arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 统计理论与方法 28 篇

2606.16941 2026-06-16 stat.ML cs.LG 新提交

A nonparametric two-sample test using a parametric integral probability metric

使用参数化积分概率度量的非参数双样本检验

Yuha Park, Yongdai Kim

AI总结 提出基于单节点神经网络的参数化判别器类构造积分概率度量,得到非参数检验统计量PReLU-IPM,并证明其一致性和渐近等价性,实验表明有限样本下检验功效更高或相当。

Comments 45 pages. Accepted for publication in Statistical Analysis and Data Mining

详情
AI中文摘要

检测两个独立样本之间的分布差异是统计学和机器学习中的一个基本问题。非参数双样本检验提供了一个原则性框架,用于确定两个样本是否来自同一潜在分布,而不假设分布的任何特定参数形式。在本研究中,我们基于新引入的积分概率度量(IPM),使用一个特殊设计的、具有神经网络单节点的参数化判别器类,提出了一种新的双样本检验统计量。我们证明了所得到的检验统计量PReLU-IPM是非参数的,并为相关的双样本检验程序PReLU-TST建立了理论保证,包括其一致性以及在正则条件下与非参数基于IPM的检验的渐近等价性。通过分析多个模拟和真实基准数据集,我们证明了PReLU-TST在有限样本下,在一系列备择假设中实现了更高的检验功效,或与竞争对手表现相当。

英文摘要

Detecting distributional differences between two independent samples is a fundamental problem in statistics and machine learning. Nonparametric two-sample testing provides a principled framework for determining whether two samples are drawn from the same underlying distribution, without assuming any specific parametric form for the distribution. In this study, we propose a new two-sample test statistic based on a newly introduced integral probability metric (IPM), using a specially designed parametric discriminator class with a single node of a neural network. We show that the resulting test statistic, called PReLU-IPM, is nonparametric and establish theoretical guarantees for the associated two-sample testing procedure, PReLU-TST, including its consistency and asymptotical equivalence to nonparametric IPM-based tests under regularity conditions. By analyzing multiple simulated and real benchmark datasets, we demonstrate that PReLU-TST achieves higher power across a range of alternatives or performs comparably to its competitors, for finite samples.

2606.16913 2026-06-16 math.ST cs.NA math.NA stat.ML stat.TH 新提交

Optimal Multiscale Learning of Linear Operators

线性算子的最优多尺度学习

Jiaheng Chen, Daniel Sanz-Alonso

AI总结 研究从含噪输入输出数据学习Sobolev空间之间有界线性算子的统计与计算极限,提出有限分辨率分块最小二乘估计器达到极小极大最优率,并实现自适应计算成本。

Comments 48 pages, 2 figures

详情
AI中文摘要

我们研究了从含噪输入输出数据学习Sobolev空间之间有界线性算子的统计与计算极限。在小波坐标下,该问题被重新表述为一个具有异质双边多尺度结构的无限维矩阵回归问题。我们建立了在Sobolev算子范数损失下的极小极大最优率,并构造了一个达到这些率的有限分辨率分块最小二乘估计器。分析揭示了跨尺度的非均匀局部估计难度,这可以在算法上加以利用:通过分配尺度自适应样本量,该估计器在稠密最小二乘实现中达到了最优计算成本。

英文摘要

We study the statistical and computational limits of learning bounded linear operators between Sobolev spaces from noisy input-output data. In wavelet coordinates, the problem is recast as an infinite-dimensional matrix regression problem with a heterogeneous two-sided multiscale structure. We establish minimax rates under Sobolev operator-norm loss and construct a finite-resolution blockwise least-squares estimator attaining these rates. The analysis reveals a nonuniform local estimation difficulty across scales, which can be exploited algorithmically: by assigning scale-adaptive sample sizes, the estimator achieves the optimal computational cost among dense least-squares implementations.

2606.16689 2026-06-16 stat.ME 新提交

Statistical methods for assessing non-replicable, outlying, and influential studies

评估不可复制、异常和有影响力研究的统计方法

Yefeng Yang, Shinichi Nakagawa

AI总结 本文厘清不可复制性、统计异常性和研究影响力的概念区别,综述元分析中检测异常和有影响力研究的模型诊断方法,并讨论最新方法学改进及实践建议。

详情
AI中文摘要

定量证据综合方法已成为整合多项研究、多中心试验和多源队列数据发现的核心工具。然而,尽管不可复制、异常和有影响力的研究可能显著影响元分析结论的稳健性和可信度,实践中对其识别和解释仍不充分。本文首先厘清不可复制性、统计异常性和研究影响力之间的概念区别,强调这些概念相关但不可互换。接着,我们回顾了元分析中检测异常和有影响力研究的标准模型诊断原则和程序,及其背后的统计原理。基于近期方法学进展,我们进一步讨论了若干实践和方法学改进,包括处理不精确和相关抽样方差的方法、稳健诊断程序以及促进异常研究识别和解释的图形工具。最后,我们总结了异常值和影响力诊断的最新进展,并为在元分析框架内谨慎解释和评估被识别为潜在不可复制、异常或有影响力的研究提供了建议。

英文摘要

Quantitative evidence synthesis method has become a central tool for integration of findings across multiple studies, multi-centre trials, and multi-source cohort data. However, the identification and interpretation of non-replicable, outlying, and influential studies remain insufficiently addressed in practice, despite their potential to substantially affect the robustness and credibility of meta-analytic conclusions. In this paper, we clarify the conceptual distinctions between non-replicability, statistical outlyingness, and study influence, emphasizing that these concepts are related but not interchangeable. We then review the standard principles and procedures of model diagnostics for detecting outlying and influential studies in meta-analysis, together with their underlying statistical rationale. Building on recent methodological developments, we further discuss several practical and methodological refinements, including approaches for handling imprecise and correlated sampling variances, robust diagnostic procedures, and graphical tools for facilitating the identification and interpretation of unusual studies. Finally, we summarize recent advances in outlier and influence diagnostics and provide recommendations for the cautious interpretation and evaluation of studies identified as potentially non-replicable, outlying, or influential within meta-analytic frameworks.

2606.16289 2026-06-16 stat.ME math.ST stat.TH 新提交

Moment-Free Kunchenko Stochastic Polynomials via Empirical Characteristic Function

基于经验特征函数的无矩Kunchenko随机多项式

Serhii Zabolotnii

AI总结 针对原始矩可能不存在的情况,利用特征函数重新表述Kunchenko随机多项式构造,提出基于固定频率网格上经验特征函数的几何结构,并证明其良定义性和几乎必然一致性,引入最小CF距离估计量并证明其可识别性、强一致性和渐近正态性。

Comments 19 pages. Lean 4 / Mathlib verification and numerical scripts: https://github.com/SZabolotnii/Ku_CF-code-supplement

详情
AI中文摘要

我们给出了Kunchenko随机多项式构造的特征函数表述,适用于原始矩可能不存在的情况。在有限方差三角情形下,Kunchenko正规系统的系数通过特征函数及其导数表示。在无矩情形下,固定有限频率网格上的经验特征函数定义了一个有界差异几何,该几何对柯西分布、对称稳定分布和其他重尾分布仍然有意义。我们证明了该经验特征函数几何的良定义性和有限网格上的几乎必然一致性。我们引入了相应的最小CF距离估计量,并证明了其在固定网格上的可识别性、强一致性和渐近正态性,其协方差由有界三角矩构成,即使对于柯西和稳定分布也保持有限;细化网格会使最优权重信息单调增加至Fisher信息,因此该估计量在稠密网格极限下是渐近有效的。我们还将有界正弦分数与弱随机多项式估计方程联系起来。一个小的Lean 4 / Mathlib补充验证了有界分数构造背后的若干确定性恒等式;收敛论证和统计解释不在形式化范围内。

英文摘要

We give a characteristic-function formulation of Kunchenko's stochastic-polynomial construction for settings in which raw moments may fail to exist. In the finite-variance trigonometric case, the coefficients of the Kunchenko normal system are expressed through the characteristic function and its derivative. In the moment-free case, empirical characteristic functions on a fixed finite frequency grid define a bounded discrepancy geometry that remains meaningful for Cauchy, symmetric stable, and other heavy-tailed laws. We prove well-definedness and finite-grid almost sure consistency of this empirical characteristic-function geometry. We introduce the associated minimum-CF-distance estimator and establish its identifiability, strong consistency, and asymptotic normality on a fixed grid, with a covariance built from bounded trigonometric moments that stays finite even for Cauchy and stable laws; refining the grid increases the optimal-weight information monotonically to the Fisher information, so the estimator is asymptotically efficient in the dense-grid limit. We also relate bounded sine scores to weak stochastic-polynomial estimating equations. A small Lean 4 / Mathlib supplement checks selected deterministic identities underlying the bounded-score construction; convergence arguments and statistical interpretation remain outside the formalization.

2606.16179 2026-06-16 math.ST stat.ML stat.TH 新提交

On the Geometry of Separation in Finite Gaussian Mixtures

有限高斯混合中分离的几何学

Huy Nguyen, Dung Le, Alessandro Rinaldo, Nhat Ho

AI总结 研究最小成分分离对有限高斯混合参数估计收敛速率的影响,通过构建基于Hellinger下界的统一几何框架,揭示了分离复杂度由成分空间配置驱动,并获得了依赖分离的收敛速率。

Comments Huy Nguyen and Dung Le contributed equally to this work

详情
AI中文摘要

我们研究了一个开放问题:理解最小成分分离对有限高斯混合中参数估计收敛速率的影响。为此,我们开发了一个统一的几何框架,基于新颖的Hellinger下界,该下界直接将混合密度之间的差异与潜在混合测度之间的Wasserstein距离联系起来,并明确依赖于最小分离和最小权重。我们的方法结合了精心设计的插值多项式和汇合差商技术,以构造专门的矩提取测试函数。当成分数量已知时,这些下界揭示了一个局部化现象:分离复杂度严格由混合成分的空间配置驱动,即它们是否集中在一个簇中、被宏观间隙分隔成多个簇,或者在没有结构约束的情况下排列。另一方面,当成分数量未知且被过度指定时,分离复杂度略有降低,而最小混合权重由于从一阶到二阶Wasserstein几何的转变而完全从收敛速率中消失。因此,我们获得了依赖分离的收敛速率,这些速率在逐点和均匀估计区间之间连续插值,从而确定了有限高斯混合中参数恢复的基本极限。

英文摘要

We study an open problem of understanding the effects of the minimum component separation on the convergence rates of parameter estimation in finite Gaussian mixtures. We address this by developing a unified geometric framework based on novel Hellinger lower bounds that directly relate discrepancies between mixture densities directly to Wasserstein distances between their underlying mixing measures, with explicit dependence on both the minimum separation and the minimum weight. Our approach combines carefully designed interpolation polynomials with confluent divided difference techniques to construct specialized moment-extraction test functions. When the number of components is known, these bounds uncover a localization phenomenon: the separation complexity is driven strictly by the spatial configuration of mixture components, namely, whether they are concentrated in a single cluster, partitioned into multiple clusters separated by a macroscopic gap, or arranged without any structural constraints. On the other hand, when the number of components becomes unknown and is over-specified, the separation complexity is slightly reduced, while the minimum mixture weight disappears entirely from the convergence rates due to a transition from first-order to second-order Wasserstein geometry. As a consequence, we obtain separation-dependent convergence rates that continuously interpolate between point-wise and uniform estimation regimes, thereby settling the fundamental limits of parameter recovery in finite Gaussian mixtures.

2606.16089 2026-06-16 stat.ME math.ST stat.TH 新提交

Wild bootstrap for mean response inference in functional linear regression models

函数线性回归模型中均值响应推断的野自助法

Hyemin Yeon, Xiongtao Dai, Daniel Nordman

AI总结 针对函数线性回归中残差自助法无法处理异方差、配对自助法计算成本高的问题,提出野自助法,兼具计算快速和适用范围广的优点,并给出截断水平选择方法。

详情
AI中文摘要

函数型回归变量使线性回归问题中的推断复杂化,因此自助法在量化不确定性和校准区间方面可发挥重要作用。然而,实践中最佳的自助法可能取决于数据因素以及计算方面的考虑,且现有自助法存在局限性:残差自助法计算快速简单,但误差异方差时可能失效;而配对自助法在函数线性回归中适用范围更广,但计算成本高得多。为弥补这一差距,我们开发了一种用于函数线性回归的野自助法,它类似于残差自助法的修改版本,但旨在像配对自助法一样具有广泛的应用范围,包括异方差误差。建立了其理论一致性,数值研究表明野自助法可提供准确且计算快速的推断。重要的是,我们还提出了一种实用且有效的截断水平选择方法,专门针对均值响应推断问题设计。通过一个天气数据示例进一步说明了所提出的函数线性回归自助法,并提供了配套的R包BTSinFLRM用于数值实现。

英文摘要

Functional regressors complicate inference in linear regression problems so that the bootstrap can play a useful role in quantifying uncertainty and calibrating intervals. The best bootstrap in practice, though, can depend on factors in the data as well as computational considerations and existing bootstraps can have limitations: residual bootstrap is computationally fast and simple but may fail when the errors are heterogeneous, while paired bootstrap applies more generally in functional linear regression at a cost of much higher computation. To bridge this gap, we develop a wild bootstrap method for functional linear regression, which is akin to a modified version of residual bootstrap but designed to have a wide scope of application like paired bootstrap, including to heteroscedastic errors. Its theoretical consistency is established and numerical studies suggest that wild bootstrap can provide accurate and computationally fast inference. Importantly, we also suggest a practical and effective approach of selecting truncation levels, specifically designed for mean response inference problems. The proposed bootstrap in functional linear regression is further illustrated through a weather data example, and an accompanying R package BTSinFLRM provides numerical implementations.

2606.16058 2026-06-16 stat.ME 新提交

Jeffreys-Type Penalized GEE for Correlated Binary Data with an Odds-Ratio Parameterization

基于Jeffreys型惩罚的GEE用于具有比值比参数化的相关二元数据

Anestis Touloumis

AI总结 针对相关二元数据中分离现象导致的GEE失效问题,提出结合Jeffreys先验惩罚与边际比值比参数化的PGEE框架,确保有限估计并提高收敛性,通过模拟和实例验证其优于普通GEE。

详情
AI中文摘要

广义估计方程(GEE)广泛用于相关二元响应的总体平均推断,但在分离情况下(在小样本、稀疏或罕见事件设置中更可能出现),普通GEE可能失败,导致不收敛、无限或极端估计以及不可靠的推断。现有的惩罚GEE(PGEE)方法缓解了其中一些问题,但在非独立工作结构下通常不能保证有限估计,并且通常依赖于相关系数参数化,其允许范围随着拟合概率趋近于零或一而缩小,迫使工作关联在分离下趋向于独立性。我们提出了一个PGEE框架,结合了Jeffreys先验惩罚和边际比值比工作参数化。比值比参数化避免了这种失败,而具有可调强度$δ$(默认$δ= 1/2$)的惩罚在分离下稳定了估计。在工作独立性下,PGEE简化为Jeffreys先验惩罚最大似然估计,为logit、probit、互补对数-对数(complementary log-log)和cauchit链接提供有限估计。在非独立比值比结构下(其中形式上的有限性保证不可用),PGEE即使在分离设置中也实现了近乎完全的实证收敛。我们还提出了单步和混合变体OPGEE和HPGEE,以降低计算成本。模拟表明,所有三种变体在分离下显著优于普通GEE,同时在常规设置中保持普通GEE的性能。我们使用一个普通GEE失败的呼吸道疾病试验来说明该方法,并在R包geer中提供了实现。

英文摘要

Generalized estimating equations (GEE) are widely used for population-averaged inference on correlated binary responses, but ordinary GEE can fail under separation, a situation that is more likely in small-sample, sparse, or rare-event settings, leading to nonconvergence, infinite or extreme estimates, and unreliable inference. Existing penalized GEE (PGEE) approaches mitigate some of these problems but do not generally guarantee finite estimates under nonindependence working structures and often rely on correlation-coefficient parameterizations whose admissible range shrinks as fitted probabilities approach zero or one, forcing the working association toward independence under separation. We propose a PGEE framework that combines a Jeffreys-prior penalty with marginalized odds-ratio working parameterizations. The odds-ratio parameterization avoids this failure, while the penalty, with tunable strength $δ$ and default $δ= 1/2$, stabilizes estimation under separation. Under working independence, PGEE reduces to the Jeffreys-prior penalized maximum-likelihood estimator, yielding finite estimates for logit, probit, complementary log-log, and cauchit links. Under nonindependence odds-ratio structures, where a formal finiteness guarantee is unavailable, PGEE achieves near-complete empirical convergence even in separated settings. We also propose one-step and hybrid variants, OPGEE and HPGEE, that reduce computational cost. Simulations show that all three variants substantially outperform ordinary GEE under separation while retaining the performance of ordinary GEE in regular settings. We illustrate the method using a respiratory-illness trial in which ordinary GEE fails, and provide an implementation in the R package geer.

2606.16043 2026-06-16 stat.ME 新提交

Bias-Reduced GEE via Adjusted Estimating Equations, with Odds-Ratio Extensions

通过调整估计方程减少偏倚的广义估计方程,及其优势比扩展

Anestis Touloumis

AI总结 针对小样本相关数据,提出一类通过调整估计方程实现一阶偏倚减少的广义估计方程(GEE)估计量,包括六种偏倚减少和校正估计量,并扩展至优势比参数化,适用于二元相关数据。

详情
AI中文摘要

广义估计方程(GEE)广泛用于相关数据分析,但当独立簇的数量较小或中等时,普通GEE回归估计量可能存在显著偏倚。我们通过将估计量视为聚类数据$M$-估计量,并推导出对估计方程的调整,以针对主要偏倚项,同时考虑工作协方差对均值参数的依赖性,从而开发了GEE的一阶偏倚减少原理。由此产生的类别包括三种偏倚减少估计量和三种一步偏倚校正类比,将Lunardon和Scharfstein(2017)的偏倚校正估计量以及Paul和Zhang(2014)的偏倚减少和偏倚校正估计量作为特例。该框架通过关联结构的相关系数参数化适用于一般响应类型,并通过成对优势比参数化扩展到相关二元数据,在此参数化下首次得到偏倚减少和偏倚校正的GEE估计量,其中边际均值兼容性约束远不如相关系数参数化严格,使其更适合小样本设置。在标准正则条件下,所有六个估计量与普通GEE具有相同的渐近分布。模拟研究表明,所提出的估计量在多种设置下减少了偏倚,同时保持了接近普通GEE的效率和覆盖率,一项临床试验分析说明了所提出估计量的实际应用。软件可在R包geer中获得。

英文摘要

Generalized estimating equations (GEE) are widely used for correlated data, but with small to moderate numbers of independent clusters the ordinary GEE regression estimators can be substantially biased. We develop a first-order bias-reduction principle for GEE by viewing the estimator as a clustered-data $M$-estimator and deriving an adjustment to the estimating equations that targets the leading bias term while accounting for the dependence of the working covariance on the mean parameters. The resulting class includes three bias-reduced estimators and three one-step bias-corrected analogs, nesting the bias-corrected estimator of Lunardon and Scharfstein (2017) and the bias-reduced and bias-corrected estimators of Paul and Zhang (2014) as special cases. The framework applies to general response types through correlation-coefficient parameterizations for the association structure and extends to correlated binary data through pairwise odds-ratio parameterizations, yielding the first bias-reduced and bias-corrected GEE estimators under this parameterization, for which the marginal-mean compatibility constraints are far less restrictive than those of correlation-coefficient parameterizations, making them better suited for small-sample settings. Under standard regularity conditions, all six estimators share the same asymptotic distribution as the ordinary GEE. Simulation studies show that the proposed estimators reduce bias while maintaining efficiency and coverage close to those of ordinary GEE across a range of settings, and a clinical trial analysis illustrates the proposed estimators in practice. Software is available in the R package geer.

2606.16013 2026-06-16 cond-mat.dis-nn cs.LG physics.data-an stat.ML 新提交

The limits of interpretability in multiple linear regression

多元线性回归中可解释性的极限

Anand Sharma, Chen Liu, Daniele Coslovich, Misaki Ozawa

发表机构 * Indian Institute of Science Education and Research(印度科学教育与研究学院) Innovation and Research Division, Ge-Room Inc.(Ge-Room公司创新与研究部) Dipartimento di Fisica, Università di Trieste(特里este大学物理系) Univ. Grenoble Alpes, CNRS, LIPhy(格勒诺布尔阿尔卑斯大学,CNRS,LIPhy)

AI总结 本文通过分析特征相关矩阵的本征模,理论解释了多重共线性导致线性回归权重不稳定和振荡模式,从而丧失可解释性的机制,并验证了岭回归的缓解作用。

Comments 23 pages, 8 figures

详情
AI中文摘要

解释机器学习模型已引起越来越多的关注,特别是在物理科学中,人们常常寻求理解潜在机制而不仅仅是进行预测。多元线性回归通常被视为比深度神经网络等更复杂模型更具可解释性的替代方案,因为其预测表示为输入特征的显式加权和。然而,当输入特征强相关时,即存在多重共线性时,学习到的权重可能表现出数据集间的大幅波动和跨物理相似特征的振荡行为,使得其解释变得困难甚至不可能。尽管统计学家熟知多重共线性下权重的不稳定性,但其对物理解释的影响,特别是与跨物理相似特征的振荡权重的联系,尚未得到系统阐明。本文通过分析特征相关矩阵的本征模,从理论上讨论了这种可解释性丧失背后的机制。我们表明,与多重共线性相关的小本征值模式会放大权重的波动,并产生不一定反映有意义贡献的振荡模式。我们在物理数据集上数值验证了这一理论图景,并表明岭回归抑制了这些不稳定模式,尽管得到的权重仍需谨慎解释。通过分析多种公开数据集,我们进一步证实了研究结果的普适性。我们的结果阐明了为何在存在多重共线性的情况下,即使对于线性回归模型,物理解释仍然可能困难。

英文摘要

Interpreting machine-learning models has attracted increasing attention, particularly in the physical sciences, where one often seeks to understand the underlying mechanisms rather than merely make predictions. Multiple linear regression is often regarded as an interpretable alternative to more complex models, such as deep neural networks, because its predictions are expressed as explicit weighted sums of input features. However, when input features are strongly correlated, namely in the presence of multicollinearity, the learned weights can exhibit large dataset-to-dataset fluctuations and oscillatory behavior across physically similar features, making their interpretation difficult or even impossible. Although the instability of the weights under multicollinearity is well known in statistics, its consequences for physical interpretation, in particular its connection to oscillatory weights across physically similar features, have not been systematically clarified. Here, we theoretically discuss the mechanism behind this loss of interpretability by analyzing the eigenmodes of the feature correlation matrix. We show that small-eigenvalue modes associated with multicollinearity amplify fluctuations in the weights and generate oscillatory patterns that do not necessarily reflect meaningful contributions. We test this theoretical picture numerically on physics datasets and show that Ridge regularization suppresses these unstable modes, although the resulting weights must still be interpreted with caution. We further confirm the generality of our findings beyond physics by analyzing a diverse collection of publicly available datasets. Our results clarify why, in the presence of multicollinearity, physical interpretation can remain difficult even for linear regression models.

2606.15836 2026-06-16 math.ST stat.ME stat.TH 新提交

Minimax Synthesis of Network Mechanisms

网络机制的最小最大综合

Marios Papamichalis, Regina Ruane

AI总结 针对单一观测网络反映多种机制的问题,提出从图中同时估计各机制贡献及其组合方式的方法,通过偏差校正实现有效推断,并给出组合规则可识别性的稠密阈值。

Comments Under Review

详情
AI中文摘要

一个单一的观测网络同时反映了多种机制:社区、枢纽和聚类共存于一个图中,每种机制对应不同的模型。我们将网络视为候选机制的组合,并从单个图中研究每种机制的贡献强度及其组合方式。我们解决两个问题。第一个是如何在机制本身必须从图中估计时衡量每种机制的贡献:从同一数据拟合机制及其强度会使强度偏向于零,而校正可消除此偏差并得到有效的置信区间。第二个是组合规则本身是否可恢复:当图由两种机制共同作用生成时,仅凭图即可确定它们是加性组合还是交互作用,当且仅当图足够稠密时存在一个尖锐阈值,低于该阈值则无法通过任何检验进行判断。该估计通过观测边对候选机制进行校准。我们建立了匹配的最小最大速率,针对已知设计基准和估计设计问题本身,通过模拟验证了方法,并将其应用于真实网络,其中符号系数恢复了已知结构,并且在一种情况下,置信区间排除了候选机制的任何正贡献。

英文摘要

A single observed network reflects several mechanisms at once: communities, hubs, and clustering coexist in one graph, each a different model. We treat the network as a combination of candidate mechanisms and study, from a single graph, how strongly each mechanism contributes and how they combine. We address two questions. The first is how to measure each mechanism's contribution when the mechanisms must themselves be estimated from the graph: fitting the mechanisms and their strengths from the same data biases the strengths toward zero, and a correction removes this bias and yields valid confidence intervals. The second is whether the rule of combination is itself recoverable: when a graph is generated by two mechanisms acting together, the graph alone determines whether they combine additively or interact, exactly when the graph is dense enough, a sharp threshold below which no test can decide. The estimate calibrates the candidate mechanisms against the observed edges. We establish matching minimax rate, against a known-design benchmark and the estimated-design problem itself, confirm the methods in simulation, and apply them to real networks, where the signed coefficients recover known structure and, in one case, a confidence interval excludes any positive contribution from a candidate mechanism.

2606.15526 2026-06-16 stat.ME 新提交

Latent Variable Models for Distributional Features

分布特征的潜变量模型

Luna Fazio, Paul-Christian Bürkner

AI总结 提出分布特征潜变量模型(DFLVM),将分布特征个体差异建模为随机截距,一步估计其与下游结果的关联,避免两步法的偏差。

详情
AI中文摘要

分析心理研究中被试的平均响应是一种标准且合理的做法。然而,理论论证和实证证据也表明,研究这些响应的分布的其他方面(如变异性或偏度)是有价值的。实践者面临的一个特殊挑战是对分布特征与感兴趣的其他结果之间的关联进行统计建模。最常见的方法是分两步进行估计:首先估计分布特征,然后将这些估计用作相关结果的预测变量。这种方法最容易在标准统计软件中实现,但它忽略了估计误差,因此可能导致有偏估计和增加错误率。我们引入了分布特征潜变量模型(DFLVM),这是一个通用框架,将分布特征中的个体间差异表示为随机截距。这些截距可以同时用作下游结果的预测变量,并在单个估计步骤中估计它们的关联。我们通过模拟研究和对真实数据集的重新分析,比较了我们的方法与两步法的性能。

英文摘要

Analyzing the mean response of study subjects in psychological research is a standard, well-justified practice. However, theoretical arguments and empirical evidence also suggest that there is value in investigating other aspects of the distribution of such responses, such as their variability or skewness. A particular challenge that practitioners face is statistical modeling of associations between distributional features and other outcomes of interest. The most common approach is to perform estimation in two steps: distributional features are estimated first, and then those estimates are used as predictors for the relevant outcomes. Such an approach is most amenable to implementation in standard statistical software, but it ignores estimation error and can therefore lead to biased estimates and increased error rates. We introduce Distributional Feature Latent Variable Models (DFLVM), a general framework that represents between-person difference in distributional features as random intercepts. These intercepts can be simultaneously used as predictors for downstream outcomes and their associations estimated in a single estimation step. We compare the performance of our approach against two-step procedures in a simulation study and through a re-analysis of a real dataset.

2606.15450 2026-06-16 stat.ME eess.SP 新提交

Kernel Density Estimation by Spectral Decomposition: Data-Driven Tapering and Superposition

谱分解的核密度估计:数据驱动的锥化和叠加

Mitchell A. Thornton

AI总结 提出在特征函数域进行带宽选择和密度估计,通过自动选择器、自适应Wiener估计器和叠加高斯混合的方法,在多种密度上优于固定带宽,并在大样本下取得领先。

Comments v1: 23 pp., 22 figs

详情
AI中文摘要

核密度估计主要依赖于一个选择:平滑带宽。我们在特征函数域处理带宽选择和密度估计,其中分箱数据的循环群平均协方差以平方经验特征函数作为其谱:真实特征函数位于采样噪声基底 $1/n$ 之上,带宽是两者相遇的谱截止点。由此衍生出几种方法。一个自动选择器剥离基底并最小化频域误差准则,在平滑密度上与经验法则匹配,在多峰密度上接近最佳固定带宽。一个自适应估计器将固定核推广到每频率最优Wiener锥化,在大多数标准密度上匹配或超越最佳固定带宽,包括固定带宽失效的尖锐峰和梳状情况;在同一域中,已知测量误差下的解卷积也随之实现。由于Wiener估计器能解析尖锐结构,但不如混合模型那样经济地拟合平滑基底,因此将高斯混合与它结合两种方式:分段划分以及平滑基底和带限残差的叠加(默认)。从谱中读取的数据驱动基底取代了假设的 $1/n$ 基底,并在堆积和舍入数据上保持稳健。在Marron-Wand基准上以精确积分平方误差评分,优势随样本量显现:谱估计器具有低偏差但付出方差代价,因此在 $n=100$ 时交叉验证领先,而Wiener滤波器和叠加在 $n=5000$ 时占据前两位。该方法在六个真实数据集(CRSP回报、NHANES自我报告、CMS双μ子和SDSS光谱、随机信标流和UNSW-NB15流量)以及合成数据质量检查上得到验证。所有实验均可复现。

英文摘要

Kernel density estimation depends largely on one choice, the smoothing bandwidth. We treat bandwidth selection and density estimation in the characteristic-function domain, where the cyclic group-averaged covariance of the binned data has the squared empirical characteristic function as its spectrum: the true characteristic function sits over a sampling-noise floor of $1/n$, and the bandwidth is the spectral cutoff where the two meet. Several methods follow. An automatic selector strips the floor and minimizes a frequency-domain error criterion, matching the rule of thumb on smooth densities and approaching the best fixed bandwidth on multimodal ones. An adaptive estimator generalizes the fixed kernel to the per-frequency optimal Wiener taper, matching or surpassing the best fixed bandwidth on most standard densities, including sharply peaked and comb-like cases where fixed bandwidths fail; deconvolution under known measurement error follows in the same domain. Because the Wiener estimator resolves sharp structure but does not fit smooth bases as economically as a mixture, a Gaussian mixture is combined with it two ways, a piecewise partition and a superposition of a smooth base and a band-limited residual, the default. A data-driven floor read from the spectrum replaces the assumed $1/n$ floor and stays robust on heaped and rounded data. On the Marron-Wand benchmark scored by exact integrated squared error, the advantage emerges with sample size, a bias-variance tradeoff: the spectral estimators carry low bias but pay in variance, so cross-validation leads at $n=100$ while the Wiener filter and superposition take the top two ranks at $n=5000$. The methods are validated on six real datasets (CRSP returns, NHANES self-reports, CMS dimuon and SDSS spectra, a random-beacon stream, and UNSW-NB15 traffic) and on a synthetic-data quality check. All experiments are reproducible.

2606.15433 2026-06-16 math.ST econ.EM stat.ME stat.TH 新提交

Limit theorems of Azadkia-Chatterjee's conditional graph correlation

Azadkia-Chatterjee条件图相关性的极限定理

Muhong Gao, Fang Han, Qizhai Li

AI总结 本文证明了Azadkia-Chatterjee条件依赖度量估计量$T_n$的渐近正态性,给出了极限方差的闭式表达式,并构造了计算高效的方差估计量,从而完善了其推断理论。

Comments 87 pages

详情
AI中文摘要

推断条件依赖的强度和检验条件独立性是统计学中的基本问题。Azadkia和Chatterjee最近的一项突破首次引入了一种条件依赖度量,该度量等于$0$当且仅当所研究的变量条件独立,等于$1$当且仅当它们条件完全依赖。他们进一步提出了一种计算高效且强相合的估计量$T_n$,基于对秩和最近邻的巧妙使用。尽管有这些吸引人的特性,$T_n$的渐近理论在很大程度上仍未发展。本文填补了这一空白。我们证明,在一般依赖下,$T_n$是渐近正态的,其极限方差具有闭式表达式。我们还构造了相合的方差估计量,这些估计量计算高效且可在$O(n\log n)$时间内实现。结合现有的偏差校正方法,这些结果为$T_n$提供了完整的推断理论。

英文摘要

Inferring the strength of conditional dependence and testing conditional independence are fundamental problems in statistics. A recent breakthrough by Azadkia and Chatterjee introduced, for the first time, a conditional dependence measure that equals $0$ if and only if the variables under study are conditionally independent, and equals $1$ if and only if they are conditionally perfectly dependent. They further proposed a computationally efficient and strongly consistent estimator, $T_n$, based on an ingenious use of ranks and nearest neighbors. Despite these attractive features, the asymptotic theory of $T_n$ has remained largely undeveloped. This paper closes that gap. We prove that, under general dependence, $T_n$ is asymptotically normal and its limiting variance admits a closed form. We also construct consistent variance estimators that are computationally efficient and implementable in $O(n\log n)$ time. Taken together with existing bias-correction methods, these results provide a complete inferential theory for $T_n$.

2606.15393 2026-06-16 stat.ML cs.LG stat.ME 新提交

Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

结构化假设空间中的有限资源错误发现率控制

Binyamin Perets, Shie Mannor

发表机构 * Technion – Israel Institute of Technology(技术学院 – 以色列理工学院) NVIDIA

AI总结 针对有限空分布样本和结构化假设空间,提出基于再生核的框架,通过两种决策规则在精确FDR控制与统计功效间权衡,并优化资源分配。

详情
AI中文摘要

科学发现依赖于大规模假设检验。然而,在控制错误发现的同时识别真正发现的能力面临重大挑战:获取相关参考数据(零分布)是资源密集型的,留下有限数据的不确定性,并且当假设空间存在固有结构时,程序应考虑该结构。在这里,我们提出了一个框架,用于在以下两种情况下控制错误发现率:当每个假设仅由有限数量的空分布样本支持,导致其p值不确定时;以及当假设空间具有任意结构时,仅要求通过合适的再生核表示该结构。我们提出了两种决策规则,它们对结构错误指定都具有鲁棒性,但在精确FDR控制和统计功效之间提供了不同的权衡。第一个规则保证精确的FDR控制;第二个规则通过将镜像统计控制适应到计数空间来最大化功效,利用分析框架在精确镜像对称放松时评估FDR控制。此外,RKHS框架带来的可处理性使我们能够直接研究有限数据的不确定性,我们利用这一点提出了一种有效分配零分布样本的策略。

英文摘要

Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.

2606.15343 2026-06-16 eess.SP math-ph math.MP stat.ME 新提交

Generalized likelihood ratio test for magnetic anomaly detection: a geometrical approach

磁异常检测的广义似然比检验:一种几何方法

C. Chenevas-Paule, S. Zozor, L. -L. Rouve, O. J. J. Michel, O. Pinaud, R. Kukla

AI总结 针对磁异常检测,提出将信号参数约束在半代数空间(偶极子模型下的锥形区域)的广义似然比检验方法,提升检测性能,数值模拟显示优于现有方法且接近最优接收机。

详情
AI中文摘要

最先进的磁异常检测方法依赖于广义似然比检验(GLRT)。这些方法基于待检测源的参数模型,该模型用合适的函数基表示。本研究的主要目标之一是证明,对于给定的测量配置,信号被限制在由这些函数基生成的空间的一个受限子集内演化。信号的参数表示被识别为一个半代数空间,对于本文使用的偶极子模型,该空间是一个锥形区域,估计信号若不在此区域内则不满足物理方程。因此,第二个目标是利用这一性质将GLRT中的信号参数约束在半代数空间内,以提高检测性能。将所提算法的性能增益与传统方法进行比较;数值模拟表明,所提方法不仅优于现有方法,甚至能提供接近清晰(最优)接收机的结果。

英文摘要

State-of-the-art approaches to magnetic anomaly detection rely on the generalized likelihood ratio test (GLRT). These approaches are based on the formulation of a parametric model of the source to be detected, expressed in a suitable functional basis. One of the primary objectives of this study is to demonstrate that, for a given measurement configuration, the signal is constrained to evolve within a restricted subset of the space generated by these functional bases. The parametric representation of the signal is identified as a semi-algebraic space which, for the dipole model used in this article, turns out to be a cone outside of which the estimated signal does not satisfy the physical equations. Thus, a second objective is to exploit this property to constrain the signal parameters in the GLRT to belong to the semi-algebraic space, in order to improve detection performance. The performance gain of the proposed algorithm is compared to the one of conventional approaches; numerical simulations show that the proposed approach not only outperforms state-of-the-art methods but can even provide results close to those of the clear-seeing (optimal) receiver.

2606.15237 2026-06-16 stat.ME 新提交

Optimized Sequential Testing for Binary Ensemble Classifiers

二元集成分类器的优化序贯测试

Joseph Kalman, Amit Moscovich

AI总结 提出一种序贯测试方法,通过提前停止基模型评估来降低二元集成分类器的计算成本,同时控制与完整集成的不一致率,并利用线性规划求解最优停止策略。

Comments 33 pages, 5 figures

详情
AI中文摘要

集成分类器是通过组合更简单基模型的结果(通常通过多数投票)进行预测的模型。一个经典例子是随机森林,它结合了决策树的预测。使用更多基模型的集成可以更准确,但训练和运行成本也更高。在本文中,我们考虑使用序贯测试领域的方法来降低二元分类计算成本的策略。我们不评估所有基模型并进行多数投票,而是顺序评估基模型,并在出现明确多数时停止执行。我们考虑了三种不同的最优性概念,用于最小化执行的基模型数量,同时控制与完整集成的不一致率的早期停止策略。对于每种最优性概念和允许的不一致率,我们展示了如何构建并高效求解线性规划以找到最优停止策略。我们在来自UC Irvine机器学习库的真实世界数据集以及Grinsztajn等人提出的基准数据集上测试了这些方法。我们发现,在大多数数据集上,这些方法在控制不一致率为0.1%的同时,提供了4倍或以上的加速。

英文摘要

Ensemble classifiers are predictive models that combine the results of simpler base models, often by majority vote. A classic example is random forests, which combine the predictions of decision trees. Ensembles that use more base models can be more accurate but also more costly to train and run. In this paper, we consider strategies for reducing the computational cost of binary classification using an approach from the field of sequential testing. Rather than evaluating all the base models and taking a majority vote, we evaluate the base models sequentially and stop execution when a clear majority emerges. We consider three different notions of optimality for early-stopping strategies that minimize the number of base models executed while controlling the rate of disagreement with the full ensemble. For each notion of optimality and allowable disagreement rate, we show that a linear program can be constructed and solved efficiently to find the optimal stopping strategy. We tested these methods on real-world datasets taken from the UC Irvine Machine Learning repository, and on the benchmark datasets proposed by Grinsztajn et al. We found that on most datasets, these methods provide speed-ups of 4x or more while controlling disagreement at 0.1%

2606.15097 2026-06-16 stat.ME stat.AP 新提交

Separate versus pooled winsorization for group mean contrasts: a finite-sample theory

分组均值对比的单独与合并截尾处理:有限样本理论

Chao Cheng, Chenshan Hu, Yukai Huang

AI总结 针对重尾数据的分组均值对比,证明合并截尾无法达到次高斯率,而单独截尾可达到,且偏差更小、集中性更好,建议在组内而非合并后截尾。

详情
AI中文摘要

比较分组均值是许多统计领域的基础,包括双样本研究、随机试验和双重差分设计,但重尾结果会使传统估计量不稳定。一种常见的补救措施是在估计目标均值对比之前对数据进行截尾处理。主要方法——合并截尾——从所有组的合并样本中计算截尾阈值,而很少使用的替代方法——单独截尾——则在每组内计算阈值。我们研究了这两种截尾策略的有限样本偏差界,并证明了一个不可能结果:没有确定性的规则可以选择合并截尾水平以达到次高斯率。相比之下,单独截尾达到了这一速率,并且该保证扩展到分组均值的一般线性对比。模拟研究证实,合并截尾可能具有显著偏差,而单独截尾几乎无偏且围绕真实值集中。这些结果支持一个简单的建议:在每组内而非合并后进行截尾。

英文摘要

Comparing group means is foundational to many statistical areas, including two-sample studies, randomized trials, and difference-in-differences designs, yet heavy-tailed outcomes can make conventional estimators unstable. A common remedy is to winsorize the data before estimating the target mean contrast. The dominant approach, pooled winsorization, computes winsorization thresholds from the combined sample across all groups, while the rarely used alternative, separate winsorization, computes them within each group. We study finite-sample deviation bounds for these two winsorization strategies, and we prove an impossibility result: no deterministic rule for selecting the pooled winsorization level can attain the sub-Gaussian rate. In contrast, separate winsorization attains this rate, and the guarantee extends to general linear contrasts of group means. Simulation studies confirm that pooled winsorization can have substantial bias, while separate winsorization remains nearly unbiased and concentrates well around the truth. These results support a simple recommendation: winsorize within each group rather than after pooling.

2606.14921 2026-06-16 stat.ME 新提交

Flexible Method Comparison with the Probability of Agreement

灵活的方法比较:基于一致概率

Nathaniel T. Stevens

AI总结 提出基于一致概率(PoA)的灵活推断框架,放宽先前假设,通过放宽假设提高方法比较的适用性,并用tPSA测量示例和模拟验证。

详情
AI中文摘要

测量方法的比较是临床实践中的常见问题;随着新方法的发展,建立它们与现有方法的一致性至关重要。一致概率(PoA)先前已被提出作为一种直观且信息丰富的手段来评估两种测量方法之间的一致性。它直接量化了不同方法对同一受试者的两次测量在临床上无法区分的可能性。在本文中,我们通过开发一个推断框架来彻底改革和扩展PoA方法,该框架放宽了先前实现中做出的几个限制性假设,最终提高了其在更广泛应用中的实用性。我们通过一个比较总前列腺特异性抗原(tPSA)测量方法的示例来说明这种更灵活的方法。并通过模拟彻底研究了其性能。这项工作极大地提高了PoA方法在方法比较中的灵活性、可用性,从而提高了其影响力。

英文摘要

The comparison of methods of measurement is a common problem in clinical practice; as novel methods are developed, establishing their agreement with existing methods is crucial. The probability of agreement (PoA) has previously been proposed as an intuitive and informative means of assessing agreement between two methods of measurement. It straightforwardly quantifies the likelihood that two measurements by different methods on the same subject are clinically indistinguishable. In this paper, we overhaul and extend the PoA methodology by developing an inference framework that relaxes several restrictive assumptions made in previous implementations, ultimately increasing its utility in a wider range of applications. We illustrate this more flexible methodology in an example that compares methods of measuring total Prostatic Specific Antigen (tPSA). And we thoroughly investigate its performance via simulation. This work dramatically increases the flexibility, availability, and hence impact of the PoA approach for method comparison.

2606.14837 2026-06-16 stat.ME 新提交

Bartlett adjustment for Gaussian random effects meta-analysis

高斯随机效应元分析的Bartlett调整

Haben Michael

AI总结 针对元分析中研究数量少导致渐近方法失效的问题,推导了高斯随机效应模型的Bartlett校正,修正了文献中的公式。

详情
AI中文摘要

元分析通常基于太少的研究,无法证明应用于它们的统计程序所依赖的渐近方法是合理的。我们考虑高阶渐近作为一种补救措施。我们推导了理想化高斯情况下的Bartlett校正,修正了当前文献中出现的公式。

英文摘要

Meta-analyses are often based on too few studies to justify the asymptotic methods underlying the statistical procedures applied to them. We consider higher-order asymptotics as a remedy. We derive the Bartlett correction for the idealized Gaussian case, correcting the formula currently appearing in the literature.

2605.13092 2026-06-16 stat.ML cs.LG stat.ME 版本更新

Adaptive Kernel Density Estimation with Pre-training

具有预训练的自适应核密度估计

Ruitong Zhang, Ke Deng

发表机构 * Department of Statistics and Data Science, Tsinghua University(统计与数据科学系,清华大学)

AI总结 本文提出利用预训练技术提升高维下自适应核密度估计效率,通过神经网络推荐合适核函数,实验证明在目标分布接近预训练分布时效果显著。

详情
AI中文摘要

高维密度估计是一个重要且具有挑战性的统计问题。传统基于核平滑的方法在高维中效率低下,因难以指定合适的位置自适应核。本文将预训练技术引入非参数密度估计中,通过建立预训练神经网络为每个样本点推荐合适的位置自适应核,实现高维高效密度估计。大量数值实验表明,当目标分布接近预训练分布族时,该策略能显著提升密度估计精度。当目标分布与预训练分布族差异较大时,预训练策略的益处可能减弱,但可通过额外的微调过程重新激活。

英文摘要

Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.

2604.26819 2026-06-16 math.PR cs.IT math.IT math.ST stat.ML stat.TH 版本更新

Sharp One-Dimensional Sub-Gaussian Comparison in Convex Order

凸序下的尖锐一维次高斯比较

Yihan Zhang

AI总结 证明若随机变量X的矩生成函数被标准正态分布G点态上界,则X在凸序下被G/𝔼[|G|]控制,且该结果由均匀分布和绝对值函数证明是最优的。

详情
AI中文摘要

我们证明,任何矩生成函数被$G \sim \mathcal{N}(0,1)$的矩生成函数点态上界的随机变量$X$,在凸序下必须被$G/\mathbb{E}[|G|]$控制,即对所有凸函数$f$,有$\mathbb{E}[f(X)] \le \mathbb{E}[f(G/\mathbb{E}[|G|])]$。这一结果是最优的,由$X \sim \mathrm{Unif}(\{-1,1\})$和$f(x)=|x|$所证实。

英文摘要

We prove that any random variable $X$ whose moment generating function is point-wise upper bounded by that of $ G \sim \mathcal{N}(0,1) $ must be dominated by $ G/\mathbb{E}[|G|] $ in convex order, meaning $ \mathbb{E}[f(X)] \le \mathbb{E}[f(G/\mathbb{E}[|G|])] $ for all convex $f$. This is sharp as witnessed by $ X \sim \mathrm{Unif}(\{-1,1\}) $ and $ f(x) = |x| $.

2412.17470 2026-06-16 math.ST econ.EM stat.ME stat.TH 版本更新

A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics

异方差稳健检验统计量尺寸可控性的一个充要条件

Benedikt M. Pötscher, David Preinerstorfer

AI总结 针对回归模型中单个约束检验,给出了异方差稳健检验统计量尺寸可控性的充要条件,改进了现有仅充分条件的结果。

Comments Two footnotes added

详情
AI中文摘要

我们重新审视了Pötscher和Preinerstorfer (2025)中关于回归模型中异方差稳健检验统计量的尺寸可控性结果。对于检验单个约束(例如,单个系数的零约束)这一特殊但重要的情形,我们给出了尺寸可控性的一个充要条件,而Pötscher和Preinerstorfer (2025)中的条件通常仅是充分的(即使在检验单个约束的情形下)。

英文摘要

We revisit size controllability results in Pötscher and Preinerstorfer (2025) concerning heteroskedasticity robust test statistics in regression models. For the special, but important, case of testing a single restriction (e.g., a zero restriction on a single coefficient), we povide a necessary and sufficient condition for size controllability, whereas the condition in Pötscher and Preinerstorfer (2025) is, in general, only sufficient (even in the case of testing a single restriction).

2602.17587 2026-06-16 math.ST cs.LG stat.ML stat.TH 版本更新

Asymptotically Optimal Sequential Testing with Markovian Data

马尔可夫数据的渐近最优序贯检验

Alhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal, Debabrota Basu, P. N. Karthik

发表机构 * Indian Institute of Science, Bangalore(班加罗尔印度科学学院) Indian Institute of Technology, Hyderabad(海得拉巴印度理工学院) Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 – CRIStAL(里尔大学、法国国家科学研究中心、中央里尔学院、UMR 9189 – CRIStAL)

AI总结 针对遍历有限状态马尔可夫链生成的数据,提出一种渐近最优的序贯假设检验方法,其期望停止时间与实例相关的下界渐近匹配,并应用于马尔可夫链蒙特卡洛模型误设检测和马尔可夫决策过程结构性质检验。

Comments ICML 2026

详情
AI中文摘要

我们研究了由遍历有限状态马尔可夫链生成的数据的单侧和α-正确序贯假设检验。原假设是未知转移矩阵属于随机矩阵的指定集合P,备择假设对应于不相交的集合Q。我们建立了备择假设下任何有效序贯检验的期望停止时间的非渐近实例相关下界,该下界是渐近紧的。我们的新分析改进了现有下界,这些下界在此设置中要么是渐近的,要么被证明是次优的。我们的下界同时包含了由未知马尔可夫链诱导的平稳分布和转移结构。我们进一步提出了一种最优检验,其期望停止时间在α→0时渐近匹配该下界。我们通过应用该框架到马尔可夫链蒙特卡洛中模型误设的序贯检测以及马尔可夫决策过程中转移动力学的线性等结构性质的检验,说明了我们框架的实用性。我们的发现给出了马尔可夫依赖下最优序贯检验程序的尖锐且一般的刻画。

英文摘要

We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic, finite-state Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the alternative corresponds to a disjoint set $Q$. We establish a non-asymptotic instance-dependent lower bound on the expected stopping time of any valid sequential test under the alternative, which is asymptotically tight. Our novel analysis improves the existing lower bounds, which are either asymptotic or provably sub-optimal in this setting. Our lower bound incorporates both the stationary distribution and the transition structure induced by the unknown Markov chain. We further propose an optimal test whose expected stopping time matches this lower bound asymptotically as $α\to 0$. We illustrate the usefulness of our framework through applications to sequential detection of model misspecification in Markov Chain Monte Carlo and to testing structural properties, such as the linearity of transition dynamics, in Markov decision processes. Our findings yield a sharp and general characterization of optimal sequential testing procedures under Markovian dependence.

2602.05807 2026-06-16 stat.ME 版本更新

SpARCD: A Spectral Graph Framework for Revealing Differential Functional Connectivity in fMRI Data

SpARCD:一种揭示fMRI数据中差异功能连接的谱图框架

Shira Yoffe, Ziv Ben-Zion, Guy Gurevitch, Talma Hendler, Malka Gorfine, Ariel Jaffe

AI总结 提出SpARCD框架,利用距离相关和谱滤波检测两种实验条件下脑连接差异,通过置换检验得到区域级显著性图,在复杂依赖结构中优于传统方法。

详情
AI中文摘要

识别在不同认知或情绪状态下表现出功能连接改变的脑区域是神经科学中的一个关键问题。现有方法,如边检验、基于种子的心理生理交互(PPI)分析或相关网络比较,通常存在统计功效低、阈值任意以及捕获分布式或非线性依赖模式能力有限的问题。我们提出SpARCD(揭示连接差异的谱分析),一种用于检测两种实验条件下脑连接差异的新统计框架。SpARCD利用距离相关(一种对线性和非线性关联都敏感的依赖度量)为每种条件构建加权图。然后通过谱滤波构建微分算子,并计算其前导特征向量来揭示连接变化。通过基于置换的检验方案实现推断,该方案生成可解释的区域级显著性图。广泛的模拟研究表明,SpARCD相对于传统的边检验或单变量方法具有更高的功效,特别是在存在复杂依赖结构时。对113名早期PTSD患者在执行情绪面孔匹配任务时的fMRI数据应用,揭示了与情绪反应和调节过程相关的不同网络。总体而言,SpARCD为比较高维连接结构提供了一个统计严谨且计算高效的框架,广泛适用于神经影像学和其他基于网络的科学领域。

英文摘要

Identifying brain regions that exhibit altered functional connectivity between cognitive or emotional states is a fundamental problem in neuroscience. We propose SpARCD (Spectral Analysis for Revealing Connectivity Differences), a statistical framework for detecting detecting condition-specific patterns of functional connectivity. SpARCD uses distance correlation, a dependence measure sensitive to both linear and nonlinear associations, to construct weighted region-wise connectivity graphs for each condition. A differential operator obtained through spectral filtering is then used to identify connectivity changes via its leading eigenvectors. To assess statistical significance, we develop a permutation-based testing procedure that yields interpretable region-level significance maps. We establish finite-sample validity of the permutation test and derive asymptotic guarantees for the stability of the resulting region rankings. Simulation studies demonstrate improved power relative to conventional edge-wise and univariate approaches, particularly in settings with nonlinear dependence structures. We applied SpARCD to fMRI data from 113 individuals with early-stage PTSD and 42 controls during emotional and neutral task conditions. The method identified distinct connectivity networks associated with visual processing in both PTSD and control participants. Resting-state comparisons between PTSD and control participants highlighted similar visual networks. SpARCD provides a statistically rigorous and computationally efficient framework for comparing high-dimensional connectivity patterns.

2507.05689 2026-06-16 math.ST stat.ML stat.TH 版本更新

Optimal structure learning and conditional independence testing

最优结构学习与条件独立性检验

Ming Gao, Yuhao Wang, Bryon Aragam

AI总结 本文建立结构学习与条件独立性检验之间的最优性联系,证明结构学习的最小最大最优率由条件独立性检验决定,并基于此提出改进的PC算法。

详情
AI中文摘要

我们建立了最优结构学习与最优条件独立性检验之间的基本联系,通过证明在结构学习问题中,最小最大最优率由这些问题中条件独立性检验的最小最大最优率决定。这通过在多森林情况下建立这两个问题之间的通用归约实现,并通过推导几个示例(包括伯努利、高斯和非参数模型)的最优率来证明。此外,我们表明这些设置中的最优算法是PC算法的适当修改。这一理论发现为通过最小最大检验的视角分析结构学习的统计复杂性提供了一个统一框架。

英文摘要

We establish a fundamental connection between optimal structure learning and optimal conditional independence testing by showing that the minimax optimal rate for structure learning problems is determined by the minimax rate for conditional independence testing in these problems. This is accomplished by establishing a general reduction between these two problems in the case of poly-forests, and demonstrated by deriving optimal rates for several examples, including Bernoulli, Gaussian and nonparametric models. Furthermore, we show that the optimal algorithm in these settings is a suitable modification of the PC algorithm. This theoretical finding provides a unified framework for analyzing the statistical complexity of structure learning through the lens of minimax testing.

2508.08564 2026-06-16 stat.ME math.ST stat.ML stat.TH 版本更新

Kernel Two-Sample Testing via Directional Components Analysis

基于方向成分分析的核双样本检验

Rui Cui, Yuhao Li, Xiaojun Song

AI总结 针对标准核双样本检验中尾部方向成分噪声导致功效下降的问题,提出通过截断MMD谱分解保留前导方向成分的核检验方法,结合高效参数自举过程,在高维和不平衡场景下实现更优功效和稳健性。

Comments Major Revision

详情
AI中文摘要

标准的核双样本检验,例如基于最大均值差异(MMD)的检验,在再生核希尔伯特空间(RKHS)中聚合所有方向上的平方差异。然而,在有限样本中,尾部方向成分存在噪声,这会降低检验功效。我们提出了一种新的基于核的检验方法,通过截断MMD的谱分解,仅保留估计良好的前导特征方向来解决这一问题。通过聚合这些稳健的成分,我们的方法实现了优越的功效和稳健性,特别是在高维和不平衡设置中。此外,我们引入了一种计算高效的自举参数过程来近似临界值,该过程在理论上合理且比基于置换的方法快得多。广泛的模拟和实证研究表明,我们的方法在保持严格的第一类错误控制的同时,比现有的基于MMD的检验具有更高的功效。

英文摘要

Standard kernel two-sample tests, such as those based on the Maximum Mean Discrepancy (MMD), aggregate squared differences across all directions in a Reproducing Kernel Hilbert Space (RKHS). However, in finite samples, trailing directional components are noisy, which degrades test power. We propose a novel kernel-based test that resolves this by truncating the spectral decomposition of the MMD, retaining only the well-estimated leading eigen-directions. By aggregating these robust components, our method achieves superior power and robustness, particularly in high-dimensional and unbalanced settings. Furthermore, we introduce a computationally efficient parametric bootstrap procedure for approximating critical values, which is theoretically justified and significantly faster than permutation-based alternatives. Extensive simulations and empirical studies demonstrate that our method maintains strict Type I error control while delivering higher power than existing MMD-based tests.

2312.01265 2026-06-16 math.PR stat.ME 版本更新

The optimal sub-Gaussian normalisation for randomised monotone functions

随机单调函数的最优亚高斯归一化

Thomas Anton, Rabee Tourky

AI总结 本文研究了随机单调函数的最优亚高斯归一化问题,通过分析有限样本大小下的概率不等式,推导出归一化尺度与对数函数之间的紧密关系。

Comments 41 pages, 1 figure. Copy editing. Signed measure processes

详情
AI中文摘要

令$\mathcal{M}$表示从$\mathbb{R}$到$[0,1]$的随机单调函数类,令$U_{\mathcal{M}}\colon \mathbb{R}_+ o \mathbb{R}_+$为最小函数,使得对于每一个成员$f_Z$具有有限有效样本大小$η_f$和任意正数$\varepsilon$,有$$\mathbb{P}\left\{ \sqrt{η_f}\, \sup_{t\in\mathbb{R}} \left| f_Z(t) - \Exf{f_Z(t)} ight| \ge \varepsilon\sqrt{U_{\mathcal{M}}(η_f)} ight\} \le 2\mathrm{e}^{-2\varepsilon^2}$$成立。我们证明对于每个$x> 1$,$$\left| \sqrt{U_{\mathcal{M}}(x)} - \sqrt{\log_4 x} ight| \le 2 \min\!\left\{ 1,\, rac{2 \ln(\mathrm{e} + \ln x)}{\sqrt{\ln x}} ight\}\,.$$最优尺度$\sqrt{U_{\mathcal{M}}(x)}$在有限样本大小上与$ rac{1}{\sqrt{2\ln 2}}\sqrt{\ln x}$紧密相关。

英文摘要

Let $\mathcal{M}$ denote the class of randomised monotone functions on $\mathbb{R}$ with values in $[0,1]$, and let $U_{\mathcal{M}}\colon \mathbb{R}_+\to \mathbb{R}_+$ be the minimal function for which $$ \mathbb{P}\left\{ \sqrt{η_f}\, \sup_{t\in\mathbb{R}} \left| f_Z(t) - \Exf{f_Z(t)} \right| \ge \varepsilon\sqrt{U_{\mathcal{M}}(η_f)} \right\} \le 2\e^{-2\varepsilon^2} $$ holds for every member $f_Z$ of $\mathcal{M}$ with finite effective sample size $η_f$ and every positive $\varepsilon$. We prove that for every $x> 1$, $$ \left| \sqrt{U_{\mathcal{M}}(x)} - \sqrt{\log_4 x} \right| \le 2 \min\!\left\{ 1,\, \frac{2 \ln(\e + \ln x)}{\sqrt{\ln x}} \right\}\,. $$ The optimal adjustment $\sqrt{U_{\mathcal{M}}(x)}$ matches $\frac{1}{\sqrt{2\ln 2}}\sqrt{\ln x}$ for all $x>1$, with residuals bounded as above.

2207.05190 2026-06-16 stat.ME math.ST stat.TH 版本更新

Estimation of High-Dimensional Normal Means through Inferential Models

通过推断模型估计高维正态均值

Samuel J. Eschker, Chuanhai Liu

AI总结 针对高维正态均值估计问题,提出基于推断模型的无先验点估计类,利用广义概率积分变换和重加权Anderson-Darling统计量的有序均匀预测随机集,实现有效推断并解释Stein悖论。

Comments 29 pages

详情
AI中文摘要

多元正态均值的估计是一个基本问题,其突出表现为在二次损失下当$n\geq 3$时MLE的不可容许性。虽然收缩和经验贝叶斯方法通过几何推理或层次建模利用联合结构,本文提出了一类源自推断模型的无先验框架的点估计。我们为独立非独立同分布观测开发了广义概率积分变换,创建了从样本到有序均匀参考分布的双射映射。通过将此双射与基于重加权Anderson-Darling统计量的有序均匀预测随机集相结合,我们确保了有效且高效的推断,捕捉了有序观测揭示的全局形状结构。我们进一步引入了结合多个似然轮廓的最大最小(瓶颈)准则。为确保可计算性,我们开发了一个有放回抽样替代方法,将精确公式与过参数化(g)建模联系起来。我们的方法提供了Stein悖论的结构性解释,表明MLE对应于联合辅助分布的零密度点,从辅助角度揭示了其不可信性。数值研究表明,我们的估计器与最先进的自动建模方法具有竞争力,并优于经典的收缩和经验贝叶斯方法。

英文摘要

The estimation of the multivariate normal mean is a fundamental problem, highlighted by the inadmissibility of the MLE for $n\geq 3$ under quadratic loss. While shrinkage and empirical Bayes methods leverage joint structure through geometric reasoning or hierarchical modeling, this paper proposes a class of point estimators derived from the prior-free framework of inferential models. We develop a generalized probability integral transform for independent, non-i.i.d observations, creating a bijective mapping from the sample to an ordered-uniform reference distribution. By combining this bijection with an ordered-uniform predictive random set based on a reweighted Anderson-Darling statistic, we ensure valid and efficient inference that captures the global shape structure revealed by the ordered observations. We further introduce a maximin (bottleneck) criterion for combining multiple plausibility contours. To ensure computability, we develop a sampling-with-replacement surrogate that connects the exact formulation to over-parameterized (g)-modeling. Our approach provides a structural explanation of Stein's paradox, showing that the MLE corresponds to a zero-density point of the joint auxiliary distribution, revealing its implausibility from an auxiliary perspective. Numerical studies show that our estimators are competitive with state-of-the-art auto-modeling methods and outperform classical shrinkage and empirical Bayes methods.

2. 贝叶斯统计与概率建模 12 篇

2606.17005 2026-06-16 cs.AI stat.ME 新提交

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

前沿AI评估公共档案的贝叶斯推断与决策审计

Yanan Long

AI总结 本文通过贝叶斯推断和审计方法,分析公共AI评估档案中的选择性报告和缺失数据,发现单一终端记录与多种历史路径兼容,并验证了审计门限对虚假声明的过滤作用。

详情
AI中文摘要

公共AI评估常被视为终端排行榜,但底层证据是由报告规则、基准修订和缺失数据塑造的选择性时间序列。LiveBench和Open LLM Leaderboard v2的重复公共档案作为主要纵向记录;LMArena提供偏好压力测试;GAIA和tau-bench贡献有限的智能体试点。这些档案共同实例化了一个贝叶斯推断问题:在固定报告约定下,一个仅包含$1{,}000$个系统的构造终端示例与两个终端前历史兼容,在相同终端尾模型下,达到天花板$0.05$以内的时间分别为$23.03$或$75.13$。在合成后验比较中,面向行动的诊断在不同观测制度下存在差异。候选选择感知的前沿模型未能通过合成恢复、目标档案预测、偏好转移和不确定性校准;相应地,固定审计门限拒绝了其更强的声明。一种档案与裁决协议重建了公共评估历史,隔离了验证的时间边界,并证伪了无依据的前沿声明。

英文摘要

Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preference stress test; and GAIA and tau-bench contribute limited agentic pilots. Together, these archives instantiate a Bayesian inference problem: under a fixed reporting convention, one constructed terminal-only example over $1{,}000$ systems is compatible with two pre-terminal histories, yielding times of $23.03$ or $75.13$ to reach within $0.05$ of the ceiling under the same terminal-tail model. In synthetic posterior comparisons, action-facing diagnostics differ across observation regimes. The candidate selection-aware frontier model fails synthetic recovery, objective-archive prediction, preference transfer, and uncertainty calibration; correspondingly, fixed audit gates reject its stronger claims. An archive-and-adjudication protocol reconstructs public evaluation histories, isolates a verified timing boundary, and falsifies unsupported frontier claims.

2606.16923 2026-06-16 cs.AI stat.ML 新提交

MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance

MA-SBI: 通过侧信道引导的误设定感知仿真推断

Arunkumar V, Manoranjan Gandhudi, Gangadharan G. R., Arun Prakash, S. Senthilkumar

发表机构 * University College of Engineering, Anna University Tiruchirappalli(安娜大学蒂鲁吉拉伯利工程学院) Central University of Karnataka(卡纳塔克中央大学) National Institute of Technology Tiruchirappalli(蒂鲁吉拉伯利国立理工学院) School of Computer & Systems Sciences, Jawaharlal Nehru University(贾瓦哈拉尔·尼赫鲁大学计算机与系统科学学院)

AI总结 针对仿真模型误设定问题,提出无需校准的MA-SBI框架,利用侧信道文本信息进行后验校正,理论保证偏差减少界限,实验表明仅用文本即可匹配oracle后验。

Comments 23 pages, 9 figures, 12 tables

详情
AI中文摘要

潜在参数的仿真推断(SBI)常受仿真器误设定困扰,即由于固有的建模简化导致的仿真观测与真实观测之间的不匹配。最新的鲁棒SBI方法RoPE通过真实与仿真观测学习表示之间的最优传输来解决此问题,但需要真实参数校准对,而这在需要SBI的设置中通常不可用。实践者拥有的是非结构化侧信息,如制度标签、指令文本和政策公告。我们提出误设定感知仿真推断(MA-SBI),一个无需校准的框架,将侧信道转化为后验校正。学习到的校正器将侧信道文本映射到观测空间偏移,应用于任何预训练的摊销后验之前,无需重新训练也无需参数真实值。我们的主要定理通过误设定与侧信道之间的互信息界定了可实现的偏差减少,通过Donsker-Varadhan扩展到所有次高斯噪声的非平凡常数。在隐藏校准基准上,仅使用文本的MA-SBI在10个种子和两个骨干网络上匹配oracle后验(TOST等价),而使用更多数据的RoPE则不能。两种方法互补:当误设定是结构性的且可从参数对中恢复时,RoPE占优,正如理论所预测。随机变体在真实COVID和OxCGRT流行病学数据上提高了后验预测对数似然,并在一个良好设定的认知科学语料库上正确保持后验不变。

英文摘要

Simulation-based inference (SBI) of latent parameters is often hindered by simulator misspecification, the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, the recent state-of-the-art for robust SBI, addresses this through optimal transport between learned representations of real and simulated observations, but requires ground-truth parameter calibration pairs that are typically unavailable in the very settings where SBI is needed. What practitioners do have is unstructured side-information such as regime labels, instruction text, and policy bulletins. We propose Misspecification-Aware Simulation-Based Inference (MA-SBI), a calibration-free framework that turns this side-channel into a posterior correction. A learned corrector maps side-channel text to an observation-space shift applied before any pre-trained amortized posterior, requiring no retraining and no parameter ground-truth. Our main theorem bounds achievable bias reduction by the mutual information between misspecification and side-channel, with a non-vacuous constant that extends to all sub-Gaussian noise via Donsker-Varadhan. On hide-the-calibration benchmarks, MA-SBI with text alone matches the oracle posterior across 10 seeds and two backbones (TOST equivalence), while RoPE given more data does not. The two approaches are complementary: where misspecification is structural and recoverable from parameter pairs, RoPE dominates, as the theory predicts. A stochastic variant improves posterior-predictive log-likelihood on real COVID and OxCGRT epidemiological data, and correctly leaves the posterior unchanged on a well-specified cognitive-science corpus.

2606.16683 2026-06-16 stat.ME stat.OT 新提交

Two fully specified Bayes factors for hypothesis testing and sensitivity analysis in process tracing

过程追踪中用于假设检验和敏感性分析的两个完全指定的贝叶斯因子

Matias López, Jake Bowers, Daniel Gajardo Cooper

AI总结 提出两个完全指定的生成模型推导证据概率,解决过程追踪中贝叶斯因子手动指定概率的偏差问题,并通过敏感性分析驱动结论。

详情
AI中文摘要

Fairfield 和 Charman (2022) 提出使用贝叶斯因子来总结过程追踪证据,但他们要求研究人员手动指定证据的概率,这引起了关于偏差的担忧 (Zaks 2021)。在本文中,我们通过直接从两个针对过程追踪研究设计的完全指定的观测生成模型中推导这些概率,提出了一个解决方案。我们完全指定的贝叶斯因子使研究人员能够报告,在考虑确凿证据权重的情况下,正面结论在转向支持对立假设之前可以吸收多少观测偏差。在实践中,这意味着最终结论更多地由敏感性测试驱动,而不是由贝叶斯因子本身驱动。为了展示我们方法的有用性,我们将该框架应用于发表在顶级政治学期刊上的六项近期过程追踪研究。

英文摘要

Fairfield and Charman (2022) propose using a Bayes factor to summarize process tracing evidence, but they require researchers to specify the probability of evidence by hand, and this has drawn concern about bias (Zaks 2021). In this paper, we present a solution by deriving such probabilities directly from two fully specified generative models of observation tailored to process-tracing research designs. Our fully specified Bayes factors enable researchers to report how much observation bias a positive conclusion can absorb before flipping in favor of the rival, taking dependence on smoking gun weight into consideration as well. In practice, this means that final conclusions are driven by sensitivity tests more than by Bayes factors themselves. To show the usefulness of our approach we apply the framework to six recent process-tracing studies published in top political science journals.

2606.16524 2026-06-16 cs.LG astro-ph.CO stat.ML 新提交

Neural Bayesian Anomaly Mitigation: A Robust Loss that Doubles as an Unsupervised Contamination Classifier

神经贝叶斯异常缓解:一种兼具无监督污染分类器功能的鲁棒损失函数

S. A. K. Leeney, W. J. Handley, H. T. J. Bevins, E. de Lera Acedo

发表机构 * Astrophysics Group, Cavendish Laboratory, University of Cambridge(剑桥大学卡文迪许实验室天体物理组) Institute of Astronomy, University of Cambridge(剑桥大学天文研究所)

AI总结 提出神经贝叶斯异常缓解(NBAM)损失,基于贝叶斯潜变量混合模型,既提供鲁棒监督损失又输出无监督污染后验,在CIFAR-10上优于Huber等基线。

Comments 13 pages, 4 figures

详情
AI中文摘要

工程化的鲁棒损失函数(如Huber、Student-$t$和广义交叉熵)使监督模型能够容忍污染,但无法回答哪些观测被破坏。我们引入神经贝叶斯异常缓解(NBAM),一种通用的即插即用损失函数,源自贝叶斯潜在开关混合模型:边际似然定义了一个鲁棒的监督损失,相关的后验定义了一个无监督的污染分类器。与Huber或Student-$t$类似,NBAM可以替换任何监督流程中的标准训练损失;与它们不同,NBAM还学习了一个结构化的污染模型,并返回每个样本的校准污染后验。学习到的输入相关先验$π_ϕ(x)$捕获污染的空间局部性,使得靠近已知损坏的样本更可能被标记,同时自动出现奥卡姆惩罚并正则化以防止过度标记。在具有非对称标签污染的CIFAR-10上,NBAM无需监督即可恢复污染过程的结构:污染后验将干净样本与污染样本分开,学习到的异常头识别每个标签翻转对的方向。除了这些能力之外,在0.2-0.6的污染率下,NBAM的性能优于本文考虑的四种鲁棒损失基线。

英文摘要

Engineered robust losses such as Huber, Student-$t$, and generalised cross-entropy make supervised models tolerant of contamination but cannot answer which observations are corrupted. We introduce Neural Bayesian Anomaly Mitigation (NBAM), a general-purpose drop-in loss derived from a Bayesian latent-switch mixture model: the marginal likelihood defines a robust supervised loss, and the associated posterior defines an unsupervised contamination classifier. Like Huber or Student-$t$, NBAM can replace the standard training loss in any supervised pipeline; unlike them, it additionally learns a structured contamination model and returns a calibrated per-sample contamination posterior. A learned input-dependent prior $π_ϕ(x)$ captures the spatial locality of contamination, so that samples near known corruptions are more likely to be flagged, while an Occam penalty emerges automatically and regularises against over-flagging. On CIFAR-10 with asymmetric label contamination, NBAM recovers the structure of the corruption process without supervision: the contamination posterior separates clean from corrupted samples, and the learned anomaly head identifies the direction of every label-flip pair. Alongside these capabilities, NBAM outperforms the four robust-loss baselines considered here at contamination rates 0.2-0.6.

2606.16224 2026-06-16 stat.AP 新提交

A Bayesian hierarchical model for meta-analysis

用于元分析的贝叶斯分层模型

Jing Dai, Sijie Xu, Shufei Ge

AI总结 提出贝叶斯分层元分析框架,通过解析积分实现小样本下的稳健参数估计,并应用于奥卡西平与卡马西平的癫痫治疗安全性比较,发现前者副作用风险更低。

Comments 8 pages, 1 figure, 8 tables

详情
AI中文摘要

元分析是综合临床试验数据以评估治疗效果的关键统计工具,但传统方法如固定效应和随机效应模型通常难以有效处理异质性、研究层面协变量或分层结构。为克服这些限制,我们开发了一个贝叶斯分层元分析框架,用于小样本下的稳健参数估计,并利用解析积分实现高效推断。模拟研究表明所提模型估计稳健。我们将该模型应用于奥卡西平和卡马西平在癫痫治疗中的安全性特征分析。结果表明,与卡马西平相比,奥卡西平与较低的副作用风险显著相关。本研究中使用的代码和相关数据可在GitHub上公开获取:https://github.com/xsjk/HierarchicalMetaAnalysis。

英文摘要

Meta-analysis is a key statistical tool for synthesizing clinical trial data to evaluate treatment effects, yet traditional methods like fixed and random-effects models often fail to handle heterogeneity, study-level covariates, or hierarchical structures effectively. To overcome these limitations, we developed a Bayesian hierarchical meta-analysis framework for robust parameter estimation on small samples and utilized analytical integration for efficient inference. Simulation studies indicated robust estimation of the proposed model. We applied it to the safety profiles of Oxcarbazepine (OXC) and Carbamazepine (CBZ) in epilepsy treatment. The results indicated that OXC was significantly associated with a lower risk of side effects than CBZ. The code and relevant data used in this study are openly available on GitHub at: https://github.com/xsjk/HierarchicalMetaAnalysis.

2606.16080 2026-06-16 stat.ME 新提交

Bayesian joint modelling using semiparametric accelerated failure time approaches

使用半参数加速失效时间方法的贝叶斯联合建模

Ding Ma, Patrick Maher, Andrew Martin

AI总结 提出一类半参数加速失效时间联合模型,直接建模协变量对事件时间的影响并灵活捕捉纵向-事件关联,采用贝叶斯框架进行估计,相比比例风险模型更具灵活性和可解释性。

详情
AI中文摘要

纵向临床研究通常收集生物标志物或健康相关生活质量的重复测量以及时间至事件结局。这些过程本质上是相互关联的:纵向轨迹可能预测事件风险,而事件发生或其预期可能导致纵向过程的信息性删失。联合模型为处理这种依赖性提供了原则性框架,但大多数现有公式依赖于比例风险假设,这可能具有限制性,并在时间尺度上提供有限的可解释性。我们提出了一类半参数加速失效时间联合模型,直接建模协变量对事件时间的影响,同时灵活捕捉纵向-事件关联。生存部分通过加速失效时间模型指定,基线部分由灵活基展开表示,允许广泛的平滑基线规范。我们使用Bernstein多项式基线表示说明该框架,并引入重缩放策略以提高时间扭曲下的数值稳定性和参数可识别性。在贝叶斯框架内进行估计,实现对纵向、生存和关联参数的联合推断。模拟研究反映了现实的纵向轨迹、删失机制和依赖结构,用于评估有限样本性能。当事件风险依赖于潜在纵向过程时,所提出的模型与独立的线性混合模型相比,显示出对纵向治疗效应的更好恢复。总体而言,该框架通过提供一种灵活且可解释的比例风险方法替代方案,扩展了现有的联合建模方法。

英文摘要

Longitudinal clinical studies often collect repeated measurements of biomarkers or health-related quality of life together with a time-to-event outcome. These processes are intrinsically linked: longitudinal trajectories may predict event risk, while event occurrence, or its anticipation, can induce informative censoring of the longitudinal process. Joint models provide a principled framework for handling this dependence, but most existing formulations rely on proportional hazards assumptions that may be restrictive and offer limited interpretability on the time scale. We propose a class of semiparametric accelerated failure time joint models that directly model covariate effects on event timing while flexibly capturing longitudinal-event associations. The survival component is specified through an accelerated failure time model with the baseline component represented by a flexible basis expansion, allowing a broad class of smooth baseline specifications. We illustrate the framework using Bernstein polynomial baseline representations and introduce rescaling strategies to improve numerical stability and parameter identifiability under time-warping. Estimation is conducted within a Bayesian framework, enabling joint inference for longitudinal, survival, and association parameters. Simulation studies reflecting realistic longitudinal trajectories, censoring mechanisms, and dependence structures are used to evaluate finite-sample performance. The proposed models show improved recovery of longitudinal treatment effects compared with a standalone linear mixed model when event risk depends on the underlying longitudinal process. Overall, the framework extends existing joint modelling methodology by offering a flexible and interpretable alternative to proportional hazards-based approaches.

2606.15837 2026-06-16 cs.CV cs.LG stat.ME stat.ML 新提交

Learning a Sampling-Free Variational DNN Plugin from Tiny Training Sets to Refine OOD Segmentation With Uncertainty Estimation

学习一种无采样的变分DNN插件,从微小训练集精炼OOD分割并估计不确定性

Jimut B. Pal, Suyash P. Awate

发表机构 * Centre for Machine Intelligence and Data Science (C-MInDS), Indian Institute of Technology (IIT) Bombay(印度理工学院孟买分校机器智能与数据科学中心) Computer Science and Engineering (CSE) Department, Indian Institute of Technology (IIT) Bombay(印度理工学院孟买分校计算机科学与工程系)

AI总结 提出VarDeepPCA,一种轻量级变分DNN框架,利用小分布内数据集学习有效解剖几何分布,无需目标域数据或预训练,通过重新解释softmax映射实现无采样推理,并提供不确定性估计,在4种临床应用中显著提升OOD分割的解剖合理性和准确性。

Comments Accepted at the Journal of Machine Learning for Biomedical Imaging

详情
AI中文摘要

深度神经网络(DNN)由于扫描仪和采集协议的变化,经常无法泛化到分布外(OOD)的医学图像。由于获取和标注新医学数据集的成本高昂,重新训练DNN模型以应对这些分布偏移通常不切实际。为了解决这个问题,我们引入了VarDeepPCA,一种新颖的轻量级变分DNN框架,旨在通过利用内在几何先验来恢复/精炼退化的分割图。与需要目标域数据或大量预训练的现有方法不同,我们的VarDeepPCA仅使用小的分布内(ID)数据集显式学习有效解剖几何的分布。理论上,我们的新颖变分学习框架利用对softmax映射的重新解释来隐式执行精确分布建模,从而实现计算高效、无采样的学习和推理。这也使VarDeepPCA能够为其恢复的分割图提供不确定性估计。我们在4种不同的临床应用上,使用14个公开可用的数据集,涉及心肌、神经视网膜边缘、前列腺和胎儿头部分割,对我们的框架进行了实证验证。与15种现有方法的比较表明,VarDeepPCA一致地恢复了现有方法在OOD数据上产生的分割图,以(i)显著提高几何的解剖合理性和分割的临床实用性,以及(ii)显著减少误差,而不需要比现有方法更多的训练数据。

英文摘要

Deep neural networks (DNNs) frequently fail to generalize to out-of-distribution (OOD) medical images because of variations in scanners and acquisition protocols. Retraining DNN models to address these distribution shifts is often impractical due to the high cost of acquiring and annotating new medical datasets. To address this, we introduce VarDeepPCA, a novel lightweight variational DNN framework designed to restore/refine degraded segmentation maps by leveraging intrinsic geometric priors. Unlike existing approaches that require target-domain data or extensive pre-training, our VarDeepPCA explicitly learns a distribution of valid anatomical geometries using only small in-distribution (ID) datasets. Theoretically, our novel variational learning framework leverages a reinterpretation of the softmax mapping to implicitly perform exact distribution modeling, thereby enabling computationally efficient, sampling-free learning and inference. This also enables VarDeepPCA to provide uncertainty estimates associated with its restored segmentation maps. We empirically validate our framework across 4 distinct clinical applications, using 14 publicly available datasets, involving segmentation of the myocardium, neuroretinal rim, prostate, and fetal head. Comparisons against 15 existing methods demonstrate that VarDeepPCA consistently restores segmentation maps produced by the existing methods on OOD data to (i) significantly improve anatomical plausibility of geometries and clinical utility of the segmentations, and (ii) significantly reduce errors, without needing any more training data than that used by existing methods.

2606.15525 2026-06-16 stat.AP stat.ME 新提交

Modeling Nonlinear Ability Trajectories and Learner Heterogeneity in Online Learning: A Bayesian Nonparametric Dynamic IRT Framework

在线学习中非线性能力轨迹与学习者异质性建模:一种贝叶斯非参数动态IRT框架

Zhihua Ma, Alice Xu, Icy Zhang, Guanyu Hu

AI总结 提出贝叶斯非参数动态IRT框架,用B样条基函数捕捉非线性效应,MFM先验自动确定聚类数,克服线性假设、预设聚类数和无法追踪纵向动态的局限,应用于198名大学生数据,识别出四种学习者轮廓。

详情
AI中文摘要

在线学习放大了理解学生参与模式如何影响学习成果的需求,特别是在技术中介环境的灵活性下。为此,我们提出了一种贝叶斯非参数动态项目反应理论(IRT)框架,用于追踪个体内部在教学单元中的能力轨迹。该模型整合了B样条基函数展开以捕捉参与行为对能力漂移的非线性效应,同时采用有限混合混合(MFM)先验自动确定潜在学习者聚类的数量。该框架克服了现有文献中的三个局限:(1)参与-能力关系中的刚性线性假设,(2)对预设聚类数的依赖,以及(3)无法追踪纵向能力动态。我们将该模型应用于198名本科生在CourseKata上完成9章入门统计学课程的纵向数据。模型自动识别出四种不同的学习者轮廓:挣扎下降型(11%)、低稳定型(23%)、主流稳定型(55%)和高进步型(12%)。结果表明,能力轨迹在各章节中保持显著稳定,且参与数量指标未能显著预测能力漂移。这些发现表明,在入门级在线统计教育中,学术能力主要反映一种稳定的预先存在的特征,而非动态可变的课程结果。最终,该框架为学习者画像提供了一种灵活工具,以指导适应性教学设计。

英文摘要

Online learning has amplified the need to understand how student engagement patterns influence learning outcomes, particularly given the flexibility of technology-mediated environments. To address this, we propose a Bayesian nonparametric dynamic item response theory (IRT) framework that tracks within-individual ability trajectories across instructional units. The proposed model integrates B-spline basis expansions to capture nonlinear effects of engagement behaviors on ability drift, alongside a Mixture-of-Finite-Mixtures (MFM) prior to automatically determine the number of latent learner clusters. This framework overcomes three limitations in the existing literature: (1) rigid linearity assumptions in engagement-ability relationships, (2) dependence on pre-specified cluster counts, and (3) the inability to track longitudinal ability dynamics. We apply the model to longitudinal data from 198 undergraduates completing a 9-chapter introductory statistics course on CourseKata. The model automatically identified four distinct learner profiles: struggling-declining (11\%), low-stable (23\%), mainstream-stable (55\%), and high-improving (12\%). Results indicate that ability trajectories remained remarkably stable across chapters, and engagement quantity metrics did not significantly predict ability drift. These findings suggest that in introductory online statistics education, academic ability primarily reflects a stable pre-existing characteristic rather than a dynamically malleable course outcome. Ultimately, this framework offers a flexible tool for learner profiling to inform adaptive instructional design.

2606.14800 2026-06-16 stat.ME cs.LG eess.IV stat.ML 新提交

Bridging data-driven priors via the score function for posterior sampling -- Comparative review and experimental study

通过得分函数桥接数据驱动先验进行后验采样——比较综述与实验研究

Elhadji Cisse Faye, Mame Diarra Fall, Sylvain Delchini, Nicolas Dobigeon

发表机构 * IDP, Univ Orléans(IDP,奥尔良大学) LITIS, Univ Rouen Normandie(LITIS,鲁昂-诺曼底大学) Bureau de Recherches Géologiques et Minières Orléans, France(奥尔良地质与矿业研究局,法国) IRIT, Univ Toulouse(图卢兹大学IRIT)

AI总结 本文综述了贝叶斯逆问题中多种数据驱动先验如何通过得分函数统一,并展示其在采样算法中的有效集成,通过图像修复和超分辨率实验验证了方法的效率与通用性。

详情
AI中文摘要

本文综述了贝叶斯逆问题中常用的多种数据驱动先验如何通过各自的得分函数统一起来。通过将这些先验置于这一共同视角下,我们表明它们可以受益于直接且有效地集成到最近提出的采样算法中。通过考虑几种数据驱动先验,即去噪正则化、基于归一化流的先验、基于得分的生成模型和凸脊正则化,说明了这一通用框架的适用性。对于这四种特定的先验,在图像修复和单图像超分辨率任务中评估了该方法的性能。这些结果以及在地质背景下恢复真实图像的结果证明了该方法的效率。这一统一框架证明足够通用,能够处理由广泛类别的基于得分函数的先验定义的任何后验分布,而不仅限于本文考虑的具体情况。

英文摘要

This paper reviews how a diverse set of popular data-driven priors commonly used in Bayesian inverse problems can be unified through their respective score functions. By framing these priors under this common perspective, we show that they can benefit from their straightfoward and effective integration into a recently proposed sampling algorithm. The applicability of this common framework is illustrated by considering several data-driven priors, namely regularization-by-denoising, normalizing flow-based priors, score-based generative models, and convex-ridge regularizers. For these four particular priors, the performance of the method is evaluated when conducting image inpainting and single image super-resolution. These results, as well as those obtained when restoring real images acquired in a geological context, demonstrate the efficiency of the method. This unified framework proves versatile enough to handle any posterior distribution defined by a broad class of score function-based priors, beyond the specific cases considered in this paper.

2509.21734 2026-06-16 stat.ME 版本更新

Optimal Stopping for Sequential Bayesian Experimental Design

序贯贝叶斯实验设计的最优停止

Chen Cheng, Xun Huan

AI总结 针对序贯实验设计中何时停止的问题,提出基于马尔可夫决策过程的贝叶斯最优停止框架,并采用课程学习策略解决联合训练中的局部最优陷阱。

详情
AI中文摘要

序贯贝叶斯实验设计通常假设实验次数在数据收集开始前是固定的。然而,在实际操作中,实验可能需要提前终止,因为额外的测量相对于其成本可能提供递减的信息,从而引发核心决策问题:何时应该停止?常见的基于阈值的停止规则易于实现但目光短浅,因为它们将当前状态与固定标准进行比较,而未考虑未来实验的预期价值。本文通过将停止和设计表述为马尔可夫决策过程中的耦合决策,为序贯实验设计开发了一个贝叶斯最优停止框架。我们证明,对于任何设计策略,最优停止规则恰好当立即终止奖励超过预期继续价值时终止。然后,我们推导出一种用于学习基于价值的停止和设计策略的策略梯度方法。朴素的联合训练可能产生循环依赖,使学习陷入早期停止的局部最优。我们通过一种课程学习策略解决了这一困难,该策略在训练过程中逐渐从强制继续过渡到自适应停止。在线性高斯基准、一维非线性测试问题以及污染物源检测问题上的数值研究表明,所提出的方法学习了稳定的设计-停止策略,并提高了资源感知性能,在具有强序贯依赖的设置中增益最大。

英文摘要

Sequential Bayesian experimental design is often formulated as a fixed-horizon policy optimization problem, in which the number of experiments is specified before data collection begins. In practical campaigns, however, additional measurements may provide diminishing information relative to their cost, making termination an integral part of experimental design. Common threshold-based stopping rules are easy to implement but myopic, because they compare the current state with a fixed criterion rather than the expected value of future experiments. This work develops a Bayesian optimal stopping framework for sequential experimental design by treating design and stopping as coupled decisions in a finite-horizon sequential decision problem. We prove that, for any fixed design policy, the optimal stopping rule terminates when the immediate terminal reward is no smaller than the expected continuation value. We then derive a policy-gradient method for learning continuous design policies with value-based stopping. The resulting optimization is challenging because the design policy, continuation value, and stopping boundary are mutually dependent, and naïve training can become trapped in early-stopping local optima. To address this difficulty, we introduce a curriculum strategy that gradually transitions from forced continuation to adaptive stopping during training. Numerical studies on a linear-Gaussian benchmark, a nonlinear test case, and a contaminant source detection problem show that the proposed approach learns stable, resource-aware design-stopping policies, with the largest gains in settings with strong sequential dependence.

2511.03954 2026-06-16 stat.ME stat.CO 版本更新

Nonparametric Modeling of Continuous-Time Markov Chains

连续时间马尔可夫链的非参数建模

Filippo Monti, Xiang Ji, Marc A. Suchard

AI总结 提出一种贝叶斯框架,通过高斯过程将连续时间马尔可夫链的速率建模为协变量的非线性函数,并开发可扩展的梯度计算方法,显著降低计算复杂度。

详情
AI中文摘要

推断连续时间马尔可夫链(CTMC)的无穷小速率是许多科学领域的核心挑战。这一任务十分困难,因为速率数量随状态空间呈二次增长,速率之间可能强相关,且许多转移可能仅被部分观测到。我们引入一个贝叶斯框架,通过高斯过程将CTMC速率建模为协变量的灵活函数。这实现了非线性协变量效应,通过纳入外部信息改进了推断,并有助于识别CTMC动态的潜在驱动因素。对于后验推断,我们使用哈密顿蒙特卡洛方法,并开发了可扩展的精确和近似梯度,用于涉及重复矩阵指数的似然计算。对于$N$个观测和$K$个CTMC状态,这些梯度将现有导数计算的主要成本从$O(NK^3)$(大常数)降低到$O(K^3+NK^2)$(更小的常数)。我们在贝叶斯系统发育和生物地理学推断中展示了该方法(其中CTMC是核心),并在合成和真实数据集上表现出强大的性能,包括在$N<K$时仍实现$K$的经验二次缩放。

英文摘要

Inferring the infinitesimal rates of continuous-time Markov chains (CTMCs) is a central challenge in many scientific domains. This task is difficult because the number of rates grows quadratically with the state space, rates can be strongly dependent, and many transitions may be only partially observed. We introduce a Bayesian framework that models CTMC rates as flexible functions of covariates through Gaussian processes. This enables nonlinear covariate effects, improves inference by incorporating external information, and helps identify potential drivers of CTMC dynamics. For posterior inference, we use Hamiltonian Monte Carlo and develop scalable exact and approximate gradients for likelihoods involving repeated matrix exponentials. With $N$ observations and $K$ CTMC states, these gradients reduce the dominant cost of existing derivative calculations from $O(NK^3)$, with large constants, to $O(K^3+NK^2)$, with cheaper constants. We demonstrate the method in Bayesian phylogenetic and phylogeographic inference, where CTMCs are central, and show strong performance on synthetic and real datasets, including empirical quadratic scaling in $K$ even when $N<K$.

2112.07755 2026-06-16 stat.ME math.ST stat.TH 版本更新

Separate Exchangeability as Modeling Principle in Bayesian Nonparametrics

分离可交换性作为贝叶斯非参数建模原则

Giovanni Rebaudo, Qiaohui Lin, Peter Mueller

AI总结 本文主张在贝叶斯非参数推断中将分离可交换性作为建模原则,讨论了其定义、两类实现模型(嵌套随机划分和回归模型)及其在实际数据中的应用。

Comments Statistical Science 2026

详情
AI中文摘要

我们主张在贝叶斯非参数(BNP)推断中将分离可交换性作为建模原则。分离可交换性在贝叶斯参数情况下实际上被广泛应用,例如,它自然出现在简单混合模型中。然而,尽管在某些领域(如随机图)中,分离和(密切相关的)联合可交换模型被广泛使用,但在BNP的其他一些应用中它们却奇怪地未被充分利用。我们简要回顾了分离可交换性的定义,重点关注该定义在贝叶斯建模中的含义。然后,我们讨论了两类实现分离可交换性的易处理模型,它们是熟悉的局部可交换BNP模型的自然对应。第一类是数据矩阵的嵌套随机划分,定义了列的划分和行的嵌套划分(嵌套在列簇内)。许多最近的嵌套划分模型实现了与众所周知的嵌套狄利克雷过程变体相关的局部可交换模型。我们认为,在这些模型下的推断在某些情况下忽略了实验设置的重要特征。我们获得了这种局部可交换划分结构的分离可交换对应。第二类涉及在涉及多组实验单元时,为非参数回归模型建立分离可交换先验。我们强调,线性模型的狄利克雷过程混合(称为ANOVA DDP)如何自然地实现此类回归问题中的分离可交换性。最后,我们通过两个真实数据示例说明了如何在这些模型下进行推断。

英文摘要

We argue for the use of separate exchangeability as a modeling principle in Bayesian nonparametric (BNP) inference. Separate exchangeability is \emph{de facto} widely applied in the Bayesian parametric case, e.g., it naturally arises in simple mixed models. However, while in some areas, such as random graphs, separate and (closely related) joint exchangeable models are widely used, they are curiously underused for several other applications in BNP. We briefly review the definition of separate exchangeability, focusing on the implications of such a definition in Bayesian modeling. We then discuss two tractable classes of models that implement separate exchangeability, which are the natural counterparts of familiar partially exchangeable BNP models. The first is nested random partitions for a data matrix, defining a partition of columns and nested partitions of rows, nested within column clusters. Many recent models for nested partitions implement partially exchangeable models related to variations of the well-known nested Dirichlet process. We argue that inference under such models in some cases ignores important features of the experimental setup. We obtain the separately exchangeable counterpart of such partially exchangeable partition structures. The second class is about setting up separately exchangeable priors for a nonparametric regression model when multiple sets of experimental units are involved. We highlight how a Dirichlet process mixture of linear models, known as ANOVA DDP, can naturally implement separate exchangeability in such regression problems. Finally, we illustrate how to perform inference under such models in two real data examples.

3. 因果推断与实验设计 5 篇

2606.15754 2026-06-16 stat.ME 新提交

Bounding Causal Effects for Ordinal Outcomes Under Positive Dependence

正相依下有序结果因果效应的界

Micha Mandel, Daniel Rodan

AI总结 针对有序数据,研究在正相依条件下基于独立工作假设的因果效应界的有效性,提出对角尾部优势条件并推导改进的界。

Comments 18 pages (main text), 2 figures. Supporting information at the end of the document

详情
AI中文摘要

定义和估计有序数据的因果效应具有挑战性。标准平均处理效应不适用于有序尺度,而替代估计量,如处理结果超过或未恶化对照结果的概率,通常不可识别。现有工作仅基于边际分布为这些量提供尖锐界。受先前观察的启发,即独立工作假设下得到的界可以显著更紧,我们研究了这些界有效的条件。我们表明,常用的正相依概念,包括正象限相依和正回归相依,不足以证明这些界的合理性。然后我们提出一个新的相依条件,对角尾部优势(DTD),在该条件下,基于独立的界保证成立。我们解释了为什么这个条件相当强,在许多情况下可能不合适,限制了使用基于独立界的理由。然而,局部DTD在许多应用中可能是合理的,我们推导了改进的界,该界利用了概率表选定部分上的独立工作假设。通过理论结果、数值示例以及对急性缺血性卒中新疗法临床试验数据的分析,我们说明了这些界的性质以及所提出条件的作用。

英文摘要

Defining and estimating causal effects for ordinal data is challenging. Standard average treatment effects are not appropriate for ordinal scales, and alternative estimands, such as the probabilities that the treatment outcome exceeds or does not worsen the control outcome, are generally not identifiable. Existing work provides sharp bounds for these quantities based only on marginal distributions. Motivated by a previous observation showing that bounds obtained under an independence working assumption can be substantially tighter, we investigate conditions under which such bounds are valid. We show that commonly used notions of positive dependence, including positive quadrant dependence and positive regression dependence, are not sufficient to justify these bounds. We then propose a new dependence condition, diagonal tail dominance (DTD), under which the independence-based bounds are guaranteed to hold. We explain why this condition is quite strong and may not be appropriate in many settings, limiting the justification for using the independence-based bounds. However, local DTD may be plausible in many applications, and we derive improved bounds that exploit an independence working assumption on selected parts of the probability table. Through theoretical results, numerical examples, and an analysis of data from a clinical trial of a new treatment for acute ischemic stroke, we illustrate the properties of the bounds and the role of the proposed conditions.

2606.14892 2026-06-16 cs.AI cs.LG cs.SI stat.ML 新提交

Relational Structural Causal Models

关系结构因果模型

Adiba Ejaz, Elias Bareinboim

发表机构 * Causal Artificial Intelligence Lab, Columbia University(哥伦比亚大学因果人工智能实验室)

AI总结 提出关系结构因果模型,将结构因果模型扩展到对象和关系可变的场景,通过关系因果图和符号识别准则实现未见组合的因果和观测查询识别,并设计关系神经因果模型在交通场景中优于非关系基线。

Comments Proceedings of the Forty-Third International Conference on Machine Learning

详情
AI中文摘要

人工智能必须拥有一个因果的环境模型,支持关于干预和反事实的推理,同时具有组合性,支持对未见过的对象组合进行泛化。在这项工作中,我们正式研究了何时以及如何学习这样的模型。我们开发了关系结构因果模型,将结构因果模型(Pearl 2009)扩展到对象及其关系变化的场景。首先,我们展示了在没有进一步假设的情况下,不仅因果查询,而且关于未见对象组合的观测查询的答案也无法被识别。为了实现这种识别——包括在存在未观测混杂的情况下——我们定义了关系因果图并推导了符号识别准则。最后,我们提出了关系神经因果模型,这是一种可证明正确的方法,在具有不同汽车、信号和行人的模拟交通场景中优于非关系基线。

英文摘要

An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.

2606.14840 2026-06-16 stat.ME 新提交

Causal Sufficient Dimension Reduction for Multiple Continuous Exposures with an Application to Environmental Mixtures

多连续暴露的因果充分降维及其在环境混合物中的应用

Thomas W. Hsiao, Howard H. Chang, Razieh Nabi

AI总结 提出因果充分降维(CSDR)框架,通过低维暴露摘要表征因果暴露-反应曲面,并设计两阶段估计器,在环境混合物研究中验证其有效性。

详情
AI中文摘要

估计多变量连续暴露的因果效应具有挑战性,因为因果暴露-反应曲面可能是高维的,使得联合暴露效应的估计和解释复杂化。这种情形出现在环境流行病学中,其关注点在于化学和污染物混合物对健康的影响。我们开发了因果充分降维(CSDR),这是一个半参数框架,通过低维暴露摘要来表征因果暴露-反应曲面。我们将降维目标形式化为因果中心均值子空间,并提出一个模块化的两阶段估计器,将 nuisance 函数估计与子空间估计解耦,相对于现有的基于边际结构模型的方法简化了实现。降维后的暴露保留了表征联合因果效应所需的信息,同时支持高效的下游估计。我们建立了因果子空间恢复的收敛速率,考虑了第一阶段 nuisance 估计误差,证明了结构维度可以一致估计,并引入了一个子空间重要性分数,用于量化每个暴露对降维的贡献。在模拟中,CSDR 在暴露-反应曲面的估计和不确定性量化方面比使用非因果降维或原始暴露的方法更准确。我们将 CSDR 应用于研究亚特兰大非洲裔美国母婴队列中母亲暴露于 PFAS 化学混合物对婴儿出生体重的影响。

英文摘要

Estimating causal effects with multivariate continuous exposures is challenging because causal exposure-response surfaces can be high-dimensional, complicating estimation and interpretation of joint exposure effects. Such settings arise in environmental epidemiology, where interest centers on the health effects of chemical and pollutant mixtures. We develop causal sufficient dimension reduction (CSDR), a semiparametric framework for representing causal exposure-response surfaces through low-dimensional exposure summaries. We formalize the reduction target as the causal central mean subspace and propose a modular two-stage estimator that decouples nuisance-function estimation from subspace estimation, simplifying implementation relative to existing marginal structural model-based approaches. The reduced exposure preserves the information needed to characterize joint causal effects while enabling efficient downstream estimation. We establish a convergence rate for causal subspace recovery accounting for first-stage nuisance estimation error, show that the structural dimension can be estimated consistently, and introduce a subspace importance score that quantifies the contribution of each exposure to the reduction. In simulations, CSDR yielded more accurate estimation and uncertainty quantification of the exposure-response surface than methods using noncausal dimension reduction or the original exposure. We apply CSDR to study the effect of maternal exposure to PFAS chemical mixtures on infant birth weight in the Atlanta African American Maternal-Child Cohort.

2605.29641 2026-06-16 stat.ME cs.PF math.PR 版本更新

Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law

不同调度策略在队列上的实验:基于Little定律的混合差分Q估计量

Nanshan Jia, Ramesh Johari, Nian Si, Zeyu Zheng

AI总结 针对数据中心调度策略A/B测试中的马尔可夫干扰问题,提出基于Little定律的混合差分Q估计量,显著降低偏差和方差,并通过非平稳到达率、异构服务率等场景的仿真验证了鲁棒性和有效性。

详情
AI中文摘要

在数据中心,任务被分发到各个服务器以均匀分配工作负载。当数据中心考虑实施新的调度算法时,通常会在部署前进行A/B测试以评估该新方法的实际影响。然而,直接的A/B测试可能会受到所谓的“马尔可夫”干扰。我们利用Farias等人(2022)开发的差分Q估计量,并引入了基于Little定律的混合差分Q估计量。我们表明,我们的A/B测试方法在测试各种调度策略时显著减少了偏差和方差。在非平稳到达率、异构服务率和通信延迟等场景下进行了大量仿真。这些仿真突出了我们A/B测试方法的鲁棒性和有效性。

英文摘要

In data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world impact of this new method. However, a straightforward A/B test might be interfered with so-called ``Markovian'' interference. We utilized the Differences-in-Q estimator, as developed by Farias et al. (2022), and introduced mixed Differences-in-Q estimators grounded in Little's Law. We show that our A/B testing methods significantly reduce bias and variance when testing various scheduling policies. Extensive simulations were conducted under scenarios like non-stationary arrival rates, heterogeneous service rates, and communication delays. These simulations highlight the robustness and efficacy of our A/B testing approach.

2602.21068 2026-06-16 stat.ME math.ST stat.AP stat.TH 版本更新

Detecting Where Effects Occur by Testing Hypotheses in Order

通过顺序假设检验检测效应发生的位置

Jake Bowers, David Kim, Nuole Chen

AI总结 提出一种自上而下的树形假设检验方法,利用实验的行政层级结构控制族系错误率,并通过误差负荷诊断判断是否需要调整,在教育试验和就业培训研究中验证了其有效性。

详情
AI中文摘要

公共政策的实验评估通常会在多个站点或区块内随机化干预。一旦报告了总体效应,对行动而言重要的问题是效应发生在哪里。标准的多次检验校正由于忽略了实验的组织方式(区块嵌套于队列、站点和区域)而功效较低。我们将假设组织成遵循这种行政结构的树,并自上而下地进行检验,仅当父节点原假设被拒绝时才进入子分支。我们证明,停止规则和有效的节点级检验足以弱控制族系错误率(FWER)。同一过程是否也能强控制FWER取决于一个在数据观测前即可计算的单一量:误差负荷,它总结了拒绝概率沿树中路径的累积方式。这个诊断指标预先告诉分析者,仅从设计量出发,未调整的过程是否控制FWER或需要调整。在25个区块随机化的MDRC教育试验中,该指标表明每个试验都不需要调整,因此仅这两个条件就能控制FWER,同时每个检验在完整的名义水平下运行;自上而下的过程检测到了Hommel校正遗漏的单个区块,并定位了自下而上检验无法评估的高层区块组。对于高误差负荷的设计,我们推导了一个自适应α调度,证明它在规则、不规则和修剪的树上控制FWER,并通过模拟确认。相同的诊断指标在需要时发出警告:在一个针对国家职业训练团研究校准的设计中,这是一个约一百个中心的大规模多站点试验,未调整的过程使FWER膨胀,自适应调度恢复控制,而自上而下的检验仍然比自下而上或分层校正检测到更多受影响的站点。

英文摘要

Experimental evaluations of public policy often randomize an intervention within many sites or blocks. Once an overall effect is reported, the question that matters for action is where it occurred. Standard multiple-testing corrections answer with little power because they ignore how the experiment is organized: blocks nest within cohorts, sites, and districts. We organize the hypotheses as a tree that follows this administrative structure and test them top-down, descending into a branch only when its parent null is rejected. We show that stopping rule and valid node-level tests suffice for weak control of the family-wise error rate (FWER). Whether the same procedure also controls the FWER in the strong sense depends on a single quantity computable before any data are seen: an error load that summarizes how rejection probability accumulates along paths through the tree. This diagnostic tells an analyst in advance, from design quantities alone, whether the unadjusted procedure controls the FWER or an adjustment is required. Across 25 block-randomized MDRC education trials it indicates that no adjustment is needed in every one, so the two conditions alone control the FWER while each test runs at the full nominal level; the top-down procedure detects individual blocks that the Hommel correction misses and locates higher-level groups of blocks that bottom-up testing cannot evaluate. For high-error-load designs we derive an adaptive alpha-schedule, prove it controls the FWER on regular, irregular, and pruned trees, and confirm it in simulation. The same diagnostic flags when it is needed: in a design calibrated to the National Job Corps Study, a wide multisite trial of about one hundred centers, the unadjusted procedure inflates the FWER, the adaptive schedule restores control, and top-down testing still detects more affected sites than bottom-up or hierarchical corrections.

4. 高维统计与正则化 10 篇

2606.16681 2026-06-16 stat.ME stat.ML 新提交

Spectral Sparsification of Laplacian-Constrained Gaussian and Hüsler-Reiss Graphical Models

拉普拉斯约束高斯和Hüsler-Reiss图模型的谱稀疏化

Ignacio Echave-Sustaeta Rodríguez, Aida Abiad, Frank Röttger

AI总结 针对拉普拉斯约束高斯图模型和Hüsler-Reiss图模型估计图过密的问题,提出将谱图稀疏化作为后估计操作,通过替换为谱接近的稀疏拉普拉斯矩阵并重新拟合模型,得到两种新方法Spectral-LCGGM和Spectral-HR,理论分析和实验表明其性能良好。

详情
AI中文摘要

图拉普拉斯矩阵以矩阵形式编码图结构,从而促进线性代数在图论中的应用。在统计学中,两类相关的概率图模型可以通过图拉普拉斯矩阵参数化。第一个是拉普拉斯约束高斯图模型(LCGGM),它要求高斯随机向量的(伪)逆协方差矩阵是拉普拉斯矩阵。应用包括图信号处理和网络拓扑学习。第二个是Hüsler-Reiss图模型,被视为高斯图模型的极值模拟,可用于洪水、热浪和金融损失的极值依赖建模。对于这两种模型,图拉普拉斯矩阵中正边权的限制产生了一种无需调整参数的图结构学习方法。虽然这些方法在许多设置中产生了强模型拟合,但得到的图估计通常比底层真实图稠密得多,限制了可解释性和可扩展性。为了提高拉普拉斯约束图学习的准确性,我们提出使用谱图稀疏化作为后估计操作。为此,我们将原始拉普拉斯估计替换为谱接近的更稀疏的拉普拉斯矩阵,并在得到的图上重新拟合模型。我们将这两种方法称为Spectral-LCGGM和Spectral-HR。我们研究了所提估计量的性质,并展示了关于其性能的几个理论结果。此外,通过在Erdős-Rényi和随机块模型图上进行模拟,我们证明了新提出的方法表现良好,并展示了它们在真实数据上的应用。

英文摘要

Graph Laplacians encode graph structures in matrix form, and thus facilitate the application of linear algebra to graph theory. In statistics, two related families of probabilistic graphical models can be parameterized by graph Laplacians. The first one is the Laplacian-constrained Gaussian graphical model (LCGGM), which imposes that the (pseudo-)inverse covariance matrix of a Gaussian random vector is a Laplacian matrix. Applications include graph signal processing and network topology learning. The second one is the Hüsler-Reiss graphical model, which is considered as an extremal analog of the Gaussian graphical model, and can be used in extremal dependence modeling of floods, heatwaves, and financial losses. For both models, the restriction to positive edge weights in the graph Laplacian gives rise to an approach for graph structure learning that does not require tuning parameters. While these approaches yield a strong model fit in many settings, the resulting graph estimates are typically much denser than the underlying ground truth, limiting interpretability and scalability. In order to improve the accuracy of Laplacian-constrained graph learning, we propose to use spectral graph sparsification as a post-estimation operation. To do so, we replace the original Laplacian estimate by a sparser Laplacian that is spectrally close, and re-fit the model on the resulting graph. We refer to the two resulting methods as Spectral-LCGGM and Spectral-HR. We investigate the properties of the proposed estimators and show several theoretical results on their performance. Furthermore, we demonstrate that the newly proposed methods perform well by running simulations on Erdős-Rényi and stochastic block model graphs, and we also showcase their applications to real data.

2606.15636 2026-06-16 stat.ME 新提交

Paired Sample Tests for High-dimensional Uncorrelatedness via Random Integration

基于随机积分的高维不相关性配对样本检验

Shiyao Huang, Xiaojun Song

AI总结 提出一种非参数检验方法,通过推广随机积分估计协方差矩阵的加权L2范数,检验两个高维随机向量间的不相关性,在n和p发散时渐近正态,对弱但广泛的相关性检测更有效。

详情
AI中文摘要

本文提出了一种新的非参数检验方法,用于评估两个高维随机向量之间的不相关性。我们通过推广Jiang等人(2023, 2024)提出的随机积分来开发我们的检验,得到的检验统计量估计协方差矩阵的加权平方$\mathscr{L}_2$范数。通过让样本量$n$和维度$p$都趋于无穷大,推导了检验统计量的渐近性质。在不相关的原假设下,我们提出的检验统计量渐近服从均值为0、方差为1的正态分布,无需指定$n$和$p$的相对大小。蒙特卡洛模拟表明我们提出的方法具有良好的有限样本性能。与许多现有检验相比,我们的检验统计量在检测“弱但广泛”的依赖性方面更有效,同时保持可比较的经验尺寸。通过评估DNA甲基化与基因表达之间相关性的实证分析,进一步说明了所提出方法的优势。

英文摘要

This paper proposes a novel nonparametric test to assess the uncorrelatedness between two high-dimensional random vectors. We develop our test by generalizing the random integration proposed by Jiang et al. (2023, 2024), and the resulting test statistic estimates a weighted squared $\mathscr{L}_2$ norm of the covariance matrix. Asymptotic properties of the test statistic are derived by letting both the sample size $n$ and the dimension $p$ diverge to infinity. Under the null hypothesis of uncorrelatedness, our proposed test statistic is asymptotically normal with zero mean and unit variance, without requiring any specification of the relative magnitude regarding $n$ and $p$. Monte Carlo simulations demonstrate the good finite-sample performance of our proposed methods. Compared with many existing tests, our test statistic is more powerful at detecting ``weak but pervasive'' dependence while maintaining a comparable empirical size. The advantages of the proposed methods are further illustrated by an empirical analysis that assesses the correlation between DNA methylation and gene expression.

2606.15602 2026-06-16 stat.ME 新提交

Bias-Aware External-Model-Assisted Inference in High-Dimensional Regression

高维回归中的偏差感知外部模型辅助推断

Hongzhe Zhang, Hanxuan Ye, Hongzhe Li

AI总结 提出DEAL方法,通过偏差感知交叉拟合收缩步骤自适应调整外部估计器,在高维半监督线性回归中实现比去偏Lasso、PPI等更短的置信区间。

详情
AI中文摘要

在高维半监督线性回归中,预测驱动推断(PPI)使用从标记数据估计的校正器来修正外部预测器。然而,在线性模型中,该校正器会抵消预测器:PPI和PPI++退化为普通最小二乘法,并且当预测器接近最优时可能增加方差。我们提出去偏外部模型辅助Lasso(DEAL),它将外部估计器和未标记协变量引入去偏估计量的方差中,并通过一个偏差感知的交叉拟合收缩步骤,自适应地适应仅目标、接近最优和有偏差但有信息三种情况。我们证明了坐标渐近正态性并具有自适应方差,将有效性扩展到误设定和非线性标记器下的投影参数,并表明在常见的未标记预算下,DEAL区间比去偏Lasso、PPI和PPI++更短;一种移位感知变体在协变量移位下保持覆盖。在模拟中,DEAL区间长度为去偏Lasso的0.49-0.87倍;在涵盖天文学、化学、蛋白质组学和肿瘤学的六个真实数据应用中(最后一个使用大语言模型作为最优预测器),DEAL区间在每种情况下都更短,中位长度比为0.23-0.53。

英文摘要

In high-dimensional semi-supervised linear regression, prediction-powered inference (PPI) corrects an external predictor with a rectifier estimated from the labeled data. In a linear model, however, this rectifier cancels the predictor: PPI and PPI++ reduce to ordinary least squares and can inflate variance when the predictor is close to the oracle. We propose the Debiased External-model-Assisted Lasso (DEAL), which routes the external estimator and the unlabeled covariates into the variance of a debiased estimator, with a bias-aware, cross-fitted shrinkage step that adapts across target-only, near-oracle, and biased-but-informative regimes. We prove coordinate-wise asymptotic normality with an adaptive variance, extend validity to the projection parameter under misspecification and nonlinear labelers, and show that, at a common unlabeled budget, DEAL intervals are shorter than those of debiased Lasso, PPI, and PPI++; a shift-aware variant preserves coverage under covariate shift. In simulations, DEAL intervals are 0.49-0.87 of the debiased-Lasso length, and across six real-data applications spanning astronomy, chemistry, proteomics, and oncology, the last using a large-language-model oracle, they tighten in every case, with median length ratios of 0.23-0.53.

2606.15581 2026-06-16 stat.ML cs.LG math.PR math.SP math.ST stat.TH 新提交

Phase Transition in Convex Relaxations for Graph Alignment

图对齐凸松弛中的相变

Laurent Massoulié, Sushil Mahavir Varma, Louis Vassaux, Irène Waldspurger

AI总结 研究相关GOE矩阵的图对齐问题,分析凸松弛方法,证明当相关参数σ=o(n^{-1/2}/log^4 n)时解集中到真实排列,并刻画了相变阈值。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了相关高斯正交系综(GOE)矩阵的图对齐问题,目标是在给定两个相关对称高斯矩阵$(A, B)$(相关性为$1/\sqrt{1+σ^2}$)的情况下恢复隐藏的顶点排列。虽然最大似然估计在信息论上是最优的,但其计算归结为二次分配问题,难以处理。受此启发,我们分析了基于在双随机矩阵集和单位超立方体上最小化$\\|AX - XB\\|_F$的凸松弛。我们证明,当相关参数满足$σ= o(n^{-1/2}/\log^4 n)$时,任一松弛的解$(X^\star)$集中在真实排列矩阵$(Π^\star)$附近,即$\\|X^\star-Π^\star\\|_F^2 = o(n)$,这意味着在简单的后处理后可以恢复除消失比例顶点外的所有顶点。结合现有下界,我们的结果精确刻画了$\\|X^\star-Π^\star\\|_F^2$从$σ= \tilde{o}(n^{-1/2})$时的$o(n)$到$σ= \tildeΩ(n^{-1/2})$时的$Ω(n)$的转变。在此过程中,我们的分析显著收紧了先前的结果,并将其扩展到双随机松弛之外。

英文摘要

We study the graph alignment problem for correlated Gaussian Orthogonal Ensemble (GOE) matrices, where the goal is to recover a hidden vertex permutation given two correlated symmetric Gaussian matrices $(A, B)$ with correlation $1/\sqrt{1+σ^2}$. While the maximum likelihood estimator is information-theoretically optimal, its computation, which reduces to a quadratic assignment problem, is intractable. Motivated by this, we analyze convex relaxations based on minimizing $\|AX - XB\|_F$ over the set of doubly stochastic matrices and the unit hypercube. We show that when the correlation parameter satisfies $σ= o(n^{-1/2}/\log^4 n)$, the solution of either relaxation $(X^\star)$ concentrates around the ground-truth permutation matrix $(Π^\star)$, i.e., $\|X^\star-Π^\star\|_F^2 = o(n)$, implying recovery of all but a vanishing fraction of vertices after simple post-processing. Combined with existing lower bounds, our results precisely characterize that $\|X^\star-Π^\star\|_F^2$ transitions from $o(n)$ for $σ= \tilde{o}(n^{-1/2})$ to $Ω(n)$ for $σ= \tildeΩ(n^{-1/2})$. In doing so, our analysis significantly tightens prior results and extends them beyond doubly stochastic relaxations.

2605.25855 2026-06-16 stat.ME math.ST stat.ML stat.TH 版本更新

High-Dimensional Robust Change-Point Detection via Angular Kernel Statistics

高维变点检测:基于角核统计量

Jyotishka Ray Choudhury, Yao Xie

AI总结 针对高维低样本量(HDLSS)数据,提出一种维度平均的角核扫描框架,通过聚合坐标间有界一维角差异实现非参数、无超参数、不依赖矩的变点检测,并给出离线与在线过程的统计推断保证。

详情
AI中文摘要

我们研究在必须从少量观测批次中进行推断的高维数据变点检测问题。主要关注高维低样本量(HDLSS)情形,其中序列长度固定而环境维度发散。我们提出一种维度平均的角核扫描框架,用于检测边际分布变化。该统计量聚合跨坐标的有界一维角差异,得到一个完全非参数、无超参数且不依赖矩的估计量,该估计量在无需指定、估计或假设有限边际矩(例如在重尾或污染分布下)的情况下仍然定义良好。对于离线单变点问题,我们推导出精确的总体均值分解为通用确定性形状函数和标量信号因子,将零假设协方差结构表征至标量长期方差因子,并建立了跨坐标混合下的HDLSS多元中心极限定理。这些结果导致插件高斯校准、渐近第一类错误控制以及功效和定位保证,包括$d^{-1/2}$局部检测尺度。我们进一步将离线过程扩展为针对高维流数据的固定窗口序贯监测过程,并获得了ARL校准和最坏情况EDD界。模拟研究表明,所提方法能够在具有挑战性的HDLSS和流设置中准确检测和定位变化,而基于矩或超参数敏感的程序可能不可靠。

英文摘要

We study nonparametric change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length is fixed while the ambient dimension diverges. We propose a dimension-averaged angular kernel scan framework for detecting marginal distributional shifts. The statistic aggregates bounded one-dimensional angular discrepancies across coordinates, yielding a fully nonparametric, hyperparameter-free, and moment-agnostic estimator that remains well-defined without specifying, estimating, or assuming finite marginal moments; for example, under heavy-tailed or contaminated distributions. For the offline single-change problem, we derive an exact population mean factorization into a universal deterministic shape function and a scalar signal factor, and characterize the exact null covariance structure up to a scalar variance factor, both valid for any fixed sample size and dimension. We also establish an HDLSS multivariate central limit theorem under cross-coordinate strong mixing which leads to a variance-calibrated asymptotically distribution-free test, asymptotic type-I error control, and lower bounds on power and localization accuracy. We further extend the offline procedure to a fixed-window sequential monitoring procedure for high-dimensional streaming data, and obtain ARL calibration and worst-case Pollak EDD bounds. Simulation studies demonstrate that the proposed method can accurately detect and localize changes in many challenging HDLSS and streaming high-dimensional settings where moment-based or hyperparameter-sensitive procedures may be extremely unstable or inaccurate.

2512.12003 2026-06-16 stat.ME math.ST stat.TH 版本更新

Debiased Inference for High-Dimensional Regression Models Based on Profile M-Estimation

基于剖面M估计的高维回归模型去偏推断

Yuhao Deng, Yi Wang, Yu Gu, Yuanjia Wang, Donglin Zeng

AI总结 提出去偏剖面M估计(DPME)框架,通过牛顿-拉夫森一步校正实现高维回归模型的正则化估计的渐近正态推断,无需显式投影,计算成本低。

详情
AI中文摘要

高维回归模型的去偏推断近年来受到广泛关注,以确保正则化估计量具有有效的推断。许多现有方法通过显式构造到 nuisance 参数空间的投影来实现 Neyman 正交性,但当投影的显式形式不可用时,这些方法不可行。我们引入了一个通用的去偏框架,即去偏剖面 $M$-估计(DPME),它适用于广泛的模型类别,并且不需要像现有方法那样进行模型特定的 Neyman 正交化或投影推导。我们的方法首先通过优化惩罚目标函数获得参数的初始估计量。为了纠正惩罚引入的偏差,我们使用牛顿-拉夫森更新构造一个一步估计量,该更新应用于剖面函数的梯度,其中剖面函数定义为在保持感兴趣参数固定时的最优目标函数。我们使用数值微分,无需显式计算梯度。得到的 DPME 估计量被证明是渐近线性和正态分布的。通过大量模拟,我们证明了所提出的方法在显著降低计算成本的同时,实现了比现有替代方法更好的覆盖率。最后,我们通过将方法应用于估计多发性骨髓瘤的治疗规则来说明其实用性。

英文摘要

Debiased inference for high-dimensional regression models has received substantial recent attention to ensure regularized estimators have valid inference. Many existing methods focus on achieving Neyman orthogonality through explicitly constructing projections onto the space of nuisance parameters, which is infeasible when an explicit form of the projection is unavailable. We introduce a general debiasing framework, Debiased Profile $M$-Estimation (DPME), which applies to a broad class of models and does not require model-specific Neyman orthogonalization or projection derivations as in existing methods. Our approach begins with obtaining an initial estimator of the parameters by optimizing a penalized objective function. To correct for the bias introduced by penalization, we construct a one-step estimator using the Newton--Raphson update, applied to the gradient of a profile function defined as the optimal objective function with the parameter of interest held fixed. We use numerical differentiation without requiring explicit calculation of the gradients. The resulting DPME estimator is shown to be asymptotically linear and normally distributed. Through extensive simulations, we demonstrate that the proposed method achieves better coverage rates than existing alternatives with largely reduced computational cost. Finally, we illustrate the utility of our method by applying it to estimate a treatment rule for multiple myeloma.

2510.13715 2026-06-16 stat.ME 版本更新

Exact Coordinate Descent for High-Dimensional Regularized Huber Regression

高维正则化Huber回归的精确坐标下降法

Younghoon Kim, Po-Ling Loh, Sumanta Basu

AI总结 提出一种精确坐标下降算法求解弹性网惩罚的高维Huber回归,通过自适应变量筛选规则加速收敛,在重尾噪声和高相关预测变量下保持稳定高效。

详情
AI中文摘要

在这项研究中,针对弹性网惩罚的高维Huber回归,开发了一种精确坐标下降算法。与现有的梯度下降或坐标下降类方法不同,即使当协变量之间由于重尾分布而产生高相关性导致Hessian矩阵病态时,该算法仍然有效。对于每个坐标,边际增量仅来自内点观测值,而导数在基于部分残差构建的网格上保持单调递增。基于传统的坐标下降框架,提出了自适应变量筛选规则,以选择性地确定每次迭代中更新哪些变量,从而加速收敛。对所提出算法的收敛性进行了正式分析,并提出了实用的计算策略以加速其执行。这些增强确保了算法即使在具有挑战性的场景下也能快速稳定地运行。涉及重尾噪声和高相关预测变量的大量模拟研究以及实际数据应用,展示了该方法的实际效率以及计算增强的益处。

英文摘要

In this study, an exact coordinate descent algorithm is developed for high-dimensional Huber regression regularized with an elastic net penalty. Unlike existing gradient descent or coordinate descent-type methods, this algorithm remains effective even when the Hessian becomes ill-conditioned due to high correlations between covariates drawn from heavy-tailed distributions. For each coordinate, marginal increments arise solely from inlier observations, while the derivatives remain monotonically increasing over a grid constructed from the partial residuals. Building on conventional coordinate descent frameworks, adaptive variable screening rules are proposed to selectively determine which variables to update at each iteration, thereby accelerating convergence. The convergence of the proposed algorithm is formally analyzed, and practical computational strategies are presented to speed up its execution. These enhancements ensure that the algorithm operates rapidly and stably even under challenging scenarios. Extensive simulation studies involving heavy-tailed noise and highly correlated predictors, along with a real-world data application, demonstrate both the practical efficiency of this method and the benefits of the computational enhancements.

2306.02244 2026-06-16 math.ST stat.ME stat.TH 版本更新

KL-BSS: Rethinking optimality for neighbourhood selection in structural equation models

KL-BSS:重新思考结构方程模型中邻域选择的最优性

Ming Gao, Wai Ming Tai, Bryon Aragam

AI总结 提出KL-BSS方法,利用结构方程模型中的潜在结构,在更弱的特征值条件下以更少样本恢复线性模型支持,并通过实验验证其优于BSS和Lasso。

详情
AI中文摘要

我们提出了一种在线性结构方程模型中进行邻域选择的新方法,该方法改进了经典方法如最佳子集选择(BSS)和Lasso。我们的方法称为KL-BSS,利用了SEM中存在的潜在结构——即使这种结构未知——并且可以轻松使用现有求解器实现。与BSS和Lasso相比,在更弱的特征值条件下,KL-BSS能够以更少的样本可证明地恢复线性模型的支持集。我们建立了KL-BSS获得的逐点和极小极大样本复杂度。在真实和模拟数据上的大量实验证实了KL-BSS带来的改进。虽然众所周知Lasso在结构化依赖下会遇到困难,但较少人知道即使是BSS也会遇到麻烦,并且可以显著改进。这些结果对图模型中的结构学习具有启示意义,因为图模型通常依赖邻域选择作为子程序。

英文摘要

We introduce a new method for neighbourhood selection in linear structural equation models that improves over classical methods such as best subset selection (BSS) and the Lasso. Our method, called KL-BSS, takes advantage of the existence of underlying structure in SEM -- even when this structure is unknown -- and is easily implemented using existing solvers. Under weaker eigenvalue conditions compared to BSS and the Lasso, KL-BSS can provably recover the support of linear models with fewer samples. We establish both the pointwise and minimax sample complexity for support recovery, which KL-BSS obtains. Extensive experiments on both real and simulated data confirm the improvements offered by KL-BSS. While it is well-known that the Lasso encounters difficulties under structured dependencies, it is less well-known that even BSS runs into trouble as well, and can be substantially improved. These results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.

2508.20278 2026-06-16 stat.ME 版本更新

Interpretable Scalar-on-Image Linear Regression Models via the Generalized Dantzig Selector

通过广义Dantzig选择器的可解释标量对图像线性回归模型

Sijia Liao, Xiaoxiao Sun, Ning Hao, Hao Helen Zhang

AI总结 提出广义Dantzig选择器,联合施加稀疏性和平滑性约束,提高标量对图像回归中系数函数的可解释性,并通过理论和实验验证其优势。

详情
AI中文摘要

标量对图像回归模型通过估计二元系数函数来研究标量响应与二元函数(例如图像)之间的关联。现有方法通常施加平滑性约束以控制偏差-方差权衡,从而防止过拟合。然而,这种假设可能阻碍可解释性,尤其是当只有图像的某些区域影响响应变化时。在这种情况下,通过对系数函数施加稀疏性假设可以更好地捕捉可解释性。为了解决这一挑战,我们提出了广义Dantzig选择器,一种联合在系数函数上施加稀疏性和平滑性的新方法。所提出的方法通过准确识别对响应变化无贡献的区域来增强可解释性,同时保持估计的稳定性。广泛的模拟研究和实际数据应用表明,新方法具有高度可解释性,并且比现有方法有显著改进。此外,我们严格建立了估计误差的非渐近界,为所提出的框架提供了强有力的理论保证。

英文摘要

The scalar-on-image regression model examines the association between a scalar response and a bivariate function (e.g., images) through the estimation of a bivariate coefficient function. Existing approaches often impose smoothness constraints to control the bias-variance trade-off, and thus prevent overfitting. However, such assumptions can hinder interpretability, especially when only certain regions of an image influence changes in the response. In such a scenario, interpretability can be better captured by imposing sparsity assumptions on the coefficient function. To address this challenge, we propose the Generalized Dantzig Selector, a novel method that jointly enforces sparsity and smoothness on the coefficient function. The proposed approach enhances interpretability by accurately identifying regions with no contribution to the changes of response, while preserving stability in estimation. Extensive simulation studies and real data applications demonstrate that the new method is highly interpretable and achieves notable improvements over existing approaches. Moreover, we rigorously establish non-asymptotic bounds for the estimation error, providing strong theoretical guarantees for the proposed framework.

2407.09964 2026-06-16 math.ST stat.ML stat.TH 版本更新

TrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional Regression

TrIM: 基于梯度的降维和高维回归的变换迭代Mondrian森林

Ricardo Baptista, Eliza O'Reilly, Yangxinyu Xie

AI总结 提出一种计算高效的梯度线性降维和高维回归算法,通过Mondrian森林估计期望梯度外积矩阵,并迭代更新特征和权重以提升回归性能,理论保证一致性。

Comments 49 pages, 19 figures

详情
AI中文摘要

我们提出了一种计算高效的算法,用于基于梯度的线性降维和高维回归。该算法首先计算一个Mondrian森林,并利用该估计器从回归函数的期望梯度外积(EGOP)估计中识别输入的相关特征子空间。此外,我们引入了一种称为变换迭代Mondrian(TrIM)森林的迭代方法,通过使用EGOP估计更新Mondrian划分机制使用的特征和权重集,从而改进Mondrian森林估计器。我们获得了估计EGOP矩阵和从TrIM算法一次迭代得到的随机森林估计器的一致性保证和收敛速度。最后,我们通过模拟和真实数据展示了所提算法在各种设置下学习相关特征子空间的有效性。

英文摘要

We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for estimating the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across various settings with both simulated and real data.

5. 时间序列与空间统计 11 篇

2606.17014 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

Filtered Conformal Ellipsoids for Graph-Native Time Series

图原生时间序列的过滤共形椭球

Yannick Limmer

发表机构 * DRW London(DRW伦敦)

AI总结 提出过滤共形椭球方法,结合状态空间滤波与共形校准,为多元时间序列生成联合预测集,控制单事件并适应跨坐标依赖,通过可观测预测律商分析保证覆盖界。

详情
AI中文摘要

多元时间序列的联合预测集应控制单个事件,同时适应跨坐标依赖性。我们研究过滤共形椭球:一个冻结的状态空间滤波器输出一步预测均值和协方差,并对得到的马氏距离分数应用分割共形校准。滤波器用于选择椭球形状;共形校准选择标量半径,因此该构造受益于学习到的预测协方差,而不依赖高斯尾部概率来保证覆盖。主要困难在于过滤分数是依赖的,且学习到的循环滤波器不需要在其原始隐藏状态上收缩;因此,我们分析可观测预测律商中的收缩,该商识别产生相同未来发射高斯律序列的隐藏状态。在稳定的贝叶斯高斯投影滤波器、协方差界和有限时域可观测性费舍尔条件下,小超额高斯负对数似然意味着学习到的发射律的收缩。结合阈值自协方差包络,这给出了依赖下过滤分割共形预测的切比雪夫型近似覆盖界;更尖锐的伯恩斯坦型界需要额外的几何混合集中假设。在高斯预言可实现性下,我们还在条件有效的高斯椭球规则类中获得了接近预言的log体积比较。我们使用具有对角加低秩协方差的GCN-GRU滤波器实例化该框架。在中等规模的图原生交通基准(METRLA-$20$和PEMSBAY-$50$)上,学习到的滤波器比静态协方差和非滤波基线给出更尖锐的目标椭球;在全图规模和非图原生数据集上,因子和copula基线可能更强。

英文摘要

Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration is applied to the resulting Mahalanobis scores. The filter is used to choose the ellipsoid shape; conformal calibration chooses the scalar radius, so the construction benefits from a learned predictive covariance without relying on Gaussian tail probabilities for coverage. The main difficulty is that filtered scores are dependent and learned recurrent filters need not contract in their raw hidden state; we therefore analyse contraction in an observable predictive-law quotient that identifies hidden states producing the same future sequence of emitted Gaussian laws. Under a stable Bayes Gaussian-projection filter, covariance bounds, and a finite-horizon observability Fisher condition, small excess Gaussian negative log-likelihood implies contraction of the learned emitted laws. Combined with a threshold-autocovariance envelope this yields a Chebyshev-type approximate coverage bound for filtered split-conformal prediction under dependence; a sharper Bernstein-type bound requires an additional geometric-mixing concentration assumption. Under Gaussian oracle realisability we also obtain a near-oracle log-volume comparison within the class of conditionally valid Gaussian ellipsoid rules. We instantiate the framework with a GCN-GRU filter with diagonal-plus-low-rank covariance. On moderate-size graph-native traffic benchmarks (METRLA-$20$ and PEMSBAY-$50$), the learned filter gives sharper at-target ellipsoids than static-covariance and non-filter baselines; at full-graph scale and on non-graph-native datasets, factor and copula baselines can be stronger.

2606.16773 2026-06-16 econ.EM stat.ME stat.ML 新提交

Generative Predictive Distributions for Time Series

时间序列的生成式预测分布

Jordi Llorens-Terrazas, Mika Meitz

AI总结 提出基于生成式表示的灵活框架,用于建模非线性多变量时间序列的预测分布,通过条件生成对抗网络估计,并建立弱时间依赖下的统计一致性。

详情
AI中文摘要

我们提出了一个灵活的框架,用于建模非线性、可能多变量时间序列的预测分布。我们的方法基于测度论概率中的一个民间结果,在适当的生成式表示中表达一般的预测分布。这种表示为预测分布提供了直接的基于模拟的近似,从而能够直接计算条件均值和方差的预测、扇形图、风险价值、预期亏损、联合尾部风险以及其他感兴趣的量。我们使用条件生成对抗网络的一个版本来估计这种生成式表示,并提供了弱时间依赖下估计的形式化统计分析。具体来说,估计被表述为一个特定的极小极大问题,并且我们建立了其近似解在豪斯多夫距离下的一致性。通过应用于股票收益、已实现方差和已实现协方差的例子,说明了该方法的实证相关性。所提出的方法在计算上也是可管理的,在我们的应用中,在标准笔记本电脑上估计大约需要一分钟。

英文摘要

We propose a flexible framework for modeling the predictive distributions of nonlinear, possibly multivariate time series. Our approach expresses a general predictive distribution in an appropriate generative representation that is based on a folklore result from measure theoretic probability. This representation provides a direct simulation-based approximation to the predictive distribution, enabling straightforward computation of forecasts for the conditional mean and variance, fan charts, value at risk, expected shortfall, joint tail risks, and other quantities of interest. We estimate this generative representation using a version of conditional generative adversarial networks and provide a formal statistical analysis of estimation under weak temporal dependence. Specifically, estimation is expressed as a particular minimax problem and we establish consistency of its approximate solutions in Hausdorff distance. The empirical relevance of the approach is illustrated using applications to equity returns, realized variance, and realized covariances. The proposed method is also computationally manageable, with estimation in our applications taking approximately one minute on a standard laptop.

2606.16677 2026-06-16 stat.AP 新提交

Distributional Forecasting of EU Asylum Applications with Dynamic Multivariate Count Models

欧盟庇护申请分布预测的动态多元计数模型

Gregor Zens, Jakub Bijak

AI总结 提出贝叶斯框架联合预测EU-27月度庇护申请分布,分解潜在强度为国家随机游走和共同因子,结合厚尾或随机波动冲击,发现联合模型优于单国模型,尤其在上尾风险中表现显著。

详情
AI中文摘要

我们提出了一个贝叶斯框架,用于联合预测EU-27各国月度庇护申请的分布。该模型将潜在申请强度分解为国家特定的随机游走和共同因子,并允许特质性和共同冲击表现出厚尾或随机波动性。使用2008年至2026年的欧盟统计局数据,我们在滚动样本外预测中评估预测分布,对整体分布准确性和上尾风险进行评分。三个发现浮现:第一,最优规格因国家、评分规则和预测期而异,强调了模型需与政策特定损失函数对齐。第二,联合EU-27模型优于单国基准模型,在上尾(准备成本最相关)中增益最大。第三,随机游走对数强度为国家庇护申请动态提供了有用的短期描述,尤其当与灵活的创新动态结合时。最后,我们讨论了这些发现对涉及庇护预测和准备规划的国家及欧盟层面机构的启示。

英文摘要

We propose a Bayesian framework for joint distributional forecasting of monthly asylum applications across the EU-27. The model decomposes latent application intensities into country-specific random walks and common factors, with idiosyncratic and shared shocks allowed to exhibit heavy tails or stochastic volatility. Using Eurostat data from 2008 to 2026, we evaluate predictive distributions in a rolling out-of-sample exercise, scoring overall distributional accuracy and upper-tail risk. Three findings emerge. First, the preferred specification varies across countries, scoring rules, and horizons, underscoring the need to align models with policy-specific loss functions. Second, joint EU-27 models improve on country-by-country benchmarks, with the largest gains in the upper tail, where preparedness costs are most relevant. Third, random-walk log-intensities provide a useful short-run description of national asylum-application dynamics, especially when combined with flexible innovation dynamics. We conclude by discussing implications for national and EU-level agencies involved in asylum forecasting and preparedness planning.

2606.15953 2026-06-16 stat.ME 新提交

Drift-Aware Spectral Conformal Prediction for Non-Exchangeable Streaming Data

面向非可交换流数据的漂移感知谱共形预测

Jeffery Opoku, David Banahene

AI总结 针对非可交换流数据的分布漂移问题,提出漂移感知谱共形预测(DASC),通过局部谱相似性加权校准残差和基于传输的漂移评分,动态调整校准池与目标误覆盖率,实现近名义覆盖并降低区间宽度。

Comments 25 pages, includes figures and references

详情
AI中文摘要

共形预测在可交换性假设下提供无分布预测区间,但许多现代数据流既非独立也非稳定,它们表现出周期性模式、变化的季节频率、突变和渐变漂移。我们提出漂移感知谱共形预测(DASC),一种针对受分布漂移影响的结构化非可交换数据流的流式不确定性量化框架。DASC 使用由局部谱相似性加权的校准残差形成共形预测区间,同时基于传输的漂移评分监测当前测试分布是否偏离过去的校准模式。当漂移轻微时,DASC 从结构相似的历史窗口借用校准残差;当漂移严重时,它收缩或重新加权校准池并在线更新目标误覆盖率。该方法还报告一个有效样本量诊断,当加权共形分位数统计上脆弱时发出警告。我们建立了一个近似覆盖界,将覆盖损失分解为漂移、残差不匹配和加权有效样本量。在合成实验和五个压力测试场景中,DASC 在漂移后保持近名义覆盖,而滚动、近期加权和仅谱共形方法可能覆盖不足。在实际电力和天气数据流中,相对于最佳校准的非 DASC 基线,DASC 分别将平均区间宽度减少约 28% 和 42%,同时保持校准或保守覆盖。一个金融波动率示例显示了一个更微妙的场景,其中仅谱校准具有竞争力,但 DASC 保持近名义覆盖并添加漂移诊断。

英文摘要

Conformal prediction provides distribution-free prediction intervals under exchangeability, but many modern data streams are neither independent nor stable. They exhibit recurring regimes, changing seasonal frequencies, abrupt shifts, and gradual drift. We propose drift-aware spectral conformal prediction (DASC), a streaming uncertainty quantification framework for structured non-exchangeable data subject to distributional drift. DASC forms conformal prediction intervals using calibration residuals weighted by local spectral similarity, while a transport-based drift score monitors whether the current test distribution has moved away from past calibration regimes. When drift is mild, DASC borrows calibration residuals from structurally similar historical windows; when drift is severe, it contracts or reweights the calibration pool and updates the target miscoverage level online. The method also reports an effective sample size diagnostic that warns when a weighted conformal quantile is statistically fragile. We establish an approximate coverage bound that decomposes coverage loss into drift, residual mismatch, and weighted effective sample size. In synthetic experiments and five stress-test regimes, DASC maintains near-nominal coverage after drift where rolling, recency-weighted, and spectral-only conformal methods can under-cover. In real electricity and weather streams, DASC reduces average interval width by approximately 28% and 42%, respectively, relative to the best calibrated non-DASC baseline, while preserving calibrated or conservative coverage. A financial volatility example shows a more nuanced regime in which spectral-only calibration is competitive, but DASC retains near-nominal coverage and adds drift diagnostics.

2606.15950 2026-06-16 stat.ML cs.LG 新提交

Spectral Adaptive Conformal Prediction for Structured Non-Exchangeable Data

面向结构化非可交换数据的谱自适应共形预测

Jeffery Opoku, David Banahene

发表机构 * University of Texas Rio Grande Valley(德克萨斯理工大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 针对非可交换时间序列数据,提出谱自适应共形预测方法,通过局部谱相似性加权共形分位数并在线调整目标误覆盖率,在循环模式和频率变化场景下提升区间覆盖的长期校准性。

Comments 35 pages, includes figures and references

详情
AI中文摘要

当数据可交换时,共形预测提供具有有限样本覆盖率的预测区间。许多时间索引数据集是不可交换的,它们具有季节、循环模式、变化频率或其他形式的结构化依赖。本文研究了一种利用这种结构的简单方法。我们提出了谱自适应共形预测,该方法使用局部谱相似性形成加权共形分位数,然后在线更新目标误覆盖率。谱权重选择与当前测试点相关的校准残差。自适应更新在不确定性随时间变化时纠正长期错误率。我们给出了固定谱加权分位数的近似覆盖结果,以及自适应更新的确定性长期校准结果。涉及循环模式和缓慢变化频率的模拟,以及三个美国真实数据示例表明,混合方法可以改进固定谱加权,同时也表明谱加权必须通过有效样本量诊断进行监控。

英文摘要

Conformal prediction gives prediction intervals with finite-sample coverage when the data are exchangeable. Many time-indexed datasets are not exchangeable. They have seasons, recurring regimes, changing frequencies, or other forms of structured dependence. This paper studies a simple way to use that structure. We propose spectral adaptive conformal prediction, a method that forms weighted conformal quantiles using local spectral similarity and then updates the target miscoverage level online. The spectral weights choose calibration residuals that look relevant to the current test point. The adaptive update corrects the long-run miss rate when uncertainty changes over time. We give an approximate coverage result for the fixed spectral weighted quantile and a deterministic long-run calibration result for the adaptive update. Simulations with recurring regimes and slowly changing frequencies, together with three U.S. real-data examples, show that the hybrid method can improve on fixed spectral weighting, while also showing that spectral weighting must be monitored through effective sample size diagnostics.

2606.15452 2026-06-16 cs.LG math.AT q-fin.RM stat.ML 新提交

PHINN: Persistent Homology Inspired Neural Network for Rare-Event Time Series Generation

PHINN: 基于持久同构的稀有事件时间序列生成神经网络

Emre Yusuf, Ren Takahashi, Jayabrata Bhaduri

发表机构 * Defense.Codes (a DBA of CapaCloud Corp)(Defense.Codes(CapaCloud Corp 的商用名))

AI总结 提出PHINN框架,利用动态Betti曲线作为条件信号和持久景观损失保持同调一致性,在金融、流行病和多模态基准上拓扑保真度优于统计和扩散基线。

Comments 15 pages, 4 figures

详情
AI中文摘要

时间序列中的稀有事件对建模至关重要,但由于数据稀缺而难以学习。当前的生成模型难以处理极端值。我们观察到稀有事件会留下独特的拓扑指纹——从点云嵌入中Betti数的转变——这些指纹比统计矩更稳定且更具判别性。我们提出了PHINN,一个流匹配框架,使用动态Betti曲线作为条件信号,并采用持久景观损失来保持同调一致性。它可扩展到多变量数据,包含一个自然语言接口来设置Betti目标,支持跨领域元学习和少样本生成,并提供经过认证的对抗鲁棒性。在金融、流行病和多模态基准上,PHINN在拓扑保真度(beta-RMSE降低41-63%,转换准确率提高84%)方面优于统计和扩散基线,在尾部覆盖方面与跳跃扩散模型相当,在形状保真度方面超过它们。所有结果均具有95%置信区间。

英文摘要

Rare events in time series are critical to model but hard to learn due to data scarcity. Current generative models struggle with extreme values. We observe that rare events leave distinct topological fingerprints - transitions in Betti numbers from point-cloud embeddings - that are more stable and discriminative than statistical moments. We introduce PHINN, a flow-matching framework using dynamic Betti curves as conditioning signals and a persistence landscape loss for homology consistency. It scales to multivariate data, includes a natural-language interface to set Betti targets, supports cross-domain meta-learning and few-shot generation, and provides certified adversarial robustness. On financial, epidemiological, and multi-modal benchmarks, PHINN outperforms statistical and diffusion baselines in topological fidelity (beta-RMSE down 41-63%, transition accuracy up 84%) and matches jump-diffusion models in tail coverage while exceeding them in shape fidelity. All results have 95% confidence intervals.

2606.15012 2026-06-16 stat.ME q-bio.QM 新提交

A Kuramoto-von Mises Time Series Model for Probabilistic Modeling of Coupled Oscillators

耦合振荡器概率建模的Kuramoto-von Mises时间序列模型

Yun Hwang, Todd P. Coleman

AI总结 提出一种不假设热力学平衡的耦合振荡器概率分布估计方法,基于Langevin动力学构建,在高采样率下具有闭式解,在非平衡模拟数据和真实脑/胃电生理数据中优于现有方法。

Comments 15 pages, 4 figures

详情
AI中文摘要

耦合振荡器系统为建模广泛的物理和生物现象提供了基本框架。在神经科学中,中枢神经系统与相邻脑区表现出同步振荡活动,例如在睡眠期间产生行波动力学。类似地,在胃肠系统中,神经肌肉细胞协调其振荡以产生慢波活动的传播波。为了估计多变量相位关系的概率分布,现有方法通常依赖于平衡热力学,通过成对指数族分布以玻尔兹曼形式表达系统。然而,这些假设在现实系统中常常被违反,现实系统本质上是动态的,并经常在平衡和非平衡状态之间转换。为了解决这个问题,我们提出了一种估计耦合振荡器概率分布的有效方法,该方法不假设热力学平衡。通过基于Langevin动力学的构建,该方法即使在非平衡状态下也能实现精确建模。最大似然估计方法在高采样率条件下具有闭式代数解,这一条件通常被现代数据采集系统满足,使其易于实际应用。我们在模拟数据上展示了其鲁棒性,在非平衡设置中优于现有方法,并进一步说明了其在表征脑刺激响应中的动态脑行波以及在人胃电生理记录背景下的假设检验中的实用性。

英文摘要

A system of coupled oscillators provides a fundamental framework for modeling a wide range of physical and biological phenomena. In neuroscience, the central nervous system exhibits synchronized oscillatory activity with adjacent brain regions, giving rise to traveling wave dynamics for instance during sleep. Similarly, in the gastrointestinal system, neuromuscular cells coordinate their oscillations to generate propagating waves of slow wave activity. To estimate probability distributions of multivariate phase relationships, existing approaches typically rely on equilibrium thermodynamics, expressing the system in a Boltzmann form through a pairwise exponential family distribution. However, these assumptions are often violated in real-world systems, which are inherently dynamic and frequently transition between equilibrium and non-equilibrium regimes. To address this, we propose an efficient method for estimating the probability distribution of coupled oscillators that does not assume thermodynamic equilibrium. Using a Langevin dynamics-based construction, the approach enables accurate modeling even in non-equilibrium regimes. The maximum likelihood estimation method is shown to have a closed form algebraic solution in the high sampling rate regime, a condition commonly satisfied by modern data acquisition systems, which makes it readily applicable in practice. We demonstrate its robustness on simulated data, where it outperforms existing approaches in non-equilibrium settings, and further illustrate its utility for characterizing dynamic brain traveling waves in response to brain stimulation and in hypothesis testing within the context of electrophysiologic recordings of the human stomach.

2603.00874 2026-06-16 stat.ME 版本更新

Detecting Distributional Differences in Spatially Correlated Multivariate Data via Kernel-Smoothed Rank-Based Empirical Copula Tests

通过核平滑秩经验Copula检验检测空间相关多变量数据的分布差异

Marco Mandap

AI总结 针对非正态和空间自相关的多变量产量分布比较,提出基于核平滑经验Copula过程的非参数空间Cramer-von Mises型检验,通过秩变换和空间核权重控制类型I错误,并建立弱收敛理论。

Comments An error was identified in the underlying distribution proof used for the empirical copula test. The authors are withdrawing this version while finalizing a formally verified proof of the distribution in Lean 4

详情
AI中文摘要

比较跨空间参考农业田块的多变量产量质量分布因两个普遍特征而复杂化:非正态性和空间自相关。经典程序如ANOVA、MANOVA和标准秩检验假设独立性,因此在存在空间依赖性时表现出严重的类型I错误膨胀。我们提出了一种基于从池化分量秩构建的核平滑经验Copula过程的非参数空间Cramer-von Mises型检验。空间核权重明确考虑了局部依赖性,而秩变换消除了对边际分布形式的敏感性。在固定域填充渐近性和多项式α混合条件下,我们建立了平滑经验Copula过程向均值为零的高斯极限的弱收敛,并证明了所得二次检验统计量收敛到限制在K-1维对比子空间上的卡方随机变量的加权和。通过在高斯Copula模型下使用精确离散空间协方差算子校准的Satterthwaite近似获得实际推断。双变量对数正态空间数据的蒙特卡洛实验表明,与变得严重反保守的经典参数和非空间秩方法相比,所提出的检验在不同强度的空间依赖性下保持了名义大小。该程序为精准农业及相关应用领域中比较多变量空间产量分布提供了一个理论上合理且计算可行的框架。

英文摘要

Comparing multivariate yield quality distributions across spatially referenced agricultural fields is complicated by two pervasive features: non-normality and spatial autocorrelation. Classical procedures such as ANOVA, MANOVA, and standard rank tests assume independence and therefore exhibit severe Type I error inflation when spatial dependence is present. We propose a nonparametric spatial Cramer-von Mises-type test based on kernel-smoothed empirical copula processes constructed from pooled componentwise ranks. Spatial kernel weights account explicitly for local dependence, while the rank transformation removes sensitivity to marginal distributional form. Under fixed-domain infill asymptotics and polynomial alpha-mixing conditions, we establish weak convergence of the smoothed empirical copula process to a mean-zero Gaussian limit and show that the resulting quadratic test statistic converges to a weighted sum of chi-squared random variables restricted to the K-1-dimensional contrast subspace. Practical inference is obtained through a Satterthwaite approximation calibrated using the exact discrete spatial covariance operator under a Gaussian copula model. Monte Carlo experiments with bivariate log-normal spatial data demonstrate that the proposed test maintains nominal size across varying strengths of spatial dependence, in contrast to classical parametric and non-spatial rank-based methods, which become severely anti-conservative. The procedure provides a theoretically justified and computationally tractable framework for comparing multivariate spatial yield distributions in precision agriculture and related applied settings.

2412.20316 2026-06-16 stat.ME 版本更新

A Rank-Based Test for Comparing Multiple Fields' Yield Quality Distributions Under Spatial Dependence

空间依赖下多个田地产量质量分布比较的基于秩的检验

Marco Mandap

AI总结 针对空间依赖和非正态性,提出基于秩的检验框架,利用空间核平滑构建稳健经验分布函数,并证明统计量渐近服从加权卡方分布,通过Satterthwaite近似校正空间方差膨胀。

Comments An error was identified in the underlying distribution proof used for the empirical copula test. The authors are withdrawing this version while finalizing a formally verified proof of the distribution in Lean 4

详情
AI中文摘要

比较多个农业田地的产量质量分布是评估管理实践的基础,但两个普遍存在的数据特征——非正态性和空间自相关——使其复杂化。传统的参数检验(如ANOVA)在空间依赖性违反独立性假设时,常遭受严重的I类错误膨胀。本文引入一种新颖的基于秩的检验框架,利用空间核平滑构建稳健的经验分布函数(EDF)。我们建立了在$\alpha$-混合条件下检验统计量的渐近性质,证明其收敛到加权卡方随机变量之和。为便于实际推断,我们采用Satterthwaite近似推导有效自由度,以考虑空间方差'膨胀'。详细发展了理论框架,为所提方法提供了严格基础。模拟研究和实际产量质量数据的应用留待未来工作。

英文摘要

Comparing yield quality distributions across multiple agricultural fields is fundamental for evaluating management practices, yet it is complicated by two pervasive data characteristics: non-normality and spatial autocorrelation. Traditional parametric tests, such as ANOVA, frequently suffer from severe Type I error inflation when the independence assumption is violated by spatial dependence. This paper introduces a novel rank-based test framework that utilizes spatial kernel smoothing to construct robust empirical distribution functions (EDFs). We establish the asymptotic properties of the test statistic under $α$-mixing conditions, proving its convergence to a weighted sum of chi-squared random variables. To facilitate practical inference, we employ a Satterthwaite approximation to derive effective degrees of freedom that account for the spatial 'inflation' of variance. The theoretical framework is developed in detail, providing a rigorous foundation for the proposed method. Simulation studies and applications to real yield quality data are left to future work.

2511.18553 2026-06-16 math.ST stat.ML stat.TH 版本更新

Matching correlated VAR time series

匹配相关VAR时间序列

Ernesto Araya, Hemant Tyagi

AI总结 研究匹配两个相关VAR时间序列数据库的问题,提出概率框架,通过线性分配估计器实现完美或部分恢复,并利用凸松弛高效求解。

详情
AI中文摘要

我们研究了匹配相关VAR时间序列数据库的问题,其中多元时间序列与其扰动和置换版本同时被观测,目标是恢复它们之间的未知匹配。为此,我们引入了一个概率框架,其中两个时间序列$(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$联合生成,使得$x^\#_t=x_{\pi^*(t)}+\sigma \tilde{x}_{\pi^*(t)}$,其中$(x_t)_{t\in[T]},(\tilde{x}_t)_{t\in[T]}$是独立同分布的一阶向量自回归(VAR)时间序列,具有高斯增量,$\pi^*$是隐藏的。目标是从观测$(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$中恢复$\pi^*$。这推广了经典的匹配独立点云问题到时间序列设置。我们推导了最大似然估计(MLE),导致在排列上的二次优化,并从理论上分析了基于线性分配的估计器。对于后一种方法,我们建立了恢复保证,识别出允许完美或部分恢复的$\sigma$阈值。此外,我们提出通过考虑排列矩阵的凸松弛(例如,在Birkhoff多面体上)来求解MLE。这允许通过交替最小化高效估计$\pi^*$和VAR参数。实验上,我们发现线性分配通常匹配或优于基于MLE松弛的方法。

英文摘要

We study the problem of matching correlated VAR time series databases, where a multivariate time series is observed along with a perturbed and permuted version, and the goal is to recover the unknown matching between them. To model this, we introduce a probabilistic framework in which two time series $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$ are jointly generated, such that $x^\#_t=x_{π^*(t)}+σ\tilde{x}_{π^*(t)}$, where $(x_t)_{t\in[T]},(\tilde{x}_t)_{t\in[T]}$ are independent and identically distributed vector autoregressive (VAR) time series of order $1$ with Gaussian increments, for a hidden $π^*$. The objective is to recover $π^*$, from the observation of $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$. This generalizes the classical problem of matching independent point clouds to the time series setting. We derive the maximum likelihood estimator (MLE), leading to a quadratic optimization over permutations, and theoretically analyze an estimator based on linear assignment. For the latter approach, we establish recovery guarantees, identifying thresholds for $σ$ that allow for perfect or partial recovery. Additionally, we propose solving the MLE by considering convex relaxations of the set of permutation matrices (e.g., over the Birkhoff polytope). This allows for efficient estimation of $π^*$ and the VAR parameters via alternating minimization. Empirically, we find that linear assignment often matches or outperforms MLE relaxation based approaches.

2302.14505 2026-06-16 stat.AP stat.ME 版本更新

Nonlinear regression models to forecast PM$_{2.5}$ concentration

基于非线性回归模型的PM$_{2.5}$浓度预测

Jinghong Zeng

AI总结 提出基于非线性回归的PM$_{2.5}$浓度预测模型,包括单值和区间预测,结合NCEP CFS2提高精度,在武汉数据上验证有效。

Comments In Chinese, supervised by Prof. Yurong Chen

详情
AI中文摘要

预测PM$_{2.5}$浓度对于解决武汉的空气污染问题至关重要。本文提出了一种基于非线性回归的PM$_{2.5}$浓度预测模型,包括单值预测模型和区间预测模型。单值预测模型能够精确预测第二天的PM$_{2.5}$浓度,在拟合优度分析中预测偏差约为6 $\mu g/m^3$。区间预测模型能够有效预测高浓度和低浓度天数,在模型验证中覆盖了60%-80%的观测样本。此外,本文将PM$_{2.5}$浓度预测模型与NCEP气候预报系统第2版相结合以实现其预测应用,然后开发了NCEP CFS2的PM$_{2.5}$浓度预测模型以提高预测精度。结果表明,PM$_{2.5}$浓度预测模型具有良好的独立预测能力。

英文摘要

Forecasting PM$_{2.5}$ concentration is important to solving air pollution problems in Wuhan. This paper proposes a PM$_{2.5}$ concentration forecast model based on nonlinear regression, including a single-value forecast model and an interval forecast model. The single-value forecast model can precisely forecast PM$_{2.5}$ concentration for the next day, with forecast bias about 6 $μg/m^3$ in goodness of fit analysis. The interval forecast model can efficiently forecast high-concentration and low-concentration days, which covers 60%-80% observed samples in model validation. Moreover, this paper combines the PM$_{2.5}$ concentration forecast model with NCEP Climate Forecast System Version 2 to realize its forecast application, then develops NCEP CFS2's PM$_{2.5}$ concentration forecast model to enhance forecast accuracy. The results indicate that the PM$_{2.5}$ concentration forecast model has good capacity for independent forecasting.

6. 计算统计与MCMC 23 篇

2606.16985 2026-06-16 stat.ML cs.LG eess.SP nlin.CD stat.ME 新提交

Dynestyx: A Probabilistic Programming Library for Dynamical Systems

Dynestyx: 一个面向动态系统的概率编程库

Daniel Waxman, Dmitry Batenkov, John Feser, Andy Zane, Eli Bingham, Youssef Marzouk, Matthew E. Levine

AI总结 提出dynestyx库,通过统一接口支持状态空间模型的先验指定、混合效应推断及状态与参数估计,实现贝叶斯动态系统分析。

Comments 7 pages

详情
AI中文摘要

状态空间模型(SSMs)是贝叶斯处理动态系统的标准形式,在统计学、信号处理和机器学习中有自然应用。尽管在理论和应用中都很重要,但动态系统已被证明难以融入现代概率编程语言(PPLs),使得最先进的方法对实践者不太可及,并在遵循“贝叶斯工作流”时引入摩擦。我们介绍了dynestyx,一个对SSMs提供一流支持的概率编程库,包括在状态和参数估计方面的最先进方法。通过一个统一的接口,用户可以指定离散时间或连续时间动态系统的任意先验,对混合效应数据进行推断,并进行具有原则性不确定性量化的状态和参数估计。

英文摘要

State-space models (SSMs) are the standard formalism for Bayesian treatment of dynamical systems, with natural applications in statistics, signal processing, and machine learning. Despite their importance in both theory and application, dynamical systems have proven difficult to incorporate in modern probabilistic programming languages (PPLs), making state-of-the-art methods less accessible to practitioners and introducing friction in following the "Bayesian workflow." We introduce dynestyx, a probabilistic programming library with first-class support for SSMs, including state-of-the-art methods in the estimation of both states and parameters. Through a single, unified interface, users may specify arbitrary priors for discrete-time or continuous-time dynamical systems, perform inference over mixed-effect data, and make state and parameter estimates with principled uncertainty quantification.

2606.16138 2026-06-16 stat.ML cs.LG 新提交

Closing the Approximation Gap in Simulation-free Latent SDEs

弥合无模拟潜在随机微分方程中的近似差距

Henry D. Smith, Brian L. Trippe, Scott W. Linderman

发表机构 * Stanford University(斯坦福大学)

AI总结 针对现有无模拟变分推断算法因参数化限制导致后验推断和参数学习性能下降的问题,提出Helmholtz-SDE算法,通过优化与指定边际分布兼容的路径律来弥合近似差距,在保持高效的同时恢复更准确的动力学。

详情
AI中文摘要

从含噪声观测中恢复动力系统是包括神经科学和物理学在内的科学领域中的反复挑战。潜在随机微分方程通过将系统建模为根据可学习SDE演化并生成观测的未观测状态来解决这一问题。变分推断为拟合潜在SDE提供了可处理的目标。传统的VI算法通过在时间离散化上进行数值模拟来评估该目标,在保真度和计算成本之间进行权衡。最近一类算法,即无模拟VI,通过其瞬时边际而不是漂移来参数化后验,从而避开了这种权衡。在这项工作中,我们表明现有无模拟VI算法的效率是有代价的:它们的参数化将近似后验限制为基于模拟的方法可用的SDE的子集,降低了后验推断和参数学习。我们提出了Helmholtz-SDE,一种无模拟VI算法,通过优化与指定边际分布集合兼容的路径律来弥合这一差距。Helmholtz-SDE比先前的无模拟方法更忠实地恢复动力学,在高后验不确定性下增益最大。它进一步以一小部分运行时间匹配基于模拟的VI的性能。

英文摘要

Recovering dynamical systems from noisy observations is a recurring challenge across scientific domains, including neuroscience and physics. Latent stochastic differential equations (SDEs) address this by modeling the system as an unobserved state that evolves according to a learnable SDE and generates the observations. Variational inference (VI) provides a tractable objective for fitting latent SDEs. Traditional VI algorithms evaluate this objective by numerical simulation over a time discretization, trading fidelity for computational cost. A recent class of algorithms, simulation-free VI, sidesteps this tradeoff by parameterizing the posterior through its instantaneous marginals rather than its drift. In this work, we show that the efficiency of existing simulation-free VI algorithms comes at a price: their parameterizations restrict the approximate posterior to a subset of the SDEs available to simulation-based methods, degrading posterior inference and parameter learning. We propose Helmholtz-SDE, a simulation-free VI algorithm that closes this gap by optimizing over path laws compatible with a prescribed collection of marginals. Helmholtz-SDE recovers dynamics more faithfully than prior simulation-free methods, with the largest gains under high posterior uncertainty. It further matches the performance of simulation-based VI at a fraction of the runtime.

2606.16073 2026-06-16 cs.LG stat.ML 新提交

Stop the Sampler! Classifier-Based Adaptive Stopping for Sampling Kernels

停止采样器!基于分类器的采样核自适应停止

Kirill Korolev, Nikita Morozov, Stepan Pavlenko, Esmeralda S. Whitammer, Sergey Samsonov

发表机构 * Stanford University(斯坦福大学)

AI总结 提出将MCMC轨迹终止作为可学习组件,利用非循环生成流网络训练状态依赖分类器,在保证详细平衡条件下自适应停止采样,显著缩短轨迹长度并改善模式覆盖与混合。

Comments ICML 2026 SPIGM Workshop

详情
AI中文摘要

从复杂、未归一化的概率密度中采样是贝叶斯推断和概率建模中的基本挑战。虽然马尔可夫链蒙特卡罗(MCMC)方法提供了渐近保证,但由于固定或手动调整的轨迹长度,它们常常遭受慢混合和高计算成本。在这项工作中,我们提出了一种新颖的框架,将轨迹终止视为采样动力学的可学习组件。通过将MCMC置于非循环生成流网络(GFlowNets)的理论中,我们训练状态依赖的神经分类器来决定轨迹何时到达高密度区域并应终止。我们通过详细平衡条件从理论上建立了最优分类器与目标密度之间的联系,并引入了一种多级训练方案以促进复杂几何中的探索。在各种基准密度上的实验结果表明,与标准MCMC基线相比,我们的方法显著减少了平均轨迹长度,同时改善了模式覆盖和混合。

英文摘要

Sampling from complex, unnormalized probability densities is a fundamental challenge in Bayesian inference and probabilistic modeling. While Markov chain Monte Carlo (MCMC) methods provide asymptotic guarantees, they often suffer from slow mixing and high computational costs due to fixed or manually tuned trajectory lengths. In this work, we propose a novel framework that treats trajectory termination as a learnable component of the sampling dynamics. By framing MCMC within the theory of non-acyclic generative flow networks (GFlowNets), we train state-dependent neural classifiers to decide when a trajectory has reached a high-density region and should terminate. We theoretically establish the connection between optimal classifiers and the target density via detailed balance conditions and introduce a multilevel training scheme to facilitate exploration in complex geometries. Experimental results across various benchmark densities demonstrate that our approach significantly reduces average trajectory lengths while improving mode coverage and mixing compared to standard MCMC baselines.

2606.15962 2026-06-16 stat.ME cs.LG 新提交

p-PSO: A Penalized Particle Swarm Optimization Technique for Finding D-Optimal Designs with Mixed Factors in Generalized Linear Models

p-PSO: 一种用于广义线性模型中混合因子D-最优设计的惩罚粒子群优化技术

Shrabanti Chowdhury, Abhyuday Mandal

发表机构 * Icahn School of Medicine at Mount Sinai(伊坎医学院) University of Georgia(佐治亚大学)

AI总结 提出一种新的惩罚粒子群优化方法p-PSO,通过通用惩罚公式解决广义线性模型中混合因子D-最优设计问题,高效且可直接使用现成PSO算法。

详情
AI中文摘要

寻找广义线性模型(GLMs)的D-最优设计具有挑战性,因为Fisher信息矩阵依赖于未知参数且缺乏闭式解,尤其当输入因子包含离散和连续变量时。尽管经典算法和最近的元启发式方法提供了部分解决方案,但仍需要稳健且计算高效的方法。本文提出了一种惩罚粒子群优化(PSO)方法,称为$p$-PSO。我们引入了一种新的、通用的约束优化惩罚公式,并展示了其在最优设计问题中的有效性。该公式与算法无关,适用于一大类黑箱优化方法。结果表明,该方法非常高效,其主要贡献在于提出了一种惩罚公式,使得可以直接使用现成的PSO算法,并自然地扩展到更一般的约束优化任务。

英文摘要

Finding D-optimal designs for generalized linear models (GLMs) is challenging due to the dependence of the Fisher information matrix on unknown parameters and the lack of closed-form solutions, particularly when input factors include both discrete and continuous variables. Although classical algorithms and recent metaheuristic approaches have offered partial solutions, there remains a need for robust and computationally efficient methods. In this paper, we propose a penalized Particle Swarm Optimization (PSO) approach, named $p$-PSO. Here we introduce a new, general-purpose penalty formulation for constrained optimization and demonstrate its effectiveness in optimal design problems. The formulation is algorithm-agnostic and applicable to a broad class of black-box optimization methods. Results show that the method is highly efficient, with its primary contribution being a penalty formulation that enables the direct use of an off-the-shelf PSO algorithm and extends naturally to more general constrained optimization tasks.

2606.15871 2026-06-16 stat.CO cs.LG stat.ML 新提交

Amortized mean-shift interacting particles

摊销均值漂移交互粒子

Ali Siahkoohi

发表机构 * Department of Computer Science University of Central Florida(计算机科学系佛罗里达中央大学)

AI总结 提出摊销均值漂移交互粒子方法,通过学习映射从观测和少量后验样本直接输出加权节点,无需评估密度或得分,实现比同等数量蒙特卡洛样本更精确的积分估计。

详情
AI中文摘要

逆问题的贝叶斯推断用于评估积分——后验期望、尾部概率和风险——跨观测流。标准估计通过对后验样本的积分求平均,其误差仅随样本量的平方根衰减,因此精度需要大量样本——当每个样本调用偏微分方程正演模型时,这是禁止的。均值漂移交互粒子需要的样本少得多:它们返回一小组带符号权重的节点——一种确定性求积,其加权平均值估计这些积分。然而,寻找节点是一个每次观测的优化,在其最精确的形式中,每一步都读取后验得分——返回它本意要节省的成本。我们引入了摊销均值漂移交互粒子,一种学习映射,在单次前向传递中从观测和几个后验样本输出加权节点。训练仅需要联合参数-观测样本和一个可供抽样的后验——条件归一化流、经验条件或用户能抽样的任何参考——映射仅从样本学习积分该后验,既不评估其密度也不评估其得分。一旦训练完成,它泛化到未见过的观测和任意节点预算的积分,并以两种方式改进独立样本:通过重新加权,证明不劣于蒙特卡洛的等权重;通过移动它们,经验上进一步降低误差。在闭式、抽样、学习和基于物理的后验中——直到一千个系数的地下水场——它在每个预算下比相同数量的样本更准确地积分,并且后验白化、维度感知核消除了高维障碍。结果是蒙特卡洛积分的帕累托改进,而非与抽取更多样本竞争。

英文摘要

Bayesian inference for inverse problems is run to evaluate integrals -- posterior expectations, tail probabilities, and risks -- across a stream of observations. The standard estimate averages the integrand over posterior samples, a Monte-Carlo average whose error decays only as the square root of the sample size, so accuracy demands many samples -- prohibitive when each one calls a partial-differential-equation forward model. Mean-shift interacting particles need far fewer: they return a small set of signed-weight nodes -- a deterministic quadrature whose weighted averages estimate those integrals. Finding the nodes, however, is a per-observation optimization that, in its most accurate form, reads the posterior score at every step -- returning the cost it meant to save. We introduce amortized mean-shift interacting particles, a learned map that emits the weighted nodes from an observation and a few posterior samples in a single forward pass. Training asks only for joint parameter-observation samples and a posterior to draw from -- a conditional normalizing flow, an empirical conditional, or any reference the user can sample -- and the map learns to integrate that posterior from samples alone, evaluating neither its density nor its score. Once trained, it generalizes to unseen observations and integrands at any node budget and improves on independent samples in two ways: by reweighting them, provably no worse than the equal weights of Monte-Carlo; and by moving them, which empirically lowers it further. Across closed-form, sampled, learned, and physics-based posteriors -- up to a thousand-coefficient groundwater field -- it integrates more accurately than the same number of samples at every budget, and a posterior-whitened, dimension-aware kernel removes the high-dimensional wall. The result is a Pareto improvement on Monte-Carlo integration, not a competitor to drawing more samples.

2606.15793 2026-06-16 cs.LG cs.AI stat.ML 新提交

Proximal Policy Optimization for Amortized Discrete Sampling

用于摊销离散采样的近端策略优化

Anna Zykova-Myzina, Timofei Gritsaev, Daniil Tiapkin, Nikita Morozov

发表机构 * HSE University(高等经济学院) Constructor University(康斯特大学) CMAP, CNRS, École polytechnique, IPP(CMAP,CNRS,巴黎综合理工学院,IPP)

AI总结 本文在生成流网络框架下,推导了策略梯度算法并首次应用近端策略优化,提升了离散概率分布采样的收敛速度和数据效率。

详情
AI中文摘要

本文探讨了在生成流网络(GFlowNet)框架下,使用策略梯度算法训练随机策略以从结构化离散概率分布中采样。基于GFlowNet与熵正则化强化学习之间的广泛理论联系,我们推导了用于训练GFlowNet的标准策略梯度算法的等价形式,并实验性地探索了其各种方法论方面,包括基线训练和优势估计。最重要的是,我们的工作是首次推导并成功将近端策略优化应用于GFlowNet,在从合成能量到分子图生成的基准测试中,与标准GFlowNet训练目标相比,显示出更快的收敛速度和更高的数据效率。

英文摘要

This paper explores policy gradient algorithms for training stochastic policies to sample from structured discrete probability distributions under the Generative Flow Network (GFlowNet) framework. Building on extensive theoretical connections between GFlowNets and entropy-regularized reinforcement learning, we derive equivalents of standard policy gradient algorithms for training GFlowNets, as well as experimentally explore their various methodological aspects, including baseline training and advantage estimation. Most importantly, our work is the first to derive and successfully apply proximal policy optimization to GFlowNets, showing its improved convergence speed and data efficiency compared to standard GFlowNet training objectives on benchmarks ranging from synthetic energies to molecular graph generation.

2606.15725 2026-06-16 stat.CO stat.ML 新提交

Score-Based Martingale Posteriors for Deep Neural Networks

基于得分的鞅后验分布用于深度神经网络

Abylay Zhumekenov, Ajay Jasra, Mohamed Maama, Raul Tempone

AI总结 研究将基于得分的鞅后验分布(SMP)应用于大规模机器学习,通过随机梯度上升递归构建参数鞅序列,实现快速不确定性量化,并与蒙特卡洛方法对比。

Comments 20 pages, 7 figures, 6 tables, appendix

详情
AI中文摘要

本文研究了基于得分的鞅后验分布(SMP)(Cui & Walker, 2025; Fong et al., 2023)在现代大规模机器学习问题中的有效性及其在不确定性量化方面的潜力。SMP在随机模型的参数空间上采用随机梯度上升型递归,并在参数空间上构造一个鞅。在简单的数学假设下,可以构建递归使得参数形成一个鞅序列,该序列具有一个极限随机变量,后者可以非常快速地模拟,这与马尔可夫链蒙特卡洛等基于蒙特卡洛的方法形成对比。在这篇说明性论文中,我们探讨了SMP用于推断深度神经网络(DNN)的参数,并在可行的情况下,将我们的结果与旨在推断传统贝叶斯后验的最先进的蒙特卡洛方法进行比较。

英文摘要

In this paper we investigate the efficacy of the score-based martingale posteriors (SMP) (Cui & Walker, 2025; Fong et al., 2023) in the context of modern and large-scale machine learning problems and its potential for meaningful uncertainty quantification. SMPs work with a stochastic gradient ascent-type recursion on the parameter space of stochastic models and construct a martingale on the parameter space. Under simple mathematical assumptions, the recursion can be built so that the parameters form a martingale sequence which possesses a limiting, in time, random variable, the latter of which can be simulated very quickly, in contrast to Monte Carlo-based methods such as Markov chain Monte Carlo. In this expository paper we explore the SMP for inferring the parameters of deep neural networks (DNNs) and, where feasible, compare our results to the state-of-the-art Monte Carlo methods aimed at inferring conventional Bayesian posteriors.

2606.15679 2026-06-16 stat.ML cs.LG cs.NA math.NA 新提交

Stochastic trace estimation with tensor train random vectors

基于张量列随机向量的随机迹估计

Zvonimir Bujanović, Daniel Kressner, Hrvoje Olić

发表机构 * University of Zagreb, Faculty of Science, Department of Mathematics(Zagreb大学科学学院数学系) Institute of Mathematics, EPFL(EPFL数学研究所)

AI总结 研究使用高斯随机张量列向量进行随机迹估计,证明适当秩下可恢复维度无关保证,并应用于Nyström++框架。

详情
AI中文摘要

随机迹估计是一种标准工具,用于近似仅通过矩阵-向量乘积可获得的大规模矩阵的迹。然而,在张量结构设置中,非结构化的高斯或Rademacher测试向量在存储和计算上可能过于昂贵,而更便宜的秩一张量积向量可能需要随张量阶数指数增长的样本复杂度。本文研究高斯随机张量列向量作为随机迹估计的结构化替代方案。我们证明,通过适当选择张量列秩,随机张量列向量可以恢复Girard-Hutchinson估计器的维度无关保证。特别地,基于张量列秩$r \geq d-1$的中位数均值变体在精度$\varepsilon$和失败概率$\delta$上实现了与基于非结构化高斯向量的经典估计器相同的依赖性。我们进一步证明了由独立高斯随机张量列向量形成的草图的一个无意识子空间注入结果:张量列秩$r\geq d-1$和$\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$个样本足以用于$k$维目标子空间。最后,我们研究了此类草图在Nyström++框架中的应用。我们证明,在额外的谱尾条件下,所得估计器可以实现所需的$\mathcal{O}(\varepsilon^{-1})$样本复杂度。这些结果阐明了随机张量列向量在随机迹估计中的潜力和局限性。

英文摘要

Stochastic trace estimation is a standard tool for approximating the trace of a large-scale matrix available only through matrix-vector products. However, in tensor-structured settings, unstructured Gaussian or Rademacher test vectors may be prohibitively expensive to store and compute with, while cheaper rank-one tensor-product vectors can require sample complexities that grow exponentially with the tensor order. This work studies Gaussian random tensor train vectors as a structured alternative for stochastic trace estimation. We show that, with a suitable choice of the tensor train rank, random tensor train vectors recover dimension-independent guarantees for the Girard--Hutchinson estimator. In particular, a median-of-means variant with tensor train rank $r \geq d-1$ achieves the same dependence on the accuracy $\varepsilon$ and failure probability $δ$ as the classical estimator based on unstructured Gaussian vectors. We further prove an oblivious subspace injection result for sketches formed from independent Gaussian random tensor train vectors: tensor train rank $r\geq d-1$ and $\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$ samples suffice for a $k$-dimensional target subspace. Finally, we investigate the use of such sketches within the Nyström++ framework. We show that the resulting estimator can achieve the desired $\mathcal{O}(\varepsilon^{-1})$ sample complexity under an additional spectral-tail condition. These results provide clarififcation on both the potential and the limitations of random tensor train vectors in stochastic trace estimation.

2606.15458 2026-06-16 stat.ML cs.LG 新提交

Structured Nonparametric Variational Inference for Dependent Latent Modeling

面向依赖潜变量建模的结构化非参数变分推断

Yuda Shao, Zhiling Gu, Shan Yu

AI总结 提出结构化非参数变分推断(SN-VI),利用多元样条技术建模后验分布中潜变量的复杂依赖关系,无需均值场假设,具有理论保证和自动依赖发现能力,在计算机视觉和空间转录组学中表现优异。

详情
AI中文摘要

变分推断(VI)是现代人工智能的核心引擎,能够实现大规模概率和生成模型的可扩展近似贝叶斯学习及不确定性感知训练。本文提出结构化非参数变分推断(SN-VI),一种利用多元样条技术对后验近似中潜变量间的复杂依赖关系进行建模的新框架。与依赖均值场假设的传统方法不同,SN-VI保留了复杂的潜变量依赖关系,能够灵活且准确地逼近任意形状的后验分布。我们建立了严格的理论保证,包括变分目标下界的推导以及后验估计渐近一致性的证明。为便于实际实现,我们开发了一种算法,可自动识别依赖潜变量及其底层依赖结构,无需手动指定。模拟研究验证了SN-VI在逼近具有有界支撑和复杂依赖的后验分布方面的有效性。该方法已成功应用于高维结构化数据,包括计算机视觉数据集和空间转录组学。在这些应用中,SN-VI展示了改进的生成模型性能,并通过学习到的依赖结构有效揭示了耦合的生物信号。

英文摘要

Variational inference (VI) is a core engine of modern AI, enabling scalable approximate Bayesian learning and uncertainty-aware training of large probabilistic and generative models. In this paper, we propose Structured Nonparametric Variational Inference (SN-VI), a novel framework for modeling complex dependencies among latent variables in posterior approximation, leveraging multivariate spline techniques. Unlike traditional methods that rely on the mean-field assumption, SN-VI preserves intricate latent variable dependencies, providing a flexible and accurate approximation of posteriors with arbitrary shapes. We establish rigorous theoretical guarantees, including the derivation of the lower bound for the variational objective and proof of asymptotic consistency in posterior estimation. To facilitate practical implementation, we develop an algorithm that automatically identifies dependent latent variables and their underlying dependence structure, without requiring manual specification. Simulation studies validate the effectiveness of SN-VI in approximating posterior distributions with bounded support and complex dependencies. The proposed method has been successfully applied to high-dimensional structured data, including computer vision datasets and spatial transcriptomics. In these applications, SN-VI demonstrates improved generative model performance and effectively uncovers coupled biological signals through the learned dependency structure.

2606.15442 2026-06-16 stat.ML cs.LG 新提交

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

正定矩阵的反向望远镜坐标系:几何、计算与生成建模

Anindya Bhadra

发表机构 * Purdue University(普渡大学)

AI总结 提出一种新的无约束坐标系,通过反向望远镜映射表示对称正定矩阵,实现雅可比仅依赖对数行列式、矩阵与逆矩阵的符号表示,并设计分裂体积-形状流模型用于生成建模。

详情
AI中文摘要

我们设计了一种新的无约束坐标系,其中 $p\times p$ 对称正定(SPD)矩阵 $\Theta$ 由反向望远镜映射 $\Theta(x)=\rm{RT}(x)$ 表示,其中 $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$ 分别代表对数体积或对数行列式;以及形状,由对数相对对角尺度与节点间的部分协方差编码。这一构造产生了其他坐标图(如矩阵对数)所不具备的重要性质,例如雅可比仅依赖于对数行列式。我们构造的一个有用特性是 $x$ 包含矩阵及其逆的无损符号表示。许多涉及矩阵及其逆的重要计算可以在变换域中以 $O(p^2)$ 完成,而将结果以矩阵形式呈现(按需)才需要 $O(p^3)$ 成本。此外,变换域中两个单位行列式矩阵可以通过一条路径上单位行列式的直线连接。对于生成建模,这允许设计一个分裂体积-形状流模型,通过条件流匹配在单位行列式路径上传输形状,并有一个独立的一维流传输体积或行列式。令人生畏的SPD约束被驯服为强大的引导力,带来令人惊讶的洞察:在某种意义上,为SPD设计体积归一化的形状流比无约束的 $\mathbb{R}^{p\times p}$ 更容易,因为后者没有内在的体积概念来辅助归一化,而SPD矩阵的行列式则提供了这一点。我们将我们的构造应用于高达 $p=200$ 的SPD矩阵生成建模,针对一个困难的合成双峰目标,以及通过fMRI数据训练的模型生成脑连接网络;还应用于SPD流形上的内在扩散。

英文摘要

We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $Θ$ is represented by a reverse telescoping map $Θ(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

2606.15414 2026-06-16 cond-mat.dis-nn cond-mat.stat-mech stat.CO 新提交

Cluster-based Message-Passing (CluMP) Optimization for Complex QUBO Problems

基于聚类的消息传递(CluMP)优化复杂QUBO问题

Paolo Rissone, Stefan Boetcher, Alfonso Amendola, Simone Sala, Federico Ricci-Tersenghi

AI总结 提出CluMP算法,通过信念传播控制聚类内阻挫,实现自旋集体更新,在稀疏图上以更少操作达到更低能量,优于局部更新启发式方法。

Comments 8 pages, 4 figures, 1 table

详情
AI中文摘要

二次无约束布尔优化(QUBO)问题在工业应用和科学研究中广泛存在。QUBO问题对应于定义在通常稀疏且异质图上的伊辛自旋系统的优化。当QUBO问题包含冲突请求时,相应的伊辛系统受挫,产生复杂的能量景观,难以探索和优化。尽管有广泛的算法和硬件发展,在这些系统中找到低能构型仍然具有挑战性(例如,局部更新启发式方法通常陷入亚稳态),特别是当(可能受挫的)相互作用产生扩展的相关域时。我们引入CluMP(基于聚类的消息传递),一种利用信念传播(BP)信息对自旋连接聚类进行集体更新的算法。通过控制聚类内的阻挫程度,CluMP使得BP在大子图上收敛,并提出了涉及单次移动中多达数百个自旋的非局域重排。我们在几种图拓扑(包括随机正则图和二维、三维晶格正则图)上定义的旋玻璃模型上,将CluMP与最先进的局部更新启发式方法进行基准测试。聚类移动始终如一地绕过局部陷阱,并以比单自旋动力学更少的有效操作达到更低的能量。这些结果表明,容忍阻挫的聚类更新可以在稀疏图上高效实现。CluMP框架为大规模组合优化和推理问题提供了一种可扩展的策略,其中利用中长程相关性是导航复杂能量景观的关键。

英文摘要

Quadratic Unconstrained Boolean Optimization (QUBO) problems are widespread in both industrial applications and scientific studies. A QUBO problem corresponds to the optimization of a system of Ising spins defined on a generally sparse and heterogeneous graph. When the QUBO problem contains conflicting requests, the corresponding Ising system is frustrated, generating a complex energy landscape, which is hard to explore and optimize. Despite extensive algorithmic and hardware developments, finding low-energy configurations in these systems remains challenging (e.g., local-update heuristics typically become trapped in metastable states), especially when the (possibly frustrated) interactions generate extended correlated domains. We introduce CluMP (Cluster-based Message-Passing), an algorithm that performs collective updates on connected clusters of spins using information from Belief Propagation (BP). By controlling the amount of frustration within clusters, CluMP enables BP convergence on large subgraphs and proposes nonlocal rearrangements involving up to hundreds of spins in a single move. We benchmark CluMP against state-of-the-art local-update heuristics on spin-glass models defined on several graph topologies, including random regular graphs and lattice regular graphs in two and three dimensions. Cluster moves consistently bypass local trapping and reach lower energies with fewer effective operations than single-spin dynamics. These results demonstrate that frustration-tolerant cluster updates can be implemented efficiently on sparse graphs. The CluMP framework provides a scalable strategy for large-scale combinatorial optimization and inference problems, where exploiting medium- and long-range correlations is key to navigating complex energy landscapes.

2606.15360 2026-06-16 stat.ME physics.data-an stat.CO 新提交

Generating-Element Maximum Entropy for Non-Gaussian Uncertainty Evaluation

非高斯不确定性评估的生成元最大熵方法

Serhii Zabolotnii

AI总结 提出生成元最大熵框架,通过选择不同的生成元(如分数幂、三角函数、对数有理函数)替代传统单项式约束,解决非高斯分布重建中的条件数、可行性及尾部建模问题,在双峰混合和重尾分布上显著提升精度。

Comments 29 pages, 3 figures, 8 tables, 1 algorithm. Reproducibility code (base R): https://github.com/SZabolotnii/Ku-MaxEnt-code-supplement

详情
AI中文摘要

矩约束最大熵(MaxEnt)在不确定性评估(GUM)和可靠性分析中从少量矩重建概率密度。经典方法使用单项式约束 x^i。我们证明单项式仅是 Kunchenko 分解空间的一种生成元选择,且该选择(而非求解器)决定了哪些密度可表示以及对偶问题的条件数。我们在同一对偶求解器下研究三种生成元:分数幂生成元(PATP),将分数矩指数选择简化为有符号支撑上的一维扫描;三角函数(特征函数)生成元,其约束对所有分布存在且保持对偶 Hessian 有界;对数有理生成元 log(1+(x/s)^2),其单一约束产生 Student/Cauchy 族 (1+(x/s)^2)^lambda,表示前两者无法产生的代数尾部。奇偶可容许性定理表明,奇函数生成元无法表示任何非均匀对称密度;统一教训是匹配生成元与目标尾部类别的设计映射。实验上,在双峰高斯混合上,扫描选择的分数成员将重建 MSE 比六矩单项式基线降低 8.5 倍(所有 20 个种子),而三角函数生成元条件数最佳。在重尾分布上,分数生成元在单项式 MaxEnt 不可行时恢复可行性(19/20 种子)并重建主体(KS 0.068)但非尾部,而匹配的对数生成元从单一约束恢复 Cauchy 尾部指数。方差最优规则(oPMM-alpha)为报告的功能选择生成元。解析乘积矩评估器使测量与验证优化适应度完全确定且比蒙特卡洛更快,消除其噪声引起的违规。

英文摘要

Moment-constrained maximum entropy (MaxEnt) reconstructs probability densities from a few moments in uncertainty evaluation (GUM) and reliability analysis. The classical method uses monomial constraints x^i. We show that monomials are merely one choice of generating element of the underlying Kunchenko decomposition space, and that this choice -- more than the solver -- governs which densities are representable and how well-conditioned the dual problem is. We study three elements under one dual solver: a fractional-power element (PATP) that reduces fractional-moment exponent selection to a one-dimensional scan on signed supports; a trigonometric (characteristic-function) element whose constraints exist for every distribution and keep the dual Hessian bounded; and a logarithmic-rational element log(1+(x/s)^2) whose single constraint yields the Student/Cauchy family (1+(x/s)^2)^lambda, representing algebraic tails the first two do not produce. A parity-admissibility theorem shows that an element of odd functions cannot represent any non-uniform symmetric density; the unifying lesson is a design map matching the element to the target's tail class. Empirically, on a bimodal Gaussian mixture the scan-selected fractional member cuts reconstruction MSE by 8.5x over the six-moment monomial baseline (all 20 seeds), while the trigonometric element is best-conditioned. On heavy tails the fractional element restores feasibility where monomial MaxEnt is infeasible (19/20 seeds) and reconstructs the body (KS 0.068) but not the tail, whereas the matched logarithmic element recovers the Cauchy tail index from one constraint. A variance-optimal rule (oPMM-alpha) selects the element for the reported functional. An analytical product-moment evaluator makes a measurement-and-verification optimization fitness exactly deterministic and faster than Monte Carlo, removing its noise-induced violations.

2606.14854 2026-06-16 hep-ph stat.ML 新提交

Event Generation with Parallel Langevin Sampling and Learned Stein Diagnostics

基于并行Langevin采样和学习的Stein诊断的事件生成

Rob Verheyen

AI总结 提出并行欠阻尼Langevin链生成无权重事件,用学习的Stein差异作为收敛诊断,应用于树级uū→Z+ng事件生成,仅需少量精确目标Langevin步,神经网络初始化可进一步减少计算。

Comments 13 pages, 4 figures

详情
AI中文摘要

高效的事件生成是精确对撞机现象学中的一个主要计算挑战,特别是对于高多重性末态,其中矩阵元评估昂贵且拒绝采样效率低。我们研究了一种基于多个并行欠阻尼Langevin链的替代方法,从每条链中保留一个末态以获得无权重事件,同时避免链内自相关。使用学习的Stein差异作为收敛诊断,提供松弛时间的数据驱动估计。我们将该方法应用于树级$u\ar u\ o Z+n g$事件生成,发现松弛仅需要适度数量的精确目标Langevin步,且随研究的多重性增长缓慢。最后,我们表明简单的神经网络代理初始化可以显著减少所需的精确矩阵元和梯度评估次数。

英文摘要

Efficient event generation is a major computational challenge for precision collider phenomenology, especially for high-multiplicity final states where matrix-element evaluations are expensive and rejection-sampling efficiencies are low. We study an alternative approach based on many parallel underdamped Langevin chains, retaining one terminal state from each chain to obtain unweighted events while avoiding within-chain autocorrelation. A learned Stein discrepancy is used as a convergence diagnostic, providing a data-driven estimate of the relaxation time. We apply the method to tree-level $u\bar u\to Z+n g$ event generation and find that relaxation requires only a modest number of exact-target Langevin steps, with mild growth over the multiplicities studied. Finally, we show that simple neural-network surrogate initialization can substantially reduce the required number of exact matrix-element and gradient evaluations.

2606.14743 2026-06-16 stat.CO math.PR 新提交

Delayed acceptance sampling with Hamiltonian proposal subchains for random field materials inference

基于哈密顿子链的延迟接受采样用于随机场材料推断

Simona Bérešová, Michal Béreš, Tomáš Luber, Stanislav Sysala

AI总结 提出延迟接受Metropolis-Hastings算法结合神经网络代理模型和哈密顿子链,加速贝叶斯反问题中正向模型评估占主导的MCMC采样,应用于岩土工程非平稳高斯随机场参数推断。

Comments 21 pages, 15 figures

详情
AI中文摘要

本文关注在正向模型评估主导计算成本的贝叶斯反问题中加速马尔可夫链蒙特卡洛采样。它建立在先前在相关场景中使用过的几个成熟要素之上:延迟接受、神经网络代理模型、哈密顿提议和提议子链。主要框架是Christen和Fox(2005)的延迟接受Metropolis-Hastings算法。第一阶段提议分布由针对代理后验的哈密顿轨迹子链构建。对于每个固定的代理模型,哈密顿子链和延迟接受校正定义了一个相对于精确后验不变的核。在本文中,代理仅在烧入阶段更新,之后的生产运行使用固定的代理模型。采样框架使用并行进程在Python中实现。多个链并行生成,并共享一个在烧入期间在所有收集数据上训练的单一代理模型。正向模型被视为黑箱;因此应用领域广泛。然而,主要动机是高效求解材料属性由高斯随机场表示的岩土工程反问题。在本研究中,采样框架应用于一个岩土工程反问题,其中水力传导率和孔隙度被建模为非平稳高斯随机场,并使用截断的Karhunen-Loeve展开近似。基于预计算,分别选择水力传导率和孔隙度的截断维度。正向模型输出是控制点和选定观测时间的孔隙压力值。这些值与加拿大地下实验室隧道密封实验期间一年内收集的原位孔隙压力测量值进行比较。

英文摘要

This paper focuses on accelerating Markov chain Monte Carlo sampling in Bayesian inverse problems in which forward model evaluations dominate the computational cost. It builds on several established ingredients previously used in related scenarios: delayed acceptance, neural network surrogate models, Hamiltonian proposals, and proposal subchains. The main framework is the delayed-acceptance Metropolis-Hastings algorithm of Christen and Fox (2005). The first-stage proposal distribution is constructed from a subchain of Hamiltonian trajectories targeting the surrogate posterior. For each fixed surrogate model, the Hamiltonian subchain and delayed-acceptance correction define a kernel invariant with respect to the exact posterior. In the present work, the surrogate is updated only during a burn-in phase, after which the production run uses a fixed surrogate model. The sampling framework is implemented in Python using parallel processes. Several chains are generated in parallel and share a single surrogate model trained during burn-in on all collected data. The forward model is treated as a black box; therefore, the application area is broad. However, the main motivation is efficient solution of geotechnical inverse problems with material properties represented by Gaussian random fields. In this study, the sampling framework is applied to a geotechnical inverse problem in which hydraulic conductivity and porosity are modeled as non-stationary Gaussian random fields approximated using truncated Karhunen-Loeve expansions. Based on a precomputation, the truncation dimensions are chosen separately for hydraulic conductivity and porosity. The forward model outputs are pore pressure values at control points and selected observation times. These are compared with in situ pore pressure measurements collected over one year during the Tunnel Sealing Experiment in an underground laboratory in Canada.

2606.08946 2026-06-16 stat.CO physics.chem-ph physics.comp-ph 新提交

A Diffusion Monte Carlo algorithm employing depth first traversal and a stack instead of a swarm

一种采用深度优先遍历和栈而非群体的扩散蒙特卡罗算法

Bastiaan J. Braams

AI总结 提出基于深度优先遍历和栈的扩散蒙特卡罗算法(DMCD),通过栈管理分裂历史,相比传统广度优先群体方法更节省内存,并统一了特征值问题与线性方程问题的算法处理。

Comments 12 pages. The code in the original (v1) Arxiv submission could randomly get trapped in a cycle where the same walker is all the time restarted with ever decreasing weight. The issue is described and addressed in this (v2) submission

详情
AI中文摘要

扩散蒙特卡罗(DMC)和用于粒子输运的蒙特卡罗方法(带重要性抽样)都涉及加权游走者的模拟,这些游走者经历出生和死亡过程(分裂和俄罗斯轮盘赌)。这些方法的既定实现截然不同:粒子模拟蒙特卡罗使用栈来处理分裂历史,而传统DMC则跟踪一群游走者。粒子模拟蒙特卡罗方法对访问过的构型进行深度优先遍历,而传统DMC方法可视为广度优先遍历。在本工作中,描述了基于深度优先、栈的DMC实现,并给出了完整代码。深度优先方法(此处称为DMCD)在总内存以及内存层次结构和协处理器的使用方面可能比广度优先方法更节省内存。该实现对于群体控制和后代加权非常自然,并统一了特征值问题(DMC)与线性方程问题(粒子输运)的算法处理。DMCD中存在而广度优先方法中没有的一个问题(本文成功解决了)是,当需要新游走者而栈为空时,需要维护一个起始者池。DMCD方法有潜力成为许多DMC应用的首选实现。

英文摘要

Diffusion Monte Carlo (DMC) and Monte Carlo for particle transport with importance sampling both involve simulations of weighted walkers that undergo birth and death processes (splitting and Russian Roulette). The established implementations of these methods are quite different: Particle simulation Monte Carlo employs a stack to handle the splitting history whereas in traditional DMC one follows a swarm of walkers. The particle simulation Monte Carlo approach involves a depth first traversal of the visited configurations whereas the traditional DMC approach may be seen as a breadth first traversal. In the present work the implementation of a depth first, stack based approach to DMC is described and a complete code is presented. The depth first approach, called DMCD here, can be more memory efficient than the breadth first approach, both for total memory and for use of a memory hierarchy and of co-processors. The implementation appears very natural for population control and for descendant weighting and it unifies algorithmic treatment of the eigenvalue problem (DMC) with the linear equation problem (particle transport). A concern with DMCD that is not present in the breadth first approach, and that is successfully addressed here, is the need to maintain a pool of starters for use when a new walker is required and the stack is empty. The DMCD approach appears to have the potential to become the preferred implementation for many DMC applications.

2605.03573 2026-06-16 stat.ML cs.LG 版本更新

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

随机薛定谔扩散模型用于纯态集合生成

Jian Xu, Wei Chen, Shigui Li, Chao Li, Jingyuan Zheng, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS RIKEN AIP South China University of Technology(华南理工大学) Stanford University(斯坦福大学) Columbia University(哥伦比亚大学)

AI总结 本文提出随机薛定谔扩散模型(SSDMs),在复射影空间CP^{d-1}上构建基于分数的生成框架,通过局部欧几里得奥本海姆-乌尔申贝格近似实现无解析过渡密度的训练,提升量子机器学习的泛化能力。

详情
AI中文摘要

在量子机器学习(QML)中,经典数据通常被编码为量子纯态并直接处理为量子表示,推动了在底层表示层面生成模型的发展,该模型从底层纯态集合中采样新量子态,而非从扰动的经典输入重新准备。然而,将具有明确反向时间采样器的分数扩散模型扩展到量子纯态集合仍具挑战性,由于复射影空间CP^{d-1}的非欧几里得几何和过渡密度的不可行性。我们提出了随机薛定谔扩散模型(SSDMs),一种内在的基于分数的生成框架,配备了Fubini-Study(FS)度量。SSDMs通过随机薛定谔方程(SSE)实现正向黎曼扩散,并推导出由黎曼分数∇_{FS} log p_t驱动的反向时间动力学。为了在没有解析过渡密度的情况下进行训练,我们引入了一个基于FS正常坐标中局部欧几里得奥本海姆-乌尔申贝格近似的局部时间目标,从而得到一个映射回流形的解析教师分数。实验表明,SSDMs能够忠实捕捉目标纯态集合的统计特性,包括可观测量的矩、重叠核MMD和纠缠度量,并且SSDM生成的量子表示通过表示层面的数据增强提升了下游QML的泛化能力。

英文摘要

Quantum machine learning increasingly relies on pure-state representations, motivating generative models that sample directly in quantum representation space rather than perturbing classical inputs and re-encoding. We introduce Stochastic Schrödinger Diffusion Models (SSDMs), a score-based generative framework that defines diffusion, scores, and reverse-time sampling intrinsically on the complex projective manifold $\mathbb{CP}^{d-1}$ under the Fubini--Study metric. SSDMs combine a Riemannian Ornstein--Uhlenbeck forward diffusion with a stochastic Schrödinger realization, and learn reverse-time dynamics driven by the Riemannian score. Our central technical contribution is a local-time learning objective that exploits the local Euclidean OU limit of intrinsic manifold diffusions in Fubini-Study normal coordinates to obtain an analytic teacher score, bypassing the intractable transition densities that limit existing Riemannian score-based models. Across synthetic, physics-inspired (TFIM, XXZ), and quantum feature-state benchmarks up to $14$ qubits, SSDMs match target pure-state ensembles by orders of magnitude on MMD and observable statistics over both ambient Euclidean and matched Riemannian score-based baselines, and improve representation-level diagnostics for downstream quantum kernel methods.

2604.23952 2026-06-16 stat.ML cs.LG nlin.CD 版本更新

Conditional Score-Based Modeling of Effective Langevin Dynamics

基于条件分数的有效朗之万动力学建模

Ludovico T. Giorgini

发表机构 * Department of Mathematics, Massachusetts Institute of Technology(数学系,麻省理工学院)

AI总结 提出一种基于有限时间转移密度条件分数的随机降阶模型校准方法,通过最小二乘拟合从数据中推断漂移和扩散系数,避免轨迹微分或状态空间划分。

详情
AI中文摘要

随机降阶模型广泛用于表示复杂系统的有效动力学,但根据数据估计其漂移和扩散系数仍然具有挑战性。标准方法通常依赖于短时间轨迹增量、状态空间划分或候选模型的重复模拟,这些方法对于高维系统、粗时间采样或非均匀采样数据变得不可靠或计算成本高昂。我们引入了一种数据驱动的校准方法,该方法基于随机降阶模型系数与有限时间转移密度的条件分数(定义为转移密度对初始状态的对数梯度)之间的新关系。由此得到的恒等式将滞后相关函数的导数表示为观测到的滞后对上的平稳期望,其中涉及该条件分数和未知模型系数。这种公式允许直接从有限滞后统计量约束漂移和扩散结构,而无需在校准过程中对轨迹进行微分、划分状态空间或重复积分候选降阶模型,从而产生一个关于平稳滞后对的最小二乘拟合问题。我们在三个复杂度递增的系统上验证了该方法:一个解析可解的Cox-Ingersoll-Ross扩散过程、一个具有仿射乘性噪声的二维非平衡扩散过程,以及一个周期性的软自旋随机朗道-利夫希茨链。在这些测试中,推断出的模型在再现有限滞后动力学相关性的同时保持了不变统计量。该框架为从数据中学习再现规定统计和动力学性质的随机降阶模型提供了一种可扩展的途径。

英文摘要

Stochastic reduced-order models are widely used to represent the effective dynamics of complex systems, but estimating their drift and diffusion coefficients from data remains challenging. Standard approaches often rely on short-time trajectory increments, state-space partitioning, or repeated simulation of candidate models, which become unreliable or computationally expensive for high-dimensional systems, coarse temporal sampling, or unevenly sampled data. We introduce a data-driven calibration method based on a novel relationship between the coefficients of a stochastic reduced model and the conditional score of the finite-time transition density, defined as the gradient of the logarithm of the transition density with respect to the initial state. The resulting identity expresses derivatives of lagged correlation functions as stationary expectations over observed lagged pairs involving this conditional score and the unknown model coefficients. This formulation allows the drift and diffusion structure to be constrained directly from finite-lag statistics, without differentiating trajectories, partitioning state space, or repeatedly integrating candidate reduced models during calibration, yielding a least-squares fitting problem over stationary lagged pairs. We validate the approach on three systems of increasing complexity: an analytically tractable Cox--Ingersoll--Ross diffusion, a two-dimensional nonequilibrium diffusion with affine multiplicative noise, and a periodic soft-spin stochastic Landau--Lifshitz chain. Across these tests, the inferred models preserve the invariant statistics while reproducing finite-lag dynamical correlations. The framework provides a scalable route for learning stochastic reduced-order models from data that reproduce prescribed statistical and dynamical properties.

2601.16470 2026-06-16 stat.ME physics.data-an 版本更新

Variational Dimension Lifting for Robust Tracking of Nonlinear Stochastic Dynamics

变分维度提升用于非线性随机动力学的鲁棒跟踪

Yonatan L. Ashenafi

AI总结 提出一种变分维度提升框架,将非线性状态空间模型转化为高维线性随机表示,从而利用高效线性滤波技术跟踪非线性随机动力学,并通过三个模型验证其准确性和鲁棒性。

详情
AI中文摘要

非线性随机运动对贝叶斯粒子跟踪提出了重大挑战。为了解决这一挑战,我们提出了一种提升框架,该框架构建了非线性状态空间模型的高维线性随机表示。所得的替代模型能够使用计算高效的线性滤波技术,同时保持与底层非线性动力学的直接联系。本文利用伊藤引理和变分法推导了此类变换的必要条件,并在双稳态三次运动模型、径向布朗运动模型和具有乘性噪声的逻辑模型上展示了该方法。模拟结果证实,变换后的线性系统在投影回原空间时,能够准确重建非线性动力学,并且在刚性和奇异性的不同区域中,跟踪精度与传统滤波器相当,同时避免了它们的结构不稳定性。

英文摘要

Nonlinear stochastic motion presents significant challenges for Bayesian particle tracking. To address this challenge, we propose a lifting framework that constructs a higher-dimensional linear stochastic representation of nonlinear state-space models. The resulting surrogates enable the use of computationally efficient linear filtering techniques while retaining a direct connection to the underlying nonlinear dynamics. The paper derives the necessary conditions for such transformations using Ito's lemma and variational calculus, and illustrates the method on a bistable cubic motion model, radial Brownian process model, and a logistic model with multiplicative noise. Simulations confirm that the transformed linear systems, when projected back, accurately reconstruct the nonlinear dynamics and, in distinct regimes of stiffness and singularity, yield tracking accuracy competitive with conventional filters, while avoiding their structural instabilities.

2603.21075 2026-06-16 stat.CO 版本更新

Neural Inference Functions for Margins for Time Series Copula Models

时间序列Copula模型的边际神经推断函数

Daniel Fynn, David Gunawan, Andrew Zammit-Mangion

AI总结 提出基于神经网络的N-IFM方法,用于高效估计多元时间序列Copula模型参数,在保持推断精度的同时大幅降低计算成本。

Comments 86 pages, 29 figures

详情
AI中文摘要

Copula模型广泛应用于多元时间序列分析,因为它们允许独立于依赖结构(完全由Copula函数刻画)对边际分布进行灵活建模。然而,随着时间序列中变量数量的增加,这些模型的贝叶斯推断计算量变得很大。受经典边际推断函数(IFM)方法的启发,我们提出了一种新的基于神经网络的Copula模型参数估计推断框架,称为边际神经推断函数(N-IFM)。N-IFM能够对新数据进行快速参数估计、快速序列预测,并通过时间序列验证进行高效的模型比较。我们使用模拟和真实数据集评估N-IFM的性能,并将其与哈密顿蒙特卡洛方法进行比较,结果表明在推断精度相当的情况下,计算量大幅降低。

英文摘要

Copula models are widely employed in multivariate time series analysis because they permit flexible modelling of marginal distributions independently of the dependence structure, which is fully characterised by the copula function. However, Bayesian inference with these models becomes computationally demanding as the number of variables in the time series increases. Motivated by the classical inference functions for margins (IFM) approach, we propose a new neural-network based inference framework for estimating parameters in copula models, termed the neural inference functions for margins (N-IFM). N-IFM enables rapid parameter estimation for new data, fast sequential prediction, and efficient model comparison via time-series validation. We assess the performance of N-IFM using both simulated and real datasets and compare it to Hamiltonian Monte Carlo, demonstrating substantial computational gains with comparable inferential accuracy.

2502.07396 2026-06-16 stat.CO cs.CE stat.ML 版本更新

Optimality in importance sampling: a gentle survey

重要性采样中的最优性:一个温和的综述

Fernando Llorente, Luca Martino

AI总结 综述重要性采样中提议密度的最优性概念,涵盖边际似然近似、多提议密度、退火后验序列及噪声场景(如ABC和强化学习)等框架,并提供理论与实证比较。

详情
AI中文摘要

蒙特卡洛采样方法的性能依赖于提议密度的关键选择。最优性概念对于在蒙特卡洛方案中设计合适的提议密度自适应过程至关重要。本文是对重要性采样中最优性概念的详尽综述。描述并分析了多种框架,例如用于模型选择的边际似然近似、多提议密度的使用、一系列退火后验序列,以及包括近似贝叶斯计算(ABC)和强化学习在内的噪声场景。还提供了一些理论和实证比较。

英文摘要

The performance of the Monte Carlo sampling methods relies on the crucial choice of a proposal density. The notion of optimality is fundamental to design suitable adaptive procedures of the proposal density within Monte Carlo schemes. This work is an exhaustive review around the concept of optimality in importance sampling. Several frameworks are described and analyzed, such as the marginal likelihood approximation for model selection, the use of multiple proposal densities, a sequence of tempered posteriors, and noisy scenarios including the applications to approximate Bayesian computation (ABC) and reinforcement learning, to name a few. Some theoretical and empirical comparisons are also provided.

2512.20566 2026-06-16 math.OC stat.ML 版本更新

Random Gradient-Free Optimization in Infinite Dimensional Spaces

无限维空间中的随机无梯度优化

Caio Peixoto, Daniel Csillag, Bernardo F. P. da Costa, Yuri F. Saporito

AI总结 提出一种仅需方向导数的无限维希尔伯特空间无梯度优化方法,通过预基和随机方向导数实现可证明收敛,并应用于物理信息神经网络求解偏微分方程。

Comments 23 pages, 4 figures

详情
AI中文摘要

我们提出了一种新的无梯度方法,用于希尔伯特空间中的无限维优化,该方法仅需计算方向导数。尽管函数优化通常通过有限维梯度下降(例如神经网络)在参数化上求解,但我们转而利用优化问题的函数性质来获得可证明的保证。然而,无限维梯度在实践中往往难以计算,使得朴素的函数梯度下降难以处理。为克服这一限制,我们的框架仅利用方向导数和希尔伯特空间的预基(即一个线性无关集,其张成空间稠密)。这解决了可处理性问题,因为预基比完全正交基或再生核(甚至可能不存在)更容易获得,且单个方向导数可通过自动微分计算。我们展示了该方法在物理信息神经网络(PINNs)求解偏微分方程中的应用,有效实现了可证明的收敛。

英文摘要

We propose a new gradient-free method for infinite-dimensional optimization in Hilbert spaces that requires only the computation of directional derivatives. Though functional optimization is often solved through finite-dimensional gradient descent over a parametrization, such as neural networks, we instead propose to leverage the functional nature of the optimization problem to enable provable guarantees. However, infinite-dimensional gradients are often hard to compute in practice, rendering naïve functional gradient descent intractable. To overcome this limitation, our framework leverages only directional derivatives and a pre-basis for the Hilbert space, i.e., a linearly independent set whose span is dense. This resolves the tractability issue, as pre-bases are much more accessible than full orthonormal bases or reproducing kernels -- which may not even exist -- and individual directional derivatives can be computed using automatic differentiation. We showcase the use of our method to solve partial differential equations à la physics-informed neural networks (PINNs), where it effectively enables provable convergence.

2506.08328 2026-06-16 stat.ME 版本更新

Diffusion Non-Additive Model for Multi-Fidelity Simulations with Tunable Precision

扩散非加性模型:用于可调精度多保真度模拟

Junoh Heo, Romain Boutelet, Wenjia Wang, Chih-Li Sung

AI总结 提出扩散非加性(DNA)模型,利用高斯过程捕获非线性依赖并外推精确解,实现多保真度模拟的精度提升与不确定性量化。

Comments 35 pages including references and 27 pages supplementary

详情
AI中文摘要

计算机模拟对于分析复杂系统不可或缺,然而高保真度模型通常带来高昂的计算成本。多保真度框架通过结合低成本低保真度模拟与昂贵的高保真度模拟来解决这一挑战,以提高准确性和效率。然而,某些科学问题要求比现有最高保真度模拟更精确的结果,特别是当存在控制模拟精度的调优参数,但对应零值的精确解仍无法获得时。本文受生成扩散模型启发,引入扩散非加性(DNA)模型,该模型利用高斯过程先验捕获不同保真度水平之间的非线性依赖,并外推至精确解。DNA模型:(i) 适应不同保真度水平之间复杂的非加性关系;(ii) 采用不可分离协方差核来建模调优参数与输入变量之间的交互,提升预测性能;(iii) 提供后验预测均值和方差的闭式表达式,实现高效推理和不确定性量化;(iv) 建立预测误差的严格理论界限,从而得到最优实验设计策略。该方法在一系列数值研究和实际案例研究中得到验证。提供了实现所提方法的R包以支持实际应用。

英文摘要

Computer simulations are indispensable for analyzing complex systems, yet high-fidelity models often incur prohibitive computational costs. Multi-fidelity frameworks address this challenge by combining inexpensive low-fidelity simulations with costly high-fidelity simulations to improve both accuracy and efficiency. However, certain scientific problems demand even more accurate results than the highest-fidelity simulations available, particularly when a tuning parameter controlling simulation accuracy is available, but the exact solution corresponding to a zero-valued parameter remains out of reach. In this paper, we introduce the Diffusion Non-Additive (DNA) model, inspired by generative diffusion models, which captures nonlinear dependencies across fidelity levels using Gaussian process priors and extrapolates to the exact solution. The DNA model: (i) accommodates complex, non-additive relationships across fidelity levels; (ii) employs a nonseparable covariance kernel to model interactions between the tuning parameter and input variables, improving predictive performance; (iii) provides closed-form expressions for the posterior predictive mean and variance, allowing efficient inference and uncertainty quantification; and (iv) establishes rigorous theoretical bounds on the prediction error, leading to an optimal experimental design strategy. The methodology is validated on a suite of numerical studies and real-world case studies. An R package implementing the proposed methodology is available to support practical applications.

2509.03945 2026-06-16 stat.CO cs.DC cs.NA math.NA stat.ML 版本更新

Prob-GParareal: A Probabilistic Numerical Parallel-in-Time Solver for Differential Equations

Prob-GParareal:一种用于微分方程的概率数值并行时间求解器

Guglielmo Gattiglio, Lyudmila Grigoryeva, Massimiliano Tamborrino

AI总结 提出Prob-GParareal,通过高斯过程建模Parareal校正函数,为微分方程的并行时间求解提供不确定性量化,并在五个基准ODE系统上验证了精度和鲁棒性。

详情
AI中文摘要

我们介绍了Prob-GParareal,这是GParareal算法的概率扩展,旨在为(常微分和偏微分)方程(ODE、PDE)的并行时间(PinT)求解提供不确定性量化。该方法采用高斯过程(GP)对Parareal校正函数进行建模,与GParareal一致,进一步实现了数值不确定性在时间上的传播,并产生系统演化的概率预测。此外,Prob-GParareal支持概率初始条件,并保持与经典数值求解器的兼容性,确保其易于集成到现有的Parareal框架中。在此,我们首先对Prob-GParareal的计算复杂度进行理论分析,并推导误差界。然后,我们在五个基准ODE系统(包括混沌、刚性和分岔问题)上数值展示了所提算法的准确性和鲁棒性。为了展示所提算法的灵活性和潜在可扩展性,我们还考虑了Prob-nnGParareal,这是通过将Parareal中的GP替换为最近邻GP得到的变体,并在一个额外的PDE示例上展示了其性能提升。这项工作弥合了现有PinT方法概率对应物发展中的一个关键空白。

英文摘要

We introduce Prob-GParareal, a probabilistic extension of the GParareal algorithm designed to provide uncertainty quantification for the Parallel-in-Time (PinT) solution of (ordinary and partial) differential equations (ODEs, PDEs). The method employs Gaussian processes (GPs) to model the Parareal correction function, in line with GParareal, further enabling the propagation of numerical uncertainty across time and yielding probabilistic forecasts of the system's evolution. Furthermore, Prob-GParareal accommodates probabilistic initial conditions and maintains compatibility with classical numerical solvers, ensuring its straightforward integration into existing Parareal frameworks. Here, we first conduct a theoretical analysis of the computational complexity and derive error bounds of Prob-GParareal. Then, we numerically demonstrate the accuracy and robustness of the proposed algorithm on five benchmark ODE systems, including chaotic, stiff, and bifurcation problems. To showcase the flexibility and potential scalability of the proposed algorithm, we also consider Prob-nnGParareal, a variant obtained by replacing the GPs in Parareal with the nearest-neighbors GPs, illustrating its increased performance on an additional PDE example. This work bridges a critical gap in the development of probabilistic counterparts to established PinT methods.

7. 机器学习统计基础 35 篇

2606.17048 2026-06-16 cs.LG cs.CV stat.ML 新提交

Exact Posterior Score Estimation for Solving Linear Inverse Problems

精确后验分数估计用于求解线性逆问题

Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh

发表机构 * University of Oxford(牛津大学) UIUC(伊利诺伊大学厄巴纳-香槟分校) EverEx

AI总结 提出精确后验分数(EPS)方法,通过闭式后验分数将线性逆问题转化为去噪问题,无需梯度或投影,在FFHQ和ImageNet上优于现有方法。

详情
AI中文摘要

扩散和基于流的模型通过训练去噪器来逆转高斯损坏,从而学习强大的数据先验。为了利用这一先验解决线性逆问题,需要从后验中采样,但先验提供的分数是无条件分数,而非后验分数。现有方法要么使用近似测量匹配校正来引导固定的预训练去噪器,要么训练一个放弃先验去噪结构的条件恢复模型。我们在一般高斯插值下推导了线性高斯逆问题的精确后验分数闭式,并表明后验采样可归结为在算子依赖的偏移枢轴和各向异性噪声协方差下的去噪问题。我们将这一恒等式转化为精确后验分数(EPS),这是一种去噪训练目标,保留了标准预训练的输入/输出结构,因此可以从头训练或从预训练去噪器微调。在推理时,EPS使用与底层骨干相同的采样器,无需似然梯度或投影。我们在FFHQ和ImageNet上的五个线性逆问题上评估了EPS,在保真度、感知和分布指标上优于无训练和基于训练的基线,同时使用的去噪器评估次数比基于梯度的后验采样器少大约一个数量级。

英文摘要

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods either steer a fixed pretrained denoiser with approximate measurement-matching corrections, or train a conditional restoration model that abandons the denoising structure of the prior. We derive the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, and show that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under an anisotropic noise covariance. We turn this identity into Exact Posterior Score (EPS), a denoising training objective that preserves the input/output structure of standard pretraining and can therefore be trained from scratch or fine-tuned from a pretrained denoiser. At inference, EPS uses the same sampler as the underlying backbone, with no likelihood gradients or projections. We evaluate EPS on five linear inverse problems across FFHQ and ImageNet, where it outperforms training-free and training-based baselines on fidelity, perceptual, and distributional metrics, while using roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.

2606.16975 2026-06-16 stat.ML cs.LG 新提交

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

固定大小神经网络实现任意精度的Sobolev逼近

Baicheng Li, Haizhao Yang, Shijun Zhang

AI总结 提出新型激活函数(EUAF、DUAF∞等),使固定大小神经网络能以任意精度逼近Sobolev空间中的函数,并给出显式的宽度和深度界。

详情
AI中文摘要

本文研究用于固定大小神经网络实现任意精度Sobolev逼近的新型激活函数。我们首先证明,任何$W^{2,\infty}((a,b)^d)$中的函数都可以通过使用基本通用激活函数($\mathrm{EUAF}$)的固定大小神经网络,以$W^{1,\infty}$范数度量达到任意精度。为了将此结果推广到$s\in\mathbb{N}$时的$W^{s,\infty}((a,b)^d)$,我们引入了来自可微通用激活函数族($\mathrm{DUAF}_n$)的光滑激活函数$\mathrm{DUAF}_{\infty}$。我们证明,任何$W^{s,\infty}((a,b)^d)$中的函数都可以通过固定大小的$\mathrm{DUAF}_{\infty}$激活网络,以$W^{s-1,\infty}$范数度量达到任意精度。我们进一步构造了Sigmoid变体$\widetilde{\mathrm{DUAF}}_n$,并证明对于每个$1\leq s\leq n$,固定大小的$\widetilde{\mathrm{DUAF}}_n$激活网络仍能以$W^{s-1,\infty}$范数度量任意逼近任何$f\in W^{s,\infty}((a,b)^d)$。在所有结果中,宽度和深度界均被显式计算,且所提出的激活函数是初等的。

英文摘要

In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

2606.16926 2026-06-16 math.OC cs.LG stat.ML 新提交

Functional Gradient Descent with Adaptive Representations

自适应表示的函数梯度下降

Daniel Csillag, Rodrigo Schuller, Pedro Dall'Antonia, Leonidas Guibas, Luiz Velho, Tiago Novello

AI总结 提出一种自适应表示的函数梯度下降算法,通过将近似误差纳入分析,在平滑损失下收敛到驻点,在PL条件下收敛到全局最小值,在回归、PDE求解和计算机视觉中优于固定近似FGD和神经网络基线。

详情
AI中文摘要

函数优化问题通常通过优化固定表示(如神经网络)的参数来解决,这导致高度非凸的损失,使训练和理论分析复杂化。一个有趣的替代方案是函数梯度下降(FGD),即直接在函数空间中进行梯度下降,它受益于强收敛结果并具有简洁的理论。然而,FGD在实践中难以实现,因为函数梯度是无限维的,因此无法完全计算或存储在内存中。现有的实现因此依赖于固定近似,这引入了近似误差。我们提出了一种新的、有理论基础的FGD算法,该算法在优化过程中自适应地调整函数梯度的表示。通过将这种近似明确地纳入分析,我们证明了无论近似如何,算法都能收敛到驻点(对于平滑损失)和全局最小值(在平滑性和Polyak-Lojasiewicz型条件下)。据我们所知,这是第一个在一般设置下具有此类保证的可实现FGD方法。我们在回归、偏微分方程的数值求解和现代计算机视觉中展示了我们方法的有效性。在各种设置中,我们的方法在效率和准确性上始终优于固定近似的FGD和神经网络基线。

英文摘要

Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.

2606.16730 2026-06-16 stat.ML cs.AI cs.LG 新提交

Attention is Just Another Name for Coupling?: A Fast-Slow ODE Perspective on Hierarchical Pretraining

注意力只是耦合的另一个名字?:关于层级预训练的快速-慢速ODE视角

Zhengyuan Gao

AI总结 本文提出一种快慢ODE视角,将因果自注意力视为耦合机制,并引入一个通过零初始化门控反馈到快路径的慢子系统,在理论证明和实验验证中揭示了其与主方程平稳分布的联系。

详情
AI中文摘要

因果自注意力是一种耦合机制:每个token的隐藏状态通过同一时间尺度上前置token的学习混合来更新。本文提出一个疑问:是否存在第二个时间上更慢的耦合——一个在序列的时间下采样视图上运行并通过零初始化门控反馈到快路径的慢子系统——来补充它?该问题以奇异摄动常微分方程(ODE)的语言提出,其中快变量$x$以token速率演化,慢变量$y$每$P$个token更新一次,时间尺度比$\varepsilon = 1/P$通过因果块均值池化在结构上强制执行。\n本文将快慢ODE形式具体化为一个神经网络:一个在$T$个token上的标准因果注意力快路径,一个在$T/P$个池化token上的全注意力慢路径(每层便宜$P^2$倍),以及一个零初始化的加法门控。此外,在快动力学的线性生成器假设下,我们证明了平衡流形$x = \phi(y)$恰好是主方程(ME)的平稳分布$p_{\mathrm{st}}(y)$;在该机制下,学习的MLP $\phi_\theta(y)$是其变分近似(训练块不是生成器,因此该恒等式是结构极限,而非对训练网络的断言)。实验上,在50万token时,耦合是中性的——门控保持关闭,耦合和冻结消融在运行间噪声范围内——其墙钟成本与密集基线相当。贡献在于精确的、带有间隙标记的映射本身,而非性能提升。

英文摘要

Causal self-attention is a coupling mechanism: each token's hidden state is updated by a learned mixture of preceding tokens at the same timescale. This paper asks whether a second, temporally slower coupling-a slow sub-system operating on a temporally-downsampled view of the sequence and fed back into the fast path through a zero-initialised gate-complements it. The question is framed in the language of singularly perturbed ordinary differential equations (ODEs), where the fast variable $x$ evolves at the token rate, the slow variable $y$ evolves at one update per $P$ tokens, and the timescale ratio $\varepsilon = 1/P$ is enforced structurally by causal block-mean pooling. The paper instantiates the fast-slow ODE formalism as a concrete neural network: a fast path of standard causal attention over $T$ tokens, a slow path of full attention over $T/P$ pooled tokens ($P^2 \times$ cheaper per layer), and a zero-initialised additive gate. In addition, under a linear-generator assumption on the fast dynamics, we prove that the equilibrium manifold $x = ϕ(y)$ is exactly the master-equation (ME) stationary distribution $p_{\mathrm{st}}(y)$; in that regime a learned MLP $ϕ_θ(y)$ is a variational approximation of it (the trained block is not a generator, so this identity is the structured limit, not a claim about the network as trained). Empirically, at $500$k tokens the coupling is neutral -- the gate stays closed and the coupled and frozen ablations are within run-to-run noise -- at a wall-clock cost comparable to a dense baseline. The contribution is the precise, gap-marked mapping itself, not a performance gain.

2606.16610 2026-06-16 stat.ML cs.LG 新提交

Diffusion Flow Matching: Dimension-Improved KL Bounds and Wasserstein Guarantees

扩散流匹配:维度改进的KL界和Wasserstein保证

Marta Gentiloni Silveri, Giovanni Conforti, Alain Durmus

AI总结 本文针对基于布朗运动的扩散流匹配,在KL散度和2-Wasserstein距离下推导了改进的离散化误差收敛界,实现了维度依赖的最优缩放。

详情
AI中文摘要

扩散流匹配(DFM)最近已成为生成建模的多功能框架,但其理论收敛性质仍仅被部分理解。在这项工作中,我们为基于布朗运动的DFM提供了精炼且新颖的收敛保证,重点关注离散化误差。我们的分析是在Kullback-Leibler(KL)散度和2-Wasserstein距离下进行的。在有限矩条件和温和的分数可积性假设下,我们推导了KL收敛界,与先前工作相比具有改进的维度依赖性,据我们所知,在最小条件下实现了最先进的缩放。我们进一步将分析扩展到2-Wasserstein距离:在额外的一阶分数可积性假设和弱对数凹性条件下,我们获得了与KL情况一致的维度依赖性的收敛保证。

英文摘要

Diffusion Flow Matching (DFM) has recently emerged as a versatile framework for generative modeling, yet its theoretical convergence properties remain only partially understood. In this work, we provide refined and novel convergence guarantees for Brownian motion based DFMs, focusing on the discretization error. Our analysis is conducted under the Kullback-Leibler (KL) divergence and the 2-Wasserstein distance. Under finite-moment conditions and a mild score integrability assumption, we derive KL convergence bounds with improved dimensional dependence compared to prior work, achieving, up to our knowledge, state-of-the-art scaling under minimal conditions. We further extend the analysis to the 2-Wasserstein distance: under an additional first-order score integrability assumption and a weak log-concavity condition, we obtain convergence guarantees with dimensional dependence consistent with the KL case.

2606.16301 2026-06-16 cs.LG stat.ML 新提交

One-Step Generalization Ratio Guided Optimization for Domain Generalization

一步泛化比率引导的域泛化优化

Sumin Cho, Dongwon Kim, Kwangsu Kim

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国高级科学技术研究所)

AI总结 提出GENIE优化器,通过一步泛化比率(OSGR)动态均衡参数更新,抑制虚假相关,促进域不变特征学习,在域泛化任务中超越现有优化器。

Comments 29 pages, accepted at the 42nd International Conference on Machine Learning (ICML 2025)

详情
AI中文摘要

域泛化(DG)旨在训练模型泛化到未见过的目标域,但常常过拟合到域特定特征,即所谓的非期望相关性。基于梯度的DG方法通常引导梯度朝向主导方向,但往往无意中强化了虚假相关性。最近的工作采用dropout来正则化过度自信的参数,但未明确调整梯度对齐或确保平衡的参数更新。我们提出GENIE(泛化增强迭代均衡器),一种新颖的优化器,利用一步泛化比率(OSGR)量化每个参数对损失减少的贡献并评估梯度对齐。通过预条件因子动态均衡OSGR,GENIE防止少量参数主导优化,从而促进域不变特征学习。理论上,GENIE平衡参数间的收敛贡献和梯度对齐,在保持SGD收敛速度的同时实现更高的OSGR。实验上,它优于现有优化器,并在与各种DG和单DG方法集成时提升性能。

英文摘要

Domain Generalization (DG) aims to train models that generalize to unseen target domains but often overfit to domain-specific features, known as undesired correlations. Gradient-based DG methods typically guide gradients in a dominant direction but often inadvertently reinforce spurious correlations. Recent work has employed dropout to regularize overconfident parameters, but has not explicitly adjusted gradient alignment or ensured balanced parameter updates. We propose GENIE (Generalization-ENhancing Iterative Equalizer), a novel optimizer that leverages the One-Step Generalization Ratio (OSGR) to quantify each parameter's contribution to loss reduction and assess gradient alignment. By dynamically equalizing OSGR via a preconditioning factor, GENIE prevents a small subset of parameters from dominating optimization, thereby promoting domain-invariant feature learning. Theoretically, GENIE balances convergence contribution and gradient alignment among parameters, achieving higher OSGR while retaining SGD's convergence rate. Empirically, it outperforms existing optimizers and enhances performance when integrated with various DG and single-DG methods.

2606.16273 2026-06-16 stat.ML cs.LG stat.ME 新提交

Generative Modeling on Metric Graphs via Neural Optimal Transport

基于神经最优传输的度量图生成建模

Alessandro Micheli, Yueqi Cao, Anthea Monod, Samir Bhatt

发表机构 * Imperial College London(帝国理工学院伦敦分校) KTH Royal Institute of Technology(皇家理工学院) Statens Serum Institut(丹麦国家血清研究所) University of Copenhagen(哥本哈根大学)

AI总结 提出首个深度生成建模框架,用于度量图上连续分布,通过图嵌入、神经半对偶求解熵Kantorovich问题并投影回原图,理论证明收敛性,实验优于离散图OT基线。

详情
AI中文摘要

我们提出了,据我们所知,首个用于紧度量图上连续支撑概率分布的深度生成建模框架。给定度量图上的源测度和目标测度,我们的方法将图嵌入到光滑环境空间,通过神经半对偶参数化求解熵Kantorovich问题,并将生成的样本投影回原始图。我们研究了两种嵌入几何:外在欧几里得实现和内在热带Abel--Jacobi嵌入到Jacobian环面。在这两种情况下,生成的生成器通过构造支持在图上。我们证明,在增加神经表达能力的联合极限下,学习到的生成器弱收敛到原始图测度之间的有效传输耦合。实验上,在一系列几何不同的图上,我们的方法匹配或改进了基于离散图OT的启发式传输基线,同时具有更好的可扩展性。最后,我们通过在纽约市曼哈顿的一百万Uber上车点数据上训练模型,展示了在真实世界城市移动数据上的可扩展性。

英文摘要

We introduce, to our knowledge, the first deep generative modeling framework for probability distributions continuously supported on compact metric graphs. Given source and target measures on a metric graph, our method embeds the graph into a smooth ambient space, solves an entropic Kantorovich problem via a neural semidual parameterization, and projects generated samples back onto the original graph. We study two embedded geometries: an extrinsic Euclidean realization and the intrinsic tropical Abel--Jacobi embedding into the Jacobian torus. In both cases, the resulting generator is graph-supported by construction. We prove that, in the joint limit of increasing neural expressivity, the learned generator converges weakly to a valid transport coupling between the original graph measures. Empirically, across a range of geometrically distinct graphs, our method matches or improves upon heuristic transport baselines based on discrete graph OT, while scaling more favorably. Finally, we demonstrate scalability on real-world urban mobility data by training our model on one million Uber pickup locations in Manhattan, New York City.

2606.15897 2026-06-16 cs.LG cs.AI stat.ML 新提交

Topological Flow Matching

拓扑流匹配

Kacper Wyrwal, İsmail İlkan Ceylan, Alexander Tong

AI总结 提出拓扑流匹配,通过拉普拉斯漂移增强参考过程,在保留流匹配稳定性和无模拟目标的同时,捕捉底层域拓扑结构,适用于脑fMRI、洋流等结构化数据。

Comments Accepted at ICLR 2026. 26 pages, 24 figures. Code: https://github.com/KacperWyrwal/topological-flow-matching

详情
AI中文摘要

流匹配是一个强大的生成建模框架,因其简单性和强大的经验性能而受到重视。然而,其标准公式将结构化空间上的信号(例如脑图上的fMRI数据)视为欧几里得空间中的点,忽略了其域的丰富拓扑特征。为了解决这个问题,我们引入了拓扑流匹配,这是流匹配的一种拓扑感知泛化。我们将流匹配解释为解决退化薛定谔桥问题的框架,并通过用拉普拉斯导出的漂移增强参考过程来注入拓扑信息。这种原则性修改捕获了底层域的结构,同时保留了流匹配的理想特性:稳定的、无模拟的目标和确定性样本路径。因此,我们的框架可以作为标准流匹配的直接替代品。我们在多样化的结构化数据集上展示了其有效性,包括脑fMRI、洋流、地震事件和交通流。

英文摘要

Flow matching is a powerful generative modeling framework, valued for its simplicity and strong empirical performance. However, its standard formulation treats signals on structured spaces, such as fMRI data on brain graphs, as points in Euclidean space, overlooking the rich topological features of their domains. To address this, we introduce topological flow matching, a topology-aware generalization of flow matching. We interpret flow matching as a framework for solving a degenerate Schrödinger bridge problem and inject topological information by augmenting the reference process with a Laplacian-derived drift. This principled modification captures the structure of the underlying domain while preserving the desirable properties of flow matching: a stable, simulation-free objective and deterministic sample paths. As a result, our framework serves as a drop-in replacement for standard flow matching. We demonstrate its effectiveness on diverse structured datasets, including brain fMRIs, ocean currents, seismic events, and traffic flows.

2606.15665 2026-06-16 stat.ML cs.LG math.ST stat.TH 新提交

Information Gap and Feasibility-Aware Inference in Binomial Logistic Mixtures

二项逻辑混合模型中的信息差距与可行性感知推断

Yuta Hayashida, Shonosuke Sugasawa

AI总结 研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距,提出基于后验熵惩罚的可行性感知推断方法,避免误导性成分选择并改善后验标签概率校准。

Comments 33 pages (main) + 30 pages (supplement)

详情
AI中文摘要

本文研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距。基于似然的标准准则(如贝叶斯信息准则,BIC)可以检测到两个成分的存在,但这并不能保证相应的标签是可恢复的。我们表明,这种差距对于具有固定试验次数的二项逻辑混合模型是内在的:观察到的混合结构证据和用于标签恢复的每个观测信息在成分分离度上具有不同的局部阶数,并且只有前者随样本量累积。因此,存在一个可检测但不可恢复的区域,其中BIC选择两个成分,而后验标签基本上没有信息。为了解决这个问题,我们提出了两种可行性感知推断程序:具有后验熵惩罚的可恢复性感知BIC,以及一种熵正则化估计器,它减轻了最大似然估计器产生过度分离成分和过度集中的后验责任的倾向。数值实验证实了预测的差距,并表明所提出的方法避免了误导性的成分选择,并改善了后验标签概率的校准。

英文摘要

This paper studies the information gap between mixture detection and label recovery in binomial logistic mixtures. Standard likelihood-based criteria such as the Bayesian information criterion (BIC) can detect the presence of two components, but this does not guarantee that the corresponding labels are recoverable. We show that this gap is intrinsic to binomial logistic mixtures with a fixed number of trials: observed-data evidence for mixture structure and per-observation information for label recovery have different local orders in the component separation, and only the former accumulates with the sample size. As a result, there exists a detectable-but-unrecoverable regime in which BIC selects two components while the posterior labels remain essentially uninformative. To address this issue, we propose two feasibility-aware inference procedures: a recoverability-aware BIC with a posterior-entropy penalty and an entropy-regularized estimator that mitigates the tendency of the maximum likelihood estimator to produce overly separated components and overly concentrated posterior responsibilities. Numerical experiments confirm the predicted gap and demonstrate that the proposed methods avoid misleading component selections and improve the calibration of posterior label probabilities.

2606.15569 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

测试时训练的决策论视角:何时、多远以及哪些方向进行自适应

Tomoya Wakayama

发表机构 * N/A

AI总结 通过决策论将测试时训练视为核机制下的隐式贝叶斯推断,揭示了更新步长和子空间选择对性能的影响,并提出了自适应策略、PAC-Bayes保证和最优子空间选择规则。

详情
AI中文摘要

测试时训练(TTT)通过参数更新使预训练模型适应每个提示,提高了在预训练到测试分布偏移下的准确性。然而,其性能常常受到不稳定性和对超参数(如更新步长和子空间)敏感性的影响。我们通过决策论的视角解释这一行为,将TTT视为核机制下的隐式贝叶斯推断。在高斯过程基准下,我们表明当更新与提示的信噪比谱匹配并与查询相关的特征方向对齐时,TTT能降低预测误差。这一视角支撑了以下结果:(1)我们展示了固定更新步长和子空间在分布偏移下失败的情况,从而激励自适应策略;(2)我们证明通过提示证据选择更新步长具有对抗过拟合的PAC-Bayes保证;(3)我们在线性-高斯校正模型下刻画了贝叶斯最优更新子空间,从而为选择Transformer块和头提供了评分规则。我们的理论有助于解释TTT的经验不稳定性,为何时、多远以及哪些方向进行自适应提供了原则性指导。

英文摘要

Test-time training (TTT) adapts a pretrained model to each prompt via parameter updates, improving accuracy under pretraining-to-test distribution shifts. Yet, its performance often suffers from instability and sensitivity to hyperparameters such as update steps and subspace. We explain this behavior through a decision-theoretic lens, treating TTT as implicit Bayesian inference in the kernel regime. Under a Gaussian process benchmark, we show that TTT reduces prediction error when updates are spectrally matched to the prompt's signal-to-noise ratio and aligned with query-relevant eigen-directions. This perspective underpins the following results: (1) we show when fixed update steps and subspaces fail under distribution shifts, motivating adaptive strategies; (2) we prove that selecting update steps via prompt evidence admits a PAC-Bayes guarantee against overfitting; and (3) we characterize the Bayes-optimal update subspace under a linear-Gaussian correction model, yielding a scoring rule for selecting Transformer blocks and heads. Our theory helps explain the empirical instability of TTT, taking a step toward principled guidance for when, how far, and which directions to adapt.

2606.15555 2026-06-16 math.OC cs.AI cs.LG stat.ML 新提交

Service-Induced Congestion in Memory-Constrained LLM Serving

内存受限的大语言模型服务中的服务引发拥塞

Ruicheng Ao, Jing Dong, Gan Luo, David Simchi-Levi

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院) Columbia Business School, Columbia University(哥伦比亚大学商学院) School of Mathematical Sciences, Peking University(北京大学数学科学学院)

AI总结 本文通过离散时间动力学模型研究内存受限的大语言模型服务中,因键值缓存增长导致的服务引发拥塞,发现同质负载下无驱逐均衡不稳定且收敛到最坏情况极限环,异质负载下稳定条件与解码长度互质相关,并提出调度设计原则。

Comments 101 pages

详情
AI中文摘要

在大语言模型(LLM)服务中,每个请求在服务期间会积累持久的图形处理单元(GPU)内存,因为其键值缓存随着每个生成的令牌而增长。在高并发下,总内存使用量因此随时间内生增长:服务过程本身会创造未来的容量压力。当内存容量超出时,系统会驱逐活动请求,丢弃缓存状态并在稍后重新启动它们,这浪费了计算并降低了吞吐量。我们开发了一个内存受限的LLM推理的离散时间动力学模型,该模型捕获了连续批处理下的准入、内存增长和驱逐。在饱和输入机制下,系统同时存在无驱逐的固定点和带驱逐的极限环。对于同质负载,我们证明无驱逐平衡是不稳定的,并且除了一个勒贝格测度为零的精确捕获集外,系统收敛到一个唯一的最坏情况极限环,该极限环在该例外集外是渐近稳定的,吞吐量损失高达50%。对于异质负载,我们在两类共同输入设置下证明了一个稳定性准则,并解释了生存多项式机制如何推广到多类和异质输入长度。在输入主导的缩放机制下,互质的解码长度稳定了无驱逐平衡,而非互质的长度创造了同步模式,导致不稳定。这些结果描述了负载异质性何时使完成去同步化并有助于稳定内存受限的服务。更广泛地说,我们将服务引发的拥塞识别为一种结构性不稳定机制,并推导出维持高吞吐量的调度设计原则。

英文摘要

In large language model (LLM) serving, each request accumulates persistent graphics processing unit (GPU) memory during service as its key-value cache grows with every generated token. Under high concurrency, aggregate memory usage therefore increases endogenously over time: the service process itself creates future capacity pressure. When memory capacity is exceeded, systems evict active requests, discarding cached state and restarting them later, which wastes computation and reduces throughput. We develop a discrete-time dynamical model of memory-constrained LLM inference that captures admission, memory growth, and eviction under continuous batching. In the saturated-input regime, the system admits both eviction-free fixed points and limit cycles with evictions. For homogeneous workloads, we show that the eviction-free equilibrium is unstable and that, except for a Lebesgue-measure-zero exact-capture set, the system converges to a unique worst-case limit cycle that is asymptotically stable outside this exceptional set, with throughput losses as large as 50%. For heterogeneous workloads, we prove a stability criterion in the two-class common-input setting and explain how the survival-polynomial mechanism generalizes to multiple classes and heterogeneous-input lengths. Under an input-dominated scaling regime, coprime decoding lengths stabilize the eviction-free equilibrium, while non-coprime lengths create synchronized modes that drive instability. These results characterize when workload heterogeneity desynchronizes completions and helps stabilize memory-constrained serving. More broadly, we identify service-induced congestion as a structural instability mechanism and derive scheduling design principles for sustaining high throughput.

2606.15482 2026-06-16 stat.ML cs.LG 新提交

Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow

Ricci-Filtration:通过离散Ricci流提升检索增强生成重排序器在查询-答案任务中的性能

Tian Qin, Wei-Min Huang

发表机构 * Tian Qin(田琴) Wei-Min Huang(黄伟民)

AI总结 提出基于离散曲率和Ricci流的几何重排序增强方法Ricci-Filtration,通过建模查询与检索块为网络并利用曲率过滤噪声块,显著提升RAG生成性能。

详情
AI中文摘要

Ricci流是一种曲率引导的扩散过程,通过收缩高正曲率区域和扩张负曲率区域来变形空间。类似地,加权图上的离散Ricci流通过收缩正Ricci曲率的边和拉伸负Ricci曲率的边来修改边权重,有效增加簇之间的分离度。受这两项开创性工作的启发,我们提出了一种基于几何的RAG重排序增强方法,称为Ricci-Filtration。通过将输入查询和初始检索块建模为一个网络,其中输入查询和块作为节点,基于嵌入的成对关系定义初始图,Ricci-Filtration利用离散曲率和Ricci流评估每个块相对于用户查询的结构重要性。该系统首先根据块相对于查询的几何曲率过滤初始块;然后,重排序器处理剩余块以增强生成性能。我们从理论上证明,归一化离散Ricci流可以通过识别边权重的不同渐近行为来检测社区结构。这支持移除相对于查询节点具有大权重和负Ricci曲率的“噪声”文档块。大量实验证实,Ricci-Filtration在准确率、精确率、召回率和F1分数上优于几种基线重排序方法。此外,消融研究表明,Ricci-Filtration在各种设置下通常优于基线,突显了该框架在不同架构下的鲁棒性。

英文摘要

Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works, we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We theoretically prove that normalized discrete Ricci flow can detect community structures by identifying distinct asymptotic behaviors in edge weights. This supports the removal of ``noisy'' document chunks characterized by large weights and negative Ricci curvature relative to the query node. Extensive experiments confirm that Ricci-Filtration outperforms several baseline reranking methods in accuracy, precision, recall, and F1 scores. Furthermore, ablation studies demonstrate that the Ricci-Filtration generally outperforms the baseline under various settings, highlighting the framework's robustness across different architectures.

2606.15219 2026-06-16 cs.LG cs.DS math.ST stat.ML stat.TH 新提交

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

神经网络能否实现最优计算-统计权衡?基于单指标模型的分析

Siyu Chen, Beining Wu, Miao Lu, Zhuoran Yang, Tianhao Wang

AI总结 提出统一梯度算法训练两层神经网络,在多项式时间内学习高斯单指标模型,样本复杂度匹配SQ下界,并扩展到稀疏情形。

Comments 96 pages, 4 figures

详情
AI中文摘要

在这项工作中,我们解决以下问题:基于梯度的神经网络训练能否在学习高斯单指标模型时实现最优计算-统计权衡?先前研究表明,统计查询框架下的任何多项式时间算法需要$Ω(d^{s^\star/2}\lor d)$个样本,其中$s^\star$是生成指数,代表学习潜在模型的内在难度。然而,神经网络能否达到这一样本复杂度尚不清楚。受先前学习单指标模型的技术(如标签变换和景观平滑)启发,我们提出了一种统一的梯度算法,用于在多项式时间内训练两层神经网络。我们的方法适用于多种损失函数和激活函数,涵盖了广泛现有方法。我们证明,该算法学习到的特征表示与未知信号$θ^\star$高度对齐,样本复杂度为$\widetilde{O}(d^{s^\star/2} \lor d)$,对于所有生成指数$s^\star\geq 1$,与SQ下界仅差多对数因子。此外,我们通过引入一种利用稀疏结构的新型权重扰动技术,将方法扩展到$θ^\star$为$k$-稀疏($k = o(\sqrt{d})$)的情形。我们推导出相应的SQ下界为$\widetildeΩ(k^{s^\star})$,我们的方法与之匹配至多对数因子。我们的框架,特别是权重扰动技术,具有独立意义,并暗示了其他问题(如稀疏张量PCA)的潜在梯度解法。

英文摘要

In this work, we tackle the following question: Can neural networks trained with gradient-based methods achieve the optimal computational-statistical tradeoff in learning Gaussian single-index models? Prior research has shown that any polynomial-time algorithm under the statistical query (SQ) framework requires $Ω(d^{s^\star/2}\lor d)$ samples, where $s^\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model. However, it remains unknown whether neural networks can achieve this sample complexity. Inspired by prior techniques such as label transformation and landscape smoothing for learning single-index models, we propose a unified gradient-based algorithm for training a two-layer neural network in polynomial time. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $θ^\star$, with sample complexity $\widetilde{O} (d^{s^\star/2} \lor d)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\star\geq 1$. Furthermore, we extend our approach to the setting where $θ^\star$ is $k$-sparse for $k = o(\sqrt{d})$ by introducing a novel weight perturbation technique that leverages the sparsity structure. We derive a corresponding SQ lower bound of order $\widetildeΩ(k^{s^\star})$, matched by our method up to a polylogarithmic factor. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient-based solutions to other problems such as sparse tensor PCA.

2606.15217 2026-06-16 stat.ML cs.LG 新提交

Conformal Candidate Certification for Offline Model-Based Optimization

离线模型优化的共形候选认证

Seungjin Choi

发表机构 * Seungjin Choi(Choi)

AI总结 提出共形候选认证(CCC)方法,通过加权共形预测为离线模型优化中的候选设计提供校准的单侧下界,确保超过目标阈值的候选被认证,解决了分布偏移下的统计可靠性问题。

Comments ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情
AI中文摘要

离线模型优化(MBO)通过优化在固定历史数据集上训练的代理模型来提出候选方案。由于候选方案故意处于分布外,代理模型的排名在最优化器最激进的地方最不可靠,然而现有方法没有为每个候选提供统计证书,证明其设计满足目标阈值。我们提出\emph{共形候选认证}(CCC),一种事后包装器,为每个候选附加一个校准的单侧下界,并仅推进那些下界超过目标阈值的候选。我们证明,熵正则化的代理最大化诱导出吉布斯倾斜提议,因此同一代理模型为加权共形预测提供重要性权重,无需单独的密度比估计步骤。在受控的合成研究中,CCC在名义水平0.90下认证了激进提议池中的16.7%的候选,经验覆盖率为0.990,而忽略协变量偏移的标准共形预测覆盖率降至0.416。

英文摘要

Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.

2606.14929 2026-06-16 cs.LG cs.AI stat.ML 新提交

Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts

嵌入模型路由的策略遗憾:具有低秩专家的上下文赌博机

Yan Dai, Negin Golrezaei, Patrick Jaillet

发表机构 * Operations Research Center, MIT(麻省理工学院运筹学研究中心) Sloan School of Management, MIT(麻省理工学院斯隆管理学院) Department of EECS, MIT(麻省理工学院电气工程与计算机科学系)

AI总结 针对推荐系统中嵌入模型路由问题,形式化为具有低秩专家的对抗性上下文线性赌博机,提出Hypentropy策略梯度算法,实现$\tilde{\mathcal O}(s\sqrt{M T})$线性化策略遗憾。

详情
AI中文摘要

现代推荐系统越来越依赖于将多样化的查询动态路由到多个嵌入模型。尽管具有实际意义,但在对抗性查询、赌博机反馈和模型有限可观测性等现实条件下,该问题仍未得到充分理解。我们将嵌入模型路由形式化为具有低秩专家的对抗性上下文线性赌博机,其中上下文是查询,动作是物品,专家是在低秩潜在表示空间上工作的嵌入模型。我们首先证明,标准遗憾概念存在结构错误指定或统计难解性,并确定了一个对数二次策略类,它足够表达以捕获查询相关的模型路由,同时又足够结构化以允许高效的在线学习。其次,我们提出了一种称为Hypentropy策略梯度(HPG)的策略梯度算法。它在不完全信息下可证明地适应未知的低秩结构,并达到$\tilde{\mathcal O}(s\sqrt{M T})$线性化策略遗憾——其中$s$、$M$和$T$分别是专家的内在秩、模型数量和轮数——从而避免了维度灾难。最后,我们还提供了HPG的计算高效且无需参数调整的实现。

英文摘要

Modern recommendation systems increasingly rely on dynamically routing diverse queries to multiple embedding models. Despite its practical significance, this problem remains poorly understood under realistic conditions like adversarial queries, bandit feedback, and limited observability of models. We formalize embedding model routing as an adversarial contextual linear bandit with low-rank experts, where contexts are queries, actions are items, and experts are the embedding models working on low-rank latent representation spaces. We first establish that standard regret notions suffer from structural misspecification or statistical intractability, and we identify a log-quadratic policy class that is expressive enough to capture query-dependent model routing, yet structured enough to allow efficient online learning. Second, we propose a policy gradient algorithm called Hypentropy Policy Gradient (HPG). It provably adapts to the unknown low-rank structure under incomplete information and attains $\tilde{\mathcal O}(s\sqrt{M T})$ linearized policy regret -- where $s, M$, and $T$ are the intrinsic rank of the experts, the number of models, and the number of rounds -- thus avoiding a curse of dimensionality. Finally, we also provide an computationally efficient and parameter-free implementation of HPG.

2606.14737 2026-06-16 q-bio.BM cs.LG stat.ML 新提交

Learning Topological Representations for Molecular Dynamics

学习分子动力学的拓扑表示

Dominik Geng, Florian Graf, Martin Uray, Roland Kwitt

发表机构 * University of Salzburg(萨尔茨堡大学) Centre for Intelligent and Secure Industrial Automation(智能与安全工业自动化中心) University of Applied Sciences(应用科学大学)

AI总结 提出掩蔽Flood复形用于持久同源性分析,在共享表示空间中实现蛋白质构象的几何感知表征,并在分类、回归和马尔可夫状态模型估计中取得竞争性能。

Comments 20 pages, 4 figures

详情
AI中文摘要

分子动力学(MD)模拟生成高维构型空间中的轨迹,其分析关键依赖于分子描述符,通常是手工设计的可观测量或学习的动力学嵌入。然而,设计既具表达力又广泛适用的描述符仍然具有挑战性。我们研究持久同源性(PH)作为MD的通用表示,并引入掩蔽Flood复形,这是一种针对蛋白质定制的最近提出的单纯复形构造的改进,以低计算成本强调残基间结构。向量化的持久图随后提供信息丰富、几何感知的蛋白质构象摘要,我们在单个共享表示空间中评估其在蛋白质类别预测、帧级可观测回归以及从学习的低维坐标估计马尔可夫状态模型(MSM)上的性能。在mdCATH数据集上的结果表明,基于PH的描述符在各项任务中具有竞争力,其中掩蔽Flood PH产生最一致的整体性能。此外,在最近的MarS-FM框架中,当使用拓扑信息MSM作为蛋白质构象生成建模的直接替代时,我们获得了比基于物理可观测量的MSM更一致的系综统计。最后,我们探索了生成模型向性质不同的快速折叠蛋白质的可迁移性。

英文摘要

Molecular dynamics (MD) simulations generate trajectories in a high-dimensional configuration space whose analysis critically depends on molecular descriptors, typically handcrafted observables or learned kinetic embeddings. Designing descriptors that are both expressive and broadly applicable, however, remains challenging. We study persistent homology (PH) as a general-purpose representation for MD and introduce the masked Flood complex, a protein-tailored modification of a recently introduced simplicial complex construction that emphasizes inter-residue structure at low computational cost. Vectorized persistence diagrams then provide information-rich, geometry-aware summaries of protein conformations, which we evaluate on protein class prediction, frame-level observable regression, and Markov state model (MSM) estimation from learned low-dimensional coordinates in a single shared representation space. Results on the mdCATH dataset show that PH-based descriptors are competitive across tasks, with masked Flood PH yielding the most consistent overall performance. Further, when using topologically-informed MSMs as a drop-in replacement within the recent MarS-FM framework for generative modeling of protein conformations, we obtain consistently better ensemble statistics than MSMs based on physical observables. Finally, we explore the transferability of the generative model to qualitatively different, fast folding, proteins.

2606.14095 2026-06-16 cs.LG math.OC math.PR stat.ML 新提交

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

基于Lyapunov的弱耦合MDP样本复杂度分析

Tianhao Wu, Matthew Zurek, Weina Wang, Qiaomin Xie

发表机构 * Department of Industrial and Systems Engineering, University of Wisconsin-Madison(威斯康星大学麦迪逊分校工业与系统工程系) Department of Computer Sciences, University of Wisconsin-Madison(威斯康星大学麦迪逊分校计算机科学系) Computer Science Department, Carnegie Mellon University(卡内基梅隆大学计算机科学系)

AI总结 针对平均奖励弱耦合MDP和Restless Bandits,提出基于Lyapunov的分析框架,实现样本和计算复杂度关于臂数N的多项式级界限,并给出首个有限样本PAC保证。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了在生成模型下,平均奖励弱耦合马尔可夫决策过程(WCMDPs)和Restless Bandits(RBs)中学习的样本复杂度。直接简化为表格MDP会导致高复杂度界限,因为状态-动作空间随臂数$N$呈指数增长。通过利用弱耦合结构,我们证明可以以关于$N$的多项式样本和计算复杂度学习近优策略。具体来说,我们分析了插件方法,该方法对从数据估计的经验模型应用高效规划算法。对于完全异质的WCMDPs,我们建立了首个具有多项式复杂度和$O(1/\sqrt{N})$最优性间隙的有限样本PAC保证。对于同质RBs,我们进一步证明在温和的结构假设下可以实现更小的最优性间隙。我们工作的一个主要技术贡献是一个新颖的基于Lyapunov的分析框架。与依赖于难以控制的偏差函数的经典方法不同,我们的框架使用显式构造的Lyapunov函数以及真实模型与经验模型之间的漂移传递技术。我们框架中一个具有独立意义的关键步骤是对底层线性规划(LP)松弛的细粒度扰动分析,这为分析基于LP的策略和弱耦合系统提供了一个通用工具。

英文摘要

We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

2606.13295 2026-06-16 stat.ML cs.LG stat.ME 新提交

Simultaneous Latent Budget Trees for Stratified Classification

用于分层分类的同时潜在预算树

Simultaneous Latent Budget Trees for Stratified Classification Cristian Buoncompagni, Stefano Pellegrino, Giulia Vannucci, Raffaele Dubbioso, Roberta Siciliano

AI总结 提出同时潜在预算树框架,通过模型驱动的分裂规则处理分层因素,实现可解释分类,并应用于肌萎缩侧索硬化症性别差异分析。

详情
AI中文摘要

在可解释人工智能时代,单棵树因其易于解释而重新受到关注。本文介绍了同时潜在预算树,这是一个概率机器学习框架,用于在存在分层因素(如时间、空间或人口统计变量)作为控制变量或潜在混杂因素时的分类树。标准的树生长过程并非设计用于优化条件分裂规则。提出了一种基于模型的分裂规则,其中子节点被解释为同时混合模型(如同时潜在预算模型及其约束版本)的潜在成分,该模型拟合于父节点。混合参数驱动观测值(不同组别不同)到达子节点,而潜在预算参数更新控制变量每个水平的响应类别轮廓。参数通过最小二乘法估计,考虑模型的神经网络视角。信息丰富的树结构可以通过节点和路径上的解释辅助工具进行交互式可视化,包括视觉剪枝和决策树选择过程。提出了适当的措施来处理不平衡的响应类别分布。所提出的方法应用于调查肌萎缩侧索硬化症疾病进展中的性别相关差异。SLBT库及其各种基于树的算法可在链接的GitHub仓库中获取。

英文摘要

In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.

2605.03289 2026-06-16 stat.ML cs.LG math.ST stat.TH 版本更新

Imbalanced Classification under Capacity Constraints

容量约束下的不平衡分类

Daniel Fraiman, Ricardo Fraiman

发表机构 * Departamento de Matemática y Ciencias Universidad de San Andrés(数学与科学系,圣安德烈斯大学) CONICET Argentina(阿根廷国家科研委员会) PEDECIBA Matemática Uruguay(乌拉圭PEDECIBA数学)

AI总结 针对少数类检测中容量约束问题,提出形式化分类框架,通过重加权先验概率等价于贝叶斯分类器,并引入容量调整性能指标,实验表明优于传统方法和SMOTE。

详情
AI中文摘要

在欺诈检测、医学筛查和工业质量控制等应用中,从严重类别不平衡中检测少数类观测是一个核心挑战。在这些场景中,每个阳性预测都会触发昂贵的后续行动(如MRI扫描、交易审计),其执行受到实际运营约束。本文提出了一个容量约束下的形式化分类框架:给定用户定义的界限$b$(可标记为少数类的观测比例上限),目标是找到在该类上最大化灵敏度的分类器。我们刻画了该约束下的最优分类器,并建立了其与重加权先验概率下的经典贝叶斯分类器的等价性。我们还引入了一个容量调整的性能指标$M$,用于衡量容量约束生效时的有效检测率。该框架在标准学习方法(k-NN、SVM、随机森林和神经网络)上实现,并为每种方法建立了统计一致性。我们进一步证明,当没有超参数面向容量约束目标时,这些方法退化为事后阈值调整,并引入了一种容量感知支持向量机,在训练过程中利用约束,实现了最强的经验性能。在台湾信用卡违约数据集上的实验证实,在高不平衡情况下,容量约束分类器显著优于经典方法和SMOTE。该框架自然地扩展到多类别设置和在线环境。

英文摘要

Detecting observations from a minority class under severe class imbalance is a central challenge in applications such as fraud detection, medical screening, and industrial quality control. In these settings, each positive prediction triggers a costly follow-up action, an MRI scan, a transaction audit, whose execution is subject to real operational constraints. This paper proposes a formal classification framework under capacity constraints: given a user-defined bound limit $b$ on the proportion of observations that can be labeled as belonging to the minority class, the goal is to find the classifier that maximizes sensitivity on that class. We characterize the optimal classifier under this constraint and establish its equivalence with the classical Bayes classifier under a reweighting of the prior probabilities. We also introduce a capacity-adjusted performance metric $M$ that accounts for the effective detection rate when the capacity constraint is binding. The framework is implemented on top of standard learning methods, k-NN, SVM, random forests, and neural networks, and statistical consistency is established for each. We further show that these methods reduce to post-hoc thresholding when no hyperparameters are oriented toward the capacity-constrained objective, and introduce a capacity-aware support vector machine that exploits the constraint during training and achieves the strongest empirical performance. Experiments on the Taiwanese credit card default dataset confirm that capacity-constrained classifiers substantially outperform both classical approaches and SMOTE under high imbalance regimes. The framework extends naturally to multiclass settings and online environments.

2505.24275 2026-06-16 cs.LG math.OC stat.ML 版本更新

GradPower: Powering Gradients for Faster Language Model Pre-Training

GradPower: 通过梯度加速更快的语言模型预训练

Jinbo Wang, Mingze Wang, Jiaqi Zhang, Wei Wang, Peng Pei, Xunliang Cai, Weinan E, Lei Wu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。通过元素级符号幂变换,将梯度输入基础优化器,无需修改优化器内部逻辑或超参数,从而在多种架构、参数规模、数据集和学习率调度方案中均取得更低的终端损失。

Comments 24 pages, accepted by ICML 2026

详情
AI中文摘要

我们提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。给定一个梯度向量$g=(g_i)_i$,GradPower首先应用元素级符号幂变换:$φ_p(g)=({ m sign}(g_i)|g_i|^p)_{i}$,其中$p>0$为固定值,然后将变换后的梯度输入基础优化器。值得注意的是,GradPower只需单行代码更改,无需修改基础优化器的内部逻辑,包括超参数。当应用于Adam(称为AdamPower)时,GradPower在多种架构(LLaMA、Qwen2MoE)、参数规模(66M到2B)、数据集(C4、OpenWebText)和学习率调度方案(余弦、warmup-stable-decay)中均一致取得更低的终端损失。最显著的收益出现在训练现代混合专家模型时使用warmup-stable-decay调度方案。GradPower还无缝集成到其他最先进的优化器中,如Muon,从而进一步提升性能。最后,我们提供了理论分析,揭示了GradPower的内在机制,并突显了梯度噪声的影响。

英文摘要

We propose GradPower, a lightweight gradient-transformation technique for accelerating language model pre-training. Given a gradient vector $g=(g_i)_i$, GradPower first applies the elementwise sign-power transformation: $φ_p(g)=({\rm sign}(g_i)|g_i|^p)_{i}$ for a fixed $p>0$, and then feeds the transformed gradient into a base optimizer. Notably, GradPower requires only a single-line code change and no modifications to the base optimizer's internal logic, including the hyperparameters. When applied to Adam (termed AdamPower), GradPower consistently achieves lower terminal loss across diverse architectures (LLaMA, Qwen2MoE), parameter scales (66M to 2B), datasets (C4, OpenWebText), and learning-rate schedules (cosine, warmup-stable-decay). The most pronounced gains are observed when training modern mixture-of-experts models with warmup-stable-decay schedules. GradPower also integrates seamlessly with other state-of-the-art optimizers, such as Muon, yielding further improvements. Finally, we provide theoretical analyses that reveal the underlying mechanism of GradPower and highlight the influence of gradient noise.

2605.18324 2026-06-16 cs.CV cs.AI cs.GR cs.LG stat.ML 版本更新

Improved Baselines with Representation Autoencoders

改进的基于表示自动编码器的基线

Jaskirat Singh, Boyang Zheng, Zongze Wu, Richard Zhang, Eli Shechtman, Saining Xie

发表机构 * Adobe Research(Adobe研究院) ANU(澳大利亚国立大学) New York University(纽约大学)

AI总结 本文研究了基于表示自动编码器(RAE)的设计选择,发现三个见解,简化并改进了RAE。首先,研究了一种通用公式,将表示定义为最后k个编码器层的总和,而不是仅最终层。其次,研究了RAE与表示对齐(REPA)的假设,发现两者具有互补的工作机制。最后,改进了RAE在无分类器指导(CFG)中的表现,通过重新参数化DiT模型输出,实现了无需训练第二个模型的指导效果。RAEv2在ImageNet-256上达到了1.06的gFID,且训练效率显著提高。

详情
AI中文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for

英文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr6, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EPFID@k (epochs to reach unguided gFID < k) as a measure of training efficiency. RAEv2 attains an EPFID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. The code is available at https://raev2.github.io.

2511.09465 2026-06-16 stat.ML cs.LG 版本更新

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

分支流:带有分裂和删除的离散、连续和流形流匹配

Lukas Billera, Hedwig Nora Nordlinder, Jack Collier Ryder, Anton Oresten, Aron Stålmarck, Theodor Mosetti Björk, Ben Murrell

发表机构 * Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet(卡罗林斯卡研究所微生物学、肿瘤和细胞生物学系)

AI总结 提出分支流框架,通过随机分支和死亡过程控制序列元素数量,适用于变长数据生成,并在小分子、抗体序列和蛋白质骨架生成中验证效果。

Comments 39 pages, 16 figures

详情
AI中文摘要

扩散和流匹配方法在状态空间连续的领域(如图像生成或蛋白质折叠与设计)以及离散领域(如扩散大语言模型)中显示出前景。当状态中的元素数量预先固定时(如图像),它们自然适用,但当大语言模型响应的长度或蛋白质链中的氨基酸数量未知时,则需要临时解决方案。这里我们提出分支流,一种生成建模框架,与扩散和流匹配方法一样,将简单分布传输到数据分布。但在分支流中,状态中的元素在二叉树森林上演化,以模型学习的速率随机分支和死亡。这使得模型在生成过程中能够控制序列中的元素数量。我们还表明,分支流可以与离散集、连续欧几里得空间、光滑流形以及混合这些组件的“多模态”乘积空间上的任何流匹配基础过程组合。我们在三个领域进行了演示:小分子生成(多模态)、抗体序列生成(离散)和蛋白质骨架生成(多模态),并表明分支流是一个具有稳定学习目标的能力分布学习器,并且它实现了新的能力。

英文摘要

Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

2602.08210 2026-06-16 cs.LG stat.ML 版本更新

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

CADO:从模仿到成本最小化的组合优化热力图求解器

Hyungseok Song, Deunsol Yoon, Kanghoon Lee, Han-Seul Jeong, Soonyoung Lee, Woohyung Lim

发表机构 * LG AI Research(LG人工智能研究院)

AI总结 针对热力图求解器监督训练中模仿损失与成本最小化的目标不匹配问题,提出CADO框架,通过强化学习微调直接优化解码后解的成本,在多个基准上取得最优性能。

Comments 22 pages, 4 figures. Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026. OpenReview: https://openreview.net/forum?id=fvxx5FOED6

详情
AI中文摘要

基于热力图的求解器已成为组合优化(CO)的一种有前景的范式。然而,我们认为主流的监督学习(SL)训练范式存在根本性的目标不匹配:最小化模仿损失(例如交叉熵)并不能保证解的成本最小化。我们将这种不匹配分解为两个缺陷:解码器盲区(忽视不可微的解码过程)和成本盲区(优先考虑结构模仿而非解的质量)。我们通过实验证明,这些内在缺陷施加了硬性性能上限。为了克服这一限制,我们提出了CADO(成本感知的优化扩散模型),一个简化的强化学习微调框架,将扩散去噪过程建模为MDP,以直接优化解码后的解成本。我们引入了标签中心奖励,将真实标签重新用作无偏基线而非模仿目标,以及混合微调以实现参数高效的适应。CADO在多个基准上取得了最先进的性能,验证了目标对齐对于释放热力图求解器全部潜力的必要性。

英文摘要

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.

2602.08026 2026-06-16 cs.LG stat.ML 版本更新

Sharp analysis of linear ensemble sampling

线性集成采样的尖锐分析

David Janz, Arya Akhavan, Csaba Szepesvári

AI总结 本文针对随机线性bandits中的线性集成采样(ES)方法,证明当集成大小m=Θ(d log n)时,ES达到~O(d^{3/2}√n)的高概率遗憾,缩小了与汤普森采样基准的差距,同时保持计算量相当。

详情
AI中文摘要

我们分析了随机线性bandits中具有标准高斯扰动的线性集成采样(ES)。我们证明,对于集成大小$m=\Theta(d \log n)$,ES达到了$\tilde O(d^{3/2}\sqrt n)$的高概率遗憾,缩小了与汤普森采样基准的差距,同时保持计算量相当。证明通过将分析简化为$m$个独立布朗运动的时间一致超越问题,为线性bandits中的随机探索带来了新视角。这种连续时间视角在这里显得特别自然:它给出了相关离散时间过程的精确表示,而我们不知道有其他途径能得到尖锐的ES界。

英文摘要

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. This continuous-time lens appears particularly natural here: it yields an exact representation of the relevant discrete-time processes, and we do not know another route to a sharp ES bound.

2602.00781 2026-06-16 cs.LG stat.ML 版本更新

Fast Non-Episodic Finite-Horizon RL with K-Step Lookahead Thresholding

快速非情节有限时域强化学习:K步前瞻阈值法

Jiamin Xu, Kyra Gan

发表机构 * GitHub arXiv

AI总结 针对非情节有限时域MDP,提出K步前瞻Q函数与阈值机制,实现快速有限样本收敛,在合成环境和标准RL任务中优于现有方法。

详情
AI中文摘要

非情节、有限时域MDP中的在线强化学习仍未充分探索,且面临估计到固定终止时间的回报的挑战。现有的无限时域方法通常依赖折扣收缩,无法自然适应这种固定时域结构。我们引入一种修改的Q函数:不针对全时域,而是学习一个K步前瞻Q函数,将规划截断到接下来的K步。为了进一步提高样本效率,我们引入阈值机制:仅当动作的估计K步前瞻值超过时变阈值时才选择该动作。我们为这一新目标提供了一种高效的表格学习算法,证明其实现了快速有限样本收敛:对于$K=1$,达到极小极大最优常数遗憾;对于任意$K \geq 2$,达到$\mathcal{O}(\max((K-1),C_{K-1})\sqrt{SAT\log(T)})$遗憾。我们在最大化奖励的目标下数值评估了算法性能。我们的实现自适应地随时间增加K,平衡前瞻深度与估计方差。实验结果表明,在合成MDP和RL环境(JumpRiverswim、FrozenLake和AnyTrading)中,累积奖励优于最先进的表格RL方法。代码见\href{this https URL}{github}。

英文摘要

Online reinforcement learning in non-episodic, finite-horizon MDPs remains underexplored and is challenged by the need to estimate returns to a fixed terminal time. Existing infinite-horizon methods, which often rely on discounted contraction, do not naturally account for this fixed-horizon structure. We introduce a modified Q-function: rather than targeting the full-horizon, we learn a K-step lookahead Q-function that truncates planning to the next K steps. To further improve sample efficiency, we introduce a thresholding mechanism: actions are selected only when their estimated K-step lookahead value exceeds a time-varying threshold. We provide an efficient tabular learning algorithm for this novel objective, proving it achieves fast finite-sample convergence: it achieves minimax optimal constant regret for $K=1$ and $\mathcal{O}(\max((K-1),C_{K-1})\sqrt{SAT\log(T)})$ regret for any $K \geq 2$. We numerically evaluate the performance of our algorithm under the objective of maximizing reward. Our implementation adaptively increases K over time, balancing lookahead depth against estimation variance. Empirical results demonstrate superior cumulative rewards over state-of-the-art tabular RL methods across synthetic MDPs and RL environments: JumpRiverswim, FrozenLake and AnyTrading. Code is provided on \href{https://github.com/jamie01713/K-Step-Lookahead}{github}.

2510.10981 2026-06-16 stat.ML cs.LG 版本更新

In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

上下文学习可证明是贝叶斯推断:元学习的泛化理论

Tomoya Wakayama, Taiji Suzuki

发表机构 * University of Tokyo(东京大学) National Institute of Information and Communications Technology(信息通信技术国家研究所)

AI总结 本文在元学习框架下,将上下文学习总风险分解为贝叶斯差距和后验方差,并证明Transformer通过预训练选择最优元算法,在测试时快速收敛到真实任务的最优算法。

详情
AI中文摘要

本文在元学习框架下,为上下文学习(ICL)发展了一个有限样本统计理论,该框架能够容纳多种任务类型的混合。我们引入了一个原则性的风险分解,将总ICL风险分解为两个正交分量:贝叶斯差距和后验方差。贝叶斯差距量化了训练模型逼近贝叶斯最优上下文预测器的程度。对于均匀注意力Transformer,我们推导出该差距的非渐近上界,明确阐明了其对预训练提示数量及其上下文长度的依赖关系。后验方差是一个与模型无关的风险,代表内在的任务不确定性。我们的关键发现是,该项仅由真实底层任务的难度决定,而任务混合带来的不确定性随着少量上下文示例呈指数级消失。这些结果共同提供了ICL的统一视角:Transformer在预训练期间选择最优元算法,并在测试时快速收敛到真实任务的最优算法。

英文摘要

This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.

2505.03201 2026-06-16 stat.ML cs.LG 版本更新

Enhancing Visual Feature Attribution via Weighted Integrated Gradients

通过加权积分梯度增强视觉特征归因

Kien Tran Duc Tuan, Tam Nguyen Trong, Son Nguyen Hoang, Khoat Than, Anh Nguyen Duc

发表机构 * Institute of Information and Communication Technology, Vietnam Academy of Science and Technology(越南科学与技术学院信息与通信技术研究所)

AI总结 针对积分梯度方法对基线选择敏感的问题,提出加权积分梯度,通过无监督准则自适应选择和加权基线,在保持公理性质的同时提升归因可靠性,实验显示在卷积和Transformer架构上最高提升36%。

详情
AI中文摘要

积分梯度(IG)是可解释AI中广泛使用的归因方法,尤其在需要可靠特征归因的计算机视觉应用中。IG的一个关键限制是其对基线(参考)图像选择的敏感性。多基线扩展如期望梯度(EG)假设基线均匀加权,隐含地认为所有基线图像信息量相等。在高维视觉模型中,这一假设常导致噪声或不稳定的解释。本文提出加权积分梯度(WG),一种通过评估和加权基线来增强归因可靠性的原则性方法。WG引入了一个无监督的基线适用性标准,实现了基于每个输入的自适应基线选择和加权。该方法在广义加权基线形式下保留了IG的核心公理性质。在预期的、基于代理的适应度-相关性单调性假设下,WG为更信息丰富的基线分配更大权重提供了概率依据。在常用图像数据集和模型上的实验表明,在我们的协议下,WG优于EG,在评估的卷积和Transformer架构上最高提升36%。这些提升伴随着额外的适应度评估成本,因此WG应被视为归因保真度的权衡,而非EG的更快替代方案。通过超越所有基线贡献相等的假设,加权积分梯度为解释计算机视觉模型提供了更清晰、更可靠的方法,提高了可解释AI的理解和实际可用性。

英文摘要

Integrated Gradients (IG) is a widely used attribution method in explainable AI, particularly in computer vision applications where reliable feature attribution is essential. A key limitation of IG is its sensitivity to the choice of baseline (reference) images. Multi-baseline extensions such as Expected Gradients (EG) assume uniform weighting over baselines, implicitly treating all baseline images as equally informative. In high-dimensional vision models, this assumption often leads to noisy or unstable explanations. This paper proposes Weighted Integrated Gradients (WG), a principled approach that evaluates and weights baselines to enhance attribution reliability. WG introduces an unsupervised criterion for baseline suitability, enabling adaptive selection and weighting of baselines on a per-input basis. The method preserves the core axiomatic properties of IG in a generalized weighted-baseline form. Under an expected, proxy-based fitness--relevance monotonicity assumption, WG provides a probabilistic justification for assigning larger weights to more informative baselines. Experiments on commonly used image datasets and models show that WG improves over EG under our protocol, with up to 36% gains across evaluated convolutional and Transformer architectures. These gains come with additional fitness-evaluation cost, so WG should be viewed as an attribution-fidelity trade-off rather than a faster alternative to EG. By moving beyond the assumption that all baselines contribute equally, Weighted Integrated Gradients offers a clearer and more reliable approach to explaining computer-vision models, improving both understanding and practical usability in explainable AI.

2510.24043 2026-06-16 cs.LG stat.ML 版本更新

Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection

局部核投影离群度:一种用于多模态离群检测的两阶段方法

Akira Tamamori

发表机构 * Department of Computer Science, Aichi Institute of Technology(爱知技术大学计算机科学系)

AI总结 提出两阶段LKPLO框架,结合自适应损失函数、全局核PCA和局部聚类,解决多模态离群检测问题,在10个基准数据集上取得最优性能。

Comments 12 pages, 5 figures; accepted by The IEICE Transactions on Information and Systems

详情
AI中文摘要

本文提出两阶段LKPLO,一种新颖的多阶段离群检测框架,克服了传统基于投影的方法同时存在的局限性:它们依赖于固定的统计度量并假设单一数据结构。我们的框架独特地综合了三个关键概念:(1) 一种基于广义损失的离群度度量(PLO),用灵活的自适应损失函数(如我们提出的SVM类损失)替代固定度量;(2) 一个全局核PCA阶段,用于线性化非线性数据结构;(3) 一个后续的局部聚类阶段,用于处理多模态分布。在10个基准数据集上进行的全面5折交叉验证实验,结合自动超参数优化,表明两阶段LKPLO达到了最先进的性能。在现有方法失败且具有挑战性结构的数据集上,尤其是在多簇数据(Optdigits)和复杂高维数据(Arrhythmia)上,它显著优于强基线。此外,消融研究实证证实,核化和局部化阶段的协同组合对其优越性能不可或缺。这项工作为重要类别的离群检测问题贡献了一个强大的新工具,并强调了混合多阶段架构的重要性。

英文摘要

This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.

2510.06647 2026-06-16 stat.ML cs.LG 版本更新

Q-Learning with Fine-Grained Gap-Dependent Regret

具有细粒度间隙依赖遗憾的Q学习

Haochen Zhang, Zhong Zheng, Lingzhou Xue

发表机构 * Department of Statistics, The Pennsylvania State University(统计学系,宾夕法尼亚州立大学)

AI总结 针对表格型马尔可夫决策过程,提出细粒度间隙依赖遗憾界,分别改进UCB和非UCB算法,并修正了AMB算法的设计缺陷。

详情
AI中文摘要

我们研究了在情节式表格马尔可夫决策过程中无模型强化学习的细粒度间隙依赖遗憾界。现有的无模型算法实现了极小化极大最坏情况遗憾,但其间隙依赖界仍然粗糙,未能完全捕捉次优间隙的结构。我们通过为基于UCB和非UCB的算法建立细粒度间隙依赖遗憾界来解决这一限制。在基于UCB的设置中,我们开发了一个新颖的分析框架,明确分离了最优和次优状态-动作对的分析,从而为UCB-Hoeffding (Jin et al., 2018) 提供了第一个细粒度遗憾上界。为了突出该框架的通用性,我们引入了ULCB-Hoeffding,这是一种受AMB (Xu et al., 2021) 启发但结构简化的新UCB算法,它享有细粒度遗憾保证并在经验上优于AMB。在非UCB设置中,我们重新审视了唯一已知的算法AMB,并识别出其算法设计和分析中的两个关键问题:Q更新中的不当截断以及其集中论证中鞅差条件的违反。我们提出了AMB的改进版本,解决了这些问题,为非UCB方法建立了第一个严格的细粒度间隙依赖遗憾,实验表明其性能优于AMB。

英文摘要

We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.

2510.01175 2026-06-16 cs.LG eess.SP math.OC stat.ML 版本更新

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

关于过参数化矩阵感知中权重归一化的优势

Yudong Wei, Liang Zhang, Bingcong Li, Niao He

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文证明在过参数化矩阵感知中,权重归一化结合黎曼优化可实现线性收敛,相比未使用归一化的方法获得指数级加速,且过参数化程度越高,迭代和样本复杂度多项式级降低。

详情
AI中文摘要

尽管归一化技术在深度学习中广泛应用,但其理论理解仍然相对有限。在这项工作中,我们建立了(广义)权重归一化(WN)应用于过参数化矩阵感知问题的优势。我们证明,使用黎曼优化的WN实现了线性收敛,相比未使用WN的标准方法获得了指数级加速。我们的分析进一步表明,随着过参数化程度的增加,迭代和样本复杂度都多项式级地改善。据我们所知,这项工作首次描述了WN如何利用过参数化在矩阵感知中实现更快的收敛。

英文摘要

While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.

2501.19401 2026-06-16 cs.LG stat.ML 版本更新

DAL: A Practical Prior-Free Black-Box Framework for Piecewise Stationary Bandits

DAL:一种面向分段平稳赌博机的实用无先验黑盒框架

Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出检测增强学习(DAL)框架,无需非平稳性先验知识,将任意最优静态赌博机算法与变化检测器结合,在多种非平稳场景下超越现有方法。

Comments 28 pages, 12 figures

详情
AI中文摘要

我们引入了一种实用的黑盒框架,称为检测增强学习(DAL),用于解决无需底层非平稳性先验知识的分段平稳赌博机问题。DAL接受任何具有阶数最优遗憾的静态赌博机算法作为输入,并通过变化检测器对其进行增强,使其适用于所有常见的赌博机变体。大量实验表明,DAL在各种非平稳场景中(包括合成基准和真实世界数据集)始终优于所有最先进的方法,凸显了其通用性和可扩展性。我们提供了对DAL强大经验性能的理论见解,并辅以彻底的经验验证。

英文摘要

We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of piecewise stationary bandits without knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm with order-optimal regret as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses all state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough empirical validation.

2508.03867 2026-06-16 math.AG cs.LG stat.ML 版本更新

Constraining the outputs of ReLU neural networks

约束ReLU神经网络的输出

Yulia Alexandr, Guido Montúfar

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Max Planck Institute for Mathematics in the Sciences(马克斯·普朗克数学研究所)

AI总结 通过引入与ReLU网络相关的代数簇,利用激活区域内的秩约束推导多项式方程,刻画网络可表示的函数,并研究簇达到预期维度的条件。

Comments 33 pages, 4 figures

详情
AI中文摘要

我们引入了一类与ReLU神经网络自然相关的代数簇,这些代数簇源于网络输出在输入空间激活区域上的分段线性结构,以及在参数空间上的分段多线性结构。通过分析每个激活区域内网络输出的秩约束,我们推导出刻画网络可表示函数的多项式方程。我们进一步研究了这些簇达到预期维度的条件,从而深入理解ReLU网络的表达能力和结构特性。

英文摘要

We introduce a class of algebraic varieties naturally associated with ReLU neural networks, arising from the piecewise linear structure of their outputs across activation regions in input space, and the piecewise multilinear structure in parameter space. By analyzing the rank constraints on the network outputs within each activation region, we derive polynomial equations that characterize the functions representable by the network. We further investigate conditions under which these varieties attain their expected dimension, providing insight into the expressive and structural properties of ReLU networks.

2505.06589 2026-06-16 stat.ML cs.AI math.OC 版本更新

Optimal Transport for Machine Learners

机器学习者的最优传输

Gabriel Peyré

AI总结 本书从机器学习角度介绍最优传输(OT)技术,涵盖从Monge映射、Kantorovich对偶到Sinkhorn算法等核心方法,并展示其在损失函数、生成模型、领域适应、梯度流等ML任务中的应用。

详情
AI中文摘要

现代机器学习反复操作概率测度:经验数据集、生成样本、潜在分布、类别条件律、粒子系统、宽网络权重和注意力模式。最优传输在此场景中很有用,因为它通过询问质量应如何移动来比较这些对象。因此,它结合了具有统计意义的差异概念与插值几何、对偶证书和变分动力学。这使得OT成为损失函数、生成建模、领域适应、鲁棒学习、重心、梯度流和学习算法的平均场描述的通用语言。本书以这些机器学习用途为出发点,介绍主要的OT技术。它从有限分配和Monge映射视角开始,过渡到Kantorovich耦合和对偶势,然后解释使传输可用的算法思想:线性规划、半离散单元、Sinkhorn缩放和低维投影。随后,相同的对象被重新用作测度几何,给出Wasserstein距离、重心、梯度流、动态公式和高斯/Bures公式。最后几章强调与现代ML最相关的变体:散度和对抗损失、熵松弛和非平衡松弛、鲁棒或谱地面几何、Gromov和量子扩展,以及基于传输的生成模型、平均场网络和注意力动态视图。目标是保持数学的明确性,同时揭示将OT转化为机器学习者可用工具箱所需的计算和几何直觉。

英文摘要

Modern machine learning repeatedly manipulates probability measures: empirical datasets, generated samples, latent distributions, class-conditional laws, particle systems, weights of wide networks and attention patterns. Optimal transport is useful in this setting because it compares such objects by asking how mass should move. It therefore combines a statistically meaningful notion of discrepancy with a geometry of interpolation, dual certificates and variational dynamics. This makes OT a common language for losses, generative modeling, domain adaptation, robust learning, barycenters, gradient flows and mean-field descriptions of learning algorithms. This book presents the main OT techniques with these machine-learning uses in mind. It starts from finite assignment and the Monge map viewpoint, passes to Kantorovich couplings and dual potentials, and then explains the algorithmic ideas that make transport usable: linear programming, semi-discrete cells, Sinkhorn scaling and low-dimensional projections. The same objects are then reused as a geometry of measures, giving Wasserstein distances, barycenters, gradient flows, dynamic formulations and Gaussian/Bures formulas. The final chapters emphasize the variants most relevant to modern ML: divergences and adversarial losses, entropic and unbalanced relaxations, robust or spectral ground geometries, Gromov and quantum extensions, and transport-based views of generative models, mean-field networks and attention dynamics. The goal is to keep the mathematics explicit while exposing the computational and geometric intuitions needed to turn OT into a working toolbox for machine learners.

2409.18909 2026-06-16 cs.LG cs.IT math.IT stat.ML 版本更新

Best Arm Identification with Minimal Regret

最小化遗憾的最佳臂识别

Junwen Yang, Vincent Y. F. Tan, Tianyuan Jin

发表机构 * Institute of Operations Research and Analytics National University of Singapore(运营研究与分析研究所,新加坡国立大学) Department of Mathematics Department of Electrical and Computer Engineering Institute of Operations Research and Analytics National University of Singapore(数学系电子与计算机工程系运营研究与分析研究所,新加坡国立大学) Department of Mathematics National University of Singapore(数学系新加坡国立大学)

AI总结 提出在最小化累积遗憾的同时以置信度δ识别最佳臂的问题,利用信息论推导下界,并设计渐近最优的Double KL-UCB算法。

详情
AI中文摘要

受需要负责任实验的现实应用启发,我们提出了最小化遗憾的最佳臂识别(BAI)问题。这一多臂老虎机问题的变体优雅地融合了其两个最普遍的目标:遗憾最小化和BAI。更准确地说,智能体的目标是以规定的置信水平δ识别最佳臂,同时最小化直到停止时间的累积遗憾。聚焦于单参数指数族分布,我们利用信息论技术建立了期望累积遗憾的实例相关下界。此外,我们提出了一个不可能结果,强调了固定置信度BAI中累积遗憾与样本复杂度之间的张力。作为补充,我们设计并分析了Double KL-UCB算法,该算法在置信水平趋近于零时达到渐近最优性。值得注意的是,该算法采用两种不同的置信界限以随机方式指导臂选择。我们的发现阐明了遗憾最小化与BAI之间内在联系的新视角。

英文摘要

Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level $δ$, while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB algorithm, which achieves asymptotic optimality as the confidence level tends to zero. Notably, this algorithm employs two distinct confidence bounds to guide arm selection in a randomized manner. Our findings elucidate a fresh perspective on the inherent connections between regret minimization and BAI.

2405.15768 2026-06-16 stat.ML cs.AI cs.LG 版本更新

Canonical Variates in Wasserstein Metric Space

Wasserstein度量空间中的典型变量

Jia Li, Lin Lin

发表机构 * Department of Statistics, The Pennsylvania State University(宾夕法尼亚州立大学统计学系) Department of Biostatistics and Bioinformatics, Duke University(杜克大学生物统计学与生物信息学系)

AI总结 针对分布数据分类问题,提出基于Wasserstein距离的Fisher比最大化降维方法,通过迭代优化算法实现,实验证明能显著提升分类性能。

Comments single space 39 pages, 10 figures

详情
AI中文摘要

在本文中,我们处理由向量空间上的分布(而非单个点)表示的实例的分类问题。我们考虑基于成对距离的分类算法,特别是分布之间的Wasserstein度量。我们研究的核心是在Wasserstein度量空间中进行降维以提高分类准确性。我们引入了一种基于最大化Fisher比(定义为类间变异与类内变异之比)原理的新方法。该比值最大化的方向被称为判别坐标或典型变量轴。在实践中,类间变异和类内变异被定义为分布对之间的平均平方Wasserstein距离,这些分布对要么属于同一类,要么属于不同类。该比值优化通过一种迭代算法实现,该算法在向量空间中的最优传输和最大化步骤之间交替进行。进行了实证研究以评估算法的收敛性;实验结果表明,降维技术显著提高了分类性能。此外,新方法优于基于从分布数据派生的向量表示运行的成熟算法。它对实例如何由分布总结的变化(例如高斯混合模型表示中的分量数量)也表现出鲁棒性。

英文摘要

In this paper, we address the classification of instances represented by distributions on a vector space rather than single points. We consider classification algorithms based on pairwise distances, specifically, the Wasserstein metric between distributions. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, both between-class and within-class variations are defined as the average squared Wasserstein distances between pairs of distributions, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. Empirical studies are conducted to assess the algorithm's convergence; and experimental results demonstrate that the dimension reduction technique substantially enhances classification performance. Moreover, the new method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness to variations in how instances are summarized by distributions, such as the number of components in a Gaussian mixture model (GMM) representation.

8. 生物统计与医学统计 9 篇

2606.16726 2026-06-16 q-bio.QM stat.AP 新提交

Too Few or Too Many? Sample Size Estimation for Differential Abundance Studies

太少还是太多?差异丰度研究的样本量估计

Michael Agronah, Benjamin M. Bolker

AI总结 提出一种基于效应大小、平均丰度和统计功效的样本量计算方法,通过R包power.nb实现,并利用30个真实微生物组数据集验证,发现现有研究样本量不足。

详情
AI中文摘要

确定适当的研究样本量是规划科学研究的关键步骤。适当的样本量规划可避免样本量不足和过度膨胀。样本量过大会浪费资源、受试者的时间和精力以及实验动物的生命。样本量不足(一个更常见的问题)会因无法检测到生物学上有意义的差异而浪费更多资源,并助长可疑的研究实践,如$p$-hacking。微生物组研究尤其受到小样本量的挑战,特别是在人类受试者或昂贵动物模型的研究中。在实践中,差异丰度研究中分类群的统计功效受效应大小(通常量化为倍数变化)、单个分类群的平均丰度和样本数量的影响。我们提出了一种新的样本量计算方法,用于差异丰度研究,作为效应大小、平均丰度和分类群统计功效的函数。我们的方法已在power.nb R包中实现,可从https://michaelagronah.com/power.nb/articles/stub.html获取。我们利用从30个真实世界微生物组数据集中获得的分类群平均丰度和倍数变化估计值,应用我们的模型进行样本量计算。结果表明,差异丰度微生物组研究需要比当前文献中普遍存在的样本量更大的样本量,才能达到足够的统计功效。我们的框架将帮助研究人员就适当的样本量做出明智的决策。

英文摘要

Determining an appropriate sample size for a study is a crucial step in planning scientific research. Appropriate sample size planning avoids both inadequate and inflated sample sizes. Inflated sample sizes wastes resources, time and effort of human subjects, and lives of experimental animals. Inadequate sample sizes, a much more common problem, wastes even more resources through the inability to detect biologically meaningful differences and encourages questionable research practices like $p$-hacking. Microbiome studies are particularly challenged by small sample sizes, particularly in studies of human subjects or expensive animal models. In practice, the statistical power of taxa within a differential abundance study is influenced by the effect size (typically quantified as fold change), mean abundance of individual taxa, and the number of samples. We present a novel approach for sample size calculation for differential abundance studies as a function of effect size, mean abundance and statistical power of taxa. Our method is implemented in the power.nb R package, available at https://michaelagronah.com/power.nb/articles/stub.html. We applied our model for sample size calculation using estimates of mean abundance and fold change of taxa obtained from thirty real-world microbiome datasets. Our results showed that differential abundance microbiome studies require larger sample sizes than are currently prevalent in the literature to achieve adequate statistical power. Our framework will help researchers make informed decisions about appropriate sample sizes.

2606.16460 2026-06-16 stat.AP stat.ME 新提交

Module-Structured Mixture Factor Models to Identify Outcome-Specific Signatures in Gene Expression Data

模块化结构混合因子模型识别基因表达数据中的结果特异性特征

Jinran Wu, Geoffrey J. McLachlan, Saumyadipta Pyne

AI总结 提出模块化结构混合因子模型,结合有限混合建模与基因模块级低秩因子表示,分解表达变异性,实现可解释的无监督疾病亚型识别。

Comments 24 pages, 2 figures

详情
AI中文摘要

高通量基因表达数据表现出高维度、复杂的基因间依赖性和样本间显著的生物学异质性,给无监督聚类和疾病亚型发现带来了重大挑战。我们引入了一种模块化结构混合因子模型,该模型将有限混合建模与在基因模块级别定义的低秩潜在因子表示相结合。通过在均值和协方差结构中显式建模基因模块,所提出的框架将表达变异性分解为全局基因特异性效应、簇特异性模块级偏移、模块内的潜在依赖性以及基因特异性残差噪声。开发了一种期望-条件最大化算法用于参数估计,允许在高维转录组学环境中进行稳定且可扩展的推断。该框架利用大型临床转录组数据集,能够对两种自身免疫性疾病中与疾病相关的分子亚型和表型异质性进行可解释的无监督识别。

英文摘要

High-throughput gene expression data exhibit high dimensionality, complex intergene dependence, and pronounced biological heterogeneity across samples, presenting major challenges for unsupervised clustering and disease subtype discovery. We introduce a module-structured mixture factor model that combines finite mixture modelling with low-rank latent factor representations defined at the gene-module level. By explicitly modelling gene modules in both the mean and covariance structure, the proposed framework decomposes expression variability into global gene-specific effects, cluster-specific module-level shifts, latent dependence within modules, and gene-specific residual noise. An Expectation--Conditional Maximisation algorithm is developed for parameter estimation, allowing stable and scalable inference in high-dimensional transcriptomic settings. This framework enables interpretable unsupervised identification of disease-associated molecular subtypes and phenotypic heterogeneity across two autoimmune diseases using a large clinical transcriptomic dataset.

2606.15478 2026-06-16 stat.ME 新提交

A Bayesian Functional Accelerated Failure-Time Model with Varying Effects Correcting for Measurement Error

贝叶斯函数加速失效时间模型:考虑测量误差的变效应

Joseph Yang, Roger Zoh, Carmen Tekwe, Lan Xue

AI总结 提出贝叶斯函数加速失效时间模型,通过高斯过程单指标结构建模功能系数随时间和标量协变量的变化,并利用工具变量处理功能协变量的测量误差,以分析可穿戴设备数据中步数活动与缺血性卒中死亡时间的关系。

详情
AI中文摘要

在许多生物医学环境中,作为连续观测轨迹收集的功能数据自然出现,一个关键的推断目标是理解这些功能协变量如何与时间-事件结局相关,同时允许这种关系在由标量特征定义的子组之间变化。现有的函数加速失效时间(AFT)模型的频率学派方法难以灵活捕捉标量协变量和时间对功能效应的联合非线性影响,并且没有充分解决经常污染功能观测暴露的测量误差。我们提出一个贝叶斯函数AFT模型,其中功能系数是时间和一组标量协变量的变效应函数,通过高斯过程单指标结构建模,为子组修正提供灵活的非线性框架。功能协变量中的测量误差通过配对代理观测与工具变量来处理,该工具变量适应与潜在功能暴露的非线性关联。这种功能数据的一个主要来源是可穿戴设备,它可以连续监测身体活动(PA)行为模式随时间的变化,但其输出众所周知容易受到测量误差的影响,并且在不同人口统计子组中表现出与健康结果的异质性关联。通过模拟,我们表明我们的方法恢复了真实的变功能效应,并相对于忽略测量误差的朴素模型减少了偏差。我们将我们的方法应用于中风地理和种族差异原因(REGARDS)研究,以调查步数身体活动如何与不同种族和地区组中的缺血性卒中死亡时间相关。

英文摘要

Functional data collected as continuously observed trajectories arise naturally in many biomedical settings, and a key inferential goal is understanding how such functional covariates relate to time-to-event outcomes while allowing that relationship to vary across subgroups defined by scalar characteristics. Existing frequentist approaches to functional accelerated failure-time (AFT) models struggle to flexibly capture the joint, nonlinear influence of scalar covariates and time on the functional effect, and none adequately address the measurement error that frequently contaminates functionally observed exposures. We propose a Bayesian functional AFT model in which the functional coefficient is a varying effect function of both time and a set of scalar covariates, modeled through a Gaussian process single-index structure that provides a flexible, nonlinear framework for subgroup modification. Measurement error in the functional covariate is handled by pairing a proxy observation with an instrumental variable that accommodates non-linear associations with the latent functional exposure. A prominent source of such functional data is wearable devices, which can continuously monitor physical activity (PA) behavioral patterns over time, yet whose outputs are well known to be prone to measurement error and to exhibit heterogeneous associations with health outcomes across demographic subgroups. Through simulations, we show that our approach recovers the true varying functional effects and reduces bias relative tonaïve models that ignore measurement error. We apply our methods to the Reasons for Geographical and Racial Differences in Stroke (REGARDS) study to investigate how step-count physical activity relates to time-to-death from ischemic stroke across racial and regional groups.

2606.15445 2026-06-16 stat.ME 新提交

Interim Monitoring as an Information-Time Alignment Problem: The WCR Framework for Time-to-Event Trials

作为信息时间对齐问题的期中监测:用于时间至事件试验的WCR框架

Haitao Pan, Zhongheng Cai

AI总结 提出WCR框架,通过锁定队列和校准随访要求参数化随访成熟度,解决事件驱动与入组驱动设计的时序矛盾,控制I类错误和功效,平衡日历时间、成熟度和决策延迟。

Comments Main manuscript with supplementary material. R package WCRBayesDesign is available on CRAN. Submitted to Biometrics

详情
AI中文摘要

时间至事件试验中的期中监测必须在推断成熟度与操作上有意义的时机之间取得平衡。事件驱动设计将分析事件累积对齐,但可能产生大量且不可预测的日历延迟,而入组驱动设计提供可预测的时机,但可能依赖于不成熟的随访。我们提出窗口队列与校准随访要求(WCR)框架,该框架通过锁定队列规模和锁定后随访要求直接参数化随访成熟度。期中分析在预设队列累积了校准的最小随访时间后进行,此时入组可能继续,后续患者保留用于最终分析。该框架区分了用于里程碑生存估计的受限随访和用于比例风险估计的非受限随访,从而将有效信息范围与估计量联系起来。设计参数和决策阈值通过约束优化联合校准,以控制I类错误和功效,同时平衡日历时间、期中成熟度和决策延迟负担。由一项罕见儿科肿瘤学试验驱动的模拟研究表明,WCR在校准模型下达到目标操作特征,并提供比传统事件驱动和入组驱动方法更稳定且可解释的期中时机。该方法已在开源R包WCRBayesDesign中实现,可从CRAN获取。WCR将期中监测重新定义为信息时间对齐问题,并为事件稀疏、入组缓慢和长终点周期的单臂试验提供了实用设计策略。

英文摘要

Interim monitoring in time-to-event trials must balance inferential maturity with operationally meaningful timing. Event-driven designs align analyses with event accumulation but can produce substantial and unpredictable calendar delays, whereas enrollment-driven designs provide predictable timing but may rely on immature follow-up. We propose the Window-Cohort with Calibrated Follow-Up Requirement (WCR) framework, which directly parameterizes follow-up maturity through a locked cohort size and a post-lock follow-up requirement. The interim analysis is conducted after the prespecified cohort has accrued the calibrated minimum follow-up, while enrollment may continue and later patients are reserved for the final analysis. The framework distinguishes restricted follow-up for landmark survival estimands from unrestricted follow-up for proportional hazards estimands, thereby linking the effective information horizon to the estimand. Design parameters and decision thresholds are jointly calibrated through constrained optimization to control type I error and power while balancing calendar time, interim maturity, and decision-lag burden. Simulation studies motivated by a rare pediatric oncology trial show that WCR attains target operating characteristics under the calibration model and offers more stable and interpretable interim timing than conventional event-driven and enrollment-driven approaches. The methodology is implemented in the open-source R package WCRBayesDesign, available on CRAN. WCR reframes interim monitoring as an information-time alignment problem and provides a practical design strategy for single-arm trials with sparse events, slow accrual, and long-horizon endpoints.

2606.15397 2026-06-16 q-bio.PE stat.AP 新提交

On the Equivalence of Instantaneous and Mechanistic Reproduction Numbers

瞬时繁殖数与机制繁殖数的等价性

Jeremy Goldwasser, Ryan J. Tibshirani, Alyssa Bilinski

AI总结 本文证明在均匀混合假设下,通过更新方程定义的瞬时繁殖数与SEIR等房室模型中的机制繁殖数等价,并推导了SEIR动力学隐含的世代间隔分布。

详情
AI中文摘要

有效繁殖数($R_t$)被广泛用于实时追踪流行病动态。标准估计框架使用通过更新方程定义的“瞬时$R_t$”,该方程通过世代间隔分布将新感染与过去感染联系起来。SEIR等房室模型基于有效接触率和传染期长度产生一个看似不同的量——“机制$R_t$”。我们证明在均匀混合(房室模型的标准假设)下,这两个定义是等价的。我们还推导了SEIR动力学隐含的世代间隔分布。一个实际后果是,通常被视为更新方程估计器的低假设输入的世代间隔,实际上编码了特定的房室结构。

英文摘要

The effective reproduction number ($R_t$) is widely used to track epidemic dynamics in real time. The standard estimation framework uses "instantaneous $R_t$," defined via the renewal equation, which relates new infections to past infections through a generation interval distribution. Compartmental models like SEIR yield a seemingly distinct quantity, "mechanistic $R_t$," based on the effective contact rate and duration of infectiousness. We prove these two definitions are equivalent under homogeneous mixing, the standard assumption in compartmental modeling. We also derive the generation interval distribution implied by SEIR dynamics. A practical consequence is that generation intervals, often treated as assumption-light inputs to renewal equation estimators, in fact encode specific compartmental structure.

2606.15145 2026-06-16 stat.ME 新提交

On the estimation of the median odds ratio for measuring contextual effects in multilevel binary data from complex survey designs

复杂调查设计中多水平二值数据中测量情境效应的中位数比值比估计

Shafayet Khan Shafee, M. Shafiqur Rahman

AI总结 针对多水平二值数据,提出基于Delta方法的中位数比值比(MOR)区间估计,适用于两水平和三水平模型,模拟显示中到大样本下偏差小且覆盖概率满意。

Comments 16 pages, 1 figure, 3 tables and 3 supplementary tables; supplementary material included as an appendix within the same file

详情
AI中文摘要

在具有聚类或分层数据结构的研究中,量化组间异质性(称为情境效应)对于有效的组级推断至关重要。中位数比值比(MOR)源自聚类二值数据的随机效应(RE)逻辑回归模型,提供了对情境效应的直观评估。现有研究大多关注两水平模型MOR的点估计,对其在复杂多水平结构下的统计性质探索有限。然而,相应的区间估计量的开发对于统计推断至关重要。此外,许多现实世界数据集,特别是来自多阶段调查的数据集,涉及超过两水平的分层结构,其中每个水平的情境效应都值得关注。本文讨论了二值和三值二值数据MOR的估计,特别强调区间估计。由于MOR是基于RE logit模型方差分量的后估计量,其置信区间使用Delta方法推导,将对数变换后的MOR视为渐近正态。该方法在两水平和三水平设置的不同模型规范中进行了演示。一项广泛的模拟研究评估了MOR估计量在分层数据设置的不同场景下的性能。结果表明,对于中到大样本,估计量表现出可忽略的偏差和满意的95%置信区间覆盖概率,小样本偏差主要归因于方差分量估计。将所提方法应用于估计剖宫产的情境效应表明,该框架增强了可解释性,并支持更明智的统计和政策导向分析。

英文摘要

In studies with clustered or hierarchical data structures, quantifying between-cluster heterogeneity, referred to as contextual effects, is crucial for valid cluster-level inference. The median odds ratio (MOR), derived from random effects (RE) logistic regression models for clustered binary data, provides an intuitive assessment of contextual effects. Most existing research focuses on point estimation of the MOR for two-level models, with limited exploration of its statistical properties under complex multilevel structures. However, the development of corresponding interval estimators is essential for statistical inference. Moreover, many real-world datasets, particularly those from multistage surveys, involve hierarchical structures beyond two levels, where contextual effects at each level are of interest. This paper discusses the estimation of MOR for both the two-and three-level binary data, with particular emphasis on interval estimation. Since the MOR is a post-estimation measure based on variance components of the RE logit model, its confidence interval is derived using the Delta method, treating the log-transformed MOR as asymptotically normal. The approach is demonstrated across different model specifications in two-and three-level settings. An extensive simulation study evaluated the performance of the MOR estimators across diverse scenarios in hierarchical data settings. The results showed that the estimators exhibited negligible bias and satisfactory coverage probability of a 95% confidence interval for moderate to large samples, with small-sample bias mainly due to variance component estimation. An application of the methods for estimating the contextual effect on C-section delivery demonstrated that the proposed framework enhances interpretability and supports more informed statistical and policy-oriented analyses.

2606.14902 2026-06-16 stat.ME stat.AP 新提交

Heterogeneous behavioral mechanisms in epidemiological models

流行病学模型中的异质性行为机制

Jessica Pavani, Rob Deardon, Alexandra M. Schmidt

AI总结 提出贝叶斯混合模型将人群分为风险中性和风险规避两类,通过整合异质性行为显著提升参数恢复、疫情轨迹估计和预测精度。

详情
AI中文摘要

传统流行病模型通常假设行为同质性。易感-感染-恢复模型为描述疾病传播提供了坚实基础,但未考虑人们实际如何应对风险。相比之下,行为变化模型纳入了捕捉个体在疫情期间调整行为的机制,认识到感染风险上升通常会激发保护行为。然而,这两种方法都有一个关键局限:它们忽视了人群固有的异质性。现实中,社区是风险容忍度和行为倾向的复杂混合体。忽略这种固有异质性会掩盖个体感知和应对疾病威胁的重要差异。本文提出一种新颖的贝叶斯混合模型,通过将人群划分为两种不同的行为模式来克服这一局限:风险中性个体(维持基线接触率)和风险规避个体(根据疫情严重程度调节行为)。通过将这些不同的动态整合到一个统一的传播框架中,所提出的模型明确考虑了聚合方法常忽略的多样化人群行为。通过模拟研究和实证数据应用,我们证明该方法在参数恢复、疫情轨迹估计和预测精度方面显著优于传统模型。结果表明,未能考虑行为多样性会导致峰值估计偏差和人为拉长的流行曲线。因此,本研究为预测社会分化环境中的疫情暴发轨迹提供了更精细的计算工具,确保公共卫生干预策略建立在行为现实的基础上。

英文摘要

Traditional epidemic models frequently assume behavioral homogeneity. The susceptible-infected-recovered model provides a robust foundation for characterizing disease transmission, but it does so without accounting for how people actually respond to risk. In contrast, behavioral change models incorporate mechanisms that capture how individuals adjust their actions during an outbreak, recognizing that rising infection risk typically motivates protective behaviors. Yet both approaches share a key limitation: they overlook the inherent heterogeneity of a population. In reality, communities are a complex mixture of risk tolerances and behavioral tendencies. Ignoring this inherent heterogeneity can obscure important differences in how individuals perceive and respond to disease threats. This paper introduces a novel Bayesian mixture model designed to address this limitation by partitioning the population into two distinct behavioral patterns: risk-neutral individuals, who maintain baseline contact rates, and risk-averse individuals, who modulate their behavior in response to epidemic severity. By integrating these disparate dynamics into a unified transmission framework, the proposed model explicitly accounts for varying population behaviors often overlooked by aggregate approaches. Through simulation studies and empirical data applications, we demonstrate that this approach significantly outperforms traditional models in parameter recovery, epidemic trajectory estimation, and forecasting precision. The findings suggest that failing to account for behavioral diversity leads to biased peak estimates and artificially stretched epidemic curves. Consequently, this research provides a more nuanced computational toolkit for predicting outbreak trajectories in socially fragmented environments, ensuring that public health intervention strategies are informed by a foundation of behavioral realism.

2604.18863 2026-06-16 stat.ME 版本更新

Overstuffed sandwiches and separation anxiety: finite-sample variance estimation for penalized GEE with near-separated binary data

过度填充三明治与分离焦虑:近分离二元数据下惩罚GEE的有限样本方差估计

Awan Afiaz, M. Shafiqur Rahman

AI总结 针对纵向二元数据近分离问题,提出一种新的有限样本方差估计方法$\hat{V}_{AR}$,通过保持得分级杠杆校正并添加有限样本向上平移,在低事件小样本场景下实现保守或接近名义I类错误率。

Comments 56 pages, 9 figures, 7 tables. Includes supplementary appendix in the main file

详情
AI中文摘要

惩罚广义估计方程(PGEE)稳定了近分离下纵向二元数据的点估计,但推断仍取决于三明治方差的校正方式。现有的PGEE校正方法可能在高杠杆方向上过度调整,需要限制性的合并假设,或在不解释偏差的情况下添加全局正则化。我们建立了沿收敛内根序列的PGEE一阶渐近性质,并推导了完全杠杆调整引起的参数特定过度校正的矩阵表征。有限样本校准受到均值偏差和杠杆校正方差估计变异性的限制。我们提出$\hat{V}_{AR}$,它保持得分级杠杆校正,并添加一个有限样本向上平移,其一阶主导项为有限总体因子,中心项较小。在模拟中,$\hat{V}_{AR}$在低事件、小$N$设置(包括$N=10$)下给出保守或接近名义I类错误率,而几种标准校正仍保持反保守,且对于非平衡设计,合并估计量不可用。

英文摘要

Penalized generalized estimating equations (PGEE) stabilize point estimation for longitudinal binary data under near-separation, but inference still depends on how the sandwich variance is corrected. Existing corrections for PGEE can overadjust in high-leverage directions, require restrictive pooling assumptions, or add global regularization without explaining the bias. We establish first-order asymptotics for PGEE along convergent interior-root sequences and derive a matrix characterization of the parameter-specific overcorrection induced by full leverage adjustment. Finite-sample calibration is limited by both mean bias and the variability of leverage-corrected variance estimates. We propose $\hat{V}_{AR}$, which keeps the score-level leverage correction and adds a finite-sample upward translation dominated at first order by the finite-population factor, with a smaller centering term. In simulations, $\hat{V}_{AR}$ gives conservative or near-nominal type I error in low-event, small-$N$ settings, including $N = 10$, where several standard corrections remain anti-conservative and pooling estimators are unavailable for unbalanced designs.

2602.18241 2026-06-16 stat.ME 版本更新

Online FDR Controlling procedures for statistical SIS Model and its application to COVID19 data

统计SIS模型的在线FDR控制程序及其在COVID19数据中的应用

Seohwa Hwang, Junyong Park

AI总结 提出一种基于条件局部FDR的在线FDR控制方法,用于离散且具有复杂依赖性的传染病数据集,在SIS模型下通过动态贝叶斯网络实现高检测功效的FDR控制。

Comments I have revise this paper a lot, so I think reupload the paper is required

详情
AI中文摘要

我们提出了一种基于条件局部FDR(LIS)的在线错误发现率(FDR)控制方法,专为离散且具有复杂依赖性的传染病数据集设计。与现有的在线FDR方法(通常假设独立性或在依赖设置中统计功效较低)不同,我们的方法在现实流行病场景中有效控制FDR的同时保持高检测功效。对于疾病建模,我们在易感-感染-易感(SIS)模型(一种广泛使用的传染病流行病学框架)中建立了动态贝叶斯网络(DBN)结构。我们的方法除了滑动窗口宽度外不需要额外的调优参数,使其适用于实时疾病监测。从统计角度来看,我们证明了该方法在平稳和遍历依赖下确保有效的FDR控制,将在线假设检验扩展到更广泛的依赖性和离散数据集。此外,通过利用LIS(已被证明比传统的基于$p$值的方法更强大),我们的方法实现了比现有方法更高的统计功效。我们通过大量模拟和实际应用(包括传染病发病率数据分析)验证了该方法。结果表明,所提出的方法在保持严格FDR控制的同时,通过实现更高的检测功效优于现有方法。

英文摘要

We propose an online false discovery rate (FDR) controlling method based on conditional local FDR (LIS), designed for infectious disease datasets that are discrete and exhibit complex dependencies. Unlike existing online FDR methods, which often assume independence or suffer from low statistical power in dependent settings, our approach effectively controls FDR while maintaining high detection power in realistic epidemic scenarios. For disease modeling, we establish a Dynamic Bayesian Network (DBN) structure within the Susceptible-Infected-Susceptible (SIS) model, a widely used epidemiological framework for infectious diseases. Our method requires no additional tuning parameters apart from the width of the sliding window, making it practical for real-time disease monitoring. From a statistical perspective, we prove that our method ensures valid FDR control under stationary and ergodic dependencies, extending online hypothesis testing to a broader range of dependent and discrete datasets. Additionally, our method achieves higher statistical power than existing approaches by leveraging LIS, which has been shown to be more powerful than traditional $p$-value-based methods. We validate our method through extensive simulations and real-world applications, including the analysis of infectious disease incidence data. Our results demonstrate that the proposed approach outperforms existing methods by achieving higher detection power while maintaining rigorous FDR control.

9. 经济金融与社会科学统计 11 篇

2606.15881 2026-06-16 stat.ME cs.LG stat.AP 新提交

Biarchetype analysis for univariate functional data. An application to macroeconomic financial time series

单变量函数数据的双原型分析及其在宏观经济金融时间序列中的应用

Aleix Alcacer, Rafael Benitez, Vicente J. Bolos, Irene Epifanio

发表机构 * Jaume I University(Jaime I 大学) University of València(瓦伦西亚大学)

AI总结 提出双原型分析方法,同时识别案例和时间维度的原型结构,应用于欧洲国家10年期国债收益率数据,揭示三个时间区间和三个国家原型。

Comments 6 pages, 2 figures. To be published in the proceedings of SIS-FENStatS 2026, Sapienza University of Rome, Italy, June 22-25, 2026

详情
AI中文摘要

我们首次在单变量函数数据背景下引入双原型分析。这种无监督方法通过同时识别案例(在我们的应用中为国家)和时间参数上的原型结构,扩展了原型分析。案例和时间点都被表示为双原型的混合,从而得到复杂函数观测的简洁且高度可解释的表示。尽管双原型分析并非旨在作为一种聚类技术,但与双聚类方法相比,它提供了更优的可解释性,因为它基于极端的、有代表性的模式而非平均质心,从而增强了人类的理解。我们将所提出的方法应用于2001-2025年期间欧洲国家的10年期政府债券收益率。结果识别出三个不同的时间区间(危机前时期、欧元区主权债务危机时期和危机后时期),并揭示了德国、希腊和匈牙利作为国家原型。

英文摘要

We introduce biarchetype analysis for the first time in the context of univariate functional data. This unsupervised methodology extends archetype analysis by simultaneously identifying archetypal structures across both the cases (countries, in our application) and the temporal argument. Both cases and time points are expressed as mixtures of biarchetypes, yielding a concise and highly interpretable representation of complex functional observations. Although biarchetype analysis is not intended as a clustering technique, it offers superior interpretability compared with biclustering approaches, as it is based on extreme, representative patterns rather than average centroids, thereby enhancing human comprehension. We apply the proposed method to 10-year government bond yields of European countries over the period 2001-2025. The results identify three distinct time regimes (the pre-crisis period, the euro-area sovereign debt crisis, and the post-crisis period), and reveal Germany, Greece, and Hungary as country archetypes.

2606.15876 2026-06-16 stat.ME stat.AP 新提交

Archetypal analysis of European 10-year government bond yields with multidimensional scaling of two-mode three-way asymmetric dissimilarities

基于二维三向非对称相异度多维缩放的欧洲10年期政府债券收益率原型分析

Aleix Alcacer, Rafael Benitez, Vicente J. Bolos, Irene Epifanio

AI总结 提出从三维非对称邻近数据提取原型轮廓的方法,应用于23个欧洲国家10年期国债收益率的定向小波平方相干性非对称相异度矩阵,通过h-plot可视化和原型分析识别原型国家并量化不对称性。

Comments 6 pages, 1 figure. To be published in the proceedings of SIS-FENStatS 2026, Sapienza University of Rome, Italy, June 22-25, 2026

详情
AI中文摘要

一种从三维非对称邻近数据中提取原型轮廓的新方法被应用于包含23×23×3二维三向非对称相异度矩阵的数据集。这些非对称相异度基于2001年至今三个时间区间内23个欧洲国家10年期政府债券收益率的定向小波平方相干性。首先,计算无条件二维三向数据的h-plot,并将其表示在统一的欧几里得空间中,提供直观且可解释的可视化。随后,进行原型分析。这种无监督方法识别出原型国家,并将所有剩余国家表示为这些原型实例的混合。此外,计算每个国家的不对称程度,进一步揭示相异度结构。提供数据集和代码以支持可重复研究。

英文摘要

A recent methodology for extracting archetypal profiles from three-way asymmetric proximity data is applied to a dataset comprising 23 x 23 x 3 two-mode, three-way asymmetric dissimilarity matrices. The asymmetric dissimilarities are based on the oriented wavelet squared coherence of 10-year government bond yields among 23 European countries across three time intervals from 2001 to the present. First, the h-plot is computed for the unconditional two-mode three-way data, which is then represented in a unified Euclidean space, providing an intuitive and interpretable visualization. Subsequently, archetypoid analysis is performed. This unsupervised methodology identifies the archetypal countries and expresses all remaining countries as mixtures of these archetypal instances. Additionally, the degree of asymmetry for each country is calculated, offering further insight into the structure of the dissimilarities. The dataset and code are provided to support reproducible research.

2606.15755 2026-06-16 q-fin.RM q-fin.ST stat.ME 新提交

A Multiplex Network Hawkes Model for Systemic Risk Measurement

用于系统性风险测量的多重网络霍克斯模型

Mante Zelvyte, Jim E. Griffin

AI总结 提出多重网络霍克斯模型,通过分离多个传染渠道并允许协变量依赖的激励,研究金融网络中的风险传染机制,发现系统性风险主要集中于少数有影响力机构的向外流动。

详情
AI中文摘要

我们引入了多重网络霍克斯模型,该模型通过允许多个激励层(其权重依赖于观测到的边和节点协变量)扩展了Linderman & Adams (2014)的网络霍克斯框架。我们使用该模型研究金融网络中的传染如何受到不同传输渠道的影响。多重结构在单个推断的传输网络内分离了特定渠道的贡献,使得候选传播机制可以直接比较,而不是被吸收到一个同质的激励层中。依赖于协变量的激励使我们能够研究传输的来源。我们使用MCMC采样器对推断的有向网络及其激励动态进行后验推断。该应用使用了2004-2022年间涵盖99家北美和欧洲公司(包括银行、保险公司和非金融公司)的广泛跨行业信用违约互换(CDS)数据集。我们评估了与资产相似性、偿付能力和盈利能力相关的三个候选传染渠道。结果表明传染路径稀疏,系统性风险传播集中在少数有影响力机构的向外流动,而非机构之间的相互反馈。渠道结果显示,行业相似性是最持续支持的资产相似性效应,而聚合层贡献表明资产相似性、偿付能力和盈利能力渠道都对推断的激励有贡献。

英文摘要

We introduce the Multiplex Network Hawkes model, which extends the network Hawkes framework of Linderman & Adams (2014) by allowing multiple excitation layers whose weights depend on observed edge and node covariates. We use the model to investigate how contagion in financial networks is affected by different transmission channels. The multiplex structure separates channel-specific contributions within a single inferred transmission network, allowing candidate propagation mechanisms to be compared directly rather than being absorbed into one homogeneous excitation layer. Covariate-dependent excitation allows us to investigate sources of transmission. We make posterior inference about the inferred directed network and its excitation dynamics using an MCMC sampler. The application uses a broad cross-industry credit default swap (CDS) dataset of 99 North American and European firms, including banks, insurers and non-financial firms over 2004-2022. We evaluate three candidate contagion channels associated with asset similarity, solvency and profitability. The results indicate sparse contagion pathways, with systemic-risk transmission concentrated in outward flows from a small number of influential institutions rather than in mutual feedback between institutions. The channel results show that industry similarity is the most consistently supported asset-similarity effect, while aggregate layer contributions indicate that asset-similarity, solvency and profitability channels all contribute to inferred excitation.

2606.15058 2026-06-16 cs.LG stat.AP 新提交

Machine Learning and the Random Walk Puzzle: Forecasting the CAD/USD Exchange Rate with Expanding Window Evaluation and SHAP Interpretability

机器学习与随机游走难题:基于扩展窗口评估和SHAP可解释性的CAD/USD汇率预测

Louis Agyekum, Edmund Fosu Agyemang, Obu-Amoah Ampomah, Kofi Acheampong, Emmanuel Boadi, Priscilla Yaa Amakye, Fafa Shalom Tchorly, Enock Adu Bonsu, Eric Nyarko

AI总结 研究机器学习模型能否超越朴素随机游走基准预测月度美元/加元汇率,采用扩展窗口评估和SHAP解释,发现线性回归显著优于随机游走,集成模型表现接近。

Comments 10 pages, 14 figures, 8 tables

详情
AI中文摘要

本研究考察机器学习(ML)模型能否在预测月度美元/加元汇率时超越朴素随机游走基准。使用加拿大银行2017年1月至2026年5月的日度数据,重采样为113个月度观测值,评估了五种ML模型:线性回归、随机森林、梯度提升、XGBoost和AdaBoost。这些模型以朴素随机游走模型和带有Holt-Winters季节性的指数平滑(ETS)为基准。所有模型均采用扩展窗口框架评估以保持严格的样本外完整性,并使用Diebold-Mariano(DM)检验评估预测精度差异。结构断点检测识别出序列中的四个显著断点,分别对应2018年中美贸易战升级、2020年COVID-19经济复苏、2022年加拿大银行加息周期峰值以及2024年加拿大银行降息周期开始。应用SHAP(Shapley Additive Explanations)分析解释表现最佳ML模型的驱动因素。结果表明,朴素随机游走模型仍然是一个强大的基准。线性回归是唯一在统计上优于朴素随机游走模型的模型,DM统计量为3.0585,p值为0.0071,而ML集成模型仅显示出微小差异。采用扩展窗口框架的随机森林在所有模型(除随机游走外)中实现了最低的MAPE,为1.17%。SHAP分析证实,短期滞后(尤其是滞后1和滞后2)以及近期滚动均值主导预测,这与汇率的近随机游走行为一致。

英文摘要

This study examines whether machine learning (ML) models can outperform the naive random walk benchmark in forecasting the monthly USD/CAD exchange rate. Using daily data from the Bank of Canada spanning January 2017 to May 2026, resampled into 113 monthly observations, five ML models are evaluated: linear regression, random forest, gradient boosting, XGBoost, and AdaBoost. These models are benchmarked against the naive random walk model and exponential smoothing with Holt-Winters seasonality (ETS). All models are evaluated using an expanding-window framework to maintain strict out-of-sample integrity, and forecast-accuracy differences are assessed using the Diebold-Mariano (DM) test. Structural break detection identifies four significant breakpoints in the series, corresponding to the escalation of the US-China trade war in 2018, the COVID-19 economic recovery in 2020, the peak of the Bank of Canada rate-hiking cycle in 2022, and the start of the Bank of Canada rate-cutting cycle in 2024. SHAP, or Shapley Additive Explanations, analysis is applied to interpret the drivers of the best-performing ML model. The results show that the naive random walk model remains a formidable benchmark. Linear regression is the only model that statistically outperforms the naive random walk model, with a DM statistic of 3.0585 and a p value of 0.0071, whereas the ML ensemble models show only marginal differences. Random Forest with an expanding-window framework achieves the lowest MAPE of 1.17 percent among all models except the random walk. SHAP analysis confirms that short-term lags, particularly lag1 and lag2, and recent rolling means dominate predictions, consistent with the near-random-walk behavior of exchange rates.

2606.15002 2026-06-16 econ.EM stat.AP stat.ME 新提交

Decision Theory for the Archetype Discovery Problem

原型发现问题的决策理论

José Luis Montiel Olea, Amilcar Velez, Zhuoheng Xu, Haomin Yu, Shunqi Zhang

AI总结 本文利用决策理论,提出通过加权K-means聚类异质政策效应来划分原型集,并证明该方法在均方误差准则下优于传统的分位数分组方法。

Comments 63 pages, 14 figures

详情
AI中文摘要

在原型发现问题中,研究者希望总结N个异质政策效应,这些效应随一组离散协变量变化。目标是将协变量集划分为K<N个组(原型集),并为每组提供政策效应的总结。我们使用决策理论证明,在加权均方误差准则下,类似于排序组平均处理效应(GATES)的程序可以解决原型发现问题。关键区别在于,在最优程序中,原型集是通过对N个异质政策效应进行加权K-means聚类获得的,而不是依赖于K个等间距分位数。我们表明,对于给定先验,最小化平均风险的程序可以通过对感兴趣政策效应的后验均值估计的不同值进行聚类来获得。类似地,在大样本中,近似极小极大程序可以通过对政策效应的一致估计量进行聚类来获得。在这两种情况下,加权K-means聚类问题的精确解都可以使用一个简单且众所周知的动态规划算法找到。

英文摘要

In the archetype discovery problem a researcher wants to summarize N heterogeneous policy effects of interest that vary over a discrete set of covariates. The goal is to partition the set of covariates into K<N groups -- the archetype sets -- and to provide a summary of the policy effects for each group. We use decision theory to show that, under a weighted mean-squared-error criterion, a procedure analogous to the Sorted Group Average Treatment Effects (GATES) solves the archetype discovery problem. The key difference is that, in the optimal procedure, archetype sets are obtained by weighted K-means clustering of the N heterogeneous policy effects, instead of relying on K equally-spaced quantiles. We show that the procedure that minimizes average risk for a given prior can be obtained by clustering the different values of the posterior mean estimate of the policy effects of interest. Similarly, an approximately minimax procedure in large samples can be obtained by clustering a consistent estimator of the policy effects. In both of these cases, an exact solution to the weighted K-means clustering problem can be found using a simple and well-known dynamic programming algorithm.

2606.14798 2026-06-16 q-fin.RM q-fin.MF q-fin.PM stat.ME 新提交

Two Sides of Schur Damping: High-Dimensional Pseudo-Likelihoods and Portfolio Allocation

Schur阻尼的两面:高维伪似然与投资组合配置

Peter Cotton

AI总结 本文揭示空间统计中的Schur补(用于高维高斯伪似然估计)与投资组合中的残余风险(用于层次风险平价与最小方差组合)是同一数学对象,通过可靠性收缩统一,并证明最优阻尼具有闭式解。

详情
AI中文摘要

两个很少相互引用的领域——拟合高维天气场的空间统计学家和构建投资组合的量化投资者——独立地得到了相同的数学对象:一个由单个可解释参数阻尼的Schur补。在空间建模中,Schur补是条件协方差,使得高斯(Vecchia)伪似然在规模上可估计,最近的工作通过向基模型收缩来正则化它。在资产配置中,它是净对冲后的残余风险,相同的参数在层次风险平价和最小方差投资组合之间插值。我们证明这些是同一操作——条件高斯分布的可靠性收缩——因此天气模型在站点数超过观测数时需要保持可估计的阻尼,与投资组合在资产数超过回报数时需要保持稳定的阻尼逐项相同。最优量是闭式可靠性,一种同时是Ledoit-Wolf强度的James-Stein收缩。收缩机制是经典的,但这一恒等式似乎是新的:据我们所知,两个文献都没有注意到空间模型拟合的条件收缩与投资组合选择的分散化-方差倾斜是同一个量。我们精确地建立了对应关系,指出两个文献各自提供了对方所缺乏的内容,并报告了一个关于唯一真正开放的选择——如何设置阻尼——的小实验,表明空间社区的拟合强度(如果有的话)是更好的配方。

英文摘要

Two communities that rarely cite each other -- spatial statisticians fitting high-dimensional weather fields, and quantitative investors building portfolios -- have independently arrived at the same mathematical object: a Schur complement, damped by one interpretable parameter. In spatial modeling the Schur complement is the conditional covariance that makes a Gaussian (Vecchia) pseudo-likelihood estimable at scale, and recent work regularizes it by shrinking toward a base model. In allocation it is the residual risk of a bet net of its hedge, and the same parameter interpolates hierarchical risk parity and the minimum-variance portfolio. We show these are one operation -- reliability shrinkage of a conditional Gaussian -- so that the damping a weather model needs to remain estimable when stations outnumber observations is, term for term, the damping a portfolio needs to remain stable when assets outnumber returns. The optimal amount is a closed-form reliability, a James-Stein shrinkage that is simultaneously a Ledoit-Wolf intensity. The shrinkage machinery is classical, but the identity appears to be new: to our knowledge neither literature has noted that the conditional shrinkage a spatial model fits and the diversification-variance tilt a portfolio chooses are one and the same quantity. We make the correspondence precise, note that the two literatures have each supplied what the other lacks, and report a small experiment on the one genuinely open choice -- how to set the damping -- suggesting the spatial community's fitted intensity is, if anything, the better recipe.

2601.20875 2026-06-16 stat.AP cs.LG econ.EM stat.ME stat.ML 版本更新

Drivers, Receivers, and Dynamic Linkages: The Directed Structure of SDG Interdependence, 2000--2024

驱动者、接收者与动态联系:可持续发展目标相互依赖的有向结构,2000-2024

Md Muhtasim Munif Fahim, Md Jahid Hasan Imran, Md. Naim Molla, Luknath Debnath, Tonmoy Shil, Ehsanul Bashar Pranto, Md Mostafizur Rahman Likhon, Md Shafin Sanyan Saad, Md. Rezaul Karim

发表机构 * Data Science Research Lab, Department of Statistics, University of Rajshahi(数据科学研究实验室,统计学系,拉贾沙希大学)

AI总结 使用面板格兰杰因果检验和局部投影法,分析114个国家2000-2024年17个可持续发展目标的有向相互依赖网络,发现84个显著联系(40个协同、44个权衡),驱动者-接收者排名脆弱,和平与强大机构是净接收者,减贫是效应加权驱动者。

Comments 27 pages, 5 figures. Panel Granger non-causality and local projections on 114 countries (2000-2024). Submitted to Sustainability Science

详情
AI中文摘要

财政和行政能力有限的政府需要知道哪些可持续发展目标(SDGs)通过目标系统传播进展以及传播速度有多快。我们利用2000年至2024年每年观测的114个国家的平衡面板数据,绘制了所有17个目标的有向相互依赖结构。目标序列具有持续性、趋势性和横截面依赖性,因此我们应用了两种适用于该机制的估计量:对一阶差分序列运行的Dumitrescu-Hurlin面板格兰杰非因果性检验,以恢复有向交互网络;以及具有Driscoll-Kraay标准误的面板局部投影,以测量31个理论推导的指标联系的动态幅度。在272个有向目标对中,84个联系通过了错误发现控制(40个协同,44个权衡;网络密度0.31)。协同和权衡以相当的强度出现,因此没有单一目标表现为通用加速器,目标层级本身也很脆弱。驱动者-接收者排名在滞后阶数和中心性指标上弱相关,并且在国家自助法下只有两个角色与零可区分:和平与强大机构作为最清晰的净接收者,以及减贫作为最可能的效应量加权驱动者。支持的联系是动态的,在四到五年内累积:卫生设施和贫困改善是降低儿童死亡率的最强预测因子,教育-儿童健康关联在183个国家的独立世界发展指标数据中得到证实。这些结果警示基于排名的加速器政策,并支持基于通过组成指标监测的、有支持的时间滞后联系构建的自适应投资组合。

英文摘要

Governments with limited fiscal and administrative capacity need to know which Sustainable Development Goals (SDGs) propagate progress through the goal system and how quickly. We map the directed interdependence structure of all seventeen goals using a balanced panel of 114 countries observed annually from 2000 to 2024. The goal series are persistent, trending, and cross-sectionally dependent, so we apply two estimators matched to this regime: a Dumitrescu-Hurlin panel Granger non-causality test, run on first-differenced series, to recover the directed interaction network, and panel local projections with Driscoll-Kraay standard errors to measure the dynamic magnitude of 31 theory-derived indicator linkages. Of 272 directed goal pairs, 84 linkages survive false-discovery control (40 synergies, 44 trade-offs; network density 0.31). Synergies and trade-offs occur at comparable strength, so no single goal behaves as a universal accelerator, and the goal-level hierarchy itself is fragile. Driver-receiver rankings correlate weakly across lag orders and centrality metrics, and under a country bootstrap only two roles are distinguishable from zero: peace and strong institutions as the clearest net receiver, and poverty reduction as the most probable effect-size-weighted driver. The supported linkages are dynamic, accruing over four to five years: sanitation and poverty improvements are the strongest predictors of lower child mortality, and the education-child-health association is corroborated in independent World Development Indicators data across 183 countries. These results caution against rankings-based accelerator policy and support adaptive portfolios built on supported, time-lagged linkages monitored through constituent indicators.

2601.04608 2026-06-16 q-fin.MF q-fin.CP stat.ML 版本更新

Forecasting the U.S. Treasury Yield Curve: A Distributionally Robust Machine Learning Approach for Interest Rate Risk Management

预测美国国债收益率曲线:一种用于利率风险管理的分布鲁棒机器学习方法

Jinjun Liu, Ming-Yen Cheng

AI总结 针对收益率曲线预测中的分布不确定性,提出结合参数因子模型与机器学习的分布鲁棒集成框架,通过惩罚尾部风险改进样本外预测性能,支持基于DV01的利率风险管理。

Comments 44 pages( including e-companion), 6 figures, under journal review

详情
AI中文摘要

美国国债收益率是全球资产定价的核心,但受政策不确定性、供需力量和行为效应影响而存在噪声,使预测用户面临下行风险。本文将收益率曲线预测建模为分布不确定性下的决策问题,并提出一种分布鲁棒集成框架,该框架将参数因子模型与机器学习预测相结合。因子增强的动态Nelson-Siegel模型捕捉收益率曲线动态,而随机森林模型则对非线性交互进行建模。鲁棒预测组合惩罚尾部风险,并改善各期限的样本外表现。该框架支持企业、机构和资产负债表决策者进行基于$DV01$的严格利率风险管理。

英文摘要

U.S. Treasury yields are central to global asset pricing but are noisy and subject to policy uncertainty, supply-demand forces, and behavioral effects, exposing forecast users to downside risk. We formulate yield curve forecasting as a decision problem under distributional uncertainty and propose a distributionally robust ensemble framework that combines parametric factor models with machine-learning forecasts. A factor-augmented Dynamic Nelson-Siegel model captures yield-curve dynamics, while Random Forests model nonlinear interactions. Robust forecast combinations penalize tail risk and improve out-of-sample performance across maturities. The framework supports disciplined $DV01$-based interest-rate risk management for corporate, institutional and balance-sheet decision makers.

2512.08219 2026-06-16 cs.DL stat.AP 版本更新

Any Old Tom, Dick or Harry: The Citation Impact of First Name Genderedness

任何老汤姆、迪克或哈里:名字性别化的引用影响

Maxime Holmberg Sainte-Marie, Vincent Larivière

AI总结 本研究通过融合维基数据名字性别化表与Web of Science索引的2010-2019年美国作者论文数据,分析名字性别化与引用分布的关系,发现女性化和中性名字在所有学科和作者角色中普遍存在引用劣势。

详情
AI中文摘要

本文研究了作者名字的性别化程度与学术产出中引用分布之间的关系。通过将来自Wikidata的名字性别化表与2010年至2019年间发表、由美国附属作者撰写、被Web of Science索引的文章的文献计量数据合并,我们开发了一个相对分布框架,沿连续性别化谱比较名字、文章和引用计数。结果表明,语料库的词汇结构(由唯一名字(类型)与其出现次数(标记)之间的关系捕捉)在作者角色间高度稳定。生产力分析显示,学科群体在女性化与男性化名字之间的不平衡方向上存在显著差异,物理科学呈现一致的男性化偏向,而社会科学倾向于女性化一端。然而,引人注目的是,分布差异的幅度在学科群体间相对稳定,而方向差异显著。引用分析揭示了所有学科群体和作者角色中女性化和中性名字普遍存在的引用赤字。这种不对称在生命科学中跨作者角色最为一致,而在物理科学中除了中间作者外不存在,中间作者的不成比例份额可能反映了大型合作结构的引用动态。总体而言,尽管数据和研究设计支持关联性而非因果性主张,但所揭示的趋势与假设一致,即名字性别化通过低审慎评估环境中的隐性偏见影响引用认可。

英文摘要

This paper examines the relationship between the genderedness of authors' first names and citation distributions in scholarly production. Merging a first name genderedness table derived from Wikidata with bibliometric data from articles by US-affiliated authors published between 2010 and 2019 and indexed in the Web of Science, we develop a relative distributional framework that compares name, article, and citation counts along a continuous genderedness spectrum. Results show that the lexical structure of the corpus, as captured by the relationship between unique first names (types) and the number of their occurrences (tokens), proves highly stable across author roles. Productivity analyses reveal that disciplinary groups diverge substantially in the direction of the imbalance between femininely- and masculinely-gendered names, with physical sciences showing a consistent masculine skew and social sciences a tendency toward the feminine end of the spectrum. Strikingly, however, the amplitude of distributional divergence remains relatively stable across disciplinary groups, in contrast to its substantial variations in direction. Citation analyses reveal a pervasive citation deficit for femininely- and neutrally-gendered names across all disciplinary groups and author roles. This asymmetry is most consistent across author roles in the life sciences, and absent in the physical sciences except among middle authors, whose unparalleled share plausibly reflects the citation dynamics of large collaborative structures. Overall, although the data and research design support associative rather than causal claims, the trends they reveal are nonetheless consistent with the hypothesis that first name genderedness influences citation recognition through implicit bias operating in low-deliberation evaluative contexts.

2512.08144 2026-06-16 stat.ME 版本更新

Propensity score adjustment when errors in achievement measures inform treatment assignment

当成绩测量误差影响处理分配时的倾向得分调整

Joshua Wasserman, Ben B. Hansen, Michael R. Elliott

AI总结 针对学业成绩差距评估中测量误差导致的小组平均分噪声问题,提出一种平衡小组真实平均分的倾向得分估计方法,改善重叠性并减少匹配估计偏差,通过模拟和德州暑期学习损失项目验证。

Comments 30 pages, 3 figures

详情
AI中文摘要

美国州教育机构将人口统计子群体之间存在学业成绩差距的学校标记为需要改进。一些学校在这些子群体中可能只有少数学生,因此期末考试成绩的平均值只能有噪声地衡量“真实”平均分——即如果学生多次参加考试所期望的分数。除了公开评估数据中掩盖小群体平均值的问题之外,这给旨在缩小成绩差距的干预措施评估带来了挑战。我们引入了旨在平衡子群体真实平均分的倾向得分估计。即使当噪声测量不可用时,这些估计也可用,并且与忽略测量误差的估计相比改善了重叠性,从而更大程度地减少了匹配估计的偏差。我们通过模拟和在德克萨斯州一项旨在遏制暑期学习损失的州级倡议中的应用来展示我们的方法。

英文摘要

U.S. state education agencies mark schools displaying achievement gaps between demographic subgroups as needing improvement. Some schools may have few students in these subgroups, such that average end-of-year test scores only noisily measure the average "true" score-the score one would expect if students took the test many times. This, in addition to the masking of small subgroup averages in publicly available assessment data, poses challenges for evaluating interventions aimed at closing achievement gaps. We introduce propensity score estimates designed to achieve balance on subgroup average true scores. These estimates are available even when noisy measurements are not and improve overlap compared to those that ignore measurement error, leading to greater bias reduction of matching estimators. We demonstrate our methods through simulation and an application to a statewide initiative in Texas for curbing summer learning loss.

2502.06530 2026-06-16 econ.TH math.ST stat.TH

Ranking Statistical Experiments via the Linear Convex Order and the Lorenz Zonoid: Economic Applications

通过线性凸序和洛伦兹洛必达序对统计实验进行排序:经济应用

Kailin Chen

AI总结 本文提出线性-Blackwell序,用于比较二元行动决策问题和准凹支付决策问题中的实验,以及道德风险和事后信号筛选问题中的实验。

Comments The main text ends on page 45, and the supplementary material follows thereafter. This paper was previously circulated under the title "Experiments in the Linear Convex Order''

详情
AI中文摘要

本文介绍了一种新的统计实验排序方法,即线性-Blackwell (LB) 序,其可通过三种等价方式表征:(i) 诱导后验和似然比的线性凸序分散性,(ii) 洛伦兹洛必达序(状态期望轮廓集)的大小,或 (iii) 后验均值的变异性。我们将LB序应用于比较二元行动决策问题和决策问题中的实验,如Kolotilin, Corrao, 和Wolitzky (2025) 所分析的。我们还利用它来比较道德风险问题中的实验,基于Holmström (1979) 和Kim (1995),以及具有事后信号的筛选问题。

英文摘要

This paper introduces a novel ranking of statistical experiments, the linear-Blackwell (LB) order, which can equivalently be characterized by (i) the dispersion of the induced posterior and likelihood ratios in the sense of the linear convex order, (ii) the size of the Lorenz zonoid (the set of statewise expectation profiles), or (iii) the variability of the posterior mean. We apply the LB order to compare experiments in binary-action decision problems and in decision problems with quasi-concave payoffs, as analyzed by Kolotilin, Corrao, and Wolitzky (2025). We also use it to compare experiments in moral hazard problems, building on Holmström (1979) and Kim (1995), and in screening problems with ex post signals.

10. 数据隐私、稳健性与公平性 9 篇

2606.16952 2026-06-16 cs.LG cs.AI stat.AP stat.ME stat.ML 新提交

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

幻象与披露:合成数据审计的因果框架

Kareem Amin, Rudrajit Das, Alessandro Epasto, Adel Javanmard, Dennis Kraft, Mónica Ribero, Sergei Vassilvitskii

发表机构 * Google(谷歌) University of Southern California(南加州大学)

AI总结 提出一个可定制的实证审计框架,通过区分真实披露与幻象披露,利用统计假设检验检测合成数据中的隐私泄露,无需模型访问或参考模型,提供比先前方法更紧的隐私泄露下界。

Comments 35 pages, 10 tables, 5 figures

详情
AI中文摘要

生成式AI和大语言模型(LLMs)的快速普及激发了人们对合成数据的兴趣,将其作为敏感真实数据集的隐私保护替代方案。然而,生成高实用性合成数据往往存在记忆和复述训练语料中隐私信息的风险。在这项工作中,我们提出了一个可定制的实证审计框架,旨在检测和解释此类数据披露。我们的框架引入了一种机制来区分“真实披露”——系统直接复现用户信息的情况,以及“幻象披露”——系统偶然生成用户数据的情况。通过将输入数据划分为训练集和保留集,并应用严格的统计假设检验,我们确定观察到的披露是否与严格的隐私基线(如零学习或特定的差分隐私(DP)边界)一致。关键的是,这种方法不需要模型访问、不需要插入金丝雀数据,也不需要参考模型训练——仅需要合成输出和保留的控制集。我们证明,该框架有效地充当了成员推断攻击,提供了比先前基于数据的审计方法更紧的隐私泄露经验下界。我们的方法是模型无关的,适用于任何合成数据生成机制,并且所需的计算资源比影子模型或基于金丝雀的替代方法少几个数量级。

英文摘要

The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training corpus. In this work, we present a customizable empirical auditing framework designed to detect and explain such data disclosures. Our framework introduces a mechanism to distinguish between "true disclosures"-where the system directly reproduces a user's information-and "phantom disclosures''-where the system incidentally generates a user's data. By partitioning input data into training and holdout sets and applying rigorous statistical hypothesis testing, we determine if observed disclosures are consistent with strict privacy baselines, such as zero-learning or specific Differential Privacy (DP) bounds. Crucially, this approach requires no model access, no canary insertion, and no reference model training -only the synthetic output and a held-out control set. We demonstrate that this framework effectively functions as a membership inference attack, providing empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. Our approach is model-agnostic, applies to any synthetic data generation mechanism, and requires orders of magnitude fewer computational resources than shadow-model or canary-based alternatives.

2606.16872 2026-06-16 stat.ME 新提交

Towards Fair Predictions: Group Conditional Concordance Index to Quantify Fairness in Time-to-Event Prognostication

迈向公平预测:用于量化时间至事件预测中公平性的组条件一致性指数

Haoyuan Wang, Riddhiman Bhattacharya, Richardo Henao, Daniel Wojdyla, Chuan Hong, Matthew Engelhard

AI总结 提出组条件一致性指数(xCI),通过扩展Harrell一致性指数来量化生存分析中的组间公平性,并在右删失数据下进行估计,通过案例研究证明其能检测现有指标忽略的偏差。

Comments 28 pages

详情
AI中文摘要

公平性度量对于严格定义、量化和减轻预测模型中的偏差至关重要。虽然大多数现有度量侧重于二元分类任务,但时间至事件分析中的公平性受到的关注有限。为了解决这一差距,我们提出了一种新的组公平性度量——组条件一致性指数(xCI),它通过以组成员身份为条件扩展了Harrell一致性指数(CI)。xCI在存在右删失数据的情况下测量组内和跨组的排序准确性。我们正式定义了xCI,证明了CI是所有可能组对之间xCI的加权平均值,并利用逆删失概率加权(IPCW)开发了一致估计量。通过分析推导和模拟研究,我们进一步研究了xCI与预测风险评分之间的关系。为了展示其实用性,我们提出了两个案例研究:(i)评估基于Framingham后代、MESA和ARIC研究协调数据训练的生存模型的公平性,以及(ii)使用大规模电子健康记录(EHR)数据库Truveta评估现有心血管疾病(CVD)风险预测模型的公平性。我们的结果表明,xCI有效地检测了现有指标忽略的跨人口统计组的偏差。总体而言,xCI为生存分析中的公平性评估提供了有价值的工具,特别是在资源分配受限的环境中,并补充了现有的公平性评估方法。

英文摘要

Fairness metrics are essential for rigorously defining, quantifying, and mitigating biases in predictive models. While most existing metrics focus on binary classification tasks, fairness in time-to-event analyses has received limited attention. To address this gap, we propose a novel group fairness metric, the group-conditional Concordance Index (xCI), which extends Harrell's Concordance Index (CI) by conditioning on group membership. The xCI measures both within-group and cross-group ranking accuracy in the presence of right-censored data. We formally define the xCI, prove that CI is a weighted average of xCIs across all possible group pairs, and develop a consistent estimator using inverse probability of censoring weights (IPCW). We further investigate the relationship between xCI and predicted risk scores through analytical derivations and simulation studies. To demonstrate its practical utility, we present two case studies: (i) assessing the fairness of survival models trained on harmonized data from the Framingham Offspring, MESA, and ARIC studies, and (ii) evaluating fairness in existing cardiovascular disease (CVD) risk prediction models using Truveta, a large-scale electronic health record (EHR) database. Our results show that xCI effectively detects biases across demographic groups that are overlooked by existing metrics. Overall, xCI provides a valuable tool for fairness assessment in survival analysis, particularly in constrained resource allocation settings, and complements existing fairness evaluation approaches.

2606.16488 2026-06-16 stat.ME 新提交

An Energy-Driven Framework for Privacy-Aware Synthetic Data Generation

一种能量驱动的隐私感知合成数据生成框架

Pierpaolo Massoli, Fabio Spagnuolo

AI总结 提出一种能量驱动框架,通过约束随机探索结合可解释性惩罚,在混合类型数据中平衡统计保真度与披露风险,实现隐私感知的合成数据生成。

Comments First release of the paper

详情
AI中文摘要

官方统计和数据密集型应用中对微观数据访问的需求日益增长,这引发了关于披露风险、推断有效性和统计效用保留的重要挑战。本文提出了一种可解释的能量驱动框架,用于混合类型数据中的隐私感知合成数据生成。该方法结合了判别建模、贝叶斯网络提议机制、Metropolis-Hastings采样和生成后优化,在一个约束概率框架内进行。与基于扰动的方法不同,隐私感知行为通过由明确的合理性、隐私性、多样性和结构一致性惩罚引导的约束随机探索来实现。该框架专门针对具有稀疏配置、异构变量类型和复杂多元依赖结构的混合类型表格数据设计。生成过程被表述为一个多目标采样问题,在保留预测效用的同时平衡统计保真度和披露风险。使用一个包含人口统计、行为和健康相关变量的混合类型个体级数据集进行了广泛的实证评估。验证策略结合了统计保真度诊断、预测分析、多样性度量、最近邻风险分析、成员推断攻击和分裂共形预测。实证结果表明,所提出的框架能够保留原始数据的大部分预测和多元结构,同时限制精确记忆现象并保持有利的隐私感知行为。该方法为在竞争效用和隐私约束下生成合成数据提供了一个可解释的框架。

英文摘要

The increasing demand for access to microdata in official statistics and data-intensive applications raises important challenges concerning disclosure risk, inferential validity and preservation of statistical utility. This paper proposes an interpretable energy-driven framework for privacy-aware synthetic data generation in mixed-type data. The proposed methodology combines discriminative modelling, Bayesian-Network proposal mechanisms, Metropolis--Hastings sampling and post-generation optimization within a constrained probabilistic framework. Unlike perturbation-based approaches, privacy-aware behaviour is achieved through constrained stochastic exploration guided by explicit plausibility, privacy, diversity and structural-coherence penalties. The framework is specifically designed for mixed-type tabular data characterized by sparse configurations, heterogeneous variable types and complex multivariate dependency structures. The generation process is formulated as a multi-objective sampling problem balancing statistical fidelity and disclosure-risk while preserving predictive utility. An extensive empirical evaluation is conducted using a mixed-type individual-level dataset containing demographic, behavioural and health-related variables. The validation strategy combines statistical fidelity diagnostics, predictive analyses, diversity measures, nearest-neighbour risk analysis, membership inference attacks and Split Conformal Prediction. The empirical results suggest that the proposed framework is capable of preserving a substantial portion of the predictive and multivariate structure of the original data while limiting exact memorization phenomena and maintaining favourable privacy-aware behaviour. The proposed methodology provides an interpretable framework for synthetic data generation under competing utility and privacy constraints.

2606.15964 2026-06-16 stat.ML cs.LG 新提交

PromptShift-CRC: Drift-Aware Conformal Risk Control for Foundation Models Under Prompt and Domain Shift

PromptShift-CRC: 面向提示和领域漂移的基础模型的漂移感知保形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley(德克萨斯理工大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 提出PromptShift-CRC方法,通过嵌入提示和响应、测量漂移、加权校准样本并在线更新风险水平,在提示和领域漂移下控制基础模型输出的风险。

详情
AI中文摘要

基础模型现在被用于其接收的提示可能快速变化的场景。用户变化、主题变化、策略变化,模型可能突然面临在校准数据中罕见的请求类型。这使得固定校准变得有风险。保形预测和保形风险控制提供了与模型无关的控制错误的方法,但当校准数据与未来数据相似时效果最佳。本文开发了PromptShift CRC,一种面向提示和领域漂移的基础模型输出的漂移感知保形风险控制方法。该方法嵌入提示和响应,测量当前提示流与校准池的偏离程度,对相关或最近的校准示例赋予更大权重,并在观察到违规后在线更新风险水平。它报告三个实用诊断指标:实现风险误差、提示漂移和有效校准大小。我们给出了该方法在分布不匹配和加权分位数不确定性项下控制风险的条件。在一个合成提示漂移基准中,静态保形风险控制在漂移后急剧失效,而PromptShift-CRC在所考虑的适应性基线中提供了最佳覆盖。然后,我们在公开基准的派生流上评估相同的校准层,包括问答、毒性、摘要事实性和长上下文幻觉风险。

英文摘要

Foundation models are now used in settings where the prompts they receive can change quickly. Users change, topics change, policies change, and the model may suddenly face a kind of request that was rare in the calibration data. This makes fixed calibration risky. Conformal prediction and conformal risk control give model-agnostic ways to control error, but they work best when the calibration data still look like the future data. This paper develops PromptShift CRC, a drift-aware conformal risk control method for foundation-model outputs under prompt and domain shift. The method embeds prompts and responses, measures how far the current prompt stream has moved from the calibration pool, gives more weight to relevant or recent calibration examples, and updates the risk level online after observed violations. It reports three practical diagnostics: realized risk error, prompt drift, and effective calibration size. We give conditions under which the method controls risk up to terms for distribution mismatch and weighted quantile uncertainty. In a synthetic prompt-shift benchmark, static conformal risk control fails sharply after drift, while PromptShift-CRC gives the best coverage among the adaptive baselines considered. We then evaluate the same calibration layer on public benchmark derived streams for question answering, toxicity, summarization factuality, and long-context hallucination risk

2606.15474 2026-06-16 cs.AI stat.AP 新提交

Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines

谁漂移了:系统还是裁判?LLM评估流水线中的随时有效归因

Yitao Li

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种基于固定锚点集和赌检验的方法,区分LLM评估中产品性能下降与裁判模型变化导致的分数漂移,并证明其随时有效性和归因准确性。

详情
AI中文摘要

对LLM产品的持续评估依赖于一个被视为地面真相的强大LLM裁判:一个廉价的监控器对每次交互进行评分,当分数下降时团队会收到警报。但裁判本身是一个API背后的模型,静默的版本升级或评分提示更新会改变其评分方式——因此每次漂移警报在更差的产品和变化的裁判之间是模糊的。我们通过一个固定的人工标注锚点集(当前裁判以稳定间隔重新评分)、一个关于裁判与人类差距的二次赌e过程,以及一个返回{无, 系统, 裁判}判决的守卫窗口规则来解决这种模糊性。我们证明了随时有效性、单向识别(只有裁判可以移动锚点)、一个归因竞赛(其设计法则是锚点必须跑赢它们守卫的主过程)以及过程正交性。在两个真实的裁判变化中,静默版本升级在60/60次运行中被检测为裁判漂移,且零次误归因为系统;而一个污染性的严格提示变化在守卫宽度为300时,120次运行中有110次被正确归因——而行业默认的滚动z检验在75%的无漂移流上产生误报。每个实验在第二个领域(TL;DR摘要)上重复,无需重新调整参数,并且当领域不同时,差异正是竞赛所预测的:严格提示变化在那里更强烈地改变分数,因此锚点触发更快,归因变得完美(240/240)。该监控器的运行成本约为对每个项目使用强裁判的0.64倍,或在更便宜但更聋的模式下为0.21倍。

英文摘要

Continuous evaluation of LLM products relies on a strong LLM judge treated as ground truth: a cheap monitor scores every interaction and a team is paged when the score drifts down. But the judge is itself a model behind an API, and a silent version bump or scoring-prompt update changes how it scores -- so every drift alarm is ambiguous between a worse product and a changed judge. We resolve the ambiguity with a fixed, human-labeled anchor set that the current judge re-scores at a steady interleave, a second betting e-process on the judge-versus-human gap, and a guard-window rule returning a verdict in {none, system, judge}. We prove anytime-validity, one-way identification (only the judge can move the anchors), an attribution race whose design law is that the anchors must out-run the main process they guard, and process orthogonality. On two real judge changes, a silent version bump is detected as judge drift in 60/60 runs with zero judge-to-system misattribution, and a contaminating strict-prompt change is correctly attributed on 110 of 120 runs at guard width 300 -- while the industry-default rolling z-test false-alarms on 75% of drift-free streams. Every experiment replicates on a second domain (TL;DR summarization) with nothing re-tuned, and where the domains differ the differences are the ones the race predicts: the strict-prompt change shifts scores harder there, so the anchors fire faster and attribution becomes perfect (240/240). The monitor runs at approximately 0.64 of the cost of strong-judging every item, or 0.21 in a cheaper-but-deafer regime.

2606.14909 2026-06-16 stat.ML cs.LG 新提交

Audited Conformal Prediction for Classification under Unknown Distribution Shift

未知分布漂移下分类问题的审计共形预测

Yanfei Zhou, Rizal Fathony, Nam H. Nguyen, Matteo Sesia

发表机构 * Department of Data Sciences and Operations, University of Southern California(数据科学与运营系,南加州大学) AI Foundations, Capital One(Capital One人工智能基础) Department of Data Sciences and Operations, Thomas Lord Department of Computer Science, University of Southern California(数据科学与运营系,托马斯·劳德计算机科学系,南加州大学)

AI总结 提出审计共形预测方法,利用目标群体小标注数据训练审计模型识别旧模型可能失败的输入,结合共形预测框架在保证边际覆盖的同时提高条件覆盖,并提供理论保证。

详情
AI中文摘要

我们考虑在未知分布漂移下部署的预训练分类模型的不确定性量化问题。我们提出了审计共形预测(ACP),该方法利用来自目标群体的小标注数据集训练一个辅助审计模型,以识别旧模型可能失败的输入。通过将审计模型的输出整合到共形预测框架中,ACP 产生的预测集在保证边际覆盖的同时,在实践中比现有方法实现了更高的条件覆盖。我们开发并分析了两种互补的整合策略——一种针对边际覆盖并改善条件性能,另一种提供明确的组条件覆盖保证——并为两者建立了理论保证。在合成和真实世界数据集上的实验验证了该方法,并说明了预测集大小与条件覆盖之间的权衡。

英文摘要

We consider the problem of uncertainty quantification for a pretrained classification model deployed under unknown distribution shift. We propose Audited Conformal Prediction (ACP), a method that leverages a small labeled dataset from the target population to train an auxiliary audit model identifying inputs where the legacy model is likely to fail. By integrating the audit model's outputs into the conformal prediction framework, ACP produces prediction sets that guarantee marginal coverage while achieving substantially higher conditional coverage in practice than existing approaches. We develop and analyze two complementary integration strategies -- one targeting marginal coverage with improved conditional performance, the other providing explicit group-conditional coverage guarantees -- and establish theoretical guarantees for both. Experiments on synthetic and real-world datasets validate the method and illustrate trade-offs between prediction set size and conditional coverage.

2504.11775 2026-06-16 stat.ML cs.CY cs.LG q-fin.RM 版本更新

Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

基于隐私化敏感属性的无歧视保险定价

Tianhe Zhang, Suhan Liu, Peng Shi

发表机构 * Department of Risk and Insurance, University of Wisconsin-Madison(风险与保险系,威斯康星大学麦迪逊分校) Department of Statistics and Operations Research, University of North Carolina-Chapel Hill(统计与运筹系,北卡罗来纳大学教堂山分校)

AI总结 针对保险公司无法直接获取敏感属性(如性别、种族)的公平定价问题,提出利用隐私化(加噪)敏感属性估计无歧视保费的方法,并建立理论保证与实证验证。

详情
AI中文摘要

公平性已成为保险定价中的重要关注点,因为保险公司越来越依赖机器学习模型来预测预期损失。同时,监管和隐私约束通常限制保险公司访问或使用敏感属性(如性别或种族)。最近的精算研究通过无歧视保费的概念来解决这一背景下的公平性问题,该概念消除了敏感属性的直接和间接影响,同时保持精算一致性。然而,实施这种方法通常需要访问敏感属性本身,而在实践中可能无法获得。本文研究了当敏感属性仅以隐私化或噪声扰动形式被观测时,无歧视保险保费的估计问题。我们考虑一个多方数据设置,其中保险公司观测非敏感属性和结果,而一个可信第三方持有通过隐私机制生成的隐私化敏感属性。在此框架内,我们开发了仅使用隐私化属性估计无歧视保费的统计方法。我们研究了两种实际相关的情况:隐私机制已知和其噪声水平未知。对于这两种情况,我们为所提出的估计量建立了理论保证。数值实验和实证应用表明,所提出的方法能够在尊重隐私和监管约束的同时实现公平的保险定价。

英文摘要

Fairness has become an important concern in insurance pricing as insurers increasingly rely on machine learning models to predict expected losses. At the same time, regulatory and privacy constraints often restrict insurers' ability to access or use sensitive attributes such as gender or race. Recent actuarial research addresses fairness in this context through the concept of the discrimination-free premium, which removes both the direct and indirect effects of sensitive attributes while preserving actuarial consistency. However, implementing this approach typically requires access to the sensitive attributes themselves, which may not be available in practice. This paper studies the estimation of discrimination-free insurance premiums when sensitive attributes are observed only in privatized or noise-perturbed form. We consider a multi-party data setting in which insurers observe non-sensitive attributes and outcomes, while a trusted third party holds privatized sensitive attributes generated through a privacy mechanism. Within this framework, we develop statistical methods for estimating discrimination-free premiums using only the privatized attributes. We study two settings of practical relevance: when the privacy mechanism is known and when its noise level is unknown. For both cases, we establish theoretical guarantees for the proposed estimators. Numerical experiments and empirical applications demonstrate that the proposed approach enables fair insurance pricing while respecting privacy and regulatory constraints.

2401.14283 2026-06-16 stat.ML cs.LG 版本更新

Information Leakage Detection through Approximate Bayes-optimal Prediction

通过近似贝叶斯最优预测的信息泄露检测

Pritha Gupta, Marcel Wever, Eyke Hüllermeier

发表机构 * University of Potsdam(波恩大学) University of Hanover(汉诺威大学) Ludwig-Maximilians-University Munich(慕尼黑大学)

AI总结 提出基于统计学习与信息论的理论框架,通过自动机器学习近似贝叶斯预测器的对数损失和准确率来估计互信息,从而检测信息泄露,在合成和真实OpenSSL TLS服务器数据集上优于现有方法。

Comments Accepted at Information Sciences

详情
AI中文摘要

在当今数据驱动的世界中,公开可用信息的激增因信息泄露(IL)问题而引发安全担忧。IL涉及通过可观察的系统信息无意中将敏感信息暴露给未经授权的方。传统的统计方法依赖于估计可观察信息与秘密信息之间的互信息(MI)来检测IL,面临维度灾难、收敛性、计算复杂性和MI误估计的挑战。尽管有效,新兴的基于监督机器学习的方法检测IL仅限于二元系统敏感信息,并且缺乏全面的框架。为了解决这些局限性,我们利用统计学习理论和信息论建立了一个理论框架,以准确量化和检测IL。使用自动机器学习,我们证明通过近似通常未知的贝叶斯预测器的对数损失和准确率,可以准确估计MI。基于此,我们展示了如何有效估计MI以检测IL。在考虑合成和真实OpenSSL TLS服务器数据集的实证研究中,我们的方法优于最先进的基线方法。

英文摘要

In today's data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor's log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

2502.12445 2026-06-16 cs.AI cs.LG stat.ML 版本更新

Computational Safety for Generative AI: A Hypothesis Testing Perspective

生成式AI的计算安全性:假设检验视角

Pin-Yu Chen

发表机构 * IBM Research(IBM研究院)

AI总结 本文从假设检验角度形式化生成式AI的计算安全性,提出基于信号处理的方法检测恶意输入和AI生成内容。

Comments Extended version of the paper presented at the ICML 2026 Workshop on Hypothesis Testing

详情
AI中文摘要

AI安全是一个快速发展的研究领域,旨在防止前沿AI技术的危害和滥用,特别是针对能够通过文本提示创建逼真高质量内容的生成式AI(GenAI)工具。此类工具的例子包括大型语言模型(LLM)和文本到图像(T2I)扩散模型。由于相似的训练数据源和神经网络架构设计,各种领先GenAI模型的性能趋于饱和,因此开发可靠的安全护栏已成为责任和可持续性的关键差异化因素。本文提出了计算安全性概念的形式化,这是一个数学框架,通过信号处理理论和方法的视角,能够对GenAI中的安全挑战进行定量评估、表述和研究。特别是,我们探讨了GenAI中两类可表述为假设检验问题的计算安全挑战。对于模型输入的安全性,我们展示了如何使用敏感性分析和损失景观分析来检测带有越狱尝试的恶意提示。对于模型输出的安全性,我们阐明了如何使用统计信号处理来检测AI生成的内容。最后,我们讨论了关键的开放研究挑战、机遇以及信号处理在计算AI安全中的重要作用。

英文摘要

AI safety is a rapidly growing area of research that seeks to prevent the harm and misuse of frontier AI technology, particularly with respect to generative AI (GenAI) tools that are capable of creating realistic and high-quality content through text prompts. Examples of such tools include large language models (LLMs) and text-to-image (T2I) diffusion models. As the performance of various leading GenAI models approaches saturation due to similar training data sources and neural network architecture designs, the development of reliable safety guardrails has become a key differentiator for responsibility and sustainability. This paper presents a formalization of the concept of computational safety, which is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI through the lens of signal processing theory and methods. In particular, we explore two exemplary categories of computational safety challenges in GenAI that can be formulated as hypothesis testing problems. For the safety of model input, we show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts. For the safety of model output, we elucidate how statistical signal processing can be used to detect AI-generated content. Finally, we discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.

11. 数据集、软件与应用 9 篇

2606.16455 2026-06-16 stat.ME stat.AP 新提交

hyreg2: An R package to Estimate Latent Classes on a Mixture of Continuous and Dichotomous Data

hyreg2: 一个用于估计连续和二分类数据混合的潜在类别的R包

Svenja Elkenkamp, John Grosser, Kim Rand

AI总结 提出hyreg2 R包,基于联合似然方法估计混合结果类型的潜在类别模型,支持连续和二分类数据,使用EM算法实现,并提供用户友好接口。

Comments Package hyreg2 available on CRAN

详情
AI中文摘要

R包hyreg2引入了一个频率学派框架,用于使用联合似然方法估计混合结果类型的潜在类别模型。该方法在假设两种结果类型来自共同底层数据生成过程的情况下,结合了连续和二分类数据。在实现的模型中,连续响应假设服从正态分布,而二分类响应使用二项分布建模。这类模型在多个科学学科中用于估计不同类型数据(例如临床试验、计量经济学和健康经济学)的共同参数集。潜在类别估计使用广泛使用的R包flexmix中实现的期望最大化算法。hyreg2包提供了该联合似然框架的用户友好实现,允许用户无需显式编程似然函数即可估计模型。异方差性和删失数据可以被考虑。除了模型估计,该包还提供了专门的汇总和可视化函数以促进结果解释。本文介绍了该包所依据的方法论框架,并通过基于EQ-5D-5L价值集估计的示例说明了其功能。

英文摘要

The R package hyreg2 introduces a frequentist framework for estimating latent class models for mixed outcome types using a joint likelihood approach. The method combines continuous and dichotomous data under the assumption that both outcome types arise from a common underlying data-generating process. In the implemented model, continuous responses are assumed to follow a normal distribution, while dichotomous responses are modeled using a binomial distribution. Such models are used in various scientific disciplines to estimate a common set of parameters across different types of data (e.g. clinical trials, econometrics and health economics). Latent class estimation is performed using the expectation-maximization algorithm as implemented in the widely used R package flexmix. The hyreg2 package offers a user-friendly implementation of this joint likelihood framework, allowing users to estimate models without explicitly programming the likelihood function. Heteroskedasticity as well as censored data can be taken into account. In addition to model estimation, the package provides dedicated summary and visualization functions to facilitate the interpretation of results. The article presents the methodological framework underlying the package and illustrates its functionality through an example based on the estimation of an EQ-5D-5L value set.

2606.15933 2026-06-16 stat.ME stat.CO 新提交

A Comparison of $\texttt{R}$ Packages for Estimating Generalized Linear Mixed Models

用于估计广义线性混合模型的 $\ exttt{R}$ 包比较

Xiang Li, Mirko Signorelli

AI总结 本文通过蒙特卡洛模拟,系统比较了七个代表性R包在估计广义线性混合模型时的收敛性、计算速度、估计精度和假设检验性能,为实际应用提供了选择建议。

Comments 22 pages, 13 figures

详情
AI中文摘要

广义线性混合模型(GLMM)广泛用于分析相关数据,如纵向和多层次数据。由于CRAN上有超过15个R包可用于拟合GLMM,实践者面临艰难选择:哪个包能提供准确估计、可靠收敛和合理计算速度?现有比较要么局限于单个包内的方法,要么只关注速度等狭窄标准。为填补这一空白,我们系统比较了七个代表性R包——lme4、GLMMadaptive、glmmTMB、MASS、hglm、brms和rstanarm——它们实现了不同的估计框架。通过24种情景的蒙特卡洛模拟,我们评估了每个包的收敛率、计算时间、估计精度和假设检验性能。结果表明,lme4_AGQ和GLMMadaptive具有最高的精度和收敛率,尽管GLMMadaptive在复杂随机效应结构下变慢。lme4_LA和glmmTMB计算速度快,但收敛率较低且偏差较大,尤其是方差分量。MASS和hglm也很快,但MASS产生宽松的单变量检验,hglm缺乏对相关随机效应和多元检验的支持。在两个贝叶斯包中,rstanarm可靠收敛并产生有效的单变量检验,而brms极慢,限制了其实用性。基于这些发现,我们为应用研究中选择GLMM工具提供了实用建议。

英文摘要

Generalized linear mixed models (GLMMs) are widely used for analyzing correlated data, such as longitudinal and multilevel data. With over 15 $\texttt{R}$ packages available on $\texttt{CRAN}$ for fitting GLMMs, practitioners face a difficult choice regarding which package yields accurate estimates, converges reliably, and offers reasonable computational speed. Existing comparisons are either limited to methods within a single package or focus on narrow criteria such as speed alone. To address this gap, we systematically compared seven representative $\texttt{R}$ packages -- $\texttt{lme4}$, $\texttt{GLMMadaptive}$, $\texttt{glmmTMB}$, $\texttt{MASS}$, $\texttt{hglm}$, $\texttt{brms}$, and $\texttt{rstanarm}$ -- that implement different estimation frameworks. By using Monte Carlo simulations across 24 scenarios, we evaluated each package in terms of convergence ratios, computational time, estimation accuracy, and hypothesis testing performance. Our results showed that $\texttt{lme4_AGQ}$ and $\texttt{GLMMadaptive}$ yield the highest accuracy and convergence ratios, although $\texttt{GLMMadaptive}$ becomes slower under complex random-effect structures. $\texttt{lme4_LA}$ and $\texttt{glmmTMB}$ are computationally fast but exhibit lower convergence ratios and larger bias, especially for variance components. $\texttt{MASS}$ and $\texttt{hglm}$ are also fast, but $\texttt{MASS}$ yields liberal univariate tests and $\texttt{hglm}$ lacks support for correlated random effects and multivariate testing. Between two Bayesian packages, $\texttt{rstanarm}$ converges reliably and produces valid univariate tests, whereas $\texttt{brms}$ is extremely slow, limiting its practical utility. Based on these findings, we provide practical recommendations for choosing GLMM tool in applied research.

2606.15760 2026-06-16 cs.LG stat.ML 新提交

The Data Manifold under the Microscope

显微镜下的数据流形

Marios Koulakis, Constantin Seibold

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对深度学习理论与实践的差距,提出一个基准框架,通过扩展dSprites和COIL-20数据集并配合有限差分估计器,实现曲率、可达性和体积的近真实值估计,用于校准几何估计器和验证理论假设。

Comments Accepted at ICML 2026. Camera-ready version

详情
AI中文摘要

深度学习理论与实践之间存在显著差距。泛化和近似误差界通常针对简化模型推导,或者过于宽松而缺乏信息。许多工作依赖于流形假设以及内在维度、曲率和可达性等几何正则性。进展需要深入了解数据流形几何和合适的基准,但现有选项两极分化:具有已知几何但适用性有限的分析流形,或几何只能粗略估计的真实世界数据集。我们引入了一个用于研究数据几何的基准框架。我们重新利用并扩展了dSprites和COIL-20,增加了额外的变换维度和密集的轴对齐采样,并将它们与有限差分估计器配对,在通用估计器不可靠或难以部署的情况下,以接近真实值的精度恢复曲率、可达性和体积。该框架旨在作为一个受控测试平台,可用作几何估计器的校准环境和探索理论假设的沙盒。为了说明其用途,我们展示了两个应用研究,即评估Genovese等人和Fefferman等人的界的缩放行为,以及跟踪$β$-VAE的逐层几何,突出了当前界的行为以及受控基准对指导和验证未来理论的价值。参考实现可在https://github.com/koulakis/manifold-microscope获取。

英文摘要

A significant gap exists between theory and practice in deep learning. Generalization and approximation error bounds are often derived for simplified models or are too loose to be informative. Many rely on the manifold hypothesis and on geometric regularity such as intrinsic dimension, curvature, and reach. Progress requires insight into data-manifold geometry and suitable benchmarks, yet existing options are polarized: analytic manifolds with known geometry but limited applicability, or real-world datasets where geometry is only coarsely estimable. We introduce a benchmarking framework for studying data geometry. We repurpose and extend dSprites and COIL-20 with additional transformation dimensions and dense, axis-aligned sampling, and pair them with finite-difference estimators that recover curvature, reach, and volume at near-ground-truth accuracy in a regime where general-purpose estimators are unreliable or difficult to deploy. The framework is intended as a controlled testbed, useful as a calibration environment for geometric estimators and a sandbox for probing theoretical assumptions. To illustrate its use, we present two application studies, namely assessing the scaling behavior of the bounds of Genovese et al. and Fefferman et al., and tracking the layer-wise geometry of a $β$-VAE, highlighting the behavior of current bounds and the value of controlled benchmarks for guiding and validating future theory. A reference implementation is available at https://github.com/koulakis/manifold-microscope.

2606.15314 2026-06-16 cs.LG cs.AI stat.ML 新提交

LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction

有限语义表格数据上的LLM:来自工业汽车改造预测的证据

Aina Vila Pons, Ioannis Tzachristas, Constantinos Antoniou

发表机构 * Technical University of Munich(慕尼黑工业大学) BMW Group(宝马集团)

AI总结 研究在工业表格数据中,LLM(嵌入、直接分类、混合堆叠)与经典树集成方法的对比,发现LLM在语义受限时效果有限,但嵌入和混合方法仍有价值。

详情
AI中文摘要

工业改造规划依赖于结构化操作数据而非自由文本:规划者必须估计新注册的原型是否需要改造、需要哪种改造包以及工作将花费多长时间。我们研究了一个工业数据集,该数据集将原型注册系统(284,271辆车)与改造管理系统(48,716次清洗后的访问)相连接,并在行序列化输入上比较了强大的表格机器学习基线与三种基于LLM的策略:嵌入特征(Amazon Titan)、直接提示分类(Claude Sonnet 4)和ML+LLM堆叠方法。在二分类发生预测、15类改造类型分类、每次访问持续时间回归以及聚合的月度基准测试中,经典树集成仍然是最强的独立模型。然而,LLM结果揭示了一致的模式:嵌入在表格上仍然有用(二分类AUC = 0.982),直接提示在通过哈希去除语义信号后崩溃(二分类AUC = 0.500;多类加权F1 = 0.018),而混合堆叠产生了最佳的手动构建多类模型(加权F1 = 0.626)。在月度基准测试中,基于滞后的机器学习优于时间序列基础模型,尽管Chronos-small在零样本预测中仍具有竞争力。结果表明,在隐私受限的工业表格上,LLM作为补充组件比替代强大的表格基线更有效。

英文摘要

Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.

2606.15201 2026-06-16 stat.AP math.DS stat.ML 新提交

A Koopman-PINN Framework for Epidemic Models: Parameter Inference and Forecasting

Koopman-PINN框架用于流行病模型:参数推断与预测

Achraf Zinihi, Matthias Ehrhardt, Moulay Rchid Sidi Ammi

AI总结 提出Koopman增强物理信息神经网络(K-PINN)框架,结合Koopman算子理论与物理信息学习,实现非线性流行病模型的参数推断与长期预测,在合成猴痘数据和真实SARS-CoV-2数据集上优于经典PINN和Koopman-EDMD方法。

详情
AI中文摘要

我们提出了一个Koopman增强的物理信息神经网络(K--PINN)框架,用于非线性流行病模型中的参数推断和预测。该方法结合了Koopman算子理论和物理信息学习。它将流行病状态映射到一个潜在的可观测空间,在该空间中动力学近似线性演化,同时通过自动微分满足控制流行病方程。这种集成提高了可解释性、参数可辨识性和长期预测稳定性。我们将所提出的框架应用于归一化的SEIRSD流行病模型,并使用合成的猴痘(Mpox)数据和来自德国、摩洛哥和瑞典的SARS-CoV-2病毒真实数据集进行评估。合成轨迹使用保持结构的非标准有限差分方案生成,以确保可靠的训练数据。数值结果表明,K--PINN在参数估计、轨迹重建和长期预测方面比经典PINN和Koopman-EDMD方法更准确。这些结果表明,K--PINN是一种有效的流行病建模机器学习框架,可以扩展到更复杂的系统。

英文摘要

We propose a Koopman-enhanced physics-informed neural network (K--PINN) framework for parameter inference and forecasting in nonlinear epidemic models. This method combines Koopman operator theory and physics-informed learning. It maps epidemic states into a latent observable space where the dynamics evolve approximately linearly while satisfying the governing epidemic equations through automatic differentiation. This integration improves interpretability, parameter identifiability, and long-term predictive stability. We apply the proposed framework to a normalized SEIRSD epidemic model and evaluate it using synthetic monkeypox (Mpox) data and real-world datasets from Germany, Morocco, and Sweden for the SARS-CoV-2 virus. Synthetic trajectories are generated using a structure-preserving, nonstandard finite difference scheme to ensure reliable training data. Numerical results demonstrate that K--PINN achieves more accurate parameter estimation, trajectory reconstruction, and long-term forecasting than classical PINNs and Koopman-EDMD approaches. These results suggest that K--PINN is an effective machine learning framework for epidemic modeling that can be extended to more complex systems.

2606.07622 2026-06-16 cs.LG stat.AP 新提交

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2603.12881 2026-06-16 stat.AP stat.ME 版本更新

Multivariate lattice deformation: A spatially explicit framework for assessing crop rotation impacts on soil nutrient dynamics

多元晶格变形:评估轮作对土壤养分动态影响的空间显式框架

Marco Mandap

AI总结 提出多元晶格模型,将土壤视为4D张量,用力向量表示轮作,通过核平滑模拟养分横向移动,以N-P-K空间欧氏距离量化累积影响,识别风险区域并指导针对性管理。

Comments An error was identified in the underlying distribution proof used for the empirical copula test. The authors are withdrawing this version while finalizing a formally verified proof of the distribution in Lean 4

详情
AI中文摘要

轮作对土壤养分的影响通常使用田间平均或单一养分分析来评估,忽略了空间异质性和多元相互作用。我们提出了一个多元晶格模型,将土壤视为4D张量(空间、时间以及N、P、K通道)。轮作表示为力向量,土壤缓冲能力(“刚度”)随质地空间变化。通过核平滑引入养分横向移动。累积影响通过N-P-K空间中的欧氏距离量化,并使用Cramer-von Mises置换检验评估显著性。在20×20异质网格上模拟三年玉米-大豆-小麦轮作显示,一个周期后平均应力为0.63,沙质区域最大值为0.91。磷耗竭(17.9%)超过氮(10.8%),在19%的单元格中主导应力——这被单一养分分析所掩盖。连续玉米使平均应力增加41%。Cramer-von Mises检验检测到显著偏差(p ≤ 0.002),Moran's I(0.29-0.30)确认了空间自相关。我们的框架识别风险区域并指导针对性管理,连接了地质统计学与机械作物模型。

英文摘要

Crop rotation impacts on soil nutrients are typically assessed using field-averaged or single-nutrient analyses that ignore spatial heterogeneity and multivariate interactions. We propose a multivariate lattice model treating soil as a 4D tensor (space, time, and N, P, K channels). Crop rotations are represented as force vectors, with soil buffering capacity ("stiffness") varying spatially with texture. Lateral nutrient movement is introduced via kernel smoothing. Cumulative impact is quantified by Euclidean distance in N-P-K space, with significance assessed via Cramer-von Mises permutation tests. Simulating a three-year corn-soybean-wheat rotation on a 20 x 20 heterogeneous grid shows mean stress of 0.63 after one cycle, with maximum 0.91 in sandy areas. Phosphorus depletion (17.9%) exceeds nitrogen (10.8%), dominating stress in 19% of cells - obscured by single-nutrient analyses. Continuous corn increases mean stress by 41%. Cramer-von Mises tests detect significant deviation (p <= 0.002), and Moran's I (0.29-0.30) confirms spatial autocorrelation. Our framework identifies risk zones and guides site-specific management, bridging geostatistics with mechanistic crop models.

2510.14092 2026-06-16 stat.ML cs.LG 版本更新

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

deFOREST: 融合光学与雷达卫星数据增强树木损失感知

Julio Enrique Castrillon-Candas, Hanfeng Gu, Caleb Meredith, Yulin Li, Xiaojing Tang, Pontus Olofsson, Mark Kon

AI总结 提出融合光学与SAR数据的森林砍伐检测流程,利用离散KL展开残差空间构建异常图,结合HMM分类,在亚马逊区域验证混合方法优于现有技术且对稀疏光学数据更鲁棒。

详情
Journal ref
IEEE Transactions on Geoscience and Remote Sensing, vol. 64, 2026, Art no. 4409213
AI中文摘要

本文开发了一个结合光学和合成孔径雷达(SAR)数据的森林砍伐检测流程。该流程的一个关键组成部分是利用离散Karhunen-Loéve(KL)展开的残差空间构建光学数据的异常图。异常通过森林标称状态下残差分量分布的浓度界限来量化。该界限不需要关于数据分布的先验知识。这与假设知道数据分布的统计参数方法形成对比,这种假设不切实际,尤其对于高维数据(如我们的数据)不可行。一旦计算出光学异常图,它们与SAR数据结合,并通过隐马尔可夫模型(HMM)对森林状态进行分类。我们在亚马逊森林中一个$92\,km \times 92\,km$的区域使用Sentinel-1(SAR)和Sentinel-2(光学)数据测试了我们的方法。结果表明,混合光学-雷达方法和仅光学方法都实现了高精度,优于最新的混合方法。此外,在高度多云地区常见的光学数据稀疏情况下,混合方法显著更鲁棒。

英文摘要

In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using the residual space of a discrete Karhunen-Loéve (KL) expansion. Anomalies are quantified using a concentration bound on the distribution of the residual components for the nominal state of the forest. This bound does not require prior knowledge on the distribution of the data. This is in contrast to statistical parametric methods that assume knowledge of the data distribution, an impractical assumption that is especially infeasible for high dimensional data such as ours. Once the optical anomaly maps are computed they are combined with SAR data, and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentinel-2 (Optical) data on a $92\,km \times 92\,km$ region in the Amazon forest. The results show that both the hybrid optical-radar and optical only methods achieve high accuracy that is superior to the recent state-of-the-art hybrid method. Moreover, the hybrid method is significantly more robust in the case of sparse optical data that are common in highly cloudy regions.

2502.10182 2026-06-16 stat.ME stat.AP 版本更新

Scalable Generalised Accuracy Estimation for Multisource Register-based Official Statistics

基于多源登记册的官方统计的可扩展广义精度估计

Nina Deliu, Piero Demetrio Falorsi, Stefano Falorsi, Diego Chianella, Giorgio Alleva

AI总结 针对多源登记册统计中的多重误差,提出一种基于多项逻辑模型的全局误差解析近似方法,实现可解释、灵活且计算可扩展的精度量化。

Comments 49 pages (main manuscript and supplementary material); 7 tables, 5 figures

详情
AI中文摘要

官方统计正在经历重大转型,国家统计机构从传统的单源数据生产系统转向整合行政、普查和调查数据的统计登记册集成系统。由此产生的多源登记册估计值容易受到多种交互误差源的影响,然而用于量化其精度的严格可扩展框架仍不成熟。本文讨论并验证了一种用于此类多源登记册统计的全局误差评估度量。聚焦于两个核心不确定性来源——抽样和建模,我们推导出一种解析解,能够精确近似多项逻辑模型下大规模插补过程的全局误差。所提出的度量具有可解释性、灵活性和计算可扩展性,能够为用户定义的、非计划的特定领域人口总量统计提供即时精度量化。其有效性在理论上得到确立,并通过模拟研究得到证实。最后,展示了在意大利国家统计局教育数据上的应用。

英文摘要

Official statistics are undergoing a significant transformation, as national statistical institutes transition from traditional single-source data production systems to integrated systems of statistical registers combining administrative, census, and survey data. The resulting multisource register-based estimates are prone to multiple interacting sources of error, yet rigorous scalable frameworks for quantifying their accuracy remain underdeveloped. This work discusses and validates a global measure of error assessment for such multisource register-based statistics. Focusing on two central sources of uncertainty, sampling and modelling, we derive an analytical solution that accurately approximates the global error of mass-imputation procedures under a multinomial logistic model. The proposed measure is interpretable, flexible, and computationally scalable, enabling on-the-fly accuracy quantification for user-defined, unplanned domain-specific statistics on population totals. Its validity is established theoretically and confirmed through simulation studies. An application to education data from the Italian National Institute of Statistics is presented.

12. 其他/综合统计 24 篇

2606.17022 2026-06-16 math.ST cs.LG stat.ML stat.TH 新提交

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

学习数据的几何:形状空间分析的数学综述

Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin, Karen Habermann, Emmanuel Hartman, Christoph von Tycowicz, Chi Zhang, Wenjun Zhao, Felix Zhou

AI总结 本文综述形状空间分析,利用微分几何、统计学和机器学习构建从形状表示到几何感知学习的分析流程,用于表征几何数据中的非线性结构。

Comments 79 pages, 10 figures, 8 tables

详情
AI中文摘要

机器学习的一个核心目标是识别数据中的结构和模式。数据采集的进步日益产生具有丰富几何形态的观测数据集,从而产生了编码对象几何变异的形状空间。这类数据集出现在广泛的学科中,包括生物学、医学、人类学和计算机视觉,其中微妙的几何差异通常携带重要的科学信息。然而,传统的机器学习方法常常不足以解释这些数据背后的非线性几何结构。本综述综合了快速增长的形状空间分析工作,该工作为几何数据的研究提供了数学和计算框架。借鉴微分几何、统计学和机器学习的理念,我们围绕一个共同的分析流程组织文献:形状表示和参数化、稳健测地距离的严格构造、形状空间上的统计分析以及几何感知的学习方法。我们讨论了这些工具如何能够表征形状变异、比较几何对象以及分析跨群体和时间的结构轨迹。为了说明该领域的广度,我们重点介绍了跨越多个生物组织尺度的应用,包括亚细胞形态学和灵长类牙齿进化的研究。在这些以及许多其他领域中,研究人员面临着由复杂、非线性且常常未对齐的几何变异引起的共同挑战。本综述最后指出了关键的理论和计算挑战,以及由日益庞大和多样化的几何数据集驱动的新兴机遇。

英文摘要

A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variability in object geometry. Such datasets arise across a wide range of disciplines, including biology, medicine, anthropology, and computer vision, where subtle geometric differences often carry important scientific information. Traditional machine learning methods, however, are frequently ill-equipped to account for the nonlinear geometric structure underlying these data. This survey synthesizes a rapidly growing body of work on shape space analysis, which provides a mathematical and computational framework for the study of geometric data. Drawing on ideas from differential geometry, statistics, and machine learning, we organize the literature around a common analytical pipeline: shape representation and parameterization, the rigorous construction of robust geodesic metrics, statistical analysis on shape spaces, and geometry-aware learning methods. We discuss how these tools enable the characterization of shape variability, the comparison of geometric objects, and the analysis of structural trajectories across populations and time. To illustrate the breadth of the field, we highlight applications spanning multiple scales of biological organization, including studies of subcellular morphology and primate tooth evolution. Across these and many other domains, researchers face common challenges arising from complex, nonlinear, and often unaligned geometric variation. The review concludes by identifying key theoretical and computational challenges, as well as emerging opportunities driven by increasingly large and diverse geometric datasets.

2606.16715 2026-06-16 cs.IT math.IT math.PR math.ST stat.TH 新提交

Testing for a Hidden Geometry in Random Graphs

随机图中隐藏几何结构的检测

Amit Silber, Mor Oren-Loberman, Wasim Huleihel

AI总结 研究在随机图中检测隐藏几何信号的问题,推导了检测不可能的信息论下界,并揭示了易-难-不可能相变。

Comments Accepted to COLT 2026; 54 apges

详情
AI中文摘要

我们研究在随机图中检测微弱几何信号的问题。形式上,考虑一个假设检验问题:在原假设下,观测图是 Erdős--Rényi 随机图 $\mathcal{G}(n,q)$;而在备择假设下,一个随机几何图 $\mathcal{G}(k,q,d)$ 被植入在 $k\le n$ 个顶点上。该植入子图由单位球面 $\mathbb{S}^{d-1}$ 上的独立随机点生成,边由潜在几何邻近性决定并校准为边密度 $q$。我们的目标是刻画检测这种隐藏几何结构的统计和计算极限。我们推导了尖锐的信息论下界,识别出检测不可能的区域,并提供在检测可行时达到这些极限的算法。我们进一步研究了该问题的计算复杂度,并确定何时存在有效的多项式时间检验。该模型展现出“易-难-不可能”相变:某些区域允许高效检测,另一些区域仅能通过计算上不可行的过程检测,而其余区域即使拥有无限计算能力也无法检测。作为计算障碍的证据,我们证明所有低次多项式算法在推测的困难区域均失败,展示了统计可行性与计算可行性之间的尖锐差距。

英文摘要

We study the problem of detecting a faint geometric signal hidden in an otherwise random graph. Formally, we consider a hypothesis testing problem in which, under the null, the observed graph is an Erdős--Rényi random graph $\mathcal{G}(n,q)$, while under the alternative a random geometric graph $\mathcal{G}(k,q,d)$ is planted on $k\le n$ vertices. The planted subgraph is generated from independent random points on the unit sphere $\mathbb{S}^{d-1}$, with edges determined by latent geometric proximity and calibrated to have edge density $q$. Our goal is to characterize the statistical and computational limits of detecting this hidden geometry. We derive sharp information-theoretic lower bounds that identify regimes where detection is impossible and provide algorithms that achieve these limits whenever detection is feasible. We further investigate the computational complexity of the problem and determine when efficient polynomial-time tests exist. The model exhibits an \emph{easy--hard--impossible} phase transition: some regimes allow efficient detection, others permit detection only with computationally intractable procedures, and still others render detection impossible even with unlimited computational power. As evidence for the computational barrier, we prove that all low-degree polynomial algorithms fail throughout the conjecturally hard regime, demonstrating a sharp gap between statistical and computational feasibility.

2606.16482 2026-06-16 math.AG math.CO math.ST stat.TH 新提交

Euler Stratifications of Second Hypersimplices via Delta-matroids

第二超单形的欧拉分层通过Delta-拟阵

Janike Oldekop

AI总结 通过Delta-拟阵与主A-行列式的非零因子对应,研究第二超单形缩放环面的欧拉示性数,证明Clarke等人(2024)关于最小ML度数的猜想。

Comments 20 pages, 1 figure, 6 tables

详情
AI中文摘要

我们研究由第二超单形产生的缩放环面的欧拉示性数。在代数统计学中,这些与环面模型的最大似然(ML)度密切相关。我们建立了delta-拟阵与主$A$-行列式的非零因子之间的对应关系,提供了delta-拟阵理论与代数统计学之间的显式联系。利用这一框架,我们证明了通过合适的簇嵌入可以实现猜想的最小ML度。此外,对于阶数不超过六的第二超单形,我们证明该值在所有嵌入中是最小的,正如Clarke等人(2024)所猜想的那样。

英文摘要

We study Euler characteristics of scaled toric varieties arising from second hypersimplices. In algebraic statistics, these are closely connected to maximum likelihood (ML) degrees of toric models. We establish a correspondence between delta-matroids and the non-vanishing factors of the principal $A$-determinant, providing an explicit connection between delta-matroid theory and algebraic statistics. Using this framework, we show that a conjectured minimum ML degree is realizable by a suitable embedding of the variety. Furthermore, for second hypersimplices up to order six, we prove that this value is minimal among all embeddings, as conjectured by Clarke et al. (2024).

2606.16393 2026-06-16 stat.AP math.ST physics.app-ph physics.data-an stat.TH 新提交

Calibrating the Brody exponent as a quantitative measure of short-range exclusion in 2D spatial point processes

将Brody指数校准为二维空间点过程中短程排斥的定量度量

Dawid Kucharski

AI总结 本文将Brody分布校准为二维空间点过程中短程排斥的定量度量,通过重新校准完全空间随机基线(β=0.96±0.15)和建立β-排斥半径经验校准(Spearman ρ=0.988),并应用于制造表面、相位提取干涉测量和素数嵌入等案例。

Comments 22 pages, 6 figures, 3 tables, 33 references; submitted to a peer-reviewed journal

详情
AI中文摘要

Brody分布最初是量子混沌中泊松和维格纳能级间距统计之间的现象学插值,本文将其校准为二维空间点过程中短程排斥的定量度量。核心结果有两个。首先,二维完全空间随机基线被重新校准为β=0.96±0.15,纠正了不恰当的一维泊松参考。其次,经验β-排斥半径校准与有效硬核半径的Spearman ρ=0.988得到验证。该框架在58个制造表面(10种材料,10种工艺)、认证圆度标准的相位提取干涉轮廓测量以及素数的二维二元嵌入上进行了演示。一个稀疏整数对照证明素数β=2.15信号是真正的算术信号(相对于随机整数对照Δβ=+0.68),而康托尔嵌入零结果(β=1.40,TOST p<0.01)表明二维排斥是由嵌入产生的而非内在的。密度稀疏实验表明β捕捉的是排斥强度而非点密度,但绝对值依赖于密度。识别了低填充分数下二元场的独特CSR基线,并提供了决策表。β-排斥半径校准、CSR基线校正和对照协议共同构成了一个用于二维空间点过程中短程排斥可重复表征的校准测量框架。

英文摘要

The Brody distribution, originally a phenomenological interpolation between Poisson and Wigner level-spacing statistics in quantum chaos, is calibrated here as a quantitative measure of short-range exclusion in 2D spatial point processes. Two results form the core. First, the 2D complete-spatial-randomness baseline is recalibrated to $β=0.96\pm0.15$, correcting the inappropriate 1D Poisson reference. Second, an empirical $β$--$r_{\text{excl}}$ calibration is validated against the effective hard-core radius with Spearman $ρ=0.988$. The framework is demonstrated on 58 manufactured surfaces (10 materials, 10 processes), phase-extracted interferometric profilometry of a certified roundness standard, and 2D binary embeddings of prime numbers. A sparse-integer control proves the prime $β=2.15$ signal is genuinely arithmetic ($Δβ=+0.68$ over random-integer control), while a Cantor-embedding null result ($β=1.40$, TOST $p<0.01$) demonstrates that 2D exclusion is embedding-created rather than intrinsic. Density-thinning experiments establish that $β$ captures exclusion strength rather than point density, while absolute values are density-dependent. A distinct CSR baseline for binary fields at low fill fraction is identified, with a decision table provided. The $β$--$r_{\text{excl}}$ calibration, the CSR baseline correction, and the control protocols together constitute a calibrated measurement framework for reproducible characterisation of short-range exclusion in 2D spatial point processes.

2606.16373 2026-06-16 math.ST math.PR math.SP stat.TH 新提交

Higher-order spectral perturbation expansions II: Kernel matrices and manifold learning

高阶谱扰动展开 II:核矩阵与流形学习

Bernhard Stankewitz, Martin Wahl

AI总结 本文在弱假设下建立核矩阵作为核积分算子近似的谱集中界,处理大重数、大有效维度和重尾分布,应用于无限维主成分分析、流形学习和贝叶斯非参数统计。

详情
AI中文摘要

我们研究了核矩阵作为相应核积分算子近似的谱集中界。结果是在数据设置和再生核的弱假设下建立的,仅依赖于Mercer条件和局部Weyl律。这使我们能够处理核矩阵的关键特征,例如大重数、大有效维度和重尾分布。我们的结果适用于无限维主成分分析、流形学习和贝叶斯非参数统计。我们通过两个典型例子说明:球面上的热核和来自贝叶斯非参数的小波先验。

英文摘要

We study spectral concentration bounds for kernel matrices as approximation of the corresponding kernel integral operator. Results are established under weak assumptions on the data setting and the reproducing kernel relying only on a Mercer condition and a local Weyl law. This allows us to deal with key features of kernel matrices, such as large multiplicities, large effective dimension, and heavy-tailed distributions. Our results apply to infinite dimensional principal component analysis, manifold learning, and Bayesian nonparametric statistics. We illustrate this via two prototypical examples: The heat kernel on the sphere and a wavelet prior from Bayesian nonparametrics.

2606.05672 2026-06-16 math.ST stat.TH 版本更新

Trace-Class Results for MCMC Algorithms for Student-t Regression Models

Student-t 回归模型的 MCMC 算法的迹类结果

Yasuyuki Hamura

AI总结 本文研究 Student-t 回归模型的 MCMC 算法,通过分析迹类性质来评估马尔可夫链的效率,发现标准数据增广算法在无信息先验下不是迹类,而折叠 Gibbs 算法是迹类;在正态-逆伽马先验下标准算法是迹类。

Comments 22 pages

详情
AI中文摘要

本文考虑 Student-$t$ 回归模型的 MCMC 算法。我们根据迹类结果是否成立来研究基于这些算法的马尔可夫链的效率。我们首先考虑回归系数和误差方差服从不变的不恰当先验分布的情况。与标准数据增广算法相关的马尔可夫算子不是迹类的,但与折叠 Gibbs 算法相关的算子是迹类的。接下来我们考虑参数服从正态-逆伽马分布的情况。在这种情况下,标准马尔可夫算子是迹类的。

英文摘要

In this paper, we consider MCMC algorithms for Student-$t$ regression models. We investigate the efficiency of Markov chains based on the algorithms in terms of whether trace-class results hold or not. We first consider the case where the parameters follow a matrix-normal-inverse-Wishart distribution and show that the Markov operator associated with a standard data augmentation algorithm is trace-class. We next consider the case of an improper prior and univariate outcomes. In this case, the standard Markov operator is not trace-class but the Markov operator associated with a collpased Gibbs algorithm is trace-class. Finally, we consider the case of an improper prior and multivariate outcomes. We obtain a trace-class result for a parameter expanded data augmentation algorithm which is based on a univariate working parameter.

2606.05072 2026-06-16 math.ST stat.TH 版本更新

Adaptive Sequential Change Detection using Mixtures of Predictive Distributions

使用预测分布混合的自适应序列变化检测

Topi Halme, H. Vincent Poor, Visa Koivunen

AI总结 针对后变化分布未知的独立观测序列变化检测问题,提出一种基于滑动窗口预测分布混合的PM-CuSum算法,实现一阶渐近最优性且渐近延迟余项更小。

详情
AI中文摘要

本文研究了当后变化分布未知时,检测独立观测序列分布变化的问题。我们提出了一种新颖的变化检测算法,称为预测混合CuSum(PM-CuSum),该算法在CuSum递归中结合了从不同长度滑动窗口构建的预测分布。预测分布根据其近期预测性能使用自适应权重进行聚合。我们证明,在温和条件下,PM-CuSum实现了一阶渐近最优性,并且其渐近延迟界具有比任何固定(甚至先知)窗口更小的余项阶数。数值模拟表明,与现有方法相比,PM-CuSum表现良好。此外,与插件似然相比,使用完整预测分布形成似然比可以显著提高性能。

英文摘要

This paper studies the problem of detecting a change in the distribution of a sequence of independent observations when the post-change distribution is unknown. We propose a novel change detection algorithm, termed Predictive-Mixture CuSum (PM-CuSum), which combines predictive distributions constructed from sliding windows of different lengths within a CuSum recursion. The predictive distributions are aggregated using adaptive weights based on their recent predictive performance. We show that PM-CuSum achieves first-order asymptotic optimality under mild conditions, and that its asymptotic delay bound has a smaller remainder order than what is achieved procedures using a single fixed (even oracle) window. Numerical simulations demonstrate that PM-CuSum performs well compared to existing methods. Moreover, it is demonstrated that forming likelihood ratios using full predictive distributions can substantially improve performance compared to plug-in likelihoods.

2605.28429 2026-06-16 math.ST stat.TH 版本更新

On Extending Type-I Error to Data-Dependent Levels

第一类错误到数据依赖水平的“正确”扩展

Nick W. Koning

AI总结 本文通过三个公理证明第一类错误到数据依赖水平的扩展是唯一的,并以此支持E-value的常用定义。

详情
AI中文摘要

关于数据依赖和事后显著性水平的假设检验文献依赖于第一类错误到数据依赖水平的特定扩展。现有对该扩展的论证是启发式的,主要动机源于其与E-value的联系。我们的主要贡献是通过展示该扩展从三个公理中产生来论证其是“正确”的:它是唯一嵌套了数据无关水平下经典第一类错误有效性的扩展,保留了数据依赖水平下的经典有效性,并且在拒绝声明的强度上是单调的。随后,我们应用这一结果来支持E-value的常用定义,通过展示它作为可能在不同数据驱动显著性水平下拒绝的广义假设检验数值表示的正确有效性概念而出现。

英文摘要

The emerging literature on hypothesis testing with data-dependent and post-hoc significance levels relies on a particular extension of the Type-I error to data-dependent levels. Existing arguments for this extension are heuristic, and primarily motivated by a resulting connection to the E-value. Our main contribution is to argue that the extension is 'right', by showing that it emerges from three axioms: within a large class of possible extensions, it is the only extension that nests classical Type-I error validity for data-independent levels, preserves classical validity for data-dependent levels and is monotone in the strength of the rejection claim. As a second contribution, we apply this result to support the common definition of the E-value, by showing that it arises as the 'right' notion of validity for the numerical representation of a generalized hypothesis test that may reject at different data-driven significance levels.

2504.11320 2026-06-16 cs.LG cs.AI cs.DC math.OC stat.ML 版本更新

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

优化大语言模型推理:带有内存约束的流引导在线调度

Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院) School of Mathematical Sciences, Peking University(北京大学数学科学学院) Alibaba Group(阿里巴巴集团)

AI总结 本文提出流引导在线调度方法,通过等待阈值算法和嵌套等待算法,在内存约束下优化大语言模型推理的延迟和容量,减少过载时的延迟。

Comments 79 pages, 20 figures

详情
AI中文摘要

大型语言模型现在每天服务于数百万用户,提供商每天的支出超过70万美元。每个请求需要逐token推理,使GPU调度成为延迟、容量和成本的关键因素。难点在于内生内存增长:生成的token会扩展键值(KV)缓存,溢出可能导致正在进行的请求被驱逐并浪费先前计算。我们将推理视为一个具有内生内存增长、线性迭代次数和驻留GPU的KV缓存约束的多阶段在线调度问题。我们引入了流模型,该模型表征了平衡批处理组成、内存需求和稳定性区域。受流模型指导,我们设计了WAIT(等待累积推理阈值)算法,该算法为已知输出长度设计了基于阈值的准入规则,并通过调节请求在解码阶段段中的推进方式扩展到未知输出长度的嵌套WAIT。两种算法在所陈述的内存条件下近似流基准。嵌套WAIT使用额外的中等规模安全缓冲区,以应对未知输出长度引起的内存溢出导致的驱逐。在配置为Llama-2-7B的A100 GPU上的Vidur模拟中,补充的实GPU验证在附录中报告,这些策略相对于广泛使用的基线算法扩大了经验上观察到的稳定运行范围,并在接近过载和过载区域显著降低了延迟。

英文摘要

Large language models now serve millions of users daily, with providers incurring costs exceeding $700,000 per day. Each request requires token-by-token inference, making GPU scheduling central to latency, capacity, and cost. The difficulty is endogenous memory growth: generated tokens expand the Key-Value (KV) cache, and overflow can evict in-progress requests and waste prior computation. We formulate inference as a multi-stage online scheduling problem with endogenous memory growth, linear iteration times, and GPU-resident KV-cache constraints. We introduce a fluid model that characterizes equilibrium batch composition, memory requirement, and stability region. Guided by the fluid model, we design WAIT (Waiting for Accumulated Inference Threshold), a threshold-based admission rule for known output lengths, and Nested WAIT, which extends the rule to unknown output lengths by regulating how requests advance across decode-stage segments. Both algorithms approximate the fluid benchmark asymptotically under the stated memory conditions. Nested WAIT uses an additional safety buffer of moderate scale to hedge against memory-overflow-induced evictions under unknown output lengths. In Vidur simulations configured for Llama-2-7B on an A100 GPU, with supplemental real-GPU validation reported in the appendix, the policies enlarge the empirically observed stable operating range relative to widely used baseline algorithms and reduce latency especially in near-overloaded and overloaded regimes.

2603.29463 2026-06-16 math.ST stat.TH 版本更新

Robustified Gaussian quasi-BIC for volatility

波动率的稳健化高斯拟BIC

Shoichi Eguchi, Hiroki Masuda

AI总结 针对受有限活动跳跃污染的非遍历连续波动率回归模型,提出两种基于密度功率加权和Hölder不等式归一化的Schwarz型统计量,并证明其模型选择一致性。

详情
AI中文摘要

我们为一类受有限活动跳跃污染的非遍历连续波动率回归模型中的稳健模型比较奠定了理论基础。通过使用密度功率加权和基于Hölder不等式的传统高斯拟似然函数归一化,我们提出了两种Schwarz型统计量,并建立了它们关于最小真实参数波动率系数的模型选择一致性。进行了数值实验以说明我们的理论发现。

英文摘要

We develop a theoretical foundation for robust model comparison in a class of non-ergodic continuous volatility regression models contaminated by finite-activity jumps. Using the density-power weighting and the Hölder(-inequality)-based normalization of the conventional Gaussian quasi-likelihood function, we propose two Schwarz-type statistics and also establish their model selection consistency with respect to the minimal true parametric volatility coefficient. Numerical experiments are conducted to illustrate our theoretical findings.

2603.25032 2026-06-16 math.ST stat.TH 版本更新

Treatment effect estimation under convergent network interference

收敛网络干扰下的处理效应估计

Bryan Park, Stefan Wager

AI总结 提出匿名干扰下有限总体序列的收敛概念,基于图极限框架证明伯努利分配下标准估计量的渐近正态性,适用于密集非随机暴露图。

详情
AI中文摘要

在网络干扰下,一个单元的观测结果取决于其邻接单元在暴露图中的处理分配。现有的基于设计的渐近理论通常通过限制暴露图中的邻域大小来考虑局部干扰。这些方法不适用于密集的暴露图,因此先前的工作通常采用超总体方法,通过随机图模型施加正则性。在本文中,我们引入了匿名干扰下有限总体序列的收敛概念。基于Lovász和Szegedy的图极限框架,我们表明暴露图的大尺度几何结构可以在稀疏性假设或随机图建模之外提供正则性来源。在伯努利分配下,我们的收敛概念使得平均直接效应的标准估计量具有渐近正态性,即使在密集、非随机的暴露图上也是如此。作为一个特例,先前工作中研究的基于图论的随机图模型生成的有限总体在我们的意义下是收敛的。在这些模型下,图的随机性产生了具有稳定大尺度几何结构的暴露图,而平均直接效应估计中的一阶不确定性则由处理分配驱动。

英文摘要

Under network interference, a unit's observed outcome depends on the treatment assignment of its neighboring units in an exposure graph. Existing design-based asymptotic theory typically considers local interference by restricting neighborhood sizes in the exposure graph. Such methods do not apply to dense exposure graphs, so prior work has often adopted a superpopulation approach instead, imposing regularity through random-graph models. In this paper, we introduce a notion of convergence for a sequence of finite populations under anonymous interference. Building on the graph limit framework of Lovász and Szegedy, we show that large-scale geometry of the exposure graph can provide a source of regularity beyond sparsity assumptions or random-graph modeling. Under Bernoulli assignment, our convergence notion yields asymptotic normality of standard estimators for the average direct effect, even on dense, non-random exposure graphs. As a special case, graphon-based random-graph models studied in prior work generate finite populations that converge in our sense. Under these models, graph randomness generates exposure graphs with stable large-scale geometry, while first-order uncertainty in average direct effect estimation is driven by treatment assignment.

2602.02819 2026-06-16 cs.LG stat.ML 版本更新

Causal Evaluation of Membership Inference Attacks

成员推断攻击的因果评估

Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet

发表机构 * Inria(法国国家科学研究中心) PreMeDICaL, Inserm, Montpellier, France(PreMeDICaL、法国国家医学研究院、蒙彼利埃,法国) School of Computer and Communication Science (EPFL)(信息与通信科学学院(EPFL)) School of Life Sciences (EPFL)(生命科学学院(EPFL)) Lausanne, Switzerland(瑞士洛桑)

AI总结 将成员推断攻击评估视为因果推断问题,定义记忆化为包含数据点的因果效应,提出多轮、单轮和零轮设置下的实用估计器并验证其有效性。

Comments Fixed ref label problems

详情
AI中文摘要

成员推断攻击(MIA)旨在区分训练点(成员)和未见数据(非成员),并广泛用于量化记忆化和评估隐私风险。标准MIA评估需要重复训练,对于大型模型计算成本高昂。单轮(单次训练,随机数据包含)和零轮(事后评估)方法常被用作替代,但其统计有效性尚不清楚。我们通过将MIA评估框架化为因果推断问题来填补这一空白,将\emph{记忆化定义为在训练集中包含一个数据点的因果效应}。这一新颖的表述揭示并形式化了现有协议中偏差的关键来源:单轮方法受到联合包含点之间的干扰,而零轮评估还受到成员与非成员评估数据之间分布偏移的混淆。我们推导了标准MIA指标的因果类比,并提出了多轮、单轮和零轮设置下的实用估计器,具有非渐近一致性保证。我们在多个设置中验证了我们的方法,包括预训练和微调的大型语言模型,表明它能够在无需重新训练且存在分布偏移的情况下可靠地测量MIA性能。总体而言,我们的框架为现代AI系统中的隐私评估提供了原则性基础。

英文摘要

Membership Inference Attacks (MIAs) aim to distinguish training points (members) from unseen data (non-members), and are widely used to quantify memorization and assess privacy risks. Standard MIA evaluation requires repeated retraining, which is computationally costly for large models. One-run (single training with randomized data inclusion) and zero-run (post hoc evaluation) methods are often used instead, but their statistical validity remains unclear. We address this gap by framing MIA evaluation as a causal inference problem, defining \emph{memorization as the causal effect of including a data point in the training set}. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations are additionally confounded by distribution shift between member and non-member evaluation data. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. We validate our approach in several settings, including pretrained and fine-tuned LLMs, showing that it enables reliable measurement of MIA performance without retraining and under distribution shift. Overall, our framework provides a principled foundation for privacy evaluation in modern AI systems.

2512.21577 2026-06-16 cs.CL cs.AI cs.LG stat.ML 版本更新

A Unified Definition of Hallucination: It's The World Model, Stupid!

幻觉的统一定义:是世界模型的问题,笨蛋!

Emmy Liu, Varun Gangal, Chelsea Zou, Michael Yu, Xiaoqi Huang, Alex Chang, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出幻觉的统一定义,即用户可观察到的错误内部世界建模,并连接至HalluWorld基准测试,以区分真实幻觉与规划或奖励错误。

Comments ICML 2026. HalluWorld benchmark at https://github.com/DegenAI-Labs/HalluWorld

详情
AI中文摘要

尽管自语言模型诞生以来已有无数缓解尝试,但即使在当今最前沿的LLM中,幻觉仍然是一个持续存在的问题。这是为什么?我们回顾了现有的幻觉定义,并将它们整合为一个统一的定义,其中先前的定义被包含在内。我们认为,幻觉可以通过将其简单地定义为不准确的(内部)世界建模来统一,其形式是用户可观察到的。例如,陈述与知识库相矛盾的事实,或生成与来源相矛盾的摘要。通过改变参考世界模型和冲突策略,我们的框架统一了先前的定义。我们认为,这种统一观点是有用的,因为它迫使评估澄清其假定的参考“世界”,区分真实幻觉与规划或奖励错误,并为跨基准比较和缓解策略讨论提供共同语言。基于这一定义,我们还将我们的框架连接到HalluWorld,这是一个补充基准,它实例化了完全指定的参考世界模型,用于压力测试模型幻觉。

英文摘要

Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference "world", distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we also connect our framework to HalluWorld, a complementary benchmark that instantiates fully specified reference world models for stress-testing model hallucinations.

2601.11422 2026-06-16 math.ST math.PR stat.TH 版本更新

Stein's method for the matrix normal distribution

矩阵正态分布的Stein方法

Robert E. Gaunt, Frédéric Ouimet, Donald Richards

AI总结 本文首次系统发展矩阵分布的Stein方法,通过矩阵Ornstein-Uhlenbeck扩散建立Stein恒等式,给出Stein方程解的半群表示及正则性估计,并应用于矩阵中心极限定理、矩阵T分布近似及协方差估计。

Comments 25 pages, 0 figures

详情
AI中文摘要

本文首次系统发展矩阵分布的Stein方法。我们建立了矩阵正态近似的Stein方法的基本要素:从具有双边尺度的矩阵Ornstein-Uhlenbeck扩散推导出基于扩展生成元的Stein恒等式,为Stein方程的解提供了明确的半群表示,并获得了解的正则性估计。新方法通过三个例子展示:(i) 量化矩阵中心极限定理的光滑Wasserstein距离界(教学示例),(ii) 中心化矩阵$T$分布的矩阵正态近似的Wasserstein距离界,以及(iii) 估计矩阵正态的行和列协方差因子的Stein矩方法,产生一类灵活的加权翻转-翻转Stein估计量,推广了Dutilleul的经典翻转-翻转算法,并自然适应行/列重要性权重、系统缺失和投影到结构化协方差族。后两个例子本质上是矩阵值的,不能通过简单的向量化处理。

英文摘要

This work presents the first systematic development of Stein's method for matrix distributions. We establish the basic essential ingredients of Stein's method for matrix normal approximation: we derive an extended-generator-based Stein identity from a matrix Ornstein-Uhlenbeck diffusion with two-sided scales, provide an explicit semigroup representation for the solution of the Stein equation, and obtain regularity estimates for the solution. The new methodology is demonstrated in three examples: (i) smooth Wasserstein distance bounds to quantify the matrix central limit theorem (a didactic example), (ii) a Wasserstein distance bound for the matrix normal approximation of the centered matrix $T$ distribution, and (iii) a Stein's method-of-moments approach to estimating the row and column covariance factors of the matrix normal, yielding a flexible class of weighted flip-flop Stein estimators that generalize Dutilleul's classical flip-flop algorithm and naturally accommodate row/column importance weights, systematic missingness, and projection onto structured covariance families. The latter two examples are intrinsically matrix-valued and cannot be treated using naive vectorization.

2510.02666 2026-06-16 math.ST stat.TH 版本更新

Robustified Gaussian quasi-likelihood inference for volatility

鲁棒化的波动率高斯拟似然推断

Shoichi Eguchi, Hiroki Masuda

AI总结 针对高频数据受有限活动跳跃和尖峰噪声污染的情况,提出基于密度幂加权和Hölder不等式归一化的鲁棒化高斯拟似然估计量,证明其渐近混合正态性,并对协变量和响应过程同时具有鲁棒性。

详情
AI中文摘要

我们考虑基于高频观测的一类连续半鞅回归模型的统计推断,这些观测受到有限活动跳跃和尖峰噪声的污染。通过采用密度幂加权和Hölder不等式归一化,我们提出了易于实现的、传统高斯拟最大似然估计量的鲁棒化版本,该版本仅需一个调节参数。我们证明了它们在$\sqrt{n}$的标准速率下具有渐近混合正态性。理论上表明,这些估计量同时对协变量和响应过程中的污染具有鲁棒性。此外,在调节参数选择的适当条件下,所提出的估计量在无污染情况下达到与传统估计量相同的渐近分布。仿真结果突出了估计量对调节参数选择的不敏感性。

英文摘要

We consider statistical inference for a class of continuous semimartingale regression models based on high-frequency observations subject to contamination by finite-activity jumps and spike noise. By employing density-power weighting and Hölder-inequality-based normalization, we propose easy-to-implement, robustified versions of the conventional Gaussian quasi-maximum-likelihood estimator that require only a single tuning parameter. We prove their asymptotic mixed normality at the standard rate of $\sqrt{n}$. It is theoretically shown that these estimators are simultaneously robust against contamination in both the covariate and response processes. Additionally, under suitable conditions on the selection of the tuning parameter, the proposed estimators achieve the same asymptotic distribution as the conventional estimator in the contamination-free case. Illustrative simulation results highlight the estimators' insensitivity to the choice of the tuning parameter.

2410.05517 2026-06-16 math.ST stat.TH 版本更新

Functional Extreme-PLS

函数型极端偏最小二乘法

Stéphane Girard, Cambyse Pakzad

AI总结 针对离散化函数型框架提出极端降维方法,结合PLS和SIR技术,通过投影协方差最大化捕捉尾部信息,在非线性逆单指标模型下估计指标,并证明渐近一致性。

Comments 44 pages, 9 figures

详情
AI中文摘要

我们提出了一种极端降维方法,将极端偏最小二乘法(Extreme-PLS)扩展到离散化函数型框架,其中协变量位于无限维希尔伯特空间$L^2([0,1])$中,但在密集时间网格上部分观测。该思想部分借鉴了偏最小二乘法(PLS)和切片逆回归(SIR)技术。该方法依赖于将协变量投影到子空间,并最大化其投影与响应之间的协方差,条件于捕捉尾部信息的极端事件。协变量和重尾响应通过非线性逆单指标模型关联,我们的目标是在该回归框架中推断指标。我们提出了一族新的估计量,并证明了其在模型下的渐近一致性和收敛速度。在噪声的温和假设下,大多数假设以正则变化的形式给出,这与标准SIR和单指标回归文献不同。此外,我们扩展了理论分析,在一般可分离希尔伯特空间中得到了经验尾部矩的几乎必然一致性结果(无模型假设)。最后,我们在合成函数数据和高频金融数据的有限样本研究中展示了结果,突出了降维在捕捉尾部依赖和极端风险管理中的有效性。

英文摘要

We propose an extreme dimension reduction method extending the Extreme-PLS approach to the discretized functional framework, where the covariate lies in the infinite-dimensional Hilbert space $L^2([0,1])$ but is partially observed on a dense time grid. The ideas are partly borrowed from both Partial Least-Squares (PLS) and Sliced Inverse Regression (SIR) techniques. Accordingly, the method relies on the projection of the covariate onto a subspace and maximizes the covariance between its projection and the response conditionally on an extreme event capturing the tail-information. The covariate and the heavy-tailed response are supposed to be linked through a non-linear inverse single-index model and our goal is to infer the index in this regression framework. We propose a new family of estimators and show its asymptotic consistency with convergence rates under the model. Assuming mild conditions on the noise, most of the assumptions are stated in terms of regular variation unlike the standard literature on SIR and single-index regression. In addition, we expand the theoretical analysis with a model-free almost sure consistency result for the empirical tail-moments in a general separable Hilbert space. Finally, our results are illustrated on a finite-sample study with synthetic functional data as well as high-frequency financial data, highlighting the effectiveness of the dimension reduction for capturing tail dependence and for extreme risk management.

2509.24223 2026-06-16 cs.LG cs.CV stat.ML 版本更新

Semantic Editing with Coupled Stochastic Differential Equations

耦合随机微分方程的语义编辑

Jianxin Zhang, Clayton Scott

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出耦合随机微分方程(coupled SDEs)引导预训练生成模型的采样过程,无需重新训练即可实现高提示保真度和近像素级一致性的语义编辑。

详情
AI中文摘要

使用预训练的文本到图像模型编辑图像内容仍然具有挑战性。现有方法常常扭曲细节或引入意外伪影。我们提出使用\emph{耦合随机微分方程}(coupled SDEs)来引导任何可以通过求解SDE进行采样的预训练生成模型的采样过程,包括扩散模型和整流流模型。通过用相同的相关噪声驱动源图像和编辑图像,我们的方法将新样本引导至所需语义,同时保持与源图像的视觉相似性。该方法开箱即用,无需重新训练或辅助网络,并实现了高提示保真度和近像素级一致性。这些结果使耦合SDE成为受控生成式AI的简单而强大的工具。项目页面:此 https URL。代码:此 https URL。

英文摘要

Editing the content of an image with a pretrained text-to-image model remains challenging. Existing methods often distort fine details or introduce unintended artifacts. We propose using \emph{coupled stochastic differential equations} (coupled SDEs) to guide the sampling process of any pre-trained generative model that can be sampled by solving an SDE, including diffusion and rectified flow models. By driving both the source image and the edited image with the same correlated noise, our approach steers new samples toward the desired semantics while preserving visual similarity to the source. The method works out-of-the-box, without retraining or auxiliary networks, and achieves high prompt fidelity along with near-pixel-level consistency. These results position coupled SDEs as a simple yet powerful tool for controlled generative AI. Project page: https://z-jianxin.github.io/syncSDE-release/. Code: https://github.com/Z-Jianxin/syncSDE-release.

2410.04057 2026-06-16 math.ST stat.TH 版本更新

Change Point Detection in Precision Matrices with D-trace Loss

基于D-trace损失的精度矩阵变点检测

Ying Lin, Benjamin Poignard, Ting Kei Pong, Akiko Takeda

AI总结 针对分段常数稀疏精度矩阵的变点检测问题,提出基于D-trace损失和组融合LASSO惩罚的估计方法,给出变点一致性和稀疏估计条件,并引入修正正则化器保证解的存在性,用ADMM算法求解。

详情
AI中文摘要

我们考虑估计一个随时间变化的稀疏精度矩阵的问题,该矩阵被假定为分段常数演化。基于组融合LASSO和LASSO惩罚函数,我们同时估计精度矩阵和变点。我们提出了一种替代常用的高斯似然损失的估计量,即D-trace损失。我们给出了估计变点的一致性和每个块中稀疏估计量的条件。我们证明了当惩罚函数的调谐参数满足某些条件时,相应估计问题的解存在。不幸的是,这些条件通常不可验证,给实际调参带来了挑战。为了解决这个问题,我们引入了一个修正正则化器,并开发了一个总是有解的修正问题:这些解可用于检测原始问题可能无解的情况,否则获得原始问题的解。然后提出了一种交替方向乘子法(ADMM)来求解修正问题。通过数值实验说明了该方法的有效性。

英文摘要

We consider the problem of estimating a time-varying sparse precision matrix, which is assumed to evolve in a piecewise constant manner. Building upon the Group Fused LASSO and LASSO penalty functions, we estimate both the precision matrix and the change points. We propose an alternative estimator to the commonly employed Gaussian likelihood loss, namely the D-trace loss. We provide the conditions for the consistency of the estimated change points and of the sparse estimators in each block. We show that the solutions to the corresponding estimation problem exist when some conditions relating to the tuning parameters of the penalty functions are satisfied. Unfortunately, these conditions are not verifiable in general, posing challenges for tuning the parameters in practice. To address this issue, we introduce a modified regularizer and develop a revised problem that always admits solutions: these solutions can be used for detecting possible unsolvability of the original problem or obtaining a solution of the original problem otherwise. An alternating direction method of multipliers (ADMM) is then proposed to solve the revised problem. The relevance of the method is illustrated through numerical experiments.

2408.05568 2026-06-16 cs.AI cs.CL cs.CY stat.AP 版本更新

Metacognitive Myopia in Large Language Models

大型语言模型中的元认知近视

Florian Scholten, Tobias R. Rebholz, Mandy Hütter

AI总结 提出元认知近视框架解释LLM偏见,认为信息环境中的有偏样本导致五种症状,并通过监控与控制机制近似技术缓解。

详情
AI中文摘要

大型语言模型(LLMs)表现出潜在有害的偏见,这些偏见强化了文化嵌入的刻板印象,影响道德判断,或放大对多数群体的积极评价。我们提出元认知近视作为一个认知生态框架,用以解释一系列已建立和新兴的LLM偏见。我们的理论框架认为,信息环境中的有偏样本导致LLM中元认知近视的五种症状:整合无效嵌入、易受冗余信息影响、在条件计算中忽略基率、基于频率的决策规则,以及对嵌套数据结构的错误高阶统计推断。此外,该框架认为元认知的两个主要组成部分——监控和控制——可以解释这五种症状。因此,我们进一步概述了如何从技术上近似监控和控制,例如通过隐藏的并行推理历史,使交互式LLM在生成公开响应之前能够评估近视推理的风险。我们的理论框架为有缺陷的人机交互和代理AI提供了新的视角,并对在组织结构和高风险决策中实施LLM提出了重要的伦理关切。

英文摘要

Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally embedded stereotypes, influence moral judgments, or amplify positive evaluations of majority groups. We propose metacognitive myopia as a cognitive-ecological framework accounting for a conglomerate of established and emerging LLM biases. Our theoretical framework posits that biased samples in the information environment cause five symptoms of metacognitive myopia in LLMs: integration of invalid embeddings, susceptibility to redundant information, neglect of base rates in conditional computation, decision rules based on frequency, and inappropriate higher-order statistical inference for nested data structures. Moreover, it posits that the two main components of metacognition, monitoring and control, could account for these five symptoms. Accordingly, we further outline how monitoring and control could be approximated technically, for instance, through hidden parallel reasoning histories that allow interactive LLMs to evaluate risks of myopic inference before generating overt responses. Our theoretical framework provides a novel perspective on flawed human-machine interactions and agentic AI and raises significant ethical concerns regarding the implementation of LLMs in organizational structures and high-stakes decisions.

2502.06178 2026-06-16 math.OC cs.LG stat.ML

Bayesian Optimization by Kernel Regression and Density-based Exploration

基于核回归和密度探索的贝叶斯优化

Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P. R. China(上海交通大学紫阳学院) School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, P. R. China(上海交通大学数学科学学院) Shanghai Institute of Aerospace Systems Engineering, Shanghai 201109, P. R. China(上海航天系统工程研究院) Department of Mathematics, Shanghai University, Shanghai 200444, P. R. China(上海大学数学系) Newtouch Center for Mathematics of Shanghai University, Shanghai University, Shanghai 200444, P. R. China(上海大学数学中心)

AI总结 该研究提出了一种新的贝叶斯优化算法BOKE,通过核回归和密度探索结合,减少计算成本至二次复杂度,并在理论和实验上证明了其收敛性和有效性。

详情
AI中文摘要

贝叶斯优化在优化昂贵评估的黑盒函数时非常有效,但因高斯过程的每次迭代三次计算复杂度而面临显著的计算挑战,导致总时间复杂度与迭代次数的四次方成正比。为了解决这一限制,我们提出了一种新的算法,即基于核回归和密度探索的贝叶斯优化(BOKE)。BOKE利用核回归进行高效的函数近似,核密度用于探索,并将它们整合到置信界标准中以指导优化过程,从而将计算成本降低到二次。我们的理论分析严格建立了在噪声评估下的BOKE全局收敛性。通过广泛的数值实验,在合成和现实优化任务中,我们证明了BOKE不仅在与高斯过程方法和其他基线方法相比具有竞争力,而且表现出优越的计算效率。这些结果突显了BOKE在资源受限环境中的有效性,为工程应用中的优化问题提供了一种实用的方法。

英文摘要

Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results in a total time complexity that is quartic with respect to the number of iterations. To address this limitation, we propose a novel algorithm, Bayesian optimization by kernel regression and density-based exploration (BOKE). BOKE uses kernel regression for efficient function approximation, kernel density for exploration, and integrates them into the confidence bound criteria to guide the optimization process, thus reducing computational costs to quadratic. Our theoretical analysis rigorously establishes the global convergence of BOKE under noisy evaluations. Through extensive numerical experiments on both synthetic and real-world optimization tasks, we demonstrate that BOKE not only performs competitively compared to Gaussian process-based methods and several other baseline methods but also exhibits superior computational efficiency. These results highlight BOKE's effectiveness in resource-constrained environments, providing a practical approach for optimization problems in engineering applications.

2503.12147 2026-06-16 math.ST stat.TH

Two statistical problems for multivariate mixture distributions

Ricardo Fraiman, Leonardo Moreno, Thomas Ransford

Comments 41 pages, 12 figures

详情
英文摘要

We address two important statistical problems: that of estimating mixtures of multivariate normal distributions and mixtures of $t$-distributions based on univariate projections, and that of quantifying a discrepancy between mixture distributions induced by two model-based clusterings. In the second problem, rather than introducing a direct metric on partitions, we propose a model-based distributional discrepancy between the fitted mixture distributions associated with two clusterings. The results are based on an earlier work of the authors, where it was shown that mixtures of multivariate Gaussian or $t$-distributions can be distinguished by projecting them onto a certain predetermined finite set of lines, the number of lines depending only on the total number of distributions involved and on the ambient dimension. We also compare our proposal with robust versions of the expectation-maximization method EM. In each case, we present algorithms for effecting the task, and compare them with existing methods by carrying out some simulations.

2511.10911 2026-06-16 stat.ME

Improving Variance and Confidence Interval Estimation in Small-Sample Propensity Score Analyses: Bootstrap vs. Asymptotic Methods

Baoshan Zhang, Sean M. O'Brien, Yuan Wu, Laine E. Thomas

详情
Journal ref
Statistics in Medicine (2026)
英文摘要

Propensity score (PS) methods are widely used to estimate treatment effects in non-randomized studies. Variance is typically estimated using sandwich or bootstrap methods, which can either treat the PS as estimated or fixed. The latter is thought to be conservative. Comparisons between the sandwich and bootstrap estimators have been compared in moderate to large sample sizes, favoring the bootstrap estimator. With the growing interest in treatments for rare disease and externally controlled clinical trials, very small sample sizes are not uncommon and the asymptotic properties of sandwich estimators may not hold. Bootstrap methods that allow for PS re-estimation can also generate problems with quasi-separation in small samples. It is unclear whether it is safe to prefer sandwich estimators or to assume that treating the PS as fixed is conservative. We conducted a Monte Carlo simulation to compare the performance of bootstrap versus sandwich variance and CI estimators for average treatment effects estimated with PS methods. We systematically evaluated the impact of treating the PS as fixed versus re-estimating it. These methodological comparisons were performed using Inverse Probability of Treatment Weighting (IPTW) and Augmented Inverse Probability of Treatment Weighting (AIPW) estimators. Simulations assessed performance under various conditions, including small sample sizes and different outcome and treatment prevalences. We illustrate the differences in our motivating example, the LIMIT-JIA trial. We show that the sandwich estimators can perform quite poorly in small samples, and fixed PS methods are not necessarily conservative. A stratified bootstrap avoids quasi-separation and performs well. Differences were large enough to alter statistical conclusions in our motivating example, LIMIT-JIA.

2508.06580 2026-06-16 stat.AP q-bio.PE

Actuarial Analysis of an Infectious Disease Insurance based on an SEIARD Epidemiological Model

Achraf Zinihi, Matthias Ehrhardt, Moulay Rchid Sidi Ammi

详情
英文摘要

The growing number of infectious disease outbreaks, like the one caused by the SARS-CoV-2 virus, underscores the necessity of actuarial models that can adapt to epidemic-driven risks. Traditional life insurance frameworks often rely on static mortality assumptions that fail to capture the temporal and behavioral complexity of disease transmission. In this paper, we propose an integrated actuarial framework based on the SEIARD epidemiological model. This framework enables the explicit modeling of incubation periods and disease-induced mortality. We derive key actuarial quantities, including the present value of annuity benefits, payment streams, and net premiums, based on SEIARD dynamics. We formulate a prospective reserve function and analyze its evolution throughout the course of an epidemic. Additionally, we examine the forces of infection, mortality, and removal to assess their impact on epidemic-adjusted survival probabilities. Numerical simulations implemented via a nonstandard finite difference (NSFD) scheme illustrate the model's applicability under various parameter settings and insurance policy assumptions.

1705.08544 2026-06-16 stat.OT

Data Visualization on Day One: Bringing Big Ideas into Intro Stats Early and Often

Xiaofei Wang, Cynthia Rush, Nicholas Jon Horton

Comments Accepted in Technology Innovations in Statistics Education

详情
Journal ref
Technology Innovations in Statistics Education, 10(1) (2017). https://escholarship.org/uc/item/84v3774z
英文摘要

In a world awash with data, the ability to think and compute with data has become an important skill for students in many fields. For that reason, inclusion of some level of statistical computing in many introductory-level courses has grown more common in recent years. Existing literature has documented multiple success stories of teaching statistics with R, bolstered by the capabilities of R Markdown. In this article, we present an in-class data visualization activity intended to expose students to R and R Markdown during the first week of an introductory statistics class. The activity begins with a brief lecture on exploratory data analysis in R. Students are then placed in small groups tasked with exploring a new dataset to produce three visualizations that describe particular insights that are not immediately obvious from the data. Upon completion, students will have produced a series of univariate and multivariate visualizations on a real dataset and practiced describing them.