arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 统计理论与方法 8 篇

2606.17424 2026-06-17 stat.ME 新提交

The dangers of using three-number summaries to estimate unknown standard deviations: sensitivity analyses and some possible improvements incorporating shape

使用三数汇总估计未知标准差的风险:敏感性分析及结合形状信息的改进方法

Udara Kumaranathunga, Alysha De Livera, Luke A. Prendergast

AI总结 本文揭示三数汇总(最小值、中位数、最大值)不足以可靠估计标准差,提出基于缩放Beta分布的新估计器,并开发敏感性分析工具以提高推断可靠性。

详情
AI中文摘要

近年来,将三数和五数汇总统计量(即最小值、最大值、中位数和四分位数)转换为均值和标准差的方法取得了很大进展。这在元分析中很常见,其中一些研究报告均值和标准差,而另一些报告分位数汇总。然而,我们表明,最常见的三数汇总不包含足够的信息来可靠地估计标准差。我们证明,这可能导致非常差的估计,从而可能使任何推断无效,并提供了敏感性分析的细节,使研究人员能够对其结果更有信心,或突出潜在的偏差来源。我们进一步探讨了指定额外信息是否能提供关于未知数据形状的足够信息以改进标准差估计,并在此过程中引入了一种使用缩放Beta分布的新估计器。通过模拟和真实数据示例,我们突出了该方法的优缺点。还提供了一个Web应用程序,以帮助研究人员进行敏感性分析。

英文摘要

In recent years, there has been much progress toward the development of methods for converting three- and five-number summary statistics (i.e. minimum, maximum, median, and quartiles) to means and standard deviations (SDs). This is commonly done in the meta-analysis setting, where some studies report means and SDs, while other report quantile summaries. However, we show that three-number summaries, which are the most common, do not contain enough information to reliably estimate SDs. We show that very poor estimates can result, which may invalidate any inference and provide details of a sensitivity analysis that can allow researchers to have greater confidence in their results, or highlight potential sources of bias. We further explore whether nominating additional information can provide enough information regarding the unknown data shape to improve SD estimations, and in doing so introduce a new estimator using the scaled Beta distribution. Simulations and a real data example are used to highlight the advantages and disadvantages of this approach. A Web application is also provided to help researchers perform sensitivity analyses.

2606.17293 2026-06-17 stat.ME math.ST 新提交

Dependent Censoring Based on Geometric Optimization

基于几何优化的相依删失

Anis Fradi, Salima Helali, Bilel Bousselmi

AI总结 针对生存分析中的相依删失问题,提出基于扩展广义Marshall-Olkin模型的框架,利用几何优化技术估计失效与删失时间的依赖关系,并证明渐近性质。

详情
AI中文摘要

在生存分析中,相依删失对准确估计模型参数和生存函数构成了重大挑战。本研究引入了一个利用扩展广义Marshall-Olkin(EGMO)模型的新框架,以处理相依删失机制。采用几何优化技术来开发高效的估计程序,捕捉失效时间和删失时间之间的依赖关系。我们建立了它们的渐近性质。模拟研究和实际数据应用说明了该方法的稳健性和有效性。

英文摘要

In survival analysis, dependent censoring poses significant challenges in accurately estimating model parameters and survival functions. This study introduces a novel framework leveraging Extended Generalized Marshall-Olkin (EGMO) models to address dependent censoring mechanisms. Geometric optimization techniques are employed to develop efficient estimation procedures that capture dependencies between failure and censoring times. We establish their asymptotic properties. Simulation studies and real data applications illustrate the method's robustness and effectiveness.

2606.18218 2026-06-17 math.PR cs.LG eess.SY math.OC stat.ML 新提交

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

随机网络中的有限时间队列峰值律:几何阈值后的对数缩放

Hao Liang, Cheng Tang, Yunzong Xu

AI总结 研究广义交换机中有限时间队列峰值,证明在均匀内部松弛条件下,漂移最小化调度策略的峰值包络从平方根律转变为对数律,并给出匹配下界和几何阈值。

详情
AI中文摘要

我们研究广义交换机中的有限时间队列峰值,广义交换机是一种标准随机网络模型,其中许多队列共享受限的服务资源。到达过程可以是依赖的、时变的,并且适应于过去;稳态负载条件是均匀内部松弛,即条件均值到达向量始终位于容量区域的一个固定收缩内。我们表明,这种松弛重塑了漂移最小化调度策略(如MaxWeight)的有限时间峰值律。没有松弛时尖锐的平方根包络仅持续到几何依赖的阈值;超过该阈值,运行最大值随水平期仅对数增长,无论是高概率还是期望意义下。其机制是自归一化:在当前队列方向上,投影波动尺度被稳定化漂移尺度归一化。这从对数系数中消除了容量几何,而几何仍保留在阈值中。匹配的下界表明,对数项和几何阈值都是不可避免的。当有限时间状态空间塌缩可用时,可以使用局部瓶颈几何来锐化阈值。对于广义输入排队交换机,我们获得了具有紧对数系数的有限时间峰值界。仿真说明了理论预测的两阶段包络、局部几何改进和方差敏感改进。

英文摘要

We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.

2606.18199 2026-06-17 math.ST q-fin.RM stat.ME stat.ML 新提交

Conformal Prediction Intervals with Tail-Specific Guarantees

具有尾部特定保证的共形预测区间

Simone Cuonzo, Nina Deliu

AI总结 本文扩展经典共形框架,通过构造单侧共形区间并取交集得到双侧区间,为上下尾分别提供显式校准的覆盖保证,理论证明尾部特定和全局边际覆盖,在偏态数据中改善方向校准。

详情
AI中文摘要

本文将构造具有全局边际覆盖$1-\alpha$的预测区间的经典共形框架扩展到为上下尾分别提供显式校准保证的区间。聚焦于分裂共形预测,我们首先构造实现边际有效性的下侧和上侧单侧共形区间,然后通过交集导出双侧区间。理论结果证明了所导出的双侧区间的尾部特定和全局边际覆盖。结果首先在可交换设定下给出,其中覆盖具有有限样本保证,然后针对非可交换数据,其中保证是渐近的。模拟研究表明,相对于经典双侧区间,所提出的方法实现了改进的方向校准,在偏态数据中尤其相关。最后,在一个金融应用中展示了所提出框架的优势,其中目标是最大化收益同时寻求对左尾的严格控制。

英文摘要

This paper extends classical conformal frameworks for constructing prediction intervals with global marginal coverage $1-\alpha$ to intervals that provide explicitly calibrated guarantees for the upper and lower tails separately. Focusing on split conformal prediction, we first construct lower and upper one-sided conformal intervals that achieve marginal validity, and then derive the induced two-sided interval by intersection. Theoretical results prove both tail-specific and global marginal coverage of the induced two-sided interval. Results are presented first for the exchangeable setting, where coverage has finite-sample guarantees, and then for non-exchangeable data, where guarantees are asymptotic. Simulation studies show that the proposed approach achieves improved directional calibration relative to classical two-sided intervals, especially relevant in skewed data. Finally, the benefit of the proposed framework is showcased in a financial application, where one aims for return maximization while seeking strict control on the left tail.

2605.14610 2026-06-17 stat.ME eess.SP math.ST 版本更新

Parametrically Adaptive Transition Polynomial: a Signed-Parity Continuous-alpha Extension of Kunchenko Stochastic Polynomials

参数自适应过渡多项式:Kunchenko随机多项式的带符号奇偶连续α扩展

Serhii Zabolotnii

AI总结 本文提出了一种参数自适应过渡多项式(PATP),作为Kunchenko随机多项式的带符号奇偶连续α扩展,通过连续参数α在[0,1]范围内控制,解决了非高斯误差下的参数估计问题,并探讨了其在极端厚尾分布中的应用边界。

Comments 38 pages, 8 figures. Code and Lean 4 proofs: this https URL (https://github.com/SZabolotnii/Ku-PATP-code-supplement). v3: the full F_2^{-1}b estimator is now used throughout (its Monte Carlo g_2(alpha) converges to the closed form); added regression and real-data (EuStockMarkets) validations; scope restricted to symmetric error laws; corrected the Laplace g_2 illustration

详情
AI中文摘要

Kunchenko的多项式最大化方法提供了一种半参数工具,用于在非高斯误差下的参数估计,但其经典幂基依赖于有限的高阶整数矩。本文引入了参数自适应过渡多项式(PATP),一种由连续参数α在[0,1]范围内控制的带符号奇偶分数幂家族。二次指数映射p_i(α)连接了分数 regime p_i(0)=1/i,退化线性点p_i(1/2)=1和带符号奇偶整数幂 regime p_i(1)=i。对于S=2的情况,我们推导出一个闭式方差减少系数g_2(α),以带符号和绝对分数矩表示,识别了α=1/2处的奇异行为,并陈述了在何种矩和正则性条件下该公式有意义。该构造应被视为Kunchenko广义装置内的Form-B PATP类比,而不是在α=1时的精确恢复经典偶幂PMM基。使用标准分布的数值示例来检验带符号奇偶估计量的有限样本行为,并标记极厚尾情况如Cauchy的适用边界。

英文摘要

Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the Parametrically Adaptive Transition Polynomial (PATP), a signed-parity fractional-power family controlled by a continuous parameter alpha in [0,1]. The quadratic exponent map p_i(alpha) connects the fractal regime p_i(0)=1/i, the degenerate linear point p_i(1/2)=1, and the signed-parity integer-power regime p_i(1)=i. For the degree-S=2 case we derive a closed-form variance-reduction coefficient g_2(alpha) in terms of signed and absolute fractional moments, identify the singular behavior at alpha=1/2, and state the moment and regularity conditions under which the formula is meaningful. The construction should be read as a Form-B PATP analogue within Kunchenko's generalized apparatus, not as an exact recovery of the canonical even-power PMM basis at alpha=1. Numerical illustrations on canonical distributions are used to examine the finite-sample behavior of the signed-parity estimator and to mark the boundary of applicability for extremely heavy-tailed cases such as Cauchy.

2501.10729 2026-06-17 stat.ME cs.LG stat.ML 版本更新

Robust Local Polynomial Regression with Similarity Kernels

基于相似性核的稳健局部多项式回归

Yaniv Shulman

AI总结 针对传统局部多项式回归对异常值敏感的问题,提出一种结合响应变量信息的条件密度核加权方法,通过局部密度估计降低异常值影响,在保持与标准LOWESS竞争力同时降低经验偏差。

详情
AI中文摘要

局部多项式回归(LPR)因其灵活性和简单性,是一种广泛使用的非参数方法,用于建模复杂关系。它通过拟合低阶多项式到数据的局部子集(按邻近度加权)来估计回归函数。然而,传统的LPR对异常值和高杠杆点敏感,这些点会显著影响估计精度。本文重新审视用于计算回归权重的核函数,并提出一种新颖的框架,将预测变量和响应变量都纳入加权机制。本工作的重点是一种条件密度核,通过局部密度估计减轻异常值的影响,从而稳健地估计权重。所提出的方法已在Python中实现,并在此https URL公开提供。总体分析量化了基于密度的稳健加权引起的偏差,报告的实验显示,与迭代稳健LOWESS相比,经验偏差更低,同时与标准LOWESS保持竞争力。这一进展为传统LPR提供了有前景的扩展,为稳健回归应用开辟了新的可能性。

英文摘要

Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. The proposed method is implemented in Python and is publicly available at this https URL. The population analysis quantifies the bias induced by density-based robust weighting, and the reported experiments show lower empirical bias than iterative robust LOWESS while remaining competitive with standard LOWESS. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.

2409.16534 2026-06-17 stat.AP 版本更新

Dependencies in Item-Adaptive CAT Data and Differential Item Functioning Detection: A Multilevel Framework

项目自适应CAT数据中的依赖性与差异项目功能检测:一个多水平框架

Dandan Chen Kaptur, Justin Kern, Chingwei David Shin, Jinming Zhang

AI总结 提出两水平逻辑模型,通过考虑CAT诱导的结构依赖性来改进DIF检测,模拟表明该模型能更好控制虚假DIF并具有竞争性统计功效。

Comments 38 pages, preprint

详情
AI中文摘要

差异项目功能(DIF)检测是计算机化自适应测试(CAT)中一个重要但研究不足的问题。本文提出一个两水平逻辑模型,通过明确考虑CAT诱导的结构依赖性引起的干扰效应,来改进CAT中的DIF检测。首先,我们概念化自适应项目选择通过临时能力估计在考生和项目之间引入系统依赖性,而传统的单水平DIF方法假设观测独立,在CAT设置中可能产生误导性结果。然后,通过数值示例和蒙特卡洛模拟,我们在各种CAT条件下(操纵测试长度、曝光控制、能力估计器、DIF类型和DIF流行率)将提出的两水平模型与竞争的单水平模型进行比较。报告了每个模型在联合模型收敛条件下的项目水平I类错误率和统计功效。我们表明,与单水平模型相比,提出的两水平模型在控制虚假DIF方面有所改进,并具有竞争性功效,特别是在较短测试和较小曝光率的情况下。然而,我们观察到模型收敛性在模拟条件下系统性地变化,突显出在复杂的CAT DIF设置中,推断准确性和收敛可靠性是相互交织的。通过这项研究,我们强调了多水平DIF建模在CAT中的前景,以及未来研究在评估DIF模型时需要联合评估收敛性和推断性能的必要性。

英文摘要

Differential item functioning (DIF) detection is an important yet understudied problem in computerized adaptive testing (CAT). In this article, we proposed a two-level logistic model to improve DIF detection in CAT by explicitly accounting for nuisance effects arising from CAT-induced structural dependency. First, we conceptualized that adaptive item selection induces systematic dependencies among examinees and items through provisional ability estimates, whereas traditional single-level DIF methods assume independent observations and may yield misleading results in CAT settings. Then, using a numeric example and Monte Carlo simulations, we compared our proposed two-level model with competing single-level models under various CAT conditions, manipulating test length, exposure control, ability estimator, DIF type, and DIF prevalence. Item-level Type-I error and statistical power conditional on joint model convergence were reported for each model. We showed that the proposed two-level model has improved control of spurious DIF and competitive power relative to single-level models, particularly with shorter tests and smaller exposure rates. However, we observed that the model convergence varied systematically across simulated conditions, highlighting that inferential accuracy and convergence reliability are intertwined in complex CAT DIF settings. Through this study, we underscored both the promise of multilevel DIF modeling in CAT and the need for future research to jointly evaluate convergence and inferential performance when assessing DIF models.

2403.12711 2026-06-17 stat.ME math.ST stat.AP 版本更新

Tests for categorical data beyond Pearson: A distance covariance and energy distance approach

超越Pearson的分类数据检验:距离协方差与能量距离方法

Fernando Castro-Prado, Wenceslao González-Manteiga, Javier Costas, Fernando Facal, Dominic Edelmann

AI总结 针对分类变量独立性检验,提出基于距离协方差和能量距离的方法,克服Pearson卡方检验和G检验的缺陷,无需重采样即可校准零分布。

Comments 21 pages (including appendices) with 5 figures

详情
AI中文摘要

分类变量在生物医学研究中至关重要。当考虑两个分类变量时,通常需要检验它们是否统计独立。我们展示了经典方法(如Pearson检验和G检验)的弱点,并提出了基于距离的检验策略,这些策略没有这些缺点。我们首先在距离协方差的背景下,针对经典的二维列联表发展了这一理论,距离协方差是一种表征两个变量一般统计独立性的关联度量。然后,我们将相同的基本思想应用于一维表格,即检验对离散分布的拟合优度,为此我们采用了一种称为能量距离的类似统计量。我们证明了我们的方法具有理想的理论性质,并且我们表明可以在无需重采样的情况下校准检验统计量的零分布。我们通过模拟以及一些真实数据示例说明了所有这些,证明了我们的方法在生物统计实践中的良好性能。

英文摘要

Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classical two-dimensional contingency tables, within the context of distance covariance, an association measure that characterizes general statistical independence of two variables. We then apply the same fundamental ideas to one-dimensional tables, namely to the testing for goodness of fit to a discrete distribution, for which we resort to an analogous statistic called energy distance. We prove that our methodology has desirable theoretical properties, and we show that we can calibrate the null distribution of our test statistics without resampling. We illustrate all this in simulations, as well as with some real data examples, demonstrating the adequate performance of our approach for biostatistical practice.

2. 贝叶斯统计与概率建模 8 篇

2606.17491 2026-06-17 stat.ML cs.LG stat.ME 新提交

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

2606.17413 2026-06-17 cs.LG stat.AP 新提交

Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

基于深度学习的OCO-2光谱大气CO2摊销概率检索:结合拉普拉斯近似与归一化流

Alejandro Calle-Saldarriaga, Felix Jimenez, Jack Grosskreuz, Jiazheng Wang, Jonathan Hobbs, Matthias Katzfuss

发表机构 * University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Jet Propulsion Laboratory, California Institute of Technology(加州理工学院喷气推进实验室)

AI总结 提出深度学习框架,利用拉普拉斯近似和归一化流从OCO-2光谱中快速、准确地检索大气CO2浓度,并量化不确定性,相比传统方法加速数个数量级且精度更高。

Comments 23 pages, 8 figures

详情
AI中文摘要

基于空间的大气二氧化碳(CO2)监测对于约束全球碳收支至关重要。NASA的轨道碳观测者-2号(OCO-2)利用高分辨率光谱估算柱平均干空气CO2摩尔分数(XCO2)。然而,当前的操作检索算法计算成本高且未能正确量化不确定性。我们提出了一种新颖的深度学习框架来解决这些挑战。由于真实卫星观测的地面真值数据难以获取,我们使用高保真模拟数据集开发并验证了我们的方法。该数据集旨在支持OCO-2不确定性量化(UQ),并包含了真实的前向模型误差。我们的架构使用多分支神经网络编码光谱波段,并通过两种可扩展的UQ方法——拉普拉斯近似和归一化流——来估计完整CO2柱或其所需汇总的后验分布。与操作性的“全物理”求解器相比,我们的方法具有五个关键优势:(1)摊销:推理速度提高数个数量级,能够实时处理海量数据流;(2)模型误差鲁棒性:通过在明确包含模型差异的模拟数据上训练,我们的方法考虑了标准反演中常被忽略的系统误差;(3)点估计精度:与基线方法相比,我们实现了更优的预测精度;(4)改进的UQ:概率输出提供了校准更好的不确定性估计;(5)非高斯后验:当使用归一化流时,我们的框架成功建模了复杂、非对称的后验分布,克服了高斯假设的局限性。这些结果表明,基于模拟的深度学习是迈向下一代操作处理系统的可行路径。

英文摘要

Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

2606.17343 2026-06-17 cs.CV stat.AP 新提交

Bayesian Magnetic Resonance Joint Image Reconstruction and Uncertainty Quantification using Sparsity Prior Models and Markov Chain Monte Carlo Sampling

贝叶斯磁共振联合图像重建与不确定性量化:基于稀疏先验模型和马尔可夫链蒙特卡洛采样

Ahmed Karam Eldaly, Matteo Figini, Daniel C. Alexander

发表机构 * Department of Computer Science, University of Exeter(埃克塞特大学计算机科学系) UCL Hawkes Institute, Department of Computer Science, University College London(伦敦大学学院计算机科学系霍克斯研究所)

AI总结 提出一种基于压缩感知磁共振图像重建的不确定性量化框架,采用贝叶斯线性逆问题建模,利用稀疏先验(总变分或小波变换)和分裂增广吉布斯采样器进行MCMC采样,在单线圈和多线圈数据集上验证了优于优化方法和深度学习方法的图像重建与不确定性量化性能。

详情
AI中文摘要

我们提出了一种新的框架,用于使用压缩感知磁共振图像重建进行不确定性量化。该问题在贝叶斯框架内被表述为线性逆问题,并为未知模型参数分配先验分布。具体而言,待重建的图像在给定基下被假设为稀疏的。我们开发了一个适用于任何基的通用框架,并作为示例,测试了图像在(1)空间梯度(使用总变分先验模型)和(2)小波变换中的稀疏性。然后,采用基于分裂增广吉布斯采样的马尔可夫链蒙特卡洛(MCMC)方法从未知参数的后验分布中采样。使用近端MCMC方法有效采样不可微的条件分布。所提出的算法在单线圈和多线圈数据集上使用各种k空间子采样模式和比率进行了验证。结果表明,与对应的基于优化的方法相比,每种提出的方法在图像重建方面具有优越性能。此外,与现有的基于深度学习的方法相比,我们的框架有效地量化了不确定性,显示估计的不确定性图与使用真实值和重建图像计算的误差图之间存在显著相关性。

英文摘要

We propose a novel framework for uncertainty quantification using compressed sensing magnetic resonance image reconstruction. The problem is formulated within a Bayesian framework as a linear inverse problem, with prior distributions assigned to the unknown model parameters. Specifically, the image to be reconstructed is assumed to be sparse in a given basis. We develop a general framework applicable to any basis and as examples, we test the sparsity of the image in its (1) spatial gradients using a total variation prior model, and in its (2) wavelet transform. A Markov chain Monte Carlo (MCMC) method, based on a split-and-augmented Gibbs sampler, is then employed to sample from the posterior distribution of the unknown parameters. The non-differentiable conditional distributions are efficiently sampled using a proximal MCMC method. The proposed algorithms are validated on both single-coil and multi-coil datasets using various k-space sub-sampling patterns and ratios. The results demonstrate the superior performance of each proposed approach in reconstructing images compared to its counterpart optimisation-based method. Moreover, our framework effectively quantifies uncertainty, showing a notable correlation between estimated uncertainty maps and error maps computed using ground truth and reconstructed images, compared with existing deep learning-based methods.

2606.17267 2026-06-17 stat.ME econ.EM math.NA stat.AP stat.ML 新提交

Bayesian Poisson-Randomized Gamma Tensor Factorization with Application to International Trade Flows

贝叶斯泊松-随机化伽马张量分解及其在国际贸易流中的应用

Jie Jian, Aaron Schein

AI总结 提出贝叶斯分层张量分解模型,结合低秩CP结构和条件伽马模型,处理稀疏半连续张量数据,并通过混合变分-蒙特卡洛算法实现大规模后验推断,应用于国际贸易流分析。

详情
AI中文摘要

我们研究具有过多零值、重右尾和切片特定离散度的稀疏半连续张量数据。这些特征自然出现在货币价值的多维数据中,例如国际贸易,其中大多数出口商-进口商-产品-年份单元格为零,而正值是连续且高度可变的。为了对这些数据进行建模,我们提出了一种贝叶斯分层张量分解模型,该模型在潜在泊松率张量上放置低秩CP结构,并将其与条件伽马模型耦合以处理正结果,其中率参数可以在一个模式内的不同切片之间变化。因此,该模型分离了正观测的发生和幅度,同时通过共享的低秩潜在结构在所有张量维度上借用强度。为了将后验推断扩展到大型数组,我们开发了一种混合变分-蒙特卡洛算法,该算法将高效的坐标上升更新与部分折叠的增广数据采样器相结合。应用于约6000万条贸易流,该方法揭示了出口商、进口商、产品和年份之间的多维依赖关系,这是从重力型或成对网络分析中难以恢复的,因为这些分析没有联合建模产品和时间维度。

英文摘要

We study sparse semi-continuous tensor data with excess zeros, heavy right tails, and slice-specific dispersion. Such features arise naturally in monetary-valued multi-way data, such as international trade, where most exporter--importer--product--year cells are zero while positive values are continuous and highly variable. To model these data, we propose a Bayesian hierarchical tensor factorization model that places a low-rank CP structure on a latent Poisson rate tensor and couples it with a conditional Gamma model for positive outcomes, with rate parameters that can vary across slices within a mode. The model therefore separates the occurrence and magnitude of positive observations while borrowing strength across all tensor dimensions through a shared low-rank latent structure. To scale posterior inference to large arrays, we develop a hybrid variational--Monte Carlo algorithm that combines efficient coordinate ascent updates with a partially collapsed augmented-data sampler. Applied to approximately 60 million trade flows, the method surfaces multiway dependence across exporters, importers, products, and years that is difficult to recover from gravity-type or pairwise network analyses, which do not jointly model the product and temporal dimensions.

2509.22474 2026-06-17 stat.ME 版本更新

Generative multi-scale modeling and downscaling via spatial autoregressive transport maps

基于空间自回归传输映射的生成式多尺度建模与降尺度

Alejandro Calle-Saldarriaga, Paul F.V. Wiemann, Matthias Katzfuss

AI总结 提出一种可扩展贝叶斯方法,利用尺度感知自回归高斯过程学习多尺度非平稳空间场的联合非高斯分布和非线性依赖结构,实现高效降尺度。

Comments 23 pages, 8 figures

详情
AI中文摘要

地球与环境科学中的空间场通常以多种尺度或分辨率呈现。虽然粗尺度数据(例如来自全球环流模型)通常丰富,但它们缺乏细尺度数据(例如来自区域气候模型)提供的局部细节,而细尺度数据通常计算成本高昂。统计降尺度和多尺度数据融合通过从低分辨率或相关输入预测高分辨率场来解决这一挑战。我们提出了一种高度可扩展的贝叶斯方法,能够从少量训练样本中学习跨多个尺度的非平稳空间场的联合非高斯分布和非线性依赖结构。我们的方法采用尺度感知自回归高斯过程,并配合适当选择的正则化诱导先验,以模拟给定粗尺度数据条件下细尺度场的条件分布。利用共轭性,积分似然以闭式形式给出,从而通过随机梯度下降实现高效的参数优化。训练完成后,该方法提供了给定粗尺度输入条件下细尺度场后验分布的闭式表征。在数值比较中,我们证明了我们的方法显著优于现有方法,并能基于粗全球环流模型的输出有效表征和模拟细尺度气候行为。

英文摘要

Spatial fields in the Earth and environmental sciences are often available at multiple scales or resolutions. While coarse-scale data (e.g., from global circulation models) are often abundant, they lack the local detail provided by fine-scale data (e.g., from regional climate models), which are typically computationally expensive to generate. Statistical downscaling and multi-scale data fusion address this challenge by predicting high-resolution fields from low-resolution or related inputs. We propose a highly scalable Bayesian approach that can learn the joint non-Gaussian distribution and nonlinear dependence structure of nonstationary spatial fields across multiple scales from a small number of training samples. Our method employs scale-aware autoregressive Gaussian processes with suitably chosen regularization-inducing priors to model the conditional distribution of fine-scale fields given coarse-scale data. Exploiting conjugacy, the integrated likelihood is available in closed form, enabling efficient parameter optimization via stochastic gradient descent. Once trained, the method provides a closed-form characterization of the posterior distribution of fine-scale fields given coarse-scale inputs. In numerical comparisons, we demonstrate that our approach substantially outperforms existing methods and effectively characterizes and simulates fine-scale climate behavior based on output from coarse global circulation models.

2505.19643 2026-06-17 stat.AP 版本更新

Online activity prediction via generalized Indian buffet process models

基于广义印度自助餐过程模型的在线活动预测

Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

AI总结 提出贝叶斯非参数模型预测新用户数和总触发数,处理网络实验的重尾参与模式,无需MCMC或变分推理,在公开和私有数据集上优于现有方法。

Comments This paper supersedes the two technical reports by the same authors arXiv:2401.14722 (https://arxiv.org/abs/2401.14722) and arXiv:2402.03231 (https://arxiv.org/abs/2402.03231)

详情
AI中文摘要

在线A/B测试是大规模数据驱动决策的标准工具。对统计功效影响最大的设计选择之一是触发机制:暴露多少用户以及持续多长时间。这通常需要从有限的试点数据预测用户参与度,即是否有足够用户触发以及何时达到目标参与水平。我们引入了一个贝叶斯非参数模型,用于预测新用户数和总触发数,适应网络实验中典型的重尾参与模式。所有预测量都可以在没有MCMC或变分推理等密集数值程序的情况下计算。我们在三个公开数据集(超过450个公开基准评估)和1,774个专有A/B测试上进行了评估。在所有设置中,与最先进的竞争对手相比,我们的模型在预测新用户、总触发数以及达到目标样本量的时间方面显示出更高的准确性,尤其是在仅观察到少数试点天数时。

英文摘要

Online A/B tests are the standard tool for data-driven decision-making at scale. Among the design choices with the largest impact on statistical power is the triggering mechanism: how many users to expose and for how long. This often requires forecasting user engagement, i.e., whether enough users will trigger, and when a target participation level will be reached, from limited pilot data. We introduce a Bayesian nonparametric model for predicting both new-user counts and total triggers, accommodating the heavy-tailed engagement patterns typical of web experiments. All predictive quantities can be computed without intensive numerical procedures such as MCMC or variational inference. We evaluate on three public datasets (over 450 public benchmark evaluations) and 1,774 proprietary A/B tests. In all the settings, our models show improved accuracy in forecasting new users, total triggers, and time to reach a target sample size compared with state-ofthe-art competitors, especially when only a few pilot days are observed.

2502.10257 2026-06-17 math.ST stat.ME 版本更新

Extended feature allocation models

扩展特征分配模型

Mario Beraha, Federico Camerlenghi, Lorenzo Ghilotti

AI总结 提出统一贝叶斯框架,联合建模特征标签和比例,克服标准模型忽略标签依赖性的局限,并引入Cox过程和行列式点过程先验,在基因组变异和森林调查中展示有效性。

详情
AI中文摘要

特征分配模型是贝叶斯非参数工具,适用于每个观测可同时展现多个特征的数据。标准公式的一个基本限制是假设特征标签独立同分布,因此在后验推断中不起作用。本文引入了一个统一的贝叶斯框架用于扩展特征分配模型,其中特征标签和比例被联合建模,从而能够同时发现特征并学习标签之间的依赖关系。基于点过程理论,我们开发了这些模型的完整贝叶斯分析。在这个一般设置中,我们还刻画了先前提出的先验会导致较差的预测分布,这些分布无法捕捉标签依赖性,并且对观测到的频率谱不敏感。我们的方法旨在通过利用特征标签携带的信息,超越这些标准公式。我们通过引入以下内容展示了我们方法的有用性:(i) 一个Cox过程先验,用于聚类基因组变异嵌入,同时预测新变异和新变异簇;(ii) 一个行列式点过程先验,用于重复森林调查,其中预测涉及未观测树木的数量和位置。

英文摘要

Feature allocation models are Bayesian nonparametric tools tailored to data in which each observation can simultaneously exhibit multiple characteristics, or features. A fundamental limitation of standard formulations is that feature labels are assumed to be independent and identically distributed, and therefore play no role in posterior inference. The present paper introduces a unified Bayesian framework for extended feature allocation models, in which feature labels and proportions are modeled jointly, thereby enabling the simultaneous discovery of features and learning of dependencies among their labels. Building on point process theory, we develop a full Bayesian analysis of these models. Within this general setting, we also characterize previously proposed priors as those leading to poor predictive distributions, which cannot capture label dependencies and are insensitive to the observed frequency spectrum. Our methodology is designed to move beyond such standard formulations by leveraging the information carried by feature labels. We demonstrate the usefulness of our approach by introducing: (i) a Cox process prior that clusters genomic variant embeddings while predicting new variants and new variant clusters; (ii) a determinantal point process prior for repeated forest surveys, where prediction concerns both the number and the locations of unobserved trees.

2412.08895 2026-06-17 eess.SP stat.AP stat.CO 版本更新

Fully Bayesian Wideband Direction-of-Arrival Estimation and Detection via RJMCMC

基于RJMCMC的全贝叶斯宽带波达方向估计与检测

Kyurae Kim, Philip T. Clemson, James P. Reilly, Jason F. Ralph, Simon Maskell

AI总结 提出一种宽带信号模型,通过循环卷积和频域稀疏矩阵分解,将边际似然计算复杂度从O(N^3 k^3)降至O(N k^3),结合非可逆RJMCMC实现全贝叶斯源数检测与DOA估计。

详情
AI中文摘要

考虑一个阵列接收来自未知数量$k$个源的未知宽带信号。宽带信号可占据任意宽的带宽,使得基于解调的方法不适用,这在涉及声学信号的场景中很常见。本文旨在根据$N$个含噪阵列测量值确定$k$,这一任务称为“检测问题”,贝叶斯模型比较是常用方法。为使贝叶斯推断可行,通常需要对源信号进行边际化。不幸的是,对于宽带信号,朴素边际化的时间复杂度为$\mathcal{O}(N^3 k^3)$,难以承受。因此,全贝叶斯信号检测尚未在宽带设置中得到验证。本文提出一种宽带信号模型,允许计算上可处理的源信号边际化。我们从线性时不变(LTI)信号传播的规范模型出发,将其增强为循环卷积,且不失一般性。这允许在频域中进行高效计算,所得线性系统可分解为一个稀疏矩阵,我们称之为\textit{条带矩阵分解}。利用这种稀疏模式,可将计算边际似然的时间复杂度降至$\mathcal{O}(N k^3)$。这些计算改进使得通过可逆跳跃马尔可夫链蒙特卡洛(RJMCMC)进行高效后验推断成为可能。本文使用RJMCMC的非可逆扩展(NRJMCMC),它通常比RJMCMC具有更低的自相关性和更快的收敛速度。然后,可以使用NRJMCMC抽取的样本以全贝叶斯方式检测潜在源信号。我们通过与广义似然比检验(GLRT)和信息准则进行比较来评估我们的方法。

英文摘要

Consider an array receiving unknown wideband signals from an unknown number of sources $k$. Wideband signals can occupy arbitrarily wide bandwidths, rendering demodulation-based approaches inapplicable, a common situation in settings involving acoustic signals. Here, we aim to determine $k$ given $N$ noisy array-valued measurements, a task known as the "detection problem," for which Bayesian model comparison is a common approach. To render Bayesian inference tractable, it is typically necessary to marginalize the source signals. Unfortunately, for wideband signals, naive marginalization has an unaffordable time complexity of $\mathcal{O}(N^3 k^3)$. As a result, fully Bayesian signal detection has yet to be demonstrated in wideband settings. In this work, we propose a wideband signal model that allows for computationally tractable marginalization of the source signals. We begin from the canonical model of linear time-invariant (LTI) signal propagation, which is then augmented into a circular convolution, all without loss of generality. This allows for efficient computation in the frequency domain, where the resulting linear system admits a decomposition into a sparse matrix we refer to as a \textit{stripe matrix decomposition}. Exploiting this sparsity pattern reduces the time complexity of computing the marginal likelihood to $\mathcal{O}(N k^3)$. These computational improvements enable efficient posterior inference via reversible-jump Markov chain Monte Carlo (RJMCMC). In this work, we use the non-reversible extension of RJMCMC (NRJMCMC), which often achieves lower autocorrelation and faster convergence than RJMCMC. Detection of the latent source signals can then be performed in a fully Bayesian manner using samples drawn by NRJMCMC. We evaluate our procedure by comparing it against generalized likelihood ratio testing (GLRT) and information criteria.

3. 因果推断与实验设计 12 篇

2606.18197 2026-06-17 stat.AP stat.ME 新提交

A Sensitivity Framework for Identifying Contagion under Latent Homophily for Fixed-in-Time Network Analyses, with an Application to U.S. House Congressional Voting

固定时间网络分析中潜在同质性下识别传染的敏感性框架——以美国众议院投票为例

Duncan A. Clark

AI总结 针对固定时间网络数据中传染效应与同质性难以区分的问题,提出基于选择偏差的敏感性分析框架,通过非参数界将传染识别转化为潜在同质性强度问题,并应用于2008年美国众议院TARP投票分析。

详情
AI中文摘要

连接的单位是否因为影响在联系中传播而相似,还是因为相似的单位形成联系,这是一个长期存在的问题。从观测网络数据中,传染或影响通常无法被识别。我们考虑一个最小且常见的设置:单一网络,时间固定,具有两波二元节点结果。我们不假设网络形成的参数模型,而是将传染的识别重新构建为一个选择偏差问题,并开发了一个敏感性框架。我们定义了一个控制直接效应(CDE),即在保持联系存在的同时干预他人的结果。我们表明,CDE与观察到的连接二元组风险比之间的差距由潜在同质性变量如何改变连接二元组的组成所决定。受Smith式选择偏差敏感性分析和Ding与VanderWeele的风险比边界函数的启发,我们开发了可解释的非参数界。这将问题“是否存在传染?”转化为“潜在同质性需要多强才能解释观察到的传染?”模拟研究表征了这些界的误差控制和功效。我们将该框架应用于2008年美国众议院对问题资产救助计划的投票,识别了在哪些假设下传染是合理的。

英文摘要

Whether connected units are similar because influence spreads across ties or because similar units form ties, is a long-standing problem. Contagion or influence is generically unidentified from observational network data. We consider the minimal and common setting of a single network, fixed over time, with two waves of a binary nodal outcome. Rather than positing a parametric model for network formation, we reframe identification of contagion as a selection-bias problem and develop a sensitivity framework. We define a controlled direct effect (CDE) holding a tie present while intervening on an alter's outcome. We show that the gap between the CDE and the observed connected-dyad risk ratio is governed by how strongly a latent homophily variable shifts the composition of connected dyads. Inspired by Smith-style selection-bias sensitivity analysis and the risk-ratio bounding function of Ding and VanderWeele we develop interpretable nonparametric bounds. This translates the question "is there contagion?" into the question "how strong would latent homophily have to be to explain away the observed contagion?" A simulation study characterizes the bounds' error control and power. We apply the framework to the 2008 U.S. House votes on the Troubled Asset Relief Program, identifying under which assumptions contagion is plausible.

2606.17515 2026-06-17 stat.ME stat.ML 新提交

Anytime-valid Optimal Policy Identification

任意有效的最优策略识别

Daniel Molitor

AI总结 针对日志化情境赌博数据,提出一种任意有效框架,通过构建高概率包含真实最优策略集的时间索引集,支持连续监测和自适应停止,并给出样本复杂度界。

Comments 15 pages, 3 figures

详情
AI中文摘要

我们开发了一个用于从日志化情境赌博数据中识别最优策略的任意有效框架。在许多应用场景中,分析者希望从候选策略类 $\Pi$ 中选择最优策略,但数据由外部确定的日志策略生成,分析者无法控制。分析者也可能希望连续监测证据,一旦最优策略明确就停止,而不是事先承诺固定样本量。本文通过构建一个时间索引集 $S_t$ 来解决这些挑战,该集合以高概率随时间一致地保留真实最优策略集。由此产生的程序允许分析者监测策略值、消除明显次优策略,并在数据依赖的时间停止而不使推断失效。当最优策略唯一时,我们定义了其识别的停止时间,并推导出样本复杂度界为 $O\\!\left(\frac{\log |\Pi|+\log\log(1/\Delta_{\min})}{\Delta_{\min}^2}\right)$,其中 $\Delta_{\min}$ 是最优与次优策略值之间的差距。模拟表明,相对于固定样本量设计,任意有效方法可以节省大量样本。应用于一个减少在线错误信息的大型自适应实验,说明了该方法如何在最优策略证据积累时提供动态视图。

英文摘要

We develop an anytime-valid framework for optimal policy identification from logged contextual bandit data. In many applied settings, the analyst wants to select the optimal policy from a candidate policy class $\Pi$, but data are generated by an externally determined logging policy that they do not control. The analyst may also wish to monitor evidence continuously, stopping once the optimal policy is clear rather than committing to a fixed sample size in advance. This paper addresses these challenges by constructing a time-indexed set $S_t$ that retains the true optimal policy set uniformly over time with high probability. The resulting procedure allows the analyst to monitor policy values, eliminate clearly suboptimal policies, and stop at data-dependent times without invalidating inference. When the optimal policy is unique, we define a stopping time for its identification and derive a sample-complexity bound scaling as $O\!\left(\frac{\log |\Pi|+\log\log(1/\Delta_{\min})}{\Delta_{\min}^2}\right)$, where $\Delta_{\min}$ is the gap between the best and second-best policy values. Simulations demonstrate that the anytime-valid approach can yield substantial sample savings relative to fixed-$N$ designs. An application to a large adaptive experiment on reducing misinformation online illustrates how the method provides a dynamic view as evidence on the optimal policy accumulates.

2606.17308 2026-06-17 stat.ME stat.ML 新提交

Kernel-Based Functional Balancing for Causal Inference with Compositional Treatments

基于核的协变量函数平衡法用于成分处理下的因果推断

Sungbum Kim, Jiayi Wang

AI总结 针对成分处理(暴露位于单纯形)的因果效应估计,提出基于核的协变量函数平衡加权法,通过最小化再生核希尔伯特空间中的最坏情况平衡误差构造权重,并构建增强加权估计量,实现√n一致性。

Comments 40 pages, 3 figures

详情
AI中文摘要

我们研究成分处理下的因果效应估计,其中暴露位于单纯形上,估计量定义在成分上而非标量或二元值。通过考虑平均潜在结果在处理空间上的投影,采用基于核的协变量函数平衡方法进行权重构造。权重通过直接最小化在由处理和协变量联合空间定义的再生核希尔伯特空间(RKHS)上的最坏情况平衡误差获得,而非在处理分配模型下估计。基于这些权重,提出了一个增强加权估计量(AWE),其中结果函数通过核岭回归估计,并与协变量分布的边际增广相结合。尽管所得目标函数结构复杂,但通过表示定理和低秩近似,我们将其转化为有限维凸优化问题。所提出的估计量在不要求权重一致估计或光滑性的情况下实现了√n一致性。建立了围绕样本特定目标的渐近正态性结果。通过模拟研究和真实数据应用展示了经验性能。

英文摘要

We study causal effect estimation with compositional treatments, where the exposure lies on a simplex and the estimand is defined over compositions rather than scalar or binary values. By considering a projection of the average potential outcome onto the treatment space, a kernel-based covariate functional balancing approach is adopted for weight construction. The weights are obtained by directly minimizing a worst-case balancing error over a reproducing kernel Hilbert space (RKHS) defined on the joint space of treatments and covariates, instead of being estimated under a treatment assignment model. Building on these weights, an augmented weighted estimator (AWE) is proposed, where the outcome function is estimated via kernel ridge regression and combined with a marginal augmentation over the covariate distribution. Despite the complex structure of the resulting objective, a finite-dimensional convex optimization problem is formulated via a representer theorem and a low-rank approximation. The proposed estimator achieves $\sqrt{n}$-consistency without requiring consistent estimation or smoothness of the weights. An asymptotic normality result is established around a sample-specific target. Empirical performance is demonstrated through simulation studies and a real data application.

2606.17232 2026-06-17 stat.ME 新提交

Semiparametric Mediation Analysis with Separately Observed Mediator and Outcome under Unmeasured Confounding

存在未测量混杂时基于分别观测的中介变量和结局变量的半参数中介分析

Sijia Li, Ruoyu Wang

AI总结 针对中介变量和结局变量从未同时观测的数据不完整性,提出一种数据融合框架,利用共享工具变量在无交互条件下识别自然直接和间接效应,并开发具有多重稳健性的半参数影响函数估计器。

Comments 24 pages; 2 figures

详情
AI中文摘要

中介分析被广泛用于解构因果路径,然而在许多实际研究中,中介变量 M 和结局变量 Y 从未被同时观测。这种不完整性破坏了自然直接和间接效应的标准识别策略。我们引入了一种新颖的数据融合框架,通过结合两个不完整的数据源(一个测量 M,另一个测量 Y)来恢复识别。我们的方法利用共享工具变量(IVs)来规避联合观测 (M,Y) 的需求,在无交互条件下对未测量混杂仍然有效,并通过潜在对齐条件适应跨数据源的协变量和暴露偏移。我们建立了两种识别策略:一种适用于已知有效 IV 集合的场景,另一种适用于需要学习有效 IV 的场景。我们进一步开发了具有多重稳健性的半参数影响函数估计器,并提出了一个在适当条件下达到半参数效率界的估计器。我们将我们的框架应用于量化 SNP rs610932 对痴呆风险的影响在多大程度上通过免疫相关基因表达途径中介。

英文摘要

Mediation analysis is widely used to disentangle causal pathways, yet in many real-world studies the mediator M and outcome Y are never jointly observed. This incompleteness breaks the standard identification strategy for natural direct and indirect effects. We introduce a novel data fusion framework that restores the identification by combining two incomplete data sources, one measuring $M$ and the other measuring Y. Our approach leverages shared instrumental variables (IVs) to circumvent the need to observe (M,Y) jointly, remains valid under unmeasured confounding via a no-interaction condition, and accommodates covariate and exposure shifts across data sources under a latent alignment condition. We establish two identification strategies, one for settings with a known set of valid IVs, and another for settings where valid IVs must be learned. We further develop semiparametric, influence-function-based estimators with multiple robustness properties, and propose an estimator that attains the semiparametric efficiency bound under appropriate conditions. We apply our framework to quantify the extent to which the effect of SNP rs610932 on dementia risk is mediated through immune-related gene-expression pathways.

2606.17516 2026-06-17 cs.LG cs.AI stat.ME stat.ML 新提交

FoundCause: Causal Discovery with Latent Confounders from Observational Data

FoundCause: 从观测数据中发现含隐混淆因子的因果关系

Patrick Blöbaum, Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan

发表机构 * Amazon Web Services(亚马逊云服务) Department of Statistics, University of California, Davis(加州大学戴维斯分校统计系)

AI总结 提出FoundCause,一种基于合成数据训练的摊销因果发现模型,通过单次前向传递直接映射数据集到因果图,显式建模隐混淆因子,在15个真实数据集上优于11种非摊销和4种摊销方法。

Comments Download the model at this https URL (https://github.com/amazon-science/foundcause)

详情
AI中文摘要

从观测数据中发现因果关系仍然具有挑战性,因为需要在没有干预的情况下恢复有向结构和隐混淆因子。我们提出了FoundCause,一种完全在合成数据上训练的摊销因果发现模型,它通过单次前向传递直接将数据集映射到因果图。通过从大量模拟结构因果模型中学习,FoundCause捕获了可迁移的统计模式,这些模式泛化到单个数据集之外。该架构融合了因果发现的几个关键归纳偏置。它使用一个置换不变的Transformer编码器,通过交替关注样本和变量来联合建模跨变量依赖性和每个变量的分布。通过统计条件注意力注入来自经典非对称度量的成对统计特征,引导模型朝向已知的因果信号。一个分解的解码器将边的存在性与方向分离,而一个三角细化模块使得能够推理高阶因果模式,如链和碰撞器。此外,一个基于可学习隐令牌的专用混淆因子模块显式建模隐藏的共同原因,并且模型通过其掩码输入表示显式处理缺失数据。据我们所知,FoundCause是第一个显式建模隐混淆因子的摊销因果发现方法。FoundCause在15个真实数据集上优于11种经典非摊销方法(如PC、GES、NOTEARS风格优化)和4种摊销因果发现方法,相对于最强的非摊销方法,在$F_1$上提高了9.6%,在AUROC上提高了1.2%,结构汉明距离减少了18.9%,同时仅需单次前向传递即可完成推理。

英文摘要

Causal discovery from observational data remains challenging due to the need to recover directed structure and latent confounding without interventions. We propose FoundCause, an amortized causal discovery model trained entirely on synthetic data that maps datasets directly to causal graphs in a single forward pass. By learning from large collections of simulated structural causal models, FoundCause captures transferable statistical patterns that generalize beyond individual datasets. The architecture incorporates several key inductive biases for causal discovery. It uses a permutation-invariant transformer encoder with alternating attention over samples and variables to jointly model cross-variable dependence and per-variable distributions. Pairwise statistical features derived from classical asymmetry measures are injected through statistics-conditioned attention, guiding the model toward known causal signals. A factorized decoder separates edge existence from direction, while a triangular refinement module enables reasoning over higher-order causal motifs such as chains and colliders. In addition, a dedicated confounder module based on learnable latent tokens explicitly models hidden common causes, and the model explicitly handles missing data via its masked input representation. To our knowledge, FoundCause is the first amortized causal discovery approach to explicitly model latent confounding. FoundCause outperforms 11 classical non-amortized methods (e.g., PC, GES, NOTEARS-style optimization) and 4 amortized causal discovery methods on 15 real-world datasets, achieving +9.6% improvement in $F_1$, +1.2% in AUROC, and an 18.9% reduction in structural Hamming distance relative to the strongest non-amortized methods, while performing inference in a single forward pass.

2606.17790 2026-06-17 stat.AP cs.IT 新提交

Distributed Experimental Design: Bayes-optimal Fusion of Local Designs

分布式实验设计:局部设计的贝叶斯最优融合

Nagananda K G, Lav R. Varshney, Pramod K. Varshney

AI总结 提出分布式贝叶斯实验设计的决策理论框架,推导贝叶斯最优融合规则,实现局部设计决策的全局最优融合,并通过数值实验验证其接近集中式性能。

Comments 12 pages, 4 figures

详情
AI中文摘要

我们为分布式贝叶斯实验设计开发了一个决策理论框架,其中局部代理使用期望信息增益评估候选实验,并将其局部设计决策传输到融合中心。与集中式贝叶斯设计不同(其中所有似然分量和信息增益值都可供单个规划者使用),分布式设置中的融合中心从压缩的局部建议中选择全局实验。我们推导了贝叶斯最优融合规则,该规则选择在给定观察到的局部设计决策条件下条件期望集中信息增益最大的实验。该规则在精神上类似于分布式检测中的最优融合规则,但存在根本差异,因为底层效用是期望信息增益,而导致的损失是信息增益遗憾而非分类错误。我们还建立了信息损失界限,并确定了仅决策融合规则渐近等价于集中式设计的条件。数值实验表明,贝叶斯最优融合紧密逼近集中式理想情况,而当少数站点携带不成比例的信息时,多数投票可能高度次优。

英文摘要

We develop a decision-theoretic framework for distributed Bayesian experimental design in which local agents evaluate candidate experiments using expected information gain and transmit their local design decisions to a fusion center. Unlike centralized Bayesian design, where all likelihood components and information-gain values are available to a single planner, the fusion center in the distributed setting chooses a global experiment from compressed local recommendations. We derive the Bayes-optimal fusion rule, which selects the experiment with largest conditional expected centralized information gain given the observed local design decisions. This rule is analogous in spirit to optimal fusion rules in distributed detection, but differs fundamentally because the underlying utility is expected information gain and the resulting loss is information-gain regret rather than classification error. We also establish information-loss bounds and identify conditions under which the decision-only fusion rule is asymptotically equivalent to the centralized design. Numerical experiments show that Bayes-optimal fusion closely approximates the centralized oracle, whereas majority voting can be highly suboptimal when a minority of sites carry disproportionate information.

2606.17777 2026-06-17 stat.ME math.ST stat.ML 新提交

On Response-Adaptive Targeting Strategies for Multi-Treatment Experiments

多处理实验中的响应自适应目标策略

Redouane Yagouti, Rémy Degenne, Emilie Kaufmann

AI总结 提出统一框架αRTS,将两臂ERADE策略推广到多臂实验,证明渐近性质并引入强制探索变体解决稀疏目标问题。

详情
AI中文摘要

临床试验中的响应自适应随机化(RAR)旨在通过根据观察到的结果动态分配患者到治疗组来提高伦理和统计效率。虽然基于目标最优分配的RAR已在两臂设置中得到广泛研究,但其扩展到多处理实验($K \geq 2$)在理论上仍然零散,大多数现有方法集中于特定算法或受限的目标分配。在本文中,我们引入了一个响应自适应目标的统一框架,即$\alpha$再平衡目标策略($\alpha$RTS),它推广了Hu等人[2009]的ERADE两臂策略。我们证明了该族中的所有设计共享基本的渐近性质:强相合性、分配比例和处理效应估计量的渐近正态性以及渐近效率。为了解决稀疏目标情况(其中某些处理被渐近消除),我们进一步提出了带有强制探索的$\alpha$RTS,这是一种保证所有处理无限采样同时保持渐近保证的变体。广泛的模拟说明了$\alpha$RTS变体在三臂背景下的有限样本行为,特别强调了强制探索在稀疏设置中的关键作用。

英文摘要

Response-adaptive randomization (RAR) in clinical trials aims to improve ethical and statistical efficiency by dynamically allocating patients to treatments based on observed outcomes. While RAR based on a target optimal allocation have been extensively studied for two-arms settings, their extension to multi-treatment experiments ($K \geq 2$) remains theoretically fragmented, with most existing methods focusing on specific algorithms or restricted target allocations. In this paper, we introduce a unified framework for response-adaptive targeting, the $\alpha$-Rebalancing Targeting Strategies ($\alpha$RTS), which generalize the ERADE two-armed strategy of Hu et al. [2009]. We prove that all designs in this family share fundamental asymptotic properties: strong consistency, asymptotic normality of allocation proportions and treatment effect estimators, and asymptotic efficiency. To address sparse target regimes (where some treatments are asymptotically eliminated), we further propose $\alpha$RTS with Forced Exploration, a variant that guarantees infinite sampling for all treatments while preserving the asymptotic guarantees. Extensive simulations illustrate the finite-sample behavior of $\alpha$RTS variants in a 3-armed context, highlighting in particular the critical role of forced exploration in sparse settings.

2606.17600 2026-06-17 stat.ME math.ST stat.ML 新提交

Proximal Mediation Analysis with Hidden Recanting Witnesses

存在隐藏反悔证人的近端中介分析

Sihan Wu, Yang Bai, Yifan Cui

AI总结 针对中介分析中未知反悔证人(治疗诱导的中介-结局混杂因素)导致的路径效应识别难题,提出三种基于近端因果推断的识别策略,并开发了近端多重稳健估计量,在部分模型正确设定时仍一致,且渐近正态达到半参效率界。

详情
AI中文摘要

中介分析对于将治疗的因果效应分解为直接和间接路径至关重要。然而,许多实际场景依赖于一个严格的假设,即反悔证人(定义为治疗诱导的中介-结局混杂因素)要么不存在,要么事先完全已知。这一要求往往难以成立,尤其是当这些变量由于测量困难或隐私限制而无法观测时。在本文中,我们利用近端因果推断,提出了三种新的识别策略,以应对在存在未知反悔证人的情况下识别路径特定效应的挑战。在此基础上,我们开发了一个半参数推断框架,推导了有效影响函数,并提出了一种近端多重稳健估计量,该估计量在至少一组 nuisance 模型正确设定时保持一致。当所有 nuisance 模型正确设定并以适当速率收敛时,该估计量渐近正态并达到半参数效率界。我们提供了一种基于极小极大优化的去偏机器学习程序,用于点估计和构建有效置信区间。通过模拟研究和真实数据应用,展示了所提方法的性能。

英文摘要

Mediation analysis is essential for decomposing the causal effect of a treatment into direct and indirect pathways. However, many practical settings rely on the stringent assumption that recanting witnesses, defined as treatment-induced mediator-outcome confounders, are either absent or fully known a priori. Such a requirement is often untenable, especially when these variables remain unobservable due to measurement difficulties or privacy constraints. In this paper, we leverage proximal causal inference to develop three novel identification strategies to address the challenge of identifying path-specific effects in the presence of unknown recanting witnesses. Building on this, we develop a semiparametric inference framework that derives the efficient influence function and proposes a proximal multiply robust estimator, which remains consistent if at least one set of nuisance models is correctly specified. When all nuisance models are correctly specified and converge at appropriate rates, the estimator is asymptotically normal and achieves the semiparametric efficiency bound. We provide a minimax optimization-based debiased machine learning procedure for point estimation and constructing valid confidence intervals. The performance of the proposed methods is demonstrated by simulation studies and a real data application.

2606.17165 2026-06-17 stat.ME cs.AI econ.EM math.ST 新提交

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础:用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.(Spotify美国公司)

AI总结 提出替代指标理论框架,证明在弱于分布等价条件下,校准LLM输出可识别平均处理效应,并分析随机性带来的偏差与方差。

详情
AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型(LLM)代替人类参与者,以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效,但这不现实。因此,我们开发了一个统计框架,将替代终点理论适配到LLM。该框架表明,将LLM结果校准到人类结果,在替代性和可比性条件(联合弱于分布等价性)下,可以识别平均处理效应。当这些条件不成立时,感兴趣的效应仅部分可识别,我们提供了诊断方法,可以在历史实验上证伪替代性,并给出有限重叠下最坏情况偏差的界限。我们进一步证明,LLM固有的随机性会引入偏差和方差,但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是,LLM结果作为替代指标的有效性只能对过去的处理被证伪,而无法对新处理被验证,因此对于新颖干预,人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用,以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes recovers the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs. The framework shows that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. When these conditions fail, the effect of interest is only partially identified, and we provide diagnostics that can falsify surrogacy on historical experiments together with a bound on the worst-case bias from limited overlap. We further show that the stochasticity inherent to LLMs introduces both bias and variance, but using an average of multiple draws as the surrogate mitigates both. We illustrate the methods and theory in simulations and an application to A/B tests on Upworthy headlines. A central takeaway from our work is that the validity of LLM outcomes as surrogates can only be falsified for past treatments and never verified for new ones, so human experiments remain indispensable for novel interventions. We discuss the role of LLM choice, prompting, and temperature as design variables, and how to size human experiments for validation.

2606.12623 2026-06-17 stat.AP cs.LG 新提交

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型(TRAM-DAG)估计急性缺血性卒中个体化治疗效果:一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结 提出因果变换模型(TRAM-DAG)估计急性缺血性卒中患者个体化治疗效果,基于观察数据拟合后,在RCT人群中验证其平均效果与ATE一致,并能正确排序患者预后。

Comments This submission has been withdrawn by the authors pending completion of internal review. A revised version will be posted in due course

详情
AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果(ATE)转向个体化治疗效果(ITE)估计,以支持治疗决策。在急性缺血性卒中中,随机对照试验(如MR CLEAN研究)显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表(mRS),这是一个有序的功能残疾指标(0:无症状,6:死亡)。我们证明,在观察性MAGIC多中心卒中患者数据上拟合后,有向无环图上的因果变换模型(TRAM-DAG)可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性,我们在MAGIC子人群(入院NIHSS≥6,对应MR CLEAN的一项纳入标准)上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认,但我们显示其平均值与试验报告的ATE一致。此外,ITE估计正确地将试验患者按观察到的良好结局(三个月mRS≤2)频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策,并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

2506.03336 2026-06-17 stat.ME 版本更新

Causal Inference with Missing Exposures and Missing Outcomes

缺失暴露和缺失结局的因果推断

Kirsten E. Landsiedel, Rachel Abbott, Atukunda Mucunguzi, Florence Mwangwa, Elijah Kakande, Edwin D. Charlebois, Carina Marquez, Moses R. Kamya, Laura B. Balzer

AI总结 提出一个扩展的反事实框架,同时处理暴露和基线结局缺失,定义反事实分层效应,并通过TMLE和Super Learner在乌干达农村酒精消费对结核病感染风险的研究中验证。

Comments 17 pages of main text (double-spaced; including 4 figures) + 20 pages of supplementary material (double-spaced; 1 figure; 2 tables) + 83 references

详情
AI中文摘要

缺失数据在公共卫生研究中普遍存在。在估计因果效应时,已有成熟的方法来处理因结局缺失导致的偏倚。通常,因果估计量是在假设干预“设定”暴露并防止缺失的情况下定义的。我们展示了如何将该框架扩展到缺失暴露的情形。我们进一步扩展该框架以纳入基线结局的缺失,这会导致感兴趣人群(例如,风险人群)的缺失。为此,我们强调了反事实分层效应,这是一类通用的因果估计量,其中关注人群受到缺失和/或暴露的影响。之所以这样命名,是因为估计量涉及对反事实层进行条件化。在每种设定中,我们提出了因果模型、相关反事实、因果估计量和识别结果。我们通过一个真实数据示例来演示,研究乌干达农村酒精消费对结核病(TB)感染风险的影响。我们强调了使用带有超级学习器的TMLE进行估计和推断,并讨论了我们的方法的实际意义。

英文摘要

Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due missing outcomes. Commonly, causal estimands are defined under hypothetical interventions to "set" the exposure and to prevent missingness. We demonstrate how this framework can be extended to missing exposures. We further extend this framework to incorporate missingness on the baseline outcome, which induces missingness on the population of interest (e.g., persons at-risk). To do so, we highlight Counterfactual Strata Effects, a general class of causal estimands where the focus population is subject to missingness and/or impacted by the exposure. They are termed such because the estimand involves conditioning on a counterfactual this http URL each setting, we present the causal model, relevant counterfactuals, causal estimand, and identification result. We demonstrate with a real-data example to investigate the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. We highlight the use of TMLE with Super Learner for estimation and inference and discuss the practical consequences of our approach.

2603.02159 2026-06-17 stat.ML cs.LG 版本更新

Instrumental and Proximal Causal Inference with Gaussian Processes

基于高斯过程的工具变量和近端因果推断

Yuqi Zhang, Krikamol Muandet, Dino Sejdinovic, Edwin Fong, Siu Lun Chau

AI总结 提出去条件高斯过程框架,用于存在未观测混杂时的因果推断,同时提供可靠的后验不确定性量化,并通过边际似然优化实现模型选择。

详情
AI中文摘要

工具变量(IV)和近端因果学习(Proxy)方法是在存在未观测混杂情况下进行因果推断的核心框架。尽管方法论上取得了重大进展,现有方法很少提供可靠的认知不确定性(EU)量化。我们通过一个去条件高斯过程(DGP)框架来解决这一差距,用于不确定性感知的因果学习。我们的公式将流行的核估计量恢复为后验均值,确保了预测精度,而后验方差则提供了有原则且校准良好的EU。此外,概率结构通过边际对数似然优化实现了系统的模型选择。实证结果表明,通过经验覆盖频率和决策感知的准确率拒绝曲线评估,该方法在提供信息丰富的EU量化的同时,表现出强大的预测性能。总之,我们的方法为存在未观测混杂情况下的因果推断提供了一个统一、实用的解决方案,并具有可靠的不确定性。

英文摘要

Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide reliable epistemic uncertainty (EU) quantification. We address this gap through a Deconditional Gaussian Process (DGP) framework for uncertainty-aware causal learning. Our formulation recovers popular kernel estimators as the posterior mean, ensuring predictive precision, while the posterior variance yields principled and well-calibrated EU. Moreover, the probabilistic structure enables systematic model selection via marginal log-likelihood optimization. Empirical results demonstrate strong predictive performance alongside informative EU quantification, evaluated via empirical coverage frequencies and decision-aware accuracy rejection curves. Together, our approach provides a unified, practical solution for causal inference under unobserved confounding with reliable uncertainty.

4. 高维统计与正则化 2 篇

2606.17121 2026-06-17 stat.AP cs.LG physics.flu-dyn 新提交

Regularized Machine Learning for System Identification of Ship Free-Running Manoeuvres from CFD-Based Synthetic Data: A Comparative Study

基于CFD合成数据的船舶自由航行操纵系统辨识的正则化机器学习:比较研究

R.F. Suárez, J.C. Berndt, M. Abdel-Maksoud

发表机构 * Hamburg University of Technology (TUHH)(汉堡技术大学)

AI总结 本研究使用正则化回归方法从CFD生成的自由航行数据中辨识船舶水动力系数,重点评估了系数集大小、训练长度和操纵组合对模型性能的影响,发现Ridge回归在计算效率和预测精度间取得最佳平衡。

Comments 28 pages

详情
AI中文摘要

本研究探讨了从CFD生成的自由航行仿真数据中辨识船舶水动力系数的监督机器学习技术。具体而言,将普通最小二乘法和正则化回归方法应用于Abkowitz型操纵模型。训练和验证数据集来自Z形和回转操纵的URANS仿真,这些仿真已通过实验基准数据验证。分析评估了系数集大小、预测模型训练所需的最小训练长度以及操纵组合对模型性能的影响。结果表明,只要通过适当的系数选择、回归模型或输入数据变异性解决多重共线性问题,大角度Z形操纵适用于水动力系统辨识。较大的系数集为可变条件提供了更大的模型灵活性,但更容易出现多重共线性。正则化回归技术有效缓解了多重共线性,并显著提高了预测精度,而纳入更多样化的操纵数据同样如此。在测试的模型中,Ridge回归在计算效率和预测精度之间提供了最佳折衷。

英文摘要

This study investigates supervised machine learning techniques for identifying ship hydrodynamic coefficients from CFD-generated data from free-running simulations. Specifically, ordinary least squares and regularized regression methods are applied to Abkowitz-type manoeuvring models. Training and validation datasets are derived from URANS simulations of zig-zag and turning circle manoeuvres, which are validated against experimental benchmark data. The analysis evaluates the effects of coefficient set size, minimum training length required for predictive model training, and manoeuvre combinations on model performance. Results demonstrate the suitability of large-angle zig-zag manoeuvres for hydrodynamic system identification, provided that multicollinearity is addressed through appropriate coefficient selection, regression models, or input data variability. Larger coefficient sets offer greater model flexibility for variable conditions but are more prone to multicollinearity. Regularized regression techniques effectively mitigate multicollinearity and notably enhance prediction accuracy, as does incorporating more diverse manoeuvring data. Among tested models, Ridge regression provided the best compromise between computational efficiency and prediction accuracy.

2411.13763 2026-06-17 math.ST stat.ME stat.ML 版本更新

Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data

高维数据下个体化阈值测量受限M估计的主动子采样

Jingyi Duan, Lehao Fu, Yang Ning

AI总结 针对测量受限问题,提出K步主动子采样算法,通过迭代采样最具信息量的观测并求解正则化M估计,实现高维线性阈值参数估计,并揭示条件密度光滑性导致的相变现象。

Comments Accepted to Annals of Statistics, 2026

详情
AI中文摘要

测量受限问题在现代应用如电子健康记录研究中频繁出现。在此类问题中,尽管有大量数据集可用,但收集标记数据可能非常昂贵或耗时,导致在给定预算内只能标记一小部分数据。这引发了一个关键问题:在预算约束下,哪些数据点最有益于标记?我们在测量受限M估计框架下研究估计最优个体化阈值的问题。具体地,我们的目标是估计连续变量$X$的线性阈值$\theta^TZ$中的高维参数$\theta$,使得$X$是否超过阈值$\theta^TZ$与二元结果$Y$之间的差异最小化。在测量受限设置中,我们提出了一种新颖的$K$步主动子采样算法来估计$\theta$,该算法迭代地采样数据集中最具信息量的观测,并求解正则化M估计量。我们的理论分析揭示了关于$\beta$(给定$Y$和$Z$时$X$的条件密度的光滑性)的尖锐相变现象。完整摘要请参见论文。

英文摘要

Measurement-constrained problems frequently arise in modern applications such as electronic health record studies. In such problems, despite the availability of large datasets, collecting labeled data can be highly costly or time-consuming, allowing only a small portion of the data to be labeled within a given budget. This raises a critical question: which data points are most beneficial to label given the budget constraint? We study this question in the context of estimating an optimal individualized threshold under a measurement-constrained M-estimation framework. In particular, our goal is to estimate a high-dimensional parameter $\theta$ in a linear threshold $\theta^TZ$ for a continuous variable $X$ such that the discrepancy between whether $X$ exceeds the threshold $\theta^TZ$ and a binary outcome $Y$ is minimized. In the measurement-constrained setting, we propose a novel $K$-step active subsampling algorithm to estimate $\theta$, which iteratively samples the most informative observations in the dataset and solves a regularized M-estimator. Our theoretical analysis reveals a sharp phase transition phenomenon with respect to $\beta$, the smoothness of the conditional density of $X$ given $Y$ and $Z$. Please see the paper for the full abstract.

5. 时间序列与空间统计 9 篇

2606.18078 2026-06-17 stat.ME 新提交

Spatial prediction of environmental processes using random forests: How best to account for spatial dependence?

使用随机森林对环境过程进行空间预测:如何最好地考虑空间依赖性?

Duncan Lee, Vinny Davies, Helen R. Savage, Hussein Twabi, Marriott Nliwasa, Peter MacPherson

AI总结 本文比较了随机森林融合空间依赖性的多种方法,通过模拟和空气污染数据实验,发现空间基函数方法表现一致良好。

详情
AI中文摘要

环境过程的地统计空间预测通常通过克里金法使用高斯过程模型进行,而机器学习算法是非空间预测的最先进技术。最近这些思想的融合令人兴奋,使传统机器学习算法具备了处理空间自相关的能力,从而提高了预测性能。已经提出了多种方法,包括与高斯过程的融合、观测驱动的相关结构、空间基函数和局部地理拟合。然而,尚未对其相对预测性能进行数值比较,而这对于指导环境科学家选择最优方法至关重要。本文填补了这一知识空白,并专注于随机森林作为机器学习算法,因为它们在计算和概念上比深度学习算法更易于实现。本文展示了两项研究的结果,第一项是受控模拟实验,研究是否有任何单一方法在不同空间自相关类型中始终表现优越。第二项研究关注马拉维布兰太尔市一项结核病患病率研究中空气污染浓度的预测。结果表明,虽然没有单一方法普遍优越,但使用空间基函数在模拟和真实数据研究中均表现一致良好。

英文摘要

Geostatistical spatial prediction for environmental processes is typically undertaken using Gaussian process models via Kriging, while machine learning (ML) algorithms are state-of-the-art for non-spatial prediction. An exciting recent fusion of these ideas imbibes traditional ML algorithms with the capacity to deal with spatial autocorrelation, leading to improved predictive performance. A range of approaches have been proposed, including fusion with Gaussian processes, observation-driven correlation structures, spatial basis functions and local geographical fitting. However, there has been no numerical comparison of their relative predictive performances, which is needed to advise environmental scientists on the optimal approach to use. This paper fills this knowledge gap, and focuses on random forests as the ML algorithm because they are more computationally and conceptually straightforward to implement than deep learning algorithms. The results from two studies are presented, the first being a controlled simulation experiment investigating whether any single approach is consistently superior across different spatial autocorrelation types. The second study focuses on the prediction of air pollution concentrations within a tuberculosis prevalence study in Blantyre, Malawi. The results show that whilst no single approach is universally superior, utilising spatial basis functions appears to perform consistently well across both the simulation and real data studies.

2606.18044 2026-06-17 stat.AP 新提交

Model-based clustering of compositional trajectories for the analysis of mobility data

基于模型的成分轨迹聚类用于移动数据分析

Andrea Panarotto, Manuela Cattelan, Ruggero Bellio

AI总结 提出一种基于状态空间模型的成分时间序列聚类方法,将电话数据中的移动轨迹表示为道路类型比例,以识别城市移动模式。

Comments 36 pages (26 for the main text, 10 in the supplementary), 13 figures (6 in the main text, 7 in the supplementary)

详情
AI中文摘要

理解城市移动模式对于设计高效且可持续的交通系统至关重要。受帕多瓦市及其周边地区应用的启发,我们提出了一种新颖的统计框架,用于分析和聚类源自电话数据的移动轨迹。我们引入了个体移动的成分表示,该表示将不确定的设备位置与周围道路网络的信息相结合,在每个时间点编码与观测位置兼容的不同道路类型的比例。这种表述自然地考虑了测量不确定性,并产生了在单纯形中演化的轨迹。为了对这些数据进行建模,我们开发了一个用于成分时间序列的状态空间框架,该框架同时捕捉电话测量误差和潜在移动过程的时间动态。基于这一表示,我们提出了一种基于模型的聚类方法,该方法基于状态空间模型的混合,以识别具有相似演化轨迹的组。这使我们能够将个体移动聚合成在人口层面上可解释的移动模式。案例研究的结果表明,该方法能够揭示有意义的移动行为,为政策制定者提供潜在相关的见解。

英文摘要

Understanding urban mobility patterns is crucial for designing efficient and sustainable transportation systems. Motivated by an application to the municipality of Padova and its surroundings, we propose a novel statistical framework for the analysis and clustering of mobility trajectories derived from telephonic data. We introduce a compositional representation of individual movements that integrates the uncertain device location with information on the surrounding road network, encoding at each time point the proportions of different road types compatible with the observed position. This formulation naturally accounts for measurement uncertainty and yields trajectories evolving in the simplex. To model these data, we develop a state-space framework for compositional time series that captures both the telephonic measurement error and the temporal dynamics of the latent mobility process. Building on this representation, we propose a model-based clustering approach based on mixtures of state-space models to identify groups of trajectories with similar evolution. This allows us to aggregate individual movements into interpretable mobility patterns at the population level. The results of the case study demonstrate the ability of the approach to uncover meaningful mobility behaviors, providing insights that are potentially relevant to policy makers.

2606.17939 2026-06-17 stat.AP stat.ML 新提交

Understanding Long-Term Dynamics of Individual Metro Usage: A Hidden Semi-Markov State Framework with Survival Analysis

理解个体地铁使用的长期动态:基于生存分析的隐半马尔可夫状态框架

Bingxun Wang, Valeria Maria Urbano, Shan He, Yang Chen, Wei Liu, Zhibin Jiang, Piercesare Secchi

AI总结 提出融合隐半马尔可夫模型与离散时间生存分析的框架,利用上海地铁四年刷卡数据识别五种可解释的出行状态及其转移层次,揭示退出风险与状态相关但独立于时长,而重返风险随不活跃时长急剧衰减。

详情
AI中文摘要

理解个体地铁使用在多年时间尺度上的演化对于交通规划和乘客留存至关重要。然而,现有方法通常将移动模式表征为静态聚类或短期变化,忽略了交通参与的生命周期动态。本研究提出一个基于状态的生命周期建模框架,将隐半马尔可夫模型(HSMM)与离散时间生存分析相结合,以刻画个体地铁移动性的演化。HSMM推断具有显式持续时间分布的潜在移动状态以及控制状态变迁的转移矩阵,而生存组件通过依赖于移动状态轨迹和行为历史的状态相关风险函数,对退出和重新进入事件进行建模。将该框架应用于上海地铁系统四年(2021-2024)的智能卡数据,能够识别可解释的移动状态,刻画转移动态,并量化状态依赖的退出和重新进入过程。分析揭示了五种稳健的移动状态,具有以偶尔使用网关状态为中心的方向性转移层次,以及控制脱离和回归的根本不同的时间机制:退出风险与状态相关但与持续时间无关,而重新进入风险随不活跃时长急剧衰减。这些发现为面向生命周期的移动性分析提供了方法论基础,并为交通运营商识别风险用户和安排留存干预提供了实践指导。

英文摘要

Understanding how individual metro usage evolves over multi-year horizons is essential for transit planning and passenger retention. However, existing approaches typically characterize mobility patterns as static clusters or short-term variability, leaving the lifecycle dynamics of transit participation underexplored. This study proposes a state-based lifecycle modeling framework that integrates Hidden Semi-Markov Models (HSMM) with discrete-time survival analysis to characterize the evolution of individual metro mobility. The HSMM infers latent mobility states with explicit duration distributions and a transition matrix governing regime changes, while the survival component models exit and re-entry events via state-dependent hazard functions conditioned on mobility-state trajectories and behavioral history. Applied to four years of smart card data from the Shanghai metro system (2021-2024), the framework enables the identification of interpretable mobility states, the characterization of transition dynamics, and the quantification of state-dependent exit and re-entry processes. The analysis reveals five robust mobility states with a directional transition hierarchy centered on an occasional-usage gateway state, and fundamentally different temporal mechanisms governing disengagement and return: exit hazard is state-dependent but duration-independent, whereas re-entry hazard decays sharply with inactivity length. These findings provide a methodological foundation for lifecycle-oriented mobility analysis and practical guidance for transit operators to identify at-risk users and time retention interventions.

2606.17717 2026-06-17 stat.ME stat.AP 新提交

Double zero-inflated spatio-temporal modeling of daily precipitation under detection thresholds

检测阈值下日降水量的双零膨胀时空建模

Juan Marcen-Gutierrez, Jorge Castillo-Mateo, Alan E. Gelfand, Jesús Asín, Ana C. Cebrián

AI总结 针对日降水量中两种零值(无降水事件和低于检测限的未测量降水)问题,提出结合Probit回归、Gamma回归和阈值截断观测机制的多层时空模型,并应用高斯过程捕捉空间依赖,在贝叶斯框架下实现精确推断。

Comments 38 pages (+33 pages supplement), 7 figures (+35 figures supplement), 5 tables

详情
AI中文摘要

解释日尺度降水行为对于精细理解降水驱动机制至关重要。然而,由于零值的频繁出现,这一工作具有挑战性。两种类型的零值——作为干旱事件的无降水和由于检测限导致的未测量降水——的公认存在加剧了这一挑战。在这项工作中,我们提出了一个多层时空模型,该模型允许我们区分和解释两种类型的零值,并对高于检测限的正降水进行建模。该方法结合了通过Probit回归建模概率的零处点质量、潜在正降水量的Gamma回归以及受阈值截断影响的观测机制。为了捕捉空间依赖性,在每个回归模型中采用了高斯过程。在贝叶斯框架下工作,我们可以获得具有精确不确定性的丰富推断范围。特别是,我们提供了基于模型的推断工具,以比较和量化真实降水过程与其观测对应物在相关特征上的差异。我们将模型应用于西班牙东北部埃布罗河流域70个站点15年间的春季日观测数据分析。我们的发现表明,阈值强烈影响观测降水的发生,特别是在湿润地区。虽然其对总累积量的影响较小,但它可能对上分位数产生显著影响。

英文摘要

Explaining precipitation behavior at daily scale is important for fine scale understanding of the mechanisms driving precipitation. However, this effort is challenging because of the frequent incidence of zeros. The challenge is amplified by the acknowledged incidence of two types of zeros -- absence of precipitation as a dry event and absence of measured precipitation due to detection limits. In this work, we propose a multilevel spatio-temporal model which allows us to distinguish and explain the two types of zeros, as well as to model positive precipitation above the detection limit. The methodology combines a point mass at zero with probability modeled through a probit regression, a Gamma regression for latent positive precipitation amounts, and an observation mechanism subject to threshold-induced censoring. To capture spatial dependencies, Gaussian processes are employed in each regression model. Working within a Bayesian framework, we can obtain a rich range of inference with exact uncertainty. In particular, we provide model-based inference tools to compare and quantify differences between the true precipitation process and its observed counterpart across relevant characteristics. We apply our model to the analysis of daily spring observations at 70 sites over 15 years from the Ebro River Basin in northeastern Spain. Our findings indicate that the threshold strongly affects the occurrence of observed precipitation, especially in humid regions. While its impact on total accumulated amounts is small, it can exert a relevant effect on upper quantiles.

2606.17369 2026-06-17 math.ST stat.ME 新提交

Inference Optimal Long Run Variance Estimation with Lugsail Kernels

使用Lugsail核的推断最优长程方差估计

Rebecca P. Kurtz-Garcia, James M. Flegal

AI总结 针对具有平稳序列依赖的数据,提出基于非标准固定平滑极限分布的Lugsail估计器最优带宽规则,改善偏差校正并优化推断。

详情
AI中文摘要

对于具有未知但平稳序列依赖的数据集,鲁棒的长程方差估计器对于处理各种场景至关重要。谱方差估计器常用,但在存在正相关时往往表现出显著的负偏差。为了克服这一点,引入了零lugsail估计器,无论相关结构如何,都能提供零渐近偏差。然而,目前尚无选择lugsail估计器最优带宽的指南,而这是估计过程中的关键组成部分。我们基于研究中发展的非标准固定平滑极限分布,提出了lugsail估计器的推断最优带宽规则。该方法显著改善了偏差校正,考虑了变异性,并提供了针对鲁棒推断优化的估计器。我们的理论发现得到了模拟研究的支持。

英文摘要

For datasets with unknown but stationary serial dependence, a robust long run variance estimator is essential to handle diverse scenarios. Spectral variance estimators are commonly used but tend to exhibit significant negative bias in the presence of positive correlation. To overcome this, zero lugsail estimators have been introduced, offering zero asymptotic bias regardless of the correlation structure. However, there are currently no guidelines for selecting the optimal bandwidth for lugsail estimators, a critical component in the estimation process. We propose an inference optimal bandwidth rule for lugsail estimators, based on nonstandard fixed-smoothing limiting distributions developed in our study. This approach significantly improves bias correction, accounts for variability, and provides an estimator optimized for robust inference. Our theoretical findings are supported by a simulation study.

2606.17530 2026-06-17 physics.soc-ph cs.LG econ.GN stat.AP 新提交

Public transit gains and spatially uneven travel demand changes after NYC congestion pricing

纽约市拥堵收费后公共交通增益与空间不均的出行需求变化

Donghang Li, Dingyi Zhuang, Yunlin Li, Chenan Shen, Nina Cao, Yunhan Zheng, Shenhao Wang, Jinhua Zhao

发表机构 * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology(麻省理工学院土木与环境工程系) Department of Urban Studies and Planning, Massachusetts Institute of Technology(麻省理工学院城市研究与规划系) Mathematical Institute, University of Oxford(牛津大学数学院) Department of Mechanical Engineering, Massachusetts Institute of Technology(麻省理工学院机械工程系) College of Urban and Environmental Sciences, Peking University(北京大学城市与环境科学学院) Department of Urban and Regional Planning, University of Florida(佛罗里达大学城市与区域规划系) Center for Computational Science and Engineering, Massachusetts Institute of Technology(麻省理工学院计算科学与工程中心)

AI总结 利用时间序列基础模型生成概率反事实预测,评估纽约市2025年实施的拥堵收费政策,发现公交和地铁客流量显著增加,但总体出行需求略有下降,且影响存在空间异质性。

详情
AI中文摘要

纽约市于2025年1月实施了全国首个基于区域的拥堵收费计划,为评估全系统城市出行如何响应大规模定价干预提供了机会。由于此类政策会在不同交通方式和区域间产生溢出效应,因此难以构建可信的控制组。我们利用时间序列基础模型生成具有校准不确定性的概率反事实需求预测,以应对这一挑战。将该框架应用于公交、地铁和总出行量数据,我们发现,与预期无政策需求相比,政策实施后公交和地铁客流量显著增加,而总体出行需求略有下降。影响存在空间异质性:总体出行需求的减少集中在拥堵缓解区内,而公共交通的增益则延伸至曼哈顿核心区以外。社会人口分析进一步揭示了不同社区之间的适应差异,凸显了空间公平性问题。我们的框架为在缺乏干净控制组的情况下,对全系统城市干预进行不确定性感知评估提供了一种可扩展的方法。

英文摘要

New York City implemented the nation's first cordon-based congestion pricing program in January 2025, providing an opportunity to evaluate how system-wide urban mobility responds to large-scale pricing interventions. Because such policies generate spillovers across modes and locations, credible control groups are difficult to construct. We address this challenge using time series foundation models to generate probabilistic counterfactual demand forecasts with calibrated uncertainty. Applying this framework to bus, subway, and aggregate trip volume data, we find that post-policy bus and subway ridership increased significantly relative to expected no-policy demand, while overall travel demand decreased modestly. The effects are spatially heterogeneous: while reductions in overall travel demand are concentrated within the Congestion Relief Zone, transit gains extend beyond Manhattan's core. Socio-demographic analyses further reveal uneven adaptation across neighborhoods, highlighting spatial equity implications. Our framework provides a scalable approach for the uncertainty-aware evaluation of system-wide urban interventions when clean control groups are unavailable.

2606.12097 2026-06-17 stat.AP physics.data-an 新提交

Weibull-Stationary Stochastic Differential Equations for Conditional Long-Horizon Wind Power Forecasting

条件长期风电预测的威布尔平稳随机微分方程

Luca Di Persio, Mehrdad Ghadiri

AI总结 提出一种基于威布尔平稳SDE的月度风电概率预测框架,通过异方差卡尔曼滤波和三种SDE模型实现高分辨率预测,CRPS约1.57 m/s,功率Wasserstein距离低于额定容量1.4%。

详情
AI中文摘要

我们提出了一个以十分钟分辨率进行一个月前风电预测的条件概率框架。从序列相关的SCADA风速数据中估计月度威布尔形状和尺度参数,通过Godambe协方差修正,并使用异方差卡尔曼滤波在双变量VAR(1)状态空间模型上进行预测。以MMSE预测的威布尔不变律为条件,我们构建并比较了三种正风速SDE模型:Ornstein-Uhlenbeck-Weibull变换、Fokker-Planck漂移优先扩散和Fokker-Planck扩散优先模型。模拟的风速集合通过校准的XGBoost功率曲线映射到功率。应用于Kelmarsh风电场Senvion MM92涡轮机2021年1月的数据,三种SDE公式在概率精度上统计上不可区分,平均CRPS值在1.569至1.575 m/s之间。因此,扩散优先模型在计算上更优,运行时间相对于OU-Weibull模型减少了约七倍。在功率域中,模拟与观测分布之间的Wasserstein距离为26.1-27.6 kW,低于额定容量的1.4%,而所检查月份的月能量产出偏差约为-7.3%。在0-1500 kW范围内,超越概率误差保持在1.6个百分点以下,在额定功率附近约为2.2个百分点。这些量为下游运行问题提供了决策相关的概率输入,而非完成的备用、储能、市场或疲劳优化决策。完全边缘化卡尔曼预测律下的威布尔参数是一个自然的扩展。

英文摘要

We present a one-month-ahead conditional probabilistic framework for wind-power forecasting at ten-minute resolution. Monthly Weibull shape and scale parameters are estimated from serially dependent SCADA wind-speed data, corrected through a Godambe covariance, and forecast by a heteroskedastic Kalman filter on a bivariate VAR(1) state-space model. Conditional on the MMSE forecasted Weibull invariant law, we construct and compare three positive wind-speed SDE models: an Ornstein-Uhlenbeck-Weibull transform, a Fokker-Planck drift-first diffusion, and a Fokker-Planck diffusion-first model. The simulated wind-speed ensembles are mapped to power through a calibrated XGBoost power curve. Applied to January 2021 data from a Senvion MM92 turbine at Kelmarsh Wind Farm, the three SDE formulations are statistically indistinguishable in probabilistic accuracy, with mean CRPS values between 1.569 and 1.575 m/s. The diffusion-first model is therefore preferred on computational grounds, reducing runtime by about a factor of seven relative to the OU-Weibull model. In the power domain, the Wasserstein distance between simulated and observed distributions is 26.1-27.6 kW, below $1.4\%$ of rated capacity, while the monthly energy-yield bias is about $-7.3\%$ for the examined month. Exceedance-probability errors remain below 1.6 percentage points over the 0-1500 kW range and about 2.2 percentage points near rated power. These quantities provide decision-relevant probabilistic inputs for downstream operational problems, rather than completed reserve, storage, market, or fatigue-optimization decisions. Full marginalisation over the Kalman predictive law of the Weibull parameters is left as a natural extension.

2412.00607 2026-06-17 stat.ME q-fin.RM 版本更新

On a risk model with tree-structured Poisson Markov random field frequency, with application to rainfall events

基于树结构泊松马尔可夫随机场频率的风险模型及其在降雨事件中的应用

Hélène Cossette, Benjamin Côté, Alexandre Dubeau, Etienne Marceau

AI总结 提出一种树结构泊松马尔可夫随机场模型来刻画组合风险中的频率相依性,研究无限增长树上的渐近风险,并在极端降雨数据上验证了模型灵活性和可扩展性。

Comments 40 pages

详情
AI中文摘要

在许多保险情境中,组合风险之间的相依性可能源于其频率。我们研究了一个相依风险模型,其中假设计数变量向量为具有泊松边缘分布的树结构马尔可夫随机场。树结构转化为多种相依方案。我们研究了组合的整体风险及其所有组成部分的风险分配。我们提供了定义在无限增长树上的组合的渐近结果。为了说明其灵活性和对更高维度的计算可扩展性,我们在真实世界的极端降雨数据上校准了风险模型并进行了风险分析。

英文摘要

In many insurance contexts, dependence between risks of a portfolio may arise from their frequencies. We investigate a dependent risk model in which we assume the vector of count variables to be a tree-structured Markov random field with Poisson marginals. The tree structure translates into a wide variety of dependence schemes. We study the global risk of the portfolio and the risk allocation to all its constituents. We provide asymptotic results for portfolios defined on infinitely growing trees. To illustrate its flexibility and computational scalability to higher dimensions, we calibrate the risk model on real-world extreme rainfall data and perform a risk analysis.

2506.00561 2026-06-17 stat.AP stat.ME 版本更新

Mortality Forecasting under Climate Risk: A Stochastic Approach with Distributed Lag Non-Linear Models

气候风险下的死亡率预测:基于分布滞后非线性模型的随机方法

Jiacheng Min, Han Li, Thomas Nagler, Shuanming Li

AI总结 提出将分布滞后非线性模型融入随机死亡率模型,通过新回拟合算法分离气候与非气候风险,在三个欧洲区域验证了短期预测优势,并利用未来气候数据预测至2045年的死亡率变化。

Comments 25 pages, 10 figures, and 2 tables

详情
AI中文摘要

评估气候驱动的死亡风险近几十年来已成为一个新兴研究领域。本文提出一种新方法,将气候驱动效应明确纳入单种群和多种群随机死亡率模型。新模型由两部分组成:随机死亡率模型和分布滞后非线性模型(DLNM)。随机部分捕捉死亡率中非气候的长期趋势、波动性和季节模式。DLNM部分捕捉气候变量对死亡率的非线性和滞后效应,以及热浪和寒潮对不同年龄组的影响。对于模型校准,我们提出一种新的回拟合算法,能够将气候驱动的死亡风险与非气候驱动的随机死亡风险分离开来。我们利用来自三个欧洲地区(雅典、里斯本和罗马)的数据,展示了我们的模型相对于四种替代模型的有效性和改进的短期(1-18个月)预测性能。此外,作为所提出建模框架的一个应用,我们利用气候模型生成的未来UTCI数据,在两种代表性浓度路径(RCP)情景下,同时考虑随机死亡率改善趋势和气候风险,提供了这些地区到2045年的总死亡率预测。预测显示,随着UTCI随时间普遍升高,冬季死亡率显著下降,而夏季死亡率上升。尽管我们预计在RCP8.5情景下短期总死亡率略低于RCP2.6,但在RCP8.5情景下,长期总死亡率预计将增加。

英文摘要

Assessing climate-driven mortality risk has become an emerging area of research in recent decades. In this paper, we propose a novel approach to explicitly incorporate climate-driven effects into both single- and multi-population stochastic mortality models. The new model consists of two components: a stochastic mortality model, and a distributed lag non-linear model (DLNM). The stochastic component captures the non-climate long-term trend, volatility, and seasonal patterns in mortality rates. The DLNM component captures non-linear and lagged effects of climate variables on mortality, as well as the impact of heat waves and cold waves across different age groups. For model calibration, we propose a novel backfitting algorithm that allows us to disentangle the climate-driven mortality risk from the non-climate-driven stochastic mortality risk. We illustrate the effectiveness and improved short-term (1--18 month) forecasting performance of our model against four alternative models, using data from three European regions: Athens, Lisbon, and Rome. Furthermore, as an application of the proposed modeling framework, we utilize future UTCI data generated from climate models to provide total mortality forecasts into 2045 across these regions under two Representative Concentration Pathway (RCP) scenarios, taking both stochastic mortality improvement trend and climate risk into account. The projections show a noticeable decrease in winter mortality alongside a rise in summer mortality, driven by a general increase in UTCI over time. Although we expect slightly lower overall mortality in the short term under RCP8.5 compared to RCP2.6, a long-term increase in total mortality is anticipated under the RCP8.5 scenario.

6. 计算统计与MCMC 11 篇

2606.17486 2026-06-17 stat.ME stat.CO 新提交

Improving Linear Regression on Small Datasets via Gaussian Process and Extreme Value Theory-Based Data Augmentation

基于高斯过程和极值理论的数据增强改进小样本线性回归

Ibrahim Salay, Jagath Senarathne

AI总结 针对小样本回归中经典假设违背问题,提出GP-MEVT混合数据增强方法,结合高斯过程与极值理论扩展预测空间并保留线性结构,在模拟和真实数据上优于标准bootstrap方法。

详情
AI中文摘要

小样本量在回归分析中带来显著挑战,常导致正态性、同方差性和残差独立性等经典假设的违背。这些违背损害了参数估计的准确性,降低了统计功效,并限制了结果的泛化能力。本研究引入了基于高斯过程的改进极值定理(GP-MEVT)方法,这是一种新颖的混合数据增强方法,结合了高斯过程与极值理论以解决这些局限性。GP-MEVT方法生成增强观测值,将预测空间扩展到观测范围之外,同时保留底层线性结构,并根据残差变异引入受控变异性。通过在三个方差场景(sigma = 2, 5, 8)和样本量(n = 10, 15, 20)下的全面模拟研究,我们证明GP-MEVT实现了更高的假设满足率,显著优于标准bootstrap和带噪声的bootstrap方法。所提出的方法还表现出合理的参数估计准确性,截距和斜率估计值始终更接近真实参数值,并且在均方根误差衡量下保持竞争性或更优的模型拟合性能。应用于真实世界数据集证实了这些优势,GP-MEVT实现了67.1%的假设满足率,而bootstrap替代方法分别为17.3%和21.2%。这些发现确立了GP-MEVT作为拟合小数据集线性回归模型的稳健可靠框架,为实践者在样本量限制不可避免时提供了一种原则性的统计推断方法。

英文摘要

Small sample sizes pose significant challenges in regression analysis, often leading to violations of classical assumptions such as normality, homoscedasticity, and independence of residuals. These violations compromise parameter estimation accuracy, reduce statistical power, and limit the generalizability of findings. This study introduces the Gaussian Process-based Modified Extreme Value Theorem (GP-MEVT) method, a novel hybrid data augmentation approach that combines Gaussian Process with Extreme Value Theory to address these limitations. The GP-MEVT method generates augmented observations that extend the predictor space beyond the observed range while preserving the underlying linear structure and introducing controlled variability based on residual variation, through comprehensive simulation studies across three variance scenarios (sigma = 2, 5, 8) and sample sizes (n = 10, 15, 20). Here, we demonstrate that GP-MEVT achieves a higher rate of assumption satisfaction, substantially outperforming standard bootstrap and bootstrap with noise methods. The proposed method also exhibits reasonable parameter estimation accuracy, with intercept and slope estimates consistently closer to true parameter values, and maintains competitive or superior model fitting performance as measured by root mean square error. Application to a real-world dataset confirms these advantages, with GP-MEVT achieving a 67.1% assumption satisfaction rate compared to 17.3% and 21.2% for bootstrap alternatives. These findings establish GP-MEVT as a robust and reliable framework for fitting linear regression models to small datasets, offering practitioners a principled approach to statistical inference when sample size limitations are unavoidable.

2606.17181 2026-06-17 stat.ME stat.AP 新提交

Tropical Viterbi Tubes for Decoding Uncertainty in Hidden Markov Models

热带维特比管:隐马尔可夫模型解码不确定性

Aurélien Nicosia

AI总结 提出热带维特比管,通过容忍度阈值捕获隐马尔可夫模型中接近最优的路径不确定性,并给出精确投影算法与校准方法。

Comments 33 pages, 4 figures; supplementary material included as ancillary file; submitted to The Annals of Applied Statistics

详情
AI中文摘要

隐马尔可夫模型广泛用于从序列数据推断潜在状态序列,但维特比解码仅报告一条最可能的完整路径。当解码状态具有科学意义时,这一单一最大化器可能掩盖由多条近最优轨迹产生的路径不确定性。在拟合的HMM条件下,我们引入热带维特比管:其完整数据对数得分在维特比最优值容忍度内的隐藏轨迹集合。状态、转移和变化状态投影显示哪些局部特征与全局近最优完整路径兼容,为序列分析、生态学、金融、生物医学监测及相关领域的HMM提供了路径不确定性层。该管是完整隐藏路径空间上的后验上水平集,容忍度解释为相对于维特比路径的对数后验几率损失。将容忍度校准到目标后验质量,为完整潜在路径提供了HPD阈值可信区域和保守的同时投影带。我们证明了单调性、阶梯函数行为和确定性稳定性保证,并通过最大加前向-后向递归在O(TK^2)时间内精确计算密集转移的投影管。后验管质量和HPD校正是通过FFBS近似的独立路径计算。在一个公开的蝙蝠追踪应用中,鲁棒觅食管段富含捕食嗡嗡声,而鲁棒通勤管段则缺乏:在eta=0.005时,鲁棒觅食的富集度为2.25,95%自助法区间为(1.73, 2.85);鲁棒通勤的富集度为0.27,区间为(0.16, 0.44)。

英文摘要

Hidden Markov models are widely used to infer latent state sequences from sequential data, but Viterbi decoding reports only one most likely complete path. When decoded states carry scientific meaning, this single maximizer can conceal pathwise uncertainty created by multiple near-optimal trajectories. Conditional on a fitted HMM, we introduce the tropical Viterbi tube: the set of hidden trajectories whose complete-data log-score lies within a tolerance of the Viterbi optimum. State, transition, and change-status projections show which local features remain compatible with globally near-optimal complete paths, giving a pathwise uncertainty layer for HMMs in sequence analysis, ecology, finance, biomedical monitoring, and related domains. The tube is a posterior superlevel set on complete hidden-path space, with tolerance interpreted as a log posterior-odds loss relative to a Viterbi path. Calibrating the tolerance to a target posterior mass gives an HPD-threshold credible region for the complete latent path and conservative simultaneous projected bands. We prove monotonicity, step-function behavior, and deterministic stability guarantees, and compute projected tubes exactly by max-plus forward-backward recursions in O(TK^2) time for dense transitions. Posterior tube mass and HPD calibration are separate pathwise calculations approximated by FFBS. In a public bat-tracking application, robust foraging tube segments are enriched for feeding buzzes, whereas robust commuting segments are depleted: at eta = 0.005, enrichment is 2.25 with 95% bootstrap interval (1.73, 2.85) for robust foraging and 0.27 with interval (0.16, 0.44) for robust commuting.

2606.17233 2026-06-17 cs.LG stat.ML 新提交

Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

基于多项式混沌展开与多元主动学习的工程结构不确定性量化

Qitian Lu, Jafar Jafari-Asl, Panagiotis Spyridis, Lukas Novak

发表机构 * Brno University of Technology(布尔诺理工大学) University of Rostock(罗斯托克大学)

AI总结 针对多输出工程问题中单一实验设计难以同时准确近似所有输出量的问题,提出一种自适应序贯采样方法,通过平衡输入空间探索与多输出聚合方差信息,构建多项式混沌展开代理模型,数值实验表明该方法提高了代理精度和稳定性。

详情
AI中文摘要

在许多工程应用中,单个高保真模型在相同输入参数下产生多个感兴趣的量(QoIs),例如复杂物理系统的有限元模型。为了减轻直接模型评估的高计算成本,代理模型被广泛用于构建模型响应的高效近似。自然地,代理模型的精度强烈依赖于实验设计(ED)的质量。然而,单个ED可能无法同时为所有输出提供足够的表示,特别是当不同输出对输入变量表现出不同的敏感性时。一个直接的解决方案是为每个输出分别进行采样,但这会导致采样复杂性和计算成本增加。从统计角度来看,这种方法也忽略了所有输出之间潜在的相关性,并可能损害数据一致性。为了解决这个问题,一种用于构建多项式混沌展开代理模型的自适应序贯采样方法被推广到向量值QoIs。该方法基于新样本对输出方差的局部贡献,从候选池中顺序选择新样本,同时平衡基于距离的输入空间探索和跨所有输出的聚合方差信息的利用。通过来自工程问题的几个数值示例,将其性能与非序贯拉丁超立方采样进行比较。数值结果表明,所提出的策略提高了代理模型的精度和稳定性,并提供了更可靠的二阶统计量估计。

英文摘要

In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.

2606.17916 2026-06-17 stat.CO astro-ph.CO astro-ph.IM stat.ML 新提交

Nested Sampling: A Critical and Comprehensive Theoretical Guide

嵌套采样:一个批判性且全面的理论指南

Luca Martino, Fernando Llorente

AI总结 本文全面详细地阐述了嵌套采样(NS)的推导过程,澄清其理论基础和实际挑战,旨在为新手提供教程,为经验丰富的从业者提供批判性回顾。

详情
AI中文摘要

嵌套采样(NS)技术因其能够高效探索高似然区域而受到广泛关注,尤其是在宇宙学和天文学领域——这一特性类似于隐式似然优化,是其成功的基础。虽然NS的完整理论推导复杂且涉及多个近似,但核心挑战在于从似然约束先验中进行采样,这对性能至关重要。本文提供了NS推导的全面详细阐述,澄清了其理论基础和实际挑战。我们详细描述了NS过程,强调了其优势及潜在局限性。通过这样做,本文旨在加深对该方法的理解,并促进未来在广泛科学应用中改进、新变体和更高效实现的发展。因此,本文的主要贡献是双重的:既作为该领域新手的教程,又作为经验丰富的从业者的批判性回顾。

英文摘要

The nested sampling (NS) technique has gained widespread attention, particularly in cosmology and astronomy, due to its ability to efficiently explore high-likelihood regions - a feature akin to an implicit likelihood optimization that underlies its success. While the full theoretical derivation of NS is complex and involves several approximations, the central challenge lies in sampling from the likelihood-constrained priors, which is crucial for its performance. This work provides a comprehensive and detailed exposition of NS derivation, clarifying both its theoretical foundations and practical challenges. We provide a thorough description of the NS procedure, emphasizing both its strengths and potential limitations. In doing so, this work seeks to deepen understanding of the method and to foster the development of future enhancements, novel variants, and more efficient implementations across a wide range of scientific applications. Thus, the main contribution of this work is twofold: it serves both as a tutorial for newcomers to the field and as a critical review for experienced practitioners.

2606.13827 2026-06-17 math.NA cs.LG stat.ML 新提交

Approximating Gaussian Whittle-Matern Fields over Well-Centered Triangulations of Riemannian Manifolds

离散流形上的Whittle-Matérn场逼近

Srinivas Nambirajan

AI总结 提出一种基于离散外微分的GMRF逼近方法,统一处理Whittle-Matérn场族,支持推断参数,兼容点/分段平滑测量,计算独立于插值函数,并给出低秩近似用于压缩感知。

Comments More specific title, updated acknowledgement, minor typos fixed

详情
AI中文摘要

马尔可夫Whittle-Matérn场已通过稀疏精度矩阵的高斯马尔可夫随机场(GMRF)收敛逼近,使用两参数族SPDE的有限元近似:\\( (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \\;\\; \kappa \in \mathbb{R}, \\; \alpha \in \mathbb{N} \\)。利用离散外微积分(DEC)分析的最新进展,我们提出了一种不同但密切相关的收敛GMRF逼近方法,适用于离散化为良好中心单纯复形的完备无边黎曼流形上的Matérn场。该收敛方法:(i) 对\\(\alpha, \kappa\\)不可知,从而允许对整个\\((\alpha, \kappa)\\)族GMRF的精度和协方差矩阵进行通用逼近方案,因此它们可以被推断而非猜测。(ii) 固有地模拟随机场的逐点和分段平滑测量,并对两者同样好地逼近。(iii) 计算上与所用插值函数无关——如果将一种收敛插值替换为同一网格上的另一种合适插值,不会产生额外开销。此外,我们证明,在精确意义上良好连接且体积集中的离散化上,精度矩阵是图拉普拉斯的谱函数。我们为该族Matérn GMRF提供了一个低秩逼近器,并提及一个用例:通过压缩感知减少建模GMRF所需的测量数量。

英文摘要

Markovian Whittle-Matérn fields have been convergently approximated by discrete Gauss Markov Random Fields (GMRFs) with sparse precision matrices using a Finite Element approximation of the two-parameter family, \[ (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \;\; \kappa \in \mathbb{R}, \; \alpha \in \mathbb{N}. \] of SPDEs. Using recent developements in the analysis of Discrete Exterior Calculus (DEC), we present a different, yet closely related, convergent GMRF approximation to these Matérn fields over complete, boundaryless Riemannian manifolds discretized as well-centered simplicial complexes. This convergent method (i) is agnostic to $\alpha, \kappa$ and thus allows a universal approximation scheme for the precision and covariance matrices of the entire $(\alpha, \kappa)$-family of GMRFs, so they may be inferred rather than guessed. (ii) inherently models pointwise and piecewise-smoothed measurements of a random field and approximates both equally well (iii) is computationally independent of the interpolants used - it suffers no overhead if one convergent interpolant were replaced with another suitable interpolant over the same mesh. Furthermore, we show that, on discretizations that are well-connected in a precise sense, and volume-concentrated, the precision matrices are spectral functions of a graph-laplacian. We provide a low rank approximator to the family of such Matérn GMRFs and mention a use case: reducing the number of measurements needed to model the GMRF by compressed-sensing.

2606.09049 2026-06-17 stat.ME cs.LG math.ST stat.ML 新提交

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

数据增强自助法:通过近似不变性统一置信区间构建

Kevin Han Huang

AI总结 提出数据增强自助法(DAB),利用数据的近似不变性构建置信区间,统一了经典自助法、共形预测等方法的理论,并引入数据增强启发式方法。

Comments Added comparison with arXiv:2604.15229 (https://arxiv.org/abs/2604.15229)

详情
AI中文摘要

我们提出了数据增强自助法(DAB),这是一个通过数据的近似不变变换来构建置信区间的框架。作为特例,DAB 恢复了依赖于精确群对称性的流行方法,例如共形预测、最大均值差异 U-统计量的 wild bootstrap 以及最近提出的 SymmPI。同时,DAB 也恢复了经典的自助法,该方法利用了随着数据集大小增长,数据索引均匀采样下数据集的近似不变性。对于所有 DAB 方法,我们建立了理论覆盖结果,这些结果根据不变性的强度在有限样本和渐近保证之间插值,且不假设群结构。近似不变性通过 Kolmogorov 距离度量,并且对于满足高斯普适性的统计量,简化为条件均值和方差匹配。这使我们能够将数据增强(DA)——一种基于近似不变性的广泛使用的机器学习启发式方法——纳入已知的统计方法中。我们通过实验测试了将 DA 纳入自助法、wild bootstrap 和共形预测在模拟设置以及图像、语言和科学数据上的性能。

英文摘要

We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

2605.16900 2026-06-17 stat.ME math.ST 版本更新

Splitting schemes and estimators for stochastic differential equations with Hölder multiplicative noise

具有Hölder乘性噪声的随机微分方程的分裂方案和估计器

Bowen Fang, Dario Spanò, Massimiliano Tamborrino

AI总结 本文研究了具有局部Lipschitz漂移和Hölder连续乘性扩散的单变量随机微分方程的参数估计问题,提出了一种基于数值分裂方案的首个显式伪似然估计器,该方案在强均方收敛性和状态空间保持性方面优于传统的欧拉-马尔蒂内斯离散化方法,并通过模拟验证了其在准确性和计算效率上的优越性。

Comments Additional simulation results. 56 pages, 14 figures, 2 tables

详情
AI中文摘要

我们研究了具有局部Lipschitz漂移和Hölder连续乘性扩散的单变量随机微分方程的参数估计问题。现有的推断方法通常依赖于欧拉-马尔蒂内斯离散化,尽管其缺乏强收敛性和无法保持状态空间,或者依赖于近似方法,例如高斯近似或海尔特展开的截断,这影响了其稳定性和计算效率。我们引入了首个基于数值分裂方案的显式伪似然估计器,这些方案对于此类SDEs具有强均方收敛性和状态空间保持性。我们的方法基于一种新的SDE分解,利用了可约性和拉姆普蒂变换,导致产生Lie-Trotter(LT)和Strang分裂方案,从而产生基于这些方案的显式伪似然和最大似然估计器。我们证明了强均方收敛性、状态空间保持性和比欧拉-马尔蒂内斯方法更稳健的离散化步长。我们进一步建立了LT估计器的一致性和渐近正态性。由于所提出的数值方案在伪似然中耦合了漂移和扩散参数,因此渐近分析需要新的证明技术。广泛的模拟显示,所提出的估计器在准确性和计算效率上均优于现有方法。

英文摘要

We study parameter estimation for univariate stochastic differential equations with locally Lipschitz drift and Hölder continuous multiplicative diffusion, a class commonly arising in several applications. Existing inference methods typically rely on either the Euler-Maruyama discretisation, despite its lack of strong convergence and failure to preserve the state space, or on approximations, e.g. Gaussian approximation or truncation of Hermite's expansions, impacting on their stability and computational efficiency. We introduce the first explicit pseudo-likelihood estimators based on numerical splitting schemes that are both strong mean-square convergent and state space preserving for this class of SDEs. Our approach is based on a novel decomposition of the SDE that exploits reducibility and the Lamperti transform, leading to Lie-Trotter (LT) and Strang splitting schemes yielding explicit pseudo-likelihoods and maximum likelihood estimators based on them. We prove strong mean-square convergence, state space preservation, and improved robustness with respect to the discretisation step compared to Euler-Maruyama-based methods. We further establish consistency and asymptotic normality of the LT estimator. Because the proposed numerical scheme couples drift and diffusion parameters in the pseudo-likelihood, the asymptotic analysis requires new proof techniques. Extensive simulations demonstrate that the proposed estimators outperform existing methods in both accuracy and computational efficiency.

2604.06531 2026-06-17 math.OC cs.LG cs.MA eess.SY stat.ML 版本更新

A Generalized Sinkhorn Algorithm for Mean-Field Schrödinger Bridge

平均场薛定谔桥的广义Sinkhorn算法

Asmaa Eldesoukey, Yongxin Chen, Abhishek Halder

AI总结 针对平均场薛定谔桥问题,提出广义Hopf-Cole变换并设计Sinkhorn型递归算法求解积分-偏微分方程组,在弱假设下证明收敛性,数值实验验证有效性。

详情
AI中文摘要

平均场薛定谔桥(MFSB)问题涉及设计一个最小努力控制器,引导具有非局部相互作用的扩散过程在固定截止时间内从给定分布到达另一个分布。与标准薛定谔桥不同,MFSB的动态约束是带有控制器的相互作用智能体群体的平均场极限。它是大规模多智能体系统的自然模型。由于非局部相互作用使问题非凸,MFSB在计算上具有挑战性。我们提出了MFSB的Hopf-Cole变换的推广,并在此基础上设计了一种Sinkhorn型递归算法来求解相关的积分-偏微分方程组。在相互作用势的温和假设下,我们讨论了所提算法的收敛性保证。我们通过排斥和吸引相互作用的数值示例来说明理论贡献。

英文摘要

The mean-field Schrödinger bridge (MFSB) problem concerns designing a minimum-effort controller that guides a diffusion process with nonlocal interaction to reach a given distribution from another by a fixed deadline. Unlike the standard Schrödinger bridge, the dynamical constraint for MFSB is the mean-field limit of a population of interacting agents with controls. It serves as a natural model for large-scale multi-agent systems. The MFSB is computationally challenging because the nonlocal interaction makes the problem nonconvex. We propose a generalization of the Hopf-Cole transform for MFSB and, building on it, design a Sinkhorn-type recursive algorithm to solve the associated system of integro-PDEs. Under mild assumptions on the interaction potential, we discuss convergence guarantees for the proposed algorithm. We present numerical examples with repulsive and attractive interactions to illustrate the theoretical contributions.

2410.10137 2026-06-17 cs.LG math.DG stat.CO stat.ML 版本更新

Variational autoencoders with latent high-dimensional steady geometric flows for dynamics

具有潜在高维稳态几何流的变分自编码器用于动力学

Andrew Gracyk

AI总结 提出VAE-DLM方法,在潜在空间中引入稳态几何流,通过物理信息方法求解高维流,增强潜在表示的表达能力,在PDE型数据上降低OOD误差15%-35%。

详情
AI中文摘要

我们开发了用于PDE型环境数据的变分自编码器(VAE)的黎曼方法,其中包含正则化几何潜在动力学,称为VAE-DLM(具有动态潜在流形的VAE)。我们重新构建了VAE框架,使得嵌入欧几里得空间中的流形几何(受我们的几何流约束)在编码器和解码器开发的中间潜在空间中被学习。通过定制潜在空间演化的几何流,我们诱导出我们选择的潜在几何性质,这些性质反映在经验性能中。我们通过谨慎选择先验重新表述了传统的证据下界(ELBO)损失。我们开发了一个具有稳态正则化项的线性几何流。该流只需要对一个时间导数进行自动微分,并且可以在中等高维度上以物理信息方法求解,从而允许更具表达力的潜在表示。我们讨论了该流如何被表述为梯度流,并保持熵远离度量奇点。这结合特征值惩罚条件,有助于确保流形在测度上足够大、非退化且具有规范几何,从而有助于鲁棒表示。我们的方法侧重于改进的多层感知器架构,使用tanh激活函数用于流形编码器-解码器。我们在感兴趣的数据集上证明,我们的方法至少与传统VAE表现相当,且通常更好。我们的方法可以超越传统VAE以及采用我们提出架构的VAE,在选定数据集上经常将分布外(OOD)误差降低15%至35%。我们重点展示了我们的方法在环境PDE上的应用,这些PDE的解在后期保持最小变化。我们提供了经验性证明,说明如何通过VAE改进外部动力学的鲁棒学习。

英文摘要

We develop Riemannian approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics, which we refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, embedded in Euclidean space are learned in the intermediary latent space developed by encoders and decoders. By tailoring the geometric flow in which the latent space evolves, we induce latent geometric properties of our choosing, which are reflected in empirical performance. We reformulate the traditional evidence lower bound (ELBO) loss with a considerate choice of prior. We develop a linear geometric flow with a steady-state regularizing term. This flow requires only automatic differentiation of one time derivative, and can be solved in moderately high dimensions in a physics-informed approach, allowing more expressive latent representations. We discuss how this flow can be formulated as a gradient flow, and maintains entropy away from metric singularity. This, along with an eigenvalue penalization condition, helps ensure the manifold is sufficiently large in measure, nondegenerate, and a canonical geometry, which contribute to a robust representation. Our methods focus on the modified multi-layer perceptron architecture with tanh activations for the manifold encoder-decoder. We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better. Our methods can outperform this and a VAE endowed with our proposed architecture, frequently reducing out-of-distribution (OOD) error between 15% to 35% on select datasets. We highlight our method on ambient PDEs whose solutions maintain minimal variation in late times. We provide empirical justification towards how we can improve robust learning for external dynamics with VAEs.

2506.14594 2026-06-17 cond-mat.dis-nn cond-mat.stat-mech stat.ML 版本更新

Uncertainty in AI-driven Monte Carlo simulations

人工智能驱动的蒙特卡洛模拟中的不确定性

Dimitrios Tzivrailis, Alberto Rosso, Eiji Kawasaki

AI总结 针对AI代理模型在蒙特卡洛模拟中引入的认知不确定性,提出罚函数集成法(PEM),通过修改Metropolis接受规则增加高不确定性区域的拒绝概率,提升模拟可靠性。

详情
AI中文摘要

在复杂系统研究中,评估物理可观测量通常需要借助蒙特卡洛技术对代表性构型进行采样。这些方法依赖于对系统能量和力场的重复评估,可能带来高昂的计算成本。为加速此类模拟,深度学习模型越来越多地被用作替代函数来近似能量景观或力场。然而,这类模型会在其预测中引入认知不确定性,这种不确定性可能通过采样过程传播并影响模拟的宏观行为。在我们的工作中,我们提出了罚函数集成法(PEM)来量化认知不确定性并减轻其对蒙特卡洛采样的影响。我们的方法引入了一种对不确定性敏感的Metropolis接受规则修改,该规则在高不确定性区域增加拒绝概率,从而增强模拟结果的可靠性。

英文摘要

In the study of complex systems, evaluating physical observables often requires sampling representative configurations via Monte Carlo techniques. These methods rely on repeated evaluations of the system's energy and force fields, which can become computationally expensive. To accelerate these simulations, deep learning models are increasingly employed as surrogate functions to approximate the energy landscape or force fields. However, such models introduce epistemic uncertainty in their predictions, which may propagate through the sampling process and affect the simulation's macroscopic behavior. In our work, we present the Penalty Ensemble Method (PEM) to quantify epistemic uncertainty and mitigate its impact on Monte Carlo sampling. Our approach introduces an uncertainty-aware modification of the Metropolis acceptance rule, which increases the rejection probability in regions of high uncertainty, thereby enhancing the reliability of the simulation outcomes.

2405.15379 2026-06-17 stat.ML cs.LG math.PR math.ST 版本更新

Randomized Midpoint Method for Log-Concave Sampling under Constraints

对数凹分布约束采样的随机中点方法

Yifeng Yu, Shijie Zhang, Lu Yu

AI总结 提出约束域中过阻尼和动能朗之万扩散的随机中点离散化方法,通过投影算子建立统一框架,证明Wasserstein-q距离下的收敛保证并得到近最优下界。

详情
AI中文摘要

本文研究在凸紧集上支撑的对数凹分布的采样问题,特别关注约束域中过阻尼和动能朗之万扩散的随机中点离散化。我们重新审视了通过投影算子处理约束的近端框架,并发展了一个更通用的公式,涵盖了欧几里得、Bregman和Gauge投影。由此产生的光滑近似允许对约束下的朗之万算法及其变体进行统一且易于处理的分析。在此框架内,我们建立了光滑代理与目标分布之间Wasserstein-$q$($q\geqslant 1$)距离的收敛保证。我们进一步推导了互补的下界,表明结果在阶上是近乎最优的。基于这种紧致近似分析,我们获得了约束下随机中点朗之万算法的新收敛保证,以及普通和动能朗之万蒙特卡洛方法的改进界,从而推进了约束扩散采样的理论理解。

英文摘要

In this paper, we study the problem of sampling from log-concave distributions supported on convex and compact sets, with a particular focus on the randomized midpoint discretization of both overdamped and kinetic Langevin diffusions in constrained domains. We revisit the proximal framework for handling constraints through projection operators and develop a more general formulation that encompasses Euclidean, Bregman, and Gauge projections. The resulting smooth approximation allows a unified and tractable analysis of Langevin algorithms and their variants under constraints. Within this framework, we establish convergence guarantees in Wasserstein-$q$ $(q\geqslant 1)$ distances between the smooth surrogate and the target distribution. We further derive complementary lower bounds, showing that the results are near-optimal in order. Building upon this tight approximation analysis, we obtain new convergence guarantees for the randomized midpoint Langevin algorithms and refined bounds for both vanilla and kinetic Langevin Monte Carlo methods under constraints, thereby advancing the theoretical understanding of constrained diffusion-based sampling.

7. 机器学习统计基础 33 篇

2606.17423 2026-06-17 q-fin.CP stat.ML 新提交

Martingale Doppelgänger-Eval: An Identification Framework for Auditing Candlestick Understanding in Vision-Language Models

鞅双生评估:审计视觉语言模型对K线图理解的识别框架

Ziyao Wang

AI总结 提出Martingale Doppelgänger-Eval基准,通过受控实验识别VLM是否基于K线证据而非趋势外推进行判断,发现模型忽略或反向利用K线语义。

详情
AI中文摘要

我们引入了Martingale Doppelgänger-Eval,一个公开的影子市场基准,用于审计视觉语言模型(VLM)是否使用K线证据而非外推过去趋势。核心困难在于识别:在真实市场历史中,图表证据和趋势高度耦合,因此观测得分无法确定流畅的技术分析叙述是否基于局部视觉证据。我们形式化证明了这一局限性:在强耦合下,没有基于观测的图表-标签数据计算的评估函数能够区分基于证据的响应者和基于趋势捷径的响应者,而匹配的证据干预以指数速率区分相同的响应者,趋势-标签交换提供了独立的捷径压力测试。因此,该基准在四种受控机制下评估冻结的VLM:鞅零市场、注入阿尔法的反事实对、趋势混杂交换和制度转换。结构行为模型识别了零市场偏差、趋势敏感性、证据敏感性、提示/渲染器脆弱性和证据忠实性;附带的统计工具包提供了最小可检测效应、针对计量API的块感知序贯测试以及重叠加权伪影检查。在冻结的商业和开源VLM中,识别回归将大的正系数分配给过去趋势,但证据系数为零或与规则隐含符号相反。匹配对分析表明,模型要么忽略注入的K线语义,要么在响应时朝与规则隐含方向相反的方向移动。该基准隔离了标准观测图表基准无法检测的失败模式,并为具有可控标签机制的时间序列图像提供了可复用的审计模板。

英文摘要

We introduce Martingale Doppelgänger-Eval, a public shadow-market benchmark for auditing whether vision-language models (VLMs) use candlestick evidence rather than extrapolate past trends. The central difficulty is identification: on real market histories, chart evidence and trend are strongly coupled, so an observational score cannot determine whether a fluent technical-analysis narrative is grounded in local visual evidence. We prove this limitation formally: no evaluation functional computed from observational chart--label data can distinguish a grounded responder from a trend-shortcut responder under strong coupling, whereas matched evidence interventions separate the same responders at an exponential rate and trend--label swaps provide an independent shortcut stress test. The benchmark therefore evaluates frozen VLMs on rendered OHLCV charts under four controlled mechanisms: a martingale-null market, injected-alpha counterfactual pairs, trend-confounder swaps, and regime shifts. A structural behavioral model identifies null-market bias, trend sensitivity, evidence sensitivity, prompt/renderer fragility, and evidence faithfulness; the accompanying statistical toolkit provides minimum detectable effects, block-aware sequential testing for metered APIs, and an overlap-weighted artifact check. Across frozen commercial and open VLMs, the identified regression assigns large positive coefficients to past trend but evidence coefficients that are zero or opposite to the rule-implied sign. Matched-pair analyses show that models either ignore injected candlestick semantics or move opposite to the rule-implied direction conditional on responding. The benchmark isolates a failure mode that standard observational chart benchmarks cannot detect and gives a reusable audit template for time-series imagery with controllable label mechanisms.

2606.18074 2026-06-17 stat.ML cs.LG stat.ME 新提交

Tensor-based second-order causal discovery

基于张量的二阶因果发现

Nathan Ouyang, Kexin Wan, Anna Seigal

AI总结 提出TSCD算法,利用观测和干预数据的协方差矩阵张量,在线性结构方程模型下识别有向无环图及其边函数,仅要求噪声不相关,并扩展到非线性模型,具有对数级干预可识别性。

Comments 27 pages, 7 figures. Code available at this https URL (https://github.com/QWE123665/Tensor-based-Second-order-Causal-Discovery)

详情
AI中文摘要

因果发现旨在揭示变量间的因果依赖关系。为此,我们提出了一种称为基于张量的二阶因果发现(TSCD)的算法。其输入是从观测数据和干预数据的协方差矩阵中得到的张量。假设因果依赖关系遵循有向无环图(DAG)上的线性结构方程模型,TSCD输出DAG及其边上的函数,仅要求噪声变量不相关。我们还实现了该方法在非线性模型中的版本。我们关注二阶统计量(通过协方差矩阵)的动机是:相对于高阶矩,它们在统计和计算上更高效;相对于一阶统计量,它们具有可识别性;并且无论变量是否为高斯分布,它们都适用。我们证明,TSCD从对数于变量数量的干预次数中可识别因果顺序和参数。实验表明,TSCD对噪声具有鲁棒性,与现有方法相比具有竞争力,并且可扩展到数百个变量。

英文摘要

Causal discovery seeks to uncover the causal dependencies among variables. For this purpose, we propose an algorithm called Tensor-based Second-order Causal Discovery (TSCD). Its input is a tensor obtained from the covariance matrices of observational and interventional data. Assuming the causal dependencies follow a linear structural equation model on a directed acyclic graph (DAG), TSCD outputs the DAG and the functions on its edges, requiring only that the noise variables are uncorrelated. We also implement a version of the approach for nonlinear models. Our focus on second-order statistics (via the covariance matrices) is motivated by their statistical and computational efficiency relative to higher-order moments, their identifiability relative to first-order statistics, and that they work regardless of whether the variables are Gaussian. We show that TSCD has identifiable causal order and parameters from a number of interventions that is logarithmic in the number of variables. Experiments show that TSCD is robust to noise, competitive with existing methods, and scales to hundreds of variables.

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 新提交

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh(生物医学信息学系,匹兹堡大学)

AI总结 提出BLITZ方法,通过两阶段回归(低阶多项式+浅层树)快速消除条件集影响,实现校准良好的非参数条件独立性检验,适用于因果发现。

Comments A fast R implementation with C++ back-end is available at this https URL (https://github.com/ericstrobl/BLITZ)

详情
AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验,但快速非参数检验往往牺牲校准性,尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ(Broad-to-Local Independence Testing via residualiZation),一种非参数条件独立性检验,旨在在一秒内运行良好,同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖,然后应用一个小型非线性特征映射,并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差,并采用矩匹配卡方近似于零分布。我们从理论上证明,两阶段设计降低了树残差化器面临的有效复杂度,使得浅层树能够控制残差条件均值偏差,同时避免过度过拟合。在模拟中,BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准,同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中,BLITZ在保留的邻接中产生了更可靠的端点方向,并具有竞争力的结构恢复。这些结果表明,从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.

2606.17383 2026-06-17 q-fin.RM cs.AI cs.LG stat.ML 新提交

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

智能体AI系统的模型验证:基于POMDP的信念状态、预测与策略验证框架

Matthew Francis Dixon

发表机构 * Quiota LLC(Quiota公司)

AI总结 提出基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架,将自主决策分解为信息、信念、预测、动作和效用组件独立验证,并通过投资组合管理案例展示其有效性。

Comments 28 pages, 3 figures, 6 tables. Source code available from this https URL (https://github.com/mfrdixon/agentic-AI-as-POMDP)

详情
AI中文摘要

智能体人工智能系统引入了一类新的模型风险。与传统预测模型不同,自主智能体持续获取信息,形成关于环境潜在状态的信念,生成预测,选择行动,并随时间调整其行为。现有的验证方法主要关注预测准确性,因此对底层决策过程的质量提供的洞察有限。本文提出了一种基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架。该框架将自主决策分解为信息、信念、预测、行动和效用,允许每个组件独立验证。大型语言模型(LLM)被形式化为近似贝叶斯滤波算子,并开发了一个模型风险分类体系,涵盖状态空间、滤波、预测、策略、效用规范和参数风险。通过一个投资组合管理案例研究展示了模型风险验证方法,其中智能体从市场和宏观经济信息中推断潜在市场制度,生成基于信念的预测,并使用Black-Litterman框架构建投资组合。实证验证结合了性能分析、信念校准诊断、覆盖测试、消融研究和参数敏感性分析。结果表明,潜在状态推断对决策质量有独立贡献,且主要结论在广泛的参数值范围内保持稳健。本文的主要贡献是提供了一个实用框架,将已建立的模型风险管理概念扩展到自主AI系统,并为其验证、治理和监控提供了严格的基础。

英文摘要

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black--Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

2606.17196 2026-06-17 stat.ML cs.LG stat.ME 新提交

Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence

再探概率测度的Log-PCA:一种动力学公式与统计收敛性

Peng Xu, Changbo Zhu, Young-Heon Kim, Xiaohui Chen

发表机构 * Department of Statistics University of Illinois Urbana-Champaign(统计学系伊利诺伊大学厄巴纳-香槟分校) Department of ACMS University of Notre Dame(ACMS系诺丁汉大学) Department of Mathematics University of British Columbia(数学系不列颠哥伦比亚大学) Department of Mathematics Thomas Lord Department of Computer Science University of Southern California(数学系托马斯·劳德计算机科学系南加州大学)

AI总结 本文在Wasserstein几何下提出一种动力学公式解释log-PCA,称为Wasserstein切向PCA(WT-PCA),并推导了经验WT-PCA相对于总体测度的统计收敛速率。

详情
AI中文摘要

本文关注在Wasserstein几何下学习随机概率测度在$\mathbb{R}^m$上的主变差。我们引入一种新的动力学公式来解释log-PCA(一种线性化的主测地线分析)作为变分方法。我们的可微版本称为Wasserstein切向PCA(WT-PCA),通过其在重心处的协方差算子捕获Wasserstein空间上(加权)概率测度的局部主测地线变差模式。基于动力学视角并利用最优传输问题的平行传输结构,我们推导了从数据估计的经验WT-PCA相对于总体和经验重心参考测度之间的2-Wasserstein距离的通用统计收敛速率。

英文摘要

This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

2606.17215 2026-06-17 cs.LG cs.DS stat.ML 新提交

Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization

鲁棒半空间学习中重加权铰链方法的平方和度障碍:一个Christoffel函数刻画

Xiaoyu Li

发表机构 * Xiaoyu Li(李小宇)

AI总结 本文通过Christoffel函数精确刻画了有界度证书无法去除的异常质量,揭示了重加权铰链方法在恶意噪声下学习γ-间隔半空间时,证书的SoS度与异常容忍度之间的基本权衡。

详情
AI中文摘要

一个去除异常值的证书仅通过低阶矩观察数据,而对手恰恰利用这一点,将腐败隐藏在干净数据已经看似典型的盲区中,该盲区无法被任何有界度测试分辨。这个盲区恰好有一个精确的大小:干净边际分布的Christoffel函数,这正是现代数据分析中用于检测异常值的量,此处从对手的角度解读为有界度证书无法去除的腐败。我们将这一反转作为在恶意噪声下鲁棒学习γ-间隔半空间的重加权铰链方法(Shen, 2025; Zeng and Shen, 2025)的组织原则:支配性资源是异常去除证书的平方和(SoS)度,而分辨原则指出,在中心c处能够对度-2t证书隐藏的最大腐败质量恰好是干净边际分布的Christoffel函数λ_{t+1}(c)。由此得出三个推论,均针对证书方法(而非信息论极限)。边际-度权衡:将密集煎饼认证到误差ϵ需要SoS度Ω(log(1/ϵ))或边际Ω(√(log(1/ϵ))/√d),解释了Shen (2025)中记录的log(1/ϵ)边际是必然的,通过加权Chebyshev归约使得阈值2t=Θ((|c|/s)^2)在经典加权极值估计下是紧的。度-2异常障碍:分辨原则实现为一个显式实例,其中度2卡在η^{1/2}而度4逃脱,将方法的小崩溃率定位在度上而非分析中。以及一个度-2t算法追踪前沿η^{1-1/2t}(在t=1时恢复Shen (2025)),其增益为显式常数,受限于煎饼密度,并由度-2障碍证明不可改进。

英文摘要

A certificate that removes outliers sees the data only through its low-degree moments, and an adversary exploits exactly this, hiding corruption where the clean data already looks typical, in the blind spot no bounded-degree test resolves. That blind spot turns out to have an exact size: the Christoffel function of the clean marginal, the very quantity modern data analysis thresholds to detect outliers, here read from the adversary's side as the corruption a bounded-degree certificate cannot remove. We turn this inversion into the organizing principle of the reweighted-hinge approach to robustly learning $\gamma$-margin halfspaces under malicious noise (Shen, 2025; Zeng and Shen, 2025): the governing resource is the Sum-of-Squares degree of the outlier-removal certificate, and the resolution principle states that the maximal corruption mass which can hide at a center $c$ from a degree-$2t$ certificate is exactly the Christoffel function $\lambda_{t+1}(c)$ of the clean marginal. Three consequences follow, all against the certificate method (not information-theoretic). A margin-degree tradeoff: certifying the dense pancake to error $\epsilon$ costs SoS degree $\Omega(\log(1/\epsilon))$ or margin $\Omega(\sqrt{\log(1/\epsilon)}/\sqrt{d})$, explaining why the $\log(1/\epsilon)$ margin Shen (2025) records is forced, with a weighted-Chebyshev reduction making the threshold $2t=\Theta((|c|/s)^2)$ tight modulo one classical weighted-extremal estimate. A degree-$2$ outlier barrier: the resolution principle realized as an explicit instance on which degree $2$ is stuck at $\eta^{1/2}$ while degree $4$ escapes, locating the method's small breakdown rate in the degree, not the analysis. And a degree-$2t$ algorithm tracing the frontier $\eta^{1-1/2t}$ (recovering Shen (2025) at $t=1$), whose gain is an explicit constant, capped by the pancake density and shown unimprovable by the degree-$2$ barrier.

2606.18183 2026-06-17 stat.ML cs.LG math.PR 新提交

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

马尔可夫噪声下线性特征时序差分学习的扩散近似

M. Forzo, E. Monzio Compagnoni, A. Russo, A. Pacchiano

发表机构 * Technical University of Munich (TUM), Munich, Germany(慕尼黑技术大学) University of Basel, Basel, Switzerland(巴塞尔大学) Boston University, Boston, USA(波士顿大学)

AI总结 针对线性TD(0)在马尔可夫噪声下的随机波动,提出随机微分方程近似模型,揭示投影Bellman算子收缩动力学与马尔可夫采样影响的区别,解释常数步长误差下限。

详情
AI中文摘要

带有线性函数逼近的时序差分(TD)学习是策略评估的核心方法。其经典连续时间描述为常微分方程(ODE),捕捉渐近均值动态但忽略了决定误差下限的随机波动。我们引入了马尔可夫噪声下线性TD(0)的随机微分方程(SDE)近似。所得模型将投影Bellman算子控制的收缩动力学与马尔可夫采样的影响区分开来。因此,该模型通过马尔可夫长期协方差与投影Bellman算子收缩几何之间的相互作用解释了常数步长误差下限。

英文摘要

Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error floor. We introduce a stochastic differential equation (SDE) approximation for linear TD(0) under Markovian noise. The resulting model distinguishes the contraction dynamics governed by the projected Bellman operator from the influence of Markovian sampling. As a consequence, the model explains the constant-stepsize error floor through the interaction between Markovian long-run covariance and the contraction geometry of the projected Bellman operator.

2606.17426 2026-06-17 stat.ML cs.LG math.PR 新提交

Bounded Difference Concentration for Infinitely Exchangeable Sequences with Applications to AI Benchmark Uncertainty

无限可交换序列的有界差分集中不等式及其在AI基准不确定性中的应用

Fangyuan Lin, Spencer Frei, Victor H. de la Pena

发表机构 * Department of Statistics, Columbia University(哥伦比亚大学统计系) Google DeepMind(谷歌DeepMind)

AI总结 通过de Finetti测度分解有界差分函数的偏差,提出有效方差代理的集中不等式,并证明零和线性对比中潜在混合项完全抵消,应用于AI基准如MMLU的不确定性量化。

详情
AI中文摘要

我们考虑无限可交换随机变量函数的集中性质。通过对de Finetti导向测度取条件,我们证明任何具有有界差分常数$c_1, \dots, c_n$的函数的偏差分解为条件采样波动和潜在混合波动。当该潜在混合是$\sigma_{\mathrm{mix}}^2$-次高斯时,我们建立了一个有效方差代理为$\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$的集中不等式。关键的是,我们证明对于零和线性对比,例如子样本均值与总体均值之差,潜在混合项完全抵消。这种抵消产生了一个紧的、无混合的Hoeffding型界,为近期有限可交换集中结果的无限可扩展极限提供了直接的de Finetti机制。我们将该框架应用于量化复合AI基准(如MMLU)中的不确定性,其中问题项在领域间自然表现出可交换依赖性。我们的结果既提供了一个领域分层层次模型来限制准确率分数的不确定性,也提供了一个无分布、节省成本的统计保证,用于从随机子集准确估计完整的基准分数。

英文摘要

We consider the concentration properties of functions of infinitely exchangeable random variables. By conditioning on the de Finetti directing measure, we show that the deviation of any function with bounded-difference constants $c_1, \dots, c_n$ decomposes into a conditional sampling fluctuation and a latent mixture fluctuation. When this latent mixture is $\sigma_{\mathrm{mix}}^2$-subgaussian, we establish a concentration inequality with an effective variance proxy of $\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$. Crucially, we demonstrate that for zero-sum linear contrasts, such as the difference between a subsample mean and a full population mean, the latent mixture term cancels exactly. This cancellation yields a tight, mixture-free Hoeffding-type bound that provides a direct de Finetti mechanism for the infinite-extendibility limit of recent finite-exchangeable concentration results. We apply this framework to quantify uncertainty in composite AI benchmarks, such as MMLU, where question items naturally exhibit exchangeable dependence across domains. Our results provide both a domain-stratified hierarchical model for bounding the uncertainty of accuracy scores, and a distribution-free, cost-saving statistical guarantee for accurately estimating full benchmark scores from random subsets.

2606.17319 2026-06-17 stat.ML cs.LG math.CO math.ST 新提交

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

低次稀疏布尔多项式的紧 $L_\infty$ 样本复杂度

Jasper van Doornmalen, Mathieu Molina, Victor Verdugo, José Verschae

发表机构 * Institute for Mathematical and Computational Engineering(数学与计算工程研究所) Pontificia Universidad Católica de Chile(智利天主教大学) Blavatnik School of Computer Science and AI(Blavatnik计算机科学与人工智能学院) Tel Aviv University(特拉维夫大学) Department of Industrial and Systems Engineering(工业与系统工程系)

AI总结 针对有界二进制黑箱函数优化,研究布尔超立方体上多项式代理的学习问题,要求均匀 $L_\infty$ 误差保证,刻画了次高斯噪声下两类有界多项式的最小最大样本复杂度。

详情
AI中文摘要

受有界二进制黑箱函数优化的启发,我们研究了在布尔超立方体上学习多项式代理的问题。为了确保优化代理能为底层目标产生良好解,我们需要均匀的 $L_\infty$ 误差保证,而非通常的 $L_2$ 型保证。我们刻画了次高斯噪声下两类有界多项式的均匀估计的最小最大样本复杂度。首先,对于 $n$ 个变量上次数至多为 $d$ 的多项式,样本复杂度为 $n^{d+1}$。其次,对于 $s$-稀疏 Fourier-Walsh 多项式且 $s \leq n$,样本复杂度为 $ns^2$。这些速率在结构上不同于无噪声情形,其中均匀精确恢复的速率分别为 $n^d$ 和 $ns$。我们的下界甚至对任意自适应学习者也成立,表明额外的因子是噪声情形固有的。$L_2$ 范数的标准傅里叶分析工具不能自然地扩展到 $L_\infty$ 设置以产生均匀保证。我们的证明通过依赖适当选择的辅助范数作为控制 $L_\infty$ 误差的代理来克服这一困难。总之,我们的结果提供了学习优化安全多项式代理的样本复杂度的紧刻画。

英文摘要

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

2606.17185 2026-06-17 cs.LG eess.SP math.DG stat.ML 新提交

Finsler Geometry, Graph Neural Networks, and You

芬斯勒几何、图神经网络与你

T. Mitchell Roddenberry, Richard G. Baraniuk

发表机构 * Rice University(莱斯大学)

AI总结 针对图拉普拉斯只能近似各向同性算子的局限,提出基于芬斯勒拉普拉斯的图神经网络层,证明其收敛性并恢复非线性扩散方程的几何结构。

详情
AI中文摘要

基于图拉普拉斯的图神经网络架构近似拉普拉斯-贝尔特拉米算子,因此限制了它们在各向同性算子上的应用。作为拉普拉斯-贝尔特拉米算子的非线性替代,我们考虑从流形上采样的点云上芬斯勒拉普拉斯的估计。我们证明,随着点样本数量的增加,这些离散估计收敛到流形上的真实算子。此外,我们表明该算子可以表示为图神经网络层,我们用它来定义一组受约束以表达芬斯勒几何的芬斯勒图神经网络。我们表明,芬斯勒图神经网络在实践中恢复了非线性扩散方程背后的几何结构。

英文摘要

Graph neural network architectures based on the graph Laplacian approximate the Laplace-Beltrami operator, thus limiting their application to isotropic operators. As a nonlinear alternative to the Laplace-Beltrami operator, we consider estimates of the Finsler Laplacian on point clouds sampled from a manifold. We prove that these discrete estimates converge to the true operator on the manifold as the number of point samples grows. Moreover, we show that this operator can be expressed as a graph neural network layer, which we use to define a family of Finslerian graph neural networks constrained to express Finsler geometry. We show that Finslerian graph neural networks recover the geometry underlying nonlinear diffusion equations in practice.

2606.17665 2026-06-17 math.ST math.PR stat.ML 新提交

Non-asymptotic Tail Bounds for the Kostlan--Shub--Smale Field: Tensor PCA and Spherical $k$-Spin Complexity

Kostlan–Shub–Smale 场的非渐近尾部界:张量 PCA 和球面 $k$-自旋复杂度

Jean-Marc Azaïs (IMT), Federico Dalmao (UDELAR), Yohann De Castro (ICJ, ECL, PSPM, IUF)

AI总结 本文为球面上KSS随机场的上确界建立显式非渐近尾部界层次结构,并应用于尖峰张量PCA和球面k-自旋模型景观,通过管方法和秩约简将估计误差归结为KSS场,利用Kac-Rice公式和Mehta-Fyodorov表示等得到显式尾部界。

详情
AI中文摘要

本文为球面上Kostlan–Shub–Smale (KSS) 随机场的上确界建立了一个显式非渐近尾部界的层次结构,并将其应用于两个问题:尖峰张量PCA和球面$k$-自旋模型的景观。对于张量PCA,我们研究了在信噪比$\lambda$下,通过\textit{轮廓最大似然估计}(即限制在相干性至少为$\kappa$的归一化秩-$R$张量上的MLE),从单个高斯观测中估计秩为$R$、阶数为$k\ge 3$、维度为$d\ge 3$的对称信号张量的非渐近统计极限。我们的分析使用了一个单一的简化:一个确定性几何不等式(管方法)和秩约简步骤将估计误差界定为典型KSS场的上确界,而Kac-Rice公式将其转化为一个高斯积分,该积分涉及平移高斯正交系综的期望绝对特征多项式,进而由我们层次结构中的四个显式尾部界(三个来自Mehta–Fyodorov表示,一个来自Ben Arous–Dembo–Guionnet大偏差)控制。相同的简化产生了两个结果,每个都带有显式常数。对于估计,一个有限$(k,d)$误差界恢复了Perry、Wein和Bandeira的渐近最优速率$\sqrt{d\log k}$,并显式依赖于秩$R$和相干性$\kappa$。对于景观,球面$k$-自旋哈密顿量的退火复杂度的双侧非渐近括号界恢复了高维极限下的Auffinger–Ben Arous–Černý复杂度函数。

英文摘要

This paper builds a hierarchy of explicit, non-asymptotic tail bounds for the supremum of the Kostlan--Shub--Smale (KSS) random field on the sphere, and applies it to two problems: Spiked Tensor PCA and the landscape of the spherical $k$-spin model. For Tensor PCA, we study the non-asymptotic statistical limits of estimating a rank-$R$ symmetric signal tensor of order~$k\ge 3$ and dimension~$d\ge 3$ from a single Gaussian observation at signal-to-noise ratio~$\lambda$, through the \emph{profile maximum likelihood estimator}, the MLE restricted to normalized rank-$R$ tensors of coherence at least~$\kappa$. Our analysis uses a single reduction: a deterministic geometric inequality (the Tube Method) and a rank-reduction step bound the estimation error by the supremum of the canonical KSS field, which the Kac--Rice formula turns into a Gaussian integral against the expected absolute characteristic polynomial of a shifted Gaussian Orthogonal Ensemble, controlled in turn by the four explicit tail bounds of our hierarchy (three from a Mehta--Fyodorov representation, one from a Ben Arous--Dembo--Guionnet large deviation). The same reduction yields two results, each with explicit constants. For estimation, a finite-$(k,d)$ error bound recovers the asymptotically optimal rate~$\sqrt{d\log k}$ of Perry, Wein and Bandeira, with explicit dependence on the rank~$R$ and the coherence~$\kappa$. For the landscape, a two-sided non-asymptotic bracketing of the annealed complexity of the spherical $k$-spin Hamiltonian recovers the Auffinger--Ben Arous--Černý complexity function in the high-dimensional limit.

2606.17364 2026-06-17 math.ST math.OC stat.ML 新提交

A Polyak-Ruppert Central Limit Theorem for SA-Adam with Momentum and Non-Convergent Adaptive Preconditioning

带动量与非收敛自适应预条件化的SA-Adam的Polyak-Ruppert中心极限定理

Sunyoung An, Xiaoming Huo

AI总结 证明在动量和非收敛预条件化下,Polyak-Ruppert平均的Adam迭代仍满足经典中心极限定理,渐近协方差与SGD相同。

Comments 44 pages, 3 figures

详情
AI中文摘要

结合预条件化、动量和权重衰减的自适应优化器(Adam和AdamW),在Polyak-Ruppert平均下,是单次推断的候选引擎。在动量和非收敛预条件化下,平均迭代是否保持经典的Polyak-Ruppert中心极限定理(CLT),具有三明治协方差$H^{-1}SH^{-1}$(Hessian $H$,梯度协方差$S$)?仅预条件化的分析不适用:带动量时,规范分解退化为同义反复。将增广状态(迭代、动量缓冲区)视为时变线性随机逼近(SA),我们证明了(在局部稳定化下)正漂移稳定性、非自治Polyak-Ruppert CLT和投影恒等式。结论:迭代边际协方差恰好是普通随机梯度下降(SGD)的三明治$H^{-1}SH^{-1}$,因此自适应在渐近意义下不可见。这适用于SA-Adam(次线性衰减动量增益,$\gamma\in(\alpha,1)$;次线性区间是关键),而非恒定$\beta$的Adam。耦合$L_2$权重衰减产生岭惩罚三明治,将单次推断扩展到正则化问题。

英文摘要

Adaptive optimizers combining preconditioning, momentum, and weight decay (Adam and AdamW) are, under Polyak-Ruppert averaging, candidate engines for one-pass inference. Does the averaged iterate keep the classical Polyak-Ruppert central limit theorem (CLT), with sandwich covariance $H^{-1}SH^{-1}$ (Hessian $H$, gradient covariance $S$), under momentum and non-convergent preconditioning? The preconditioner-only analysis does not carry over: with momentum the canonical decomposition collapses to a tautology. Treating the augmented state (iterate, momentum buffer) as a time-varying linear stochastic approximation (SA), we prove (under local stabilization) positive drift stability, a non-autonomous Polyak-Ruppert CLT, and a projection identity. The upshot: the iterate-marginal covariance is exactly the plain stochastic gradient descent (SGD) sandwich $H^{-1}SH^{-1}$, so the adaptivity is asymptotically invisible. This holds for SA-Adam (sub-linearly vanishing momentum gain, $\gamma\in(\alpha,1)$; the sub-linear regime is essential), not constant-$\beta$ deployed Adam. Coupled $L_2$ weight decay yields the ridge-penalized sandwich, extending one-pass inference to regularized problems.

2606.17260 2026-06-17 math.OC cs.LG stat.ML 新提交

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

基于确定性积分时间的哈密顿动力学的加速凸优化

Xiuyuan Wang, Vishwak Srinivasan, Qiang Fu, Siddharth Mitra, Ashia Wilson, Andre Wibisono

发表机构 * Department of Computer Science, Yale University(耶鲁大学计算机科学系) Department of EECS, Massachusetts Institute of Technology(麻省理工学院电子工程与计算机科学系)

AI总结 提出基于哈密顿动力学的平滑凸优化算法,通过利用平均哈密顿流轨迹的收缩而非端点收缩,实现确定性加速收敛,并推导出具有最优一阶复杂度的离散实现。

Comments 51 pages, 7 figures. Accepted to the 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

我们开发了基于哈密顿动力学的平滑凸优化算法,实现了加速收敛速率。通过利用平均哈密顿流轨迹的收缩而非要求轨迹端点处的收缩,我们证明了基于哈密顿动力学的优化方法具有确定性的加速收敛保证,扩展了先前仅限于二次目标或仅在期望中成立的工作。我们分析了一个理想的连续时间算法,并推导了具有最优一阶复杂度的实用离散时间实现,从而将哈密顿动力学确立为确定性加速凸优化的有用算法原语。

英文摘要

We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.

2606.16379 2026-06-17 cs.LG stat.ML 新提交

Scalable and Interpretable Representation Alignment with Ordinal Similarity

可扩展且可解释的序数相似性表示对齐

Diogo Soares, Pankhil Gawade, Andrea Dittadi, Ewa Szczurek

发表机构 * University of Maryland(马里兰大学) Google Research(谷歌研究院)

AI总结 针对现有表示相似性度量缺乏可解释性、对异常值敏感且计算复杂的问题,提出基于序数相似性的三元组和四元组相似性指数,实现可解释、鲁棒且高效的对齐度量。

详情
AI中文摘要

评估表示相似性是表示学习的基础。然而,现有度量存在显著局限性:由于基线漂移而缺乏可解释性,对异常值缺乏鲁棒性,并且对于大型数据集计算上难以处理,迫使依赖启发式近似。为了解决这些问题,我们开发了一个序数相似性框架,通过三元组相似性指数(TSI)和四元组相似性指数(QSI)实例化,通过量化序数关系的一致性来衡量对齐。我们从理论上证明,这种公式本质上是可解释的、对异常值鲁棒的,并且计算高效。最后,我们建立了TSI与通过互近邻度量的局部邻域对齐之间的形式等价性。实验上,我们验证了这些性质,并表明序数相似性提供了一种可扩展的对齐度量方法,使从业者能够更好地理解和设计表示。

英文摘要

Evaluating representation similarity is fundamental to representation learning. However, existing metrics suffer from significant limitations: they lack interpretability due to shifting baselines, lack robustness to outliers, and are computationally intractable for large datasets, forcing reliance on heuristic approximations. To address this, we develop an ordinal-similarity framework, instantiated by the Triplet (TSI) and Quadruplet (QSI) Similarity Indices, which measure alignment by quantifying the consistency of ordinal relationships. We theoretically demonstrate this formulation is inherently interpretable, robust to outliers, and computationally efficient. Finally, we establish a formal equivalence between TSI and local neighborhood alignment, measured by Mutual Nearest Neighbors. Empirically, we validate these properties and show that ordinal similarity offers a scalable approach to measuring alignment, enabling practitioners to better understand and design representations.

2606.14954 2026-06-17 math.FA cs.LG math.OC stat.ML 新提交

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

数据科学中的表示代价:基础与深度神经网络的拟巴拿赫空间

Greg Ongie, Rahul Parhi

AI总结 本文建立了一个统一框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价,揭示了深度神经网络诱导的本征空间是拟巴拿赫空间,并证明了表示定理等自然结果。

详情
AI中文摘要

我们开发了一个通用框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价。从这个抽象视角,我们定义了任意参数化模型的表示代价,并揭示了它们诱导的(本征)函数空间。这统一了最近数据拟合方法的函数空间观点。我们还证明了许多自然结果在这个抽象设置中成立,包括参数方法在其本征空间上的表示定理。该框架还严格地将参数化方法与其在充分过参数化下的等价非参数描述联系起来。经典方法及其本征空间,如核方法/再生核希尔伯特空间、小波/贝索夫空间和浅层神经网络/变分空间,都是我们抽象框架的特例。将表示代价研究“公理化”的一个副产品是,我们立即获得了深度神经网络的新结果:对于深度为$L$的前馈ReLU网络,其诱导的本征空间是$p$范数可拟的拟巴拿赫空间,其中$p = 2/L$。这揭示了深度神经网络的归纳偏置(由表示代价给出)在深度$L > 2$时无法被范数捕捉。

英文摘要

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

2605.29669 2026-06-17 stat.ML cs.LG math.PR math.ST 版本更新

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Eigen-Spike 涌现与共轭核在非线性可分数据上的二次等价

Collin Cranston, Zhichao Wang, Todd Kemp, Michael W. Mahoney

AI总结 针对非线性可分数据(XOR问题),通过共轭核矩阵的二次等价模型,分析异常特征值涌现及其与标签对齐的BBP型相变,揭示样本复杂度、信噪比、激活函数和预训练特征对非线性可学习性的影响。

Comments 81 pages, 8 figures

详情
AI中文摘要

近期随机矩阵理论(RMT)工作发展了确定性等价的概念:通常是线性代理模型,用于近似大型非线性随机矩阵(如神经网络中的非线性特征映射)的谱行为。一方面,这些确定性等价通过将复杂模型简化为具有经典RMT工具特性的更简单模型,使理论预测易于处理。然而,这留下了一个问题:在处理高维非线性可分数据(例如对非线性可分数据进行分类)时,这种理想化的线性等价是否仍然有意义。受此启发,我们考虑前馈神经网络的非线性特征映射——共轭核(CK),在典型的非线性可分数据集XOR问题上;我们利用CK中信息性异常特征值的研究及其对应特征向量是否渐近与XOR标签对齐,作为非线性可学习性的代理。我们开发了尖峰CK矩阵的稳健二次等价,从而能够精确分析随着修改机器学习实践中常见的各种旋钮(样本复杂度、信噪比、非线性激活选择以及预训练特征)时涌现的信息性尖峰。在每种情况下,我们推导出精确的BBP型相变,其中通过CK特征向量的线性分类变得可能。我们的分析有助于将RMT中确定性等价工具的力量转化为研究机器学习中实际相关的问题。

英文摘要

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). Such equivalents make theoretical predictions tractable by reducing a complex model to a simpler one with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful for classification of high-dimensional nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a one-layer feedforward NN, under a canonical nonlinearly separable dataset for the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent of the CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. We identify regimes in which these knobs move the CK beyond the linear equivalent and produce BBP-type transitions to label-aligned outlier eigenspaces. Our analysis helps bring deterministic-equivalence tools from RMT to bear on problems of practical relevance in ML.

2605.29200 2026-06-17 stat.ME 版本更新

Approximating full conformal prediction: distribution free guarantees via the tournament correction

近似全共形预测:通过锦标赛校正实现无分布保证

Aabesh Bhattacharyya, Boxuan Zhang, Rina Foygel Barber

AI总结 提出基于锦标赛思想的一类新近似方法,在保证边际覆盖率为1-2α的同时降低计算成本,并可在稳定性条件下收紧至约1-α。

Comments 23 pages, 2 figures

详情
AI中文摘要

共形预测是一个提供预测区间的框架,具有无分布有效性的保证,确保对来自任何分布的数据的预测覆盖率。它的两个主要变体是全共形预测和分裂共形预测(也称为转导和归纳)。全共形预测被广泛认为在统计上更有效(因为分裂共形预测需要数据分割,因此由于样本量的损失可能导致更宽的预测区间),但其实现计算上不可行,因为它需要对响应空间中的每个候选值重新拟合底层模型。现有的计算捷径,例如使用离散值网格来近似全共形预测构造,通常缺乏边际覆盖率的理论保证,并且可能在实际中失败。为了解决这一限制,我们引入了一类新的全共形预测方法近似,基于锦标赛思想,使得能够构建具有严格边际覆盖率保证为$1-2α$的预测集。在稳定性条件下,理论覆盖率保证收紧至约$1-α$。这个新框架推广了现有的留一法交叉共形预测方法,同时允许灵活使用各种现有的近似策略。

英文摘要

Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split conformal prediction (also called transductive and inductive). Full conformal prediction is widely considered to be statistically more efficient (since split conformal prediction requires data splitting, and therefore can lead to wider prediction intervals due to the resulting loss in sample size), but its implementation is computationally prohibitive, as it requires the underlying model to be refit for every candidate value in the response space. Existing computational shortcuts, such as using a discrete grid of values to approximate the full conformal prediction construction, frequently lack theoretical guarantees on marginal coverage and can fail in practice. To address this limitation, we introduce a novel class of approximations to the full conformal prediction method, based on the idea of \emph{tournaments}, which enables the construction of prediction sets with a rigorous marginal coverage guarantee of $1-2\alpha$. Under stability conditions, the theoretical coverage guarantee tightens to approximately $1-\alpha$. This new framework generalizes the existing method of leave-one-out cross-conformal prediction, while allowing for flexible use of various existing approximation strategies.

2604.18701 2026-06-17 cs.LG cs.AI stat.ML 版本更新

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Curiosity-Critic:累积预测误差改进作为世界模型训练的可处理内在奖励

Vin Bhaskara, Haicheng Wang

AI总结 提出Curiosity-Critic方法,通过可处理的每步替代项(当前预测误差与渐近误差基线的差值)作为内在奖励,利用共训练的评论家在线估计误差基线,有效分离可约与不可约预测误差,在随机网格世界实验中优于现有方法。

Comments Accepted to ICML 2026 Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026). Code: this https URL (https://github.com/vinbhaskara/Curiosity-Critic)

详情
AI中文摘要

基于局部预测误差的好奇心奖励仅关注当前转移,而不考虑世界模型在所有已访问转移上的累积预测误差。我们引入了Curiosity-Critic,其内在奖励基于这一累积目标的改进,并证明它有一个可处理的每步替代项:当前预测误差与当前状态转移的渐近误差基线之间的差值。我们通过一个与世界模型共同训练的评论家在线估计这一误差基线;由于评论家只需学习一个转移的预测难度,其对不可约噪声基线的估计在世界模型饱和之前就已收敛,从而将探索引导向可学习的转移。该奖励对可学习转移较高,而对随机转移趋近于零,从而在线分离认知(可约)和偶然(不可约)预测误差。从Schmidhuber(1991)到学习特征空间变体的先前预测误差好奇心公式,都作为该误差基线的特定近似特例出现。在随机网格世界上的实验表明,Curiosity-Critic在训练速度和最终世界模型准确性上优于基于预测误差、访问计数和随机网络蒸馏的方法。

英文摘要

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; since the critic only has to learn how hard a transition is to predict, its estimate of the irreducible noise floor converges well before the world model saturates, redirecting exploration toward learnable transitions. The reward is higher for learnable transitions and collapses toward zero for stochastic ones, thereby separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

2411.08821 2026-06-17 stat.ML cs.LG stat.CO 版本更新

Conditional Local Importance by Quantile Expectations

基于分位数期望的条件局部重要性

Kelvyn K. Bladen, Adele Cutler, D. Richard Cutler, Kevin R. Moon

AI总结 提出模型无关的局部变量重要性方法CLIQUE,通过分位数期望捕获局部依赖关系,提升稳定性并直接适用于多类分类问题。

Comments 29 pages, 28 figures

详情
AI中文摘要

全局变量重要性度量通常用于解释机器学习模型的结果。局部变量重要性技术评估变量如何影响单个观测。当前流行的方法,包括LIME和SHAP,在预测空间中提供了有用的特征贡献度量,但在模型损失空间中改进局部结构表征方面仍有空间。此外,它们本身不适用于多类分类问题。我们提出了一种新的模型无关的局部变量重要性计算方法CLIQUE,它突出局部依赖关系,比基于置换的方法具有更好的稳定性,并且可以直接应用于多类分类问题。模拟和真实示例表明,CLIQUE强调局部依赖信息,捕获超出相关性可评估的交互行为,并在响应变量对变量变化不变的区域分配零重要性。

英文摘要

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, provide useful measures of feature contribution in the prediction space, while leaving opportunities for improved characterization of local structure in the model loss space. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that highlights locally dependent relationships, provides improved stability over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and assigns zero importance in regions where the response is invariant to changes in variables.

2603.08001 2026-06-17 cs.LG stat.ML 版本更新

Amortizing Maximum Inner Product Search with Learned Support Functions

通过学习支持函数摊销最大内积搜索

Theo X. Olausson, João Monteiro, Michal Klein, Marco Cuturi

AI总结 提出基于回归的摊销MIPS方法,通过训练神经网络直接预测最优键,利用支持函数的凸性加速搜索,在BEIR基准上显著提升IVF匹配率。

详情
AI中文摘要

最大内积搜索(MIPS)是机器学习中的关键子程序,需要从数据库(键)中识别出与给定查询最匹配的向量。我们提出摊销MIPS:一种基于回归的方法,训练神经网络直接预测MIPS解,从而摊销在固定键数据库上从已知分布中重复求解查询的MIPS成本。我们的关键洞察是,MIPS值函数是键集合的\emph{支持}函数,这是一个经过充分研究的凸函数,其梯度给出最优键。这激发了两种互补的摊销模型:SupportNet,一个输入凸神经网络,用于回归支持函数;以及KeyNet,一个向量值网络,直接回归最优键。SupportNet可以作为聚类路由器,将查询引导到相关的数据库分区,而KeyNet可以作为原始查询的直接替代品,直接输入到现成的索引流水线中。我们在BEIR基准上的实验表明,对于文档嵌入,当考虑计算工作量(无论是FLOPs、探测次数还是挂钟时间)时,学习的SupportNet和KeyNet显著提高了IVF匹配率。我们的代码可在以下网址获取:this https URL。

英文摘要

Maximum inner product search (MIPS) is a crucial subroutine in machine learning, requiring the identification of a vector taken within a database (the keys) that best aligns with a given query. We propose amortized MIPS: a regression-based approach that trains neural networks to directly predict MIPS solutions, amortizing the cost of repeatedly solving MIPS for queries drawn from a known distribution over a fixed key database. Our key insight is that the MIPS value function is the \emph{support} function of the set of keys, a well-studied convex function whose gradient yields the optimal key. This motivates two complementary amortized models: SupportNet, an input-convex neural network trained to regress the support function, and KeyNet, a vector-valued network that directly regresses the optimal key. SupportNet can serve as a cluster router, steering queries toward relevant database partitions, while KeyNet can be used as a drop-in replacement for the original query, fed directly to off-the-shelf indexing pipelines. Our experiments on the BEIR benchmark show that, for document embeddings, learned \SupportNet{}s and \KeyNet{}s significantly improve IVF match rates when accounting for compute effort, whether measured in FLOPs, number of probes, or wall-clock time. Our code is available at: this https URL.

2602.23116 2026-06-17 cs.LG cs.GT stat.ML 版本更新

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

具有广义双线性偏好的可证明高效正则化在线RLHF

Junghyun Lee, Minju Hong, Kwang-Sung Jun, Chulhee Yun, Se-Young Yun

AI总结 研究在线RLHF中正则化最佳响应最大遗憾最小化问题,通过广义双线性偏好模型证明强凸性可导出多对数遗憾,表明快速遗憾不限于KL散度。

Comments 48 pages, 3 figures (ver3: major revisions; ver2: more colorful boxes, fixed some typos)

详情
AI中文摘要

我们考虑在一般偏好和bandit反馈下在线RLHF中的正则化最佳响应最大遗憾最小化问题。虽然各种正则化器被用于增强对齐的鲁棒性,但已知的多对数遗憾保证仍然高度特定于KL。为了研究这种快速速率是否扩展到KL之外,我们采用广义双线性偏好模型(GBPM)——通过一个秩为$2r$的斜对称矩阵捕获$d$维逐项特征上的非传递偏好——以隔离一般正则化的影响。关键地,在GBPM下,我们证明任何贪婪策略的对偶间隙受限于平方估计误差,该误差仅利用强凸性和斜对称性导出。在特征覆盖假设下,我们通过贪婪采样建立了$\tilde{\mathcal{O}}(\eta d^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$的通用多对数遗憾,并通过探索后提交(Explore-Then-Commit)建立了$\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{\eta r T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$的维度改进遗憾(对于条件良好的臂集),其中$\eta^{-1}$是正则化系数,$T$是时间范围,$C_{\min}$是依赖于臂集的量。这表明“快速”遗憾并非KL特有,而是通用强凸几何的基本结果。

英文摘要

We consider the problem of regularized best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized to robustify alignment, known polylogarithmic regret guarantees remain heavily specific to KL. To investigate whether such fast rates extend beyond KL, we adopt the Generalized Bilinear Preference Model (GBPM) -- capturing intransitive preferences over $d$-dimensional item-wise features via a rank-$2r$ skew-symmetric matrix -- to isolate the impact of generic regularization. Crucially, under GBPM, we prove that the dual gap of any greedy policy is bounded by the squared estimation error, derived using \emph{only} strong convexity and skew-symmetry. Under a feature coverage assumption, we establish a \emph{generic} polylogarithmic regret of $\tilde{\mathcal{O}}(\eta d^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$ with Greedy Sampling, and a dimension-wise improved regret (for well-conditioned arm-sets) of $\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{\eta r T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$ with Explore-Then-Commit, where $\eta^{-1}$ is the regularization coefficient, $T$ is the time horizon, and $C_{\min}$ is an arm-set dependent quantity. This demonstrates that ``fast'' regrets are not KL-specific, but rather a fundamental consequence of generic strongly convex geometry.

2603.04198 2026-06-17 stat.ML cs.LG 版本更新

Stable and Steerable Sparse Autoencoders with Weight Regularization

基于权重正则化的稳定且可操控的稀疏自编码器

Piotr Jedryszek, Oliver M. Crook

AI总结 通过L1/L2权重正则化提高稀疏自编码器的跨种子特征一致性,并在语言模型上提升操控成功率,同时保持可解释性分数。

详情
AI中文摘要

稀疏自编码器(SAEs)被广泛用于从神经网络激活中提取人类可解释的特征,但其学习到的特征在不同随机种子和训练选择下可能差异很大。为了提高稳定性,我们研究了通过添加编码器和解码器权重的L1或L2惩罚进行权重正则化,并评估了正则化与常见SAE训练默认值的交互作用。在MNIST上,我们观察到L2权重正则化产生了一个高度对齐的特征核心,并且当与绑定初始化和单位范数解码器约束结合时,它显著提高了跨种子的特征一致性。对于在语言模型激活(Pythia-70M-deduped)上训练的TopK SAEs,添加小的L2权重惩罚增加了三个随机种子间共享特征的比例,并使操控成功率大致翻倍,同时自动可解释性分数的平均值基本保持不变。最后,在正则化设置下,激活操控成功与否能更好地由自动可解释性分数预测,这表明正则化可以使基于文本的特征解释与功能可控性对齐。

英文摘要

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

2602.17894 2026-06-17 stat.ML cs.LG math.ST 版本更新

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

从有偏且昂贵的数据源学习:预算下的极小极大最优数据收集

Michael O. Harding, Vikas Singh, Kirthevasan Kandasamy

AI总结 针对预算固定的多源数据收集问题,提出最大化有效样本量的采样方案,结合事后分层估计器,实现极小极大最优风险。

Comments COLT 2026

详情
AI中文摘要

数据收集是现代统计和机器学习流程的关键组成部分,特别是当必须从多个异质数据源收集数据以研究感兴趣的目标总体时。在许多用例中,如医学研究或政治民意调查,不同数据源产生不同的采样成本。观测通常具有相关的群体身份——例如健康指标、人口统计或政治派别——并且这些群体的相对组成可能在源总体之间以及源总体与目标总体之间存在显著差异。在这项工作中,我们研究在固定预算下的多源数据收集,重点关注总体均值和群体条件均值的估计。我们表明,朴素的数据收集策略(例如试图“匹配”目标分布)或依赖标准估计量(例如样本均值)可能高度次优。相反,我们开发了一种采样方案,该方案最大化有效样本量——总样本量除以 $D_{\chi^2}(q\mid\mid\overline{p}) + 1$,其中 $q$ 是目标分布,$\overline{p}$ 是聚合源分布,$D_{\chi^2}$ 是 $\chi^2$ 散度。我们将此采样方案与经典的事后分层估计器配对,并给出其风险的上界。我们提供了匹配的下界,证明我们的方法达到了预算下的极小极大最优风险。我们的技术也扩展到最小化超额风险的预测问题,为具有昂贵和异质数据源的多源学习提供了原则性方法。

英文摘要

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{\chi^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{\chi^2}$ is the $\chi^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

2602.06014 2026-06-17 cs.LG cs.AI math.OC math.ST stat.ML 版本更新

Optimism Stabilizes Thompson Sampling for Adaptive Inference

乐观主义稳定自适应推断的汤普森采样

Shunxing Yan, Han Zhong

AI总结 本文通过引入乐观机制(如方差膨胀或均值奖励)稳定汤普森采样,使得各臂拉取次数收敛于确定性尺度,从而在K臂随机bandit中实现渐近有效的Wald推断,并解决了多最优臂的扩展问题。

Comments Accepted in part to COLT 2026

详情
AI中文摘要

汤普森采样(TS)广泛用于随机多臂老虎机,但其在自适应数据收集下的推断性质微妙。样本均值的经典渐近理论可能失效,因为臂特定样本量是随机的,并通过动作选择规则与奖励耦合。我们研究了具有高斯随机指数的K臂随机bandit中汤普森采样的自适应推断,其中奖励噪声为独立次高斯,并确定乐观主义是恢复稳定性的关键机制,即每个臂的拉取次数集中在确定性尺度附近。这种稳定性使得尽管自适应采样,仍能获得渐近有效的Wald推断。首先,我们证明方差膨胀的TS对任意K≥2是稳定的,包括多个臂最优的挑战性情况,对最优臂具有渐近均匀分配,对次优臂具有尖锐的对数拉取次数渐近性。这解决了Halder等人提出的K臂扩展问题,使用新的胜者图和Lyapunov漂移技术来控制多个最优臂之间的分配。其次,我们分析了一种替代的乐观修改,保持高斯指数方差不变但向指数中心添加显式均值奖励,并建立了类似的稳定性结论。总之,适当实施的乐观主义稳定了汤普森采样,并在多臂老虎机中实现了渐近有效的Wald推断,同时仅产生轻微额外的遗憾代价。

英文摘要

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study adaptive inference for Thompson sampling with Gaussian randomized indices in $K$-armed stochastic bandits with independent sub-Gaussian reward noises, and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, meaning that each arm's pull count concentrates around a deterministic scale. This stability yields asymptotically valid Wald inference despite adaptive sampling. First, we prove that variance-inflated TS is stable for any $K \ge 2$, including the challenging regime where multiple arms are optimal, with asymptotically uniform allocation over optimal arms and sharp logarithmic pull-count asymptotics for suboptimal arms. This resolves the $K$-armed extension question raised by \citet{halder2025stable}, using new winner-map and Lyapunov-drift techniques to control allocation among multiple optimal arms. Second, we analyze an alternative optimistic modification that keeps the Gaussian index variance unchanged but adds an explicit mean bonus to the index center, and establish a similar stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid Wald inference in multi-armed bandits, while incurring only a mild additional regret cost.

2602.05790 2026-06-17 cs.IT cs.LG stat.ML 版本更新

Price of metric universality in vector quantization is at most 0.11 bit

向量量化中度量普适性的代价至多为0.11比特

Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

AI总结 本文证明存在一个通用码本,对于所有可能的X统计量,在W为高斯时,其性能至少与速率每维度降低0.11比特的X自适应水填充码本相当。

Comments 41 page, 1 figure

详情
AI中文摘要

快速计算矩阵乘积 $W^\top X$ 是现代大语言模型的核心操作。为了更高效地部署,一种流行的方法是使用低精度近似 $\widehat W$ 替代真实 $W$(“仅权重量化”)。信息论表明,降低 $W$ 精度的最优算法依赖于 $X$ 的(二阶)统计量,并且需要将向量量化码本与 $X$ 的 PCA 方向仔细对齐(称为“水填充分配”的过程)。然而,码本对 $X$ 统计量的依赖性非常不实用。本文证明存在一个通用码本,对于所有可能的 $X$ 统计量同时接近最优,其意义在于:当 $W$ 为高斯时,该通用码本至少与速率每维度降低 0.11 比特的 $X$ 自适应水填充码本一样好。这样的通用码本将是低精度存储格式的理想候选者,这是当前活跃研究的话题,但可惜存在性证明是非构造性的。等价地,我们的结果表明在 $\mathbb{R}^n$ 中存在一个网,它同时关于所有希尔伯特范数是球面的接近最优覆盖。

英文摘要

Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

2601.21455 2026-06-17 stat.ML cs.LG 版本更新

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

质疑共形预测中的覆盖-长度度量:当更短的区间并不更好时

Yizhou Min, Yizhou Lu, Lanqi Li, Zhen Zhang, Jiaye Teng

AI总结 本文批判性检验共形预测中标准度量(覆盖率和区间长度)的充分性,揭示一种称为“偏见技巧”(PT)的反直觉方法可欺骗性地缩短区间长度而保持覆盖有效,并提出新度量“区间稳定性”以检测此类行为。

详情
AI中文摘要

共形预测(CP)已成为无分布不确定性量化的基石,通常通过其覆盖率和区间长度进行评估。本文批判性地检验了这些标准度量的充分性。我们证明,通过一种称为偏见技巧(PT)的反直觉方法,区间长度可能被欺骗性地改善,而覆盖率仍然有效。具体而言,对于任何给定的测试样本,PT 概率性地返回一个区间,该区间要么为空,要么使用调整后的置信水平构建,从而保持边际覆盖率。虽然 PT 可能产生欺骗性较低的区间长度,但它引入了实际漏洞:同一输入在算法的重复运行中可能产生完全不同的预测区间。我们正式推导了 PT 实现这些误导性改进的条件,并在各种回归和分类任务中提供了广泛的实证证据。此外,我们引入了一个新度量——区间稳定性,它有助于检测新的 CP 方法是否基于此类 PT 技术隐式地改善了长度。代码可在 https://this URL 获取。

英文摘要

Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at this https URL.

2512.21315 2026-06-17 cs.LG cs.CV stat.ML 版本更新

Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

数据处理不等式是否反映实践?论低级任务的有用性

Roy Turgeman, Tom Tirer

AI总结 本文研究低级处理(如去噪、编码)如何提升分类性能,证明在有限样本下存在预处理可提高准确率,并通过实验验证理论趋势。

Comments ICLR 2026 (camera-ready). Code is available at: this https URL (https://github.com/serveroy/process-before-you-classify)

详情
AI中文摘要

数据处理不等式是一个信息论原理,指出信号的信息内容不能通过处理观测数据而增加。特别地,它表明在解决分类问题之前,增强信号或对其进行编码没有益处。对于最优贝叶斯分类器,这一断言可以被证明是正确的。然而,在实践中,尽管现代深度神经网络具有强大的能力,但在高级下游任务之前执行“低级”任务仍然很常见。在本文中,我们旨在理解低级处理何时以及为何对分类有益。我们提出了一个二元分类设置的综合理论研究,其中我们考虑一个与最优贝叶斯分类器紧密相连的分类器,并随着训练样本数量的增加而收敛到它。我们证明,对于任何有限数量的训练样本,存在一种预分类处理可以提高分类准确率。我们还探讨了类分离、训练集大小和类平衡对该过程相对增益的影响。我们通过理论设置的经验研究来支持我们的理论。最后,我们进行了一项实证研究,调查去噪和编码对基准数据集上实际深度分类器性能的影响。具体来说,我们改变了训练集的大小和类别分布以及噪声水平,并展示了与我们的理论结果一致的趋势。

英文摘要

The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.

2512.13853 2026-06-17 cs.LG cond-mat.stat-mech math.PR stat.ML 版本更新

Dropout Neural Network Training Viewed from a Percolation Perspective

从逾渗视角看待Dropout神经网络训练

Finley Devlin, Jaron Sanders

AI总结 本文研究使用dropout训练深度神经网络时的逾渗现象,建立新逾渗模型刻画网络拓扑与路径问题的关系,揭示dropout中的逾渗效应及其可能导致训练崩溃的机制。

Comments 21 pages, 14 figures

详情
AI中文摘要

在这项工作中,我们研究了使用dropout训练深度神经网络(NNs)时逾渗的存在和影响。Dropout方法是训练NNs的正则化技术,由G. Hinton等人(2012)首次提出。这些方法在训练的每个阶段随机临时移除NN中的连接,并用随机梯度下降(SGD)更新剩余子网络。随机从网络中移除连接的过程类似于逾渗,这是统计物理的一个范式模型。如果dropout移除足够多的连接,使得NN的输入和输出之间没有路径,那么NN就无法根据数据做出预测。我们研究了模拟NN中dropout的新逾渗模型,并刻画了网络拓扑与该路径问题之间的关系。该理论证明了dropout中存在逾渗效应。我们还表明,在使用dropout训练无偏置NN时,这种逾渗效应可能导致训练崩溃;并且我们启发式地论证了这种崩溃也扩展到有偏置的NN。

英文摘要

In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.

2512.11784 2026-06-17 cs.LG stat.ML 版本更新

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

大提示词机制下的Softmax作为线性注意力:基于测度的视角

Etienne Boursier, Claire Boyer

AI总结 提出基于测度的框架,证明在无限提示词极限下softmax注意力收敛到线性算子,并给出有限提示词下的非渐近浓度界,从而将线性注意力的优化分析迁移到大提示词下的softmax注意力。

详情
AI中文摘要

Softmax注意力是Transformer架构的核心组成部分,但其非线性结构给理论分析带来了重大挑战。我们开发了一个统一的、基于测度的框架,用于研究有限和无限提示词下的单层softmax注意力。对于独立同分布的高斯输入,我们利用softmax算子在大提示词极限下收敛到作用于底层输入标记测度的线性算子这一事实。基于这一见解,我们建立了softmax注意力输出和梯度的非渐近浓度界,量化了有限提示词模型接近其无限提示词对应模型的速度,并证明了在具有次高斯标记的一般上下文学习设置中,这种浓度在整个训练轨迹上保持稳定。在线性回归的上下文学习中,我们利用易处理的无限提示词动力学来分析有限提示词长度下的训练。我们的结果表明,当提示词足够长时,为线性注意力开发的优化分析可以直接迁移到softmax注意力上,表明大提示词下的softmax注意力继承了其线性对应物的分析结构。这反过来为研究大提示词机制下softmax注意力层的训练动力学和统计行为提供了一个有原则且广泛适用的工具包。

英文摘要

Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gaussian inputs, we lean on the fact that the softmax operator converges in the infinite-prompt limit to a linear operator acting on the underlying input-token measure. Building on this insight, we establish non-asymptotic concentration bounds for the output and gradient of softmax attention, quantifying how rapidly the finite-prompt model approaches its infinite-prompt counterpart, and prove that this concentration remains stable along the entire training trajectory in general in-context learning settings with sub-Gaussian tokens. In the case of in-context linear regression, we use the tractable infinite-prompt dynamics to analyze training at finite prompt length. Our results allow optimization analyses developed for linear attention to transfer directly to softmax attention when prompts are sufficiently long, showing that large-prompt softmax attention inherits the analytical structure of its linear counterpart. This, in turn, provides a principled and broadly applicable toolkit for studying the training dynamics and statistical behavior of softmax attention layers in large prompt regimes.

2407.02458 2026-06-17 math.ST stat.ML 版本更新

Statistical Advantages of Oblique Randomized Decision Trees and Forests

倾斜随机决策树与森林的统计优势

Eliza O'Reilly

AI总结 本文利用随机几何中的随机镶嵌理论,分析基于协变量线性组合特征的倾斜随机决策树与森林估计量的统计性质,证明其在高维多指标模型中的二次风险界与收敛速度,并揭示轴对齐方法的次优性。

Comments 45 pages, 2 figures

详情
AI中文摘要

本文研究了在随机决策树和森林回归算法中,使用由协变量的一般线性组合构成的特征来划分数据的统计含义。利用随机几何中的随机镶嵌理论,我们对一类高效生成的随机树和森林估计量进行了理论分析,这些估计量允许沿此类特征进行倾斜分割。我们将这些估计量称为倾斜Mondrian树和森林,因为树的生成过程是:首先从协变量的线性组合中选择一组特征,然后运行一个Mondrian过程,该过程沿这些特征层次地划分数据。对于降维的多指标模型的灵活函数类(其中输出假定依赖于输入域的低维相关特征子空间),我们得到了二次风险界和收敛速度。结果突出了这些估计量的风险如何依赖于特征的选择,并量化了风险相对于用于分割数据的选定特征与真实相关特征子空间之间误差的鲁棒性。渐近分析还提供了估计的相关特征集必须满足的收敛速度条件,以使倾斜Mondrian估计量相对于相关特征子空间的维度达到极小极大最优收敛速度。此外,我们得到了轴对齐Mondrian树(其中特征限制在协变量集合内)的风险下界,证明了无论用于在每个树节点划分数据的协变量分布如何加权,这些估计量对于一般的岭函数都是次优的。

英文摘要

This work studies the statistical implications of using features comprised of general linear combinations of covariates to partition the data in randomized decision tree and forest regression algorithms. Using random tessellation theory in stochastic geometry, we provide a theoretical analysis of a class of efficiently generated random tree and forest estimators that allow for oblique splits along such features. We call these estimators oblique Mondrian trees and forests, as the trees are generated by first selecting a set of features from linear combinations of the covariates and then running a Mondrian process that hierarchically partitions the data along these features. Quadratic risk bounds and convergence rates are obtained for the flexible function class of multi-index models for dimension reduction, where the output is assumed to depend on a low-dimensional relevant feature subspace of the input domain. The results highlight how the risk of these estimators depends on the choice of features and quantify how robust the risk is with respect to error between the selected features along which the data is split and the true relevant feature subspace. The asymptotic analysis also provides conditions on the convergence rate a set of estimated relevant features must satisfy for oblique Mondrian estimators to obtain minimax optimal rates of convergence with respect to the dimension of the relevant feature subspace. Additionally, a lower bound on the risk of axis-aligned Mondrian trees (where features are restricted to the set of covariates) is obtained, proving that these estimators are suboptimal for general ridge functions, no matter how the distribution over the covariates used to divide the data at each tree node is weighted.

2510.19528 2026-06-17 stat.ML cs.LG math.ST 版本更新

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

学习上下值包络以塑造在线强化学习:一种原则性方法

Sebastian Reboul, Hélène Halconruy

AI总结 提出一种两阶段框架,利用离线数据学习值函数的上下界,并将其融入在线算法,通过解耦上下界实现更灵活紧致的近似,理论分析给出高概率遗憾界,实验表明显著降低遗憾。

Comments 35 pages, 5 figures

详情
AI中文摘要

我们研究了利用离线数据加速在线强化学习这一基本问题——该方向潜力巨大但理论基础有限。我们的研究聚焦于如何在此背景下\emph{学习}和\emph{应用}值包络。为此,我们引入了一个原则性的两阶段框架:第一阶段使用离线数据推导值函数的上下界,第二阶段将这些学习到的界融入在线算法。我们的方法通过解耦上下界扩展了先前工作,实现了更灵活和紧致的近似。与依赖固定塑形函数的方法不同,我们的包络是数据驱动的,并明确建模为随机变量,通过过滤论证确保各阶段的独立性。分析建立了由两个可解释量决定的高概率遗憾界,从而为离线预训练和在线微调之间提供了形式化的桥梁。在表格型MDP上的实验结果表明,与UCBVI和先前方法相比,我们的方法显著降低了遗憾,同时与相关方法保持竞争力。

英文摘要

We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to \emph{learn} and \emph{apply} value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods while remaining competitive with related approaches.

2508.19445 2026-06-17 cs.LG stat.ML 版本更新

On Surjectivity of Neural Networks: Can you elicit any behavior from your model?

论神经网络的满射性:你能从模型中诱导出任何行为吗?

Haozhe Jiang, Nika Haghtalab

AI总结 本文证明现代神经网络架构(如预层归一化和线性注意力模块)几乎总是满射,意味着任何输出(包括有害内容)原则上都可生成,揭示了模型在对抗攻击下的固有脆弱性。

Comments Blog: this https URL (https://astro-eric.github.io/blogs/surjective/)

详情
AI中文摘要

给定一个训练好的神经网络,是否可以通过某些输入生成任意指定的输出?等价地,该网络对应的函数是否是满射的?在生成模型中,满射性意味着任何输出,包括有害或不良内容,原则上都可以由网络生成,引发了对模型安全和越狱漏洞的担忧。在本文中,我们证明了现代神经架构的许多基本构建模块,例如具有预层归一化和线性注意力模块的网络,几乎总是满射的。作为推论,广泛使用的生成框架,包括GPT风格的Transformer和具有确定性ODE求解器的扩散模型,允许对任意输出进行逆映射。通过研究这些现代且常用的神经架构的满射性,我们提供了一个形式化方法,揭示了它们对广泛对抗攻击类别的不可避免的脆弱性。

英文摘要

Given a trained neural network, can any specified output be generated by some input? Equivalently, does the network correspond to a function that is surjective? In generative models, surjectivity implies that any output, including harmful or undesirable content, can in principle be generated by the networks, raising concerns about model safety and jailbreak vulnerabilities. In this paper, we prove that many fundamental building blocks of modern neural architectures, such as networks with pre-layer normalization and linear-attention modules, are almost always surjective. As corollaries, widely used generative frameworks, including GPT-style transformers and diffusion models with deterministic ODE solvers, admit inverse mappings for arbitrary outputs. By studying surjectivity of these modern and commonly used neural architectures, we contribute a formalism that sheds light on their unavoidable vulnerability to a broad class of adversarial attacks.

2502.18049 2026-06-17 stat.ML cs.LG 版本更新

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

无崩溃的递归学习:基于加权的稳定化框架

Hengzhi He, Shirong Xu, Guang Cheng

AI总结 针对递归生成模型训练中的模型崩溃问题,提出基于加权的训练策略,在混合真实与合成数据场景下,理论推导出最优加权方案的统一表达式,揭示合成数据利用与模型性能间的权衡。

Comments This article has been accepted for publication in Journal of the Royal Statistical Society: Series B, published by Oxford University Press

详情
AI中文摘要

最近的研究发现了递归生成模型训练中的一个有趣现象,称为模型崩溃,即基于先前模型生成的数据训练的模型表现出严重的性能下降。解决这一问题并开发更有效的训练策略已成为生成模型研究的核心挑战。在本文中,我们在一个新框架下研究这一现象,其中生成模型在每一步迭代中基于新收集的真实数据和上一步的合成数据的组合进行训练。为了开发整合真实和合成数据的最优训练策略,我们评估了加权训练方案在各种场景下的性能,包括高斯分布估计、广义线性模型和非参数估计。我们从理论上刻画了合成数据的混合比例和加权方案对最终模型性能的影响。我们的关键发现是,在不同设置下,不同合成数据比例下的最优加权方案渐近地遵循一个统一表达式,揭示了利用合成数据与模型性能之间的基本权衡。在某些情况下,分配给真实数据的最优权重对应于黄金比例的倒数。最后,我们在大量模拟数据集和一个真实表格数据集上验证了我们的理论结果。

英文摘要

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

8. 生物统计与医学统计 10 篇

2606.18146 2026-06-17 stat.ME 新提交

Spatial Disease Mapping and Disparity Detection Using Generative AI: An Amortized Bayesian Learning Framework

使用生成式AI的空间疾病映射与差异检测:一种摊销贝叶斯学习框架

Luca Aiello, Sudipto Banerjee

AI总结 提出一种摊销贝叶斯框架,通过神经网络近似后验分布,实现跨不同区域图的空间边界检测,并在呼吸疾病和肺癌数据中验证其有效性。

详情
AI中文摘要

我们引入了一个用于空间边界检测的摊销贝叶斯框架,该框架能够推广到具有不同区域数量和多样邻接结构的区域图上的后验推断。底层模型将泊松计数似然与协变量驱动的规则相结合,以中断跨不相似相邻区域的平滑,并利用有向无环图自回归(DAGAR)先验来捕捉残差空间依赖性。为了逼近目标后验分布,我们在模拟地图上训练了一个神经引擎:一个置换不变摘要网络编码观测计数、偏移量、协变量和邻接矩阵的图感知表示,而一个条件归一化流生成近似的后验样本。模拟研究证明了准确的参数恢复、接近名义水平的区间覆盖、良好校准的后验预测行为以及信息丰富的后验边界概率。与马尔可夫链蒙特卡洛(MCMC)的基准测试证实了在主要边界证据上的紧密一致性,而消融研究验证了包含模型引导的图摘要的有效性。最后,应用于格拉斯哥呼吸系统疾病和加利福尼亚肺癌数据表明,一个训练好的神经引擎可以无缝部署到具有不同图结构的真实世界地图上,产生的边界结论与已建立的局部平滑分析一致。

英文摘要

We introduce an amortized Bayesian framework for spatial boundary detection that generalizes posterior inference across areal graphs with varying numbers of regions and diverse adjacency structures. The underlying model couples a Poisson count likelihood with a covariate-driven rule to interrupt smoothing across dissimilar neighboring areas, utilizing a directed acyclic graph autoregressive (DAGAR) prior to capture residual spatial dependence. To approximate the target posterior distribution, a neural engine is trained on simulated maps: a permutation-invariant summary network encodes graph-aware representations of the observed counts, offsets, covariates, and adjacency matrices, while a conditional normalizing flow generates the approximate posterior draws. Simulation studies demonstrate accurate parameter recovery, near-nominal interval coverage, well-calibrated posterior predictive behavior, and informative posterior boundary probabilities. Benchmarking against Markov chain Monte Carlo (MCMC) confirms close agreement regarding primary boundary evidence, and an ablation study validates the inclusion of model-guided graph summaries. Finally, applications to Glasgow respiratory disease and California lung cancer data demonstrate that a single trained neural engine can be seamlessly deployed across real-world maps with distinct graph structures, yielding boundary conclusions consistent with established localized smoothing analyses.

2606.18139 2026-06-17 stat.ME 新提交

Bayesian Threshold-Aligned Joint Disease Progression Modeling for Alzheimer's Disease

贝叶斯阈值对齐的阿尔茨海默病联合疾病进展建模

Rong Wu, Duygu Tosun, Isabella Hausle, Margo Heston, Aaron Wolfe Scheffler

AI总结 提出贝叶斯阈值对齐联合疾病进展模型(B-TAJ DPM),通过半参数框架将生物标志物轨迹与认知障碍生存终点联合建模,并锚定于阳性阈值,以揭示异质性进展模式。

详情
AI中文摘要

阿尔茨海默病的特征是淀粉样蛋白-β和tau蛋白的逐渐积累,数年后出现认知障碍。尽管存在这一既定模式,但在病理进展年龄和认知症状发作方面存在显著的主体间变异性。为了理解这种变异的来源,需要通过联合建模疾病进展和认知障碍发生时间(参考标志性阳性阈值)的框架,将主体对齐到异质性疾病时间线上。现有的神经退行性疾病进展模型依赖于限制性参数形式,未能将疾病时间线锚定于阳性阈值,并且将生物标志物轨迹与认知生存终点分离。为了解决这些局限性,我们引入了贝叶斯阈值对齐联合疾病进展模型(B-TAJ DPM)。这个生成式半参数框架在潜在疾病时间线上建模多变量疾病进展轨迹,这些轨迹锚定于标志性阳性阈值。关键的是,该框架整合了一个生存模型,将病理进展与认知障碍联系起来。后验推断和对未见主体的后验预测在开源软件中实现。模拟研究显示出优异的估计精度和区间覆盖率。当应用于阿尔茨海默病神经影像学倡议数据时,B-TAJ DPM刻画了非线性进展模式,量化了主体间阳性年龄的变异性,并揭示了tau阳性年龄与认知障碍加速之间的联系。

英文摘要

Alzheimer's disease is characterized by the progressive accumulation of amyloid-$\beta$ and tau followed years later by cognitive impairment. Despite this established motif, substantial subject-level variability exists in the age of pathological progression and the onset of cognitive symptoms. To understand the source of this variation, subjects must be aligned across heterogeneous disease timelines via frameworks that jointly model disease progression and time to cognitive impairment with reference to landmark positivity thresholds. Existing neurodegenerative disease progression models rely on restrictive parametric forms, fail to anchor disease timelines to positivity thresholds, and decouple biomarker trajectories from cognitive survival endpoints. To address these limitations, we introduce the Bayesian Threshold-Aligned Joint Disease Progression Model (B-TAJ DPM). This generative, semi-parametric framework models multivariate disease progression trajectories over latent disease timelines anchored at landmark positivity thresholds. Crucially, the framework integrates a survival model to link pathological progression to cognitive impairment. Posterior inference and posterior predictions for unseen subjects are carried out in open-source software. Simulation studies demonstrate excellent estimation accuracy and interval coverage. When applied to Alzheimer's Disease Neuroimaging Initiative data, B-TAJ DPM characterizes non-linear progression patterns, quantifies subject-level variation in positivity age, and reveals links between age of tau positivity and acceleration of cognitive impairment.

2606.17923 2026-06-17 stat.ME 新提交

Spatial mixed models for assessing environmental exposure effects on the microbiome

评估环境暴露对微生物组影响的空间混合模型

Sooran Kim, Chan Wang, Soyoung Kwak, Fares Darawshy, Alexander Bain, Leopoldo N. Segal, Jiyoung Ahn, Huilin Li

AI总结 提出一种空间混合模型框架,利用条件自回归先验同时处理区域空间依赖和分类群生态依赖,在特征选择中实现高检测功率和低假阳性率,应用于PM2.5暴露研究识别相关菌属。

详情
AI中文摘要

环境暴露(如空气污染)对人类健康的影响日益受到重视。越来越多的证据表明,微生物组可能介导这些效应,从而解释环境与宿主生物学之间的关系。然而,环境暴露对微生物组的影响尚未完全明确,且该背景下的统计建模面临复杂依赖结构的挑战。具体而言,微生物组数据在采样区域间表现出空间依赖性,以及微生物分类群间的生态相关性,若忽略这些依赖,会显著降低检测能力,导致遗漏真实信号。我们提出了一种新颖的微生物组数据空间混合建模框架,该框架利用条件自回归先验同时考虑区域级空间依赖和分类群级生态依赖。通过模拟,我们证明该框架优于忽略此类依赖的现有方法,在特征选择中实现高检测功率,同时保持低假阳性率并降低估计均方误差。应用于两项真实研究——食品与微生物组纵向调查研究数据和肺微生物组数据集,其中涉及细颗粒物(PM2.5)暴露,我们的模型识别出已知与污染相关健康结果有关的菌属,以及可能介导宿主对空气污染反应的新分类群。这一新颖方法为揭示复杂环境数据中具有生物学意义的关联提供了强大而灵活的工具。

英文摘要

The influence of environmental exposures, such as air pollution, on human health has become increasingly recognized. A growing body of evidence suggests that the microbiome may mediate these effects, explaining the relationship between the environment and host biology. However, the impact of environmental exposures on the microbiome is not yet fully understood, and statistical modeling in this context is challenged by complex dependency structures. In particular, microbiome data exhibit spatial dependencies across sampling regions as well as ecological correlations among microbial taxa, which, if ignored, can substantially reduce detection power, leading to missed true signals. We introduce a novel spatial mixed modeling framework for microbiome data that accounts for both region-level spatial dependency and taxon-level ecological dependency using conditional autoregressive priors. Through simulations, we demonstrate that this framework outperforms existing methods that ignore such dependencies, by achieving high detection power in feature selection while maintaining low false positive rates and reduced mean squared error in estimation. Applied to two real studies-data from Food and Microbiome Longitudinal Investigation study and lung microbiome dataset-with fine particulate matter (PM_2.5) exposures, our model identified genera, which are known to be involved in pollution-related health outcomes, as well as novel taxa that may mediate host responses to air pollution. This novel approach offers a powerful and flexible tool for uncovering biologically meaningful associations in complex environmental data.

2606.17841 2026-06-17 stat.ME 新提交

Subgroup analysis in randomized controlled trials with binary outcomes: dilution and logic-respecting properties

二元结局随机对照试验中的亚组分析:稀释与逻辑一致性性质

Long-Hao Xu, Yang Han, Tim Friede

AI总结 研究二元结局随机对照试验中亚组分析的比值比和相对响应的性质,证明比值比不适合作为疗效指标而相对响应合适,并阐明两者在逻辑一致性和稀释性质上的差异。

详情
AI中文摘要

亚组分析在随机对照试验中常规用于检验治疗效果在患者亚组间是否同质或由于治疗效应异质性而不同。本文研究了二元结局亚组分析中比值比和相对响应的性质,通过新的理论见解和方法学发展扩展了先前的工作。我们建立了几个新定理,描述了当两个亚组合并时,总体人群的比值比在大小和方向上如何变化。这些结果进一步证实了比值比不适合作为该亚组设置中的疗效指标,而相对响应是合适的。我们还提出了比值比和相对响应之间的正式关系,并阐明了它们在逻辑一致性性质(即总体疗效是否介于亚组疗效之间)和稀释性质(即混合亚组是否使总体比值比向1移动)方面的差异。尽管比值比通常不具有逻辑一致性,但在某些条件下它可能近似表现为具有逻辑一致性的疗效指标。为了说明我们的发现,我们基于临床试验数据给出了一个说明性示例,并讨论了其对随机对照试验中亚组分析的意义。

英文摘要

Subgroup analysis is routinely used in randomized controlled trials to examine whether treatment effects are homogeneous across patient subgroups or differ because of treatment-effect heterogeneity. In this paper, we investigate the properties of the odds ratio and the relative response in subgroup analyses with binary outcomes, extending previous work with new theoretical insights and methodological developments. We establish several new theorems that characterize how the odds ratio for the overall population changes in both magnitude and direction when two subgroups are combined. These results further confirm that the odds ratio is inappropriate as an efficacy measure in this subgroup setting, whereas the relative response is appropriate. We also present the formal relationship between the odds ratio and the relative response, and clarify their differences in terms of the logic-respecting property, that is, whether the overall efficacy lies between the subgroup efficacies, and the dilution property, that is, whether mixing subgroups moves the overall odds ratio toward 1. Although the odds ratio is generally not logic-respecting, it may behave approximately like a logic-respecting efficacy measure under certain conditions. To illustrate our findings, we present an illustrative example based on clinical trial data and discuss its implications for subgroup analysis in randomized controlled trials.

2605.19772 2026-06-17 stat.ME 版本更新

Assessing covariate-adjusted risk differences in small-sample clinical trials

评估小样本临床试验中协变量调整的风险差异

Martin Schnuerch, Alex Ocampo, Klaus Kähler Holst, Christian Stock

AI总结 针对小样本(N≤150)随机临床试验,通过模拟研究比较精确无条件检验、Mantel-Haenszel方法和g计算法在估计和检验风险差异时的表现,发现g计算法在极小样本中I类错误膨胀,而稳健或惩罚变体可改善错误控制但牺牲功效,经典方法稳健但效率较低,并基于结果提供方法选择建议。

Comments 22 pages, 3 figures

详情
AI中文摘要

二元终点在临床试验中很常见,传统上使用条件优势比来评估治疗效果。然而,优势比的解释困难,不可压缩,且依赖于强假设才能成为试验的相关总体汇总指标。作为替代,风险差异作为更可解释、临床意义更明确且假设更少的治疗效果度量,日益受到重视。这一转变也受到新监管指南的推动,该指南强调边际估计量的相关性并鼓励协变量调整。然而,风险差异的协变量调整推断,特别是在小样本中,存在方法学上的细微差别,且缺乏公认的最佳实践。我们进行了一项模拟研究,比较了在小样本(N≤150)随机临床试验中,存在预后分类基线协变量时,估计和检验风险差异的方法,重点关注精确无条件检验、Mantel-Haenszel方法和g计算(标准化)方法。我们发现,当应用标准Wald型推断时,几种g计算方法在极小样本中表现出I类错误膨胀,而稳健或惩罚变体以牺牲功效为代价改善了错误控制。经典方法如Mantel-Haenszel和Suissa-Shuster检验保持稳健,但可能放弃协变量调整带来的效率提升。总体而言,我们的结果表明,观察到的I类错误膨胀很大程度上反映了估计量与方差估计之间的错配,而非仅由小样本量导致。基于这些结果,我们提供了实用建议,以指导方法选择,使估计量、方差估计和推断目标保持一致。

英文摘要

Binary endpoints are common in clinical trials and conditional odds ratios have traditionally been used to assess treatment effects. However, the interpretation of odds ratios is difficult, they are non-collapsible and rely on strong assumptions in order to be a relevant overall summary measure for the trial. As an alternative, risk differences have gained increasing prominence as a more interpretable, clinically meaningful and assumption-lean measure of treatment effects. This shift has also been motivated by new regulatory guidance, which emphasizes the relevance of marginal estimands and encourages covariate adjustment. Yet, covariate-adjusted inference for risk differences, particularly in smaller samples, has methodological subtleties and lacks well-established best practices. We conduct a simulation study comparing methods for estimating and testing risk differences in small-sample ($N \leq 150$) randomized clinical trials with prognostic categorical baseline covariates, focusing on exact unconditional tests, Mantel-Haenszel methods, and $g$-computation (standardization) approaches. We find that several $g$-computation approaches exhibit inflated Type-I error in very small samples when standard Wald-type inference is applied, whereas robust or penalized variants improve error control at the expense of power. Classical methods such as the Mantel-Haenszel and Suissa-Shuster tests remain robust but may forgo efficiency gains from covariate adjustment. Overall, our results indicate that much of the observed Type-I error inflation reflects misalignment between estimand and variance estimation rather than small sample size alone. Based on these results, we provide practical recommendations to guide method selection that align the estimand, variance estimation, and inferential target.

2604.26272 2026-06-17 stat.ME 版本更新

TWICEBEE: A Two-stage Intra-patient Curve-free Bayesian Decision-Theoretic Dose Escalation Design

TWICEBEE: 一种两阶段患者内无曲线贝叶斯决策理论剂量递增设计

Dehua Bi, Katherine Ryan, Sabine Heitzeneder, Zina Good, John S. Tamaresis, Robert Lowsky, Michelle Monje, Crystal Mackall, Ying Lu

AI总结 针对多周期免疫治疗中毒性随周期递减的特点,提出两阶段患者内剂量递增设计,结合加速滴定与改进的无曲线贝叶斯决策理论框架,实现安全高效的剂量探索。

详情
AI中文摘要

我们提出了一种新颖的I期患者内剂量递增设计,专门针对多周期免疫治疗环境,其中固定剂量水平的毒性在临床预期中会随着后续治疗周期而降低。该设计源于一项CAR T细胞疗法的I期试验,这是一种新兴的细胞免疫疗法,已在癌症中确立应用,并在自身免疫性疾病中日益受到研究。该设计适用于临床上认为周期特异性毒性非递增假设合理的情况。具体而言,我们基于针对两药试验的改进无曲线贝叶斯决策理论(c-CFBD)设计的外推性质(Xu等人,2025),将治疗周期视为第二维度。通过重新定义偏序,c-CFBD框架可以适应跨周期的毒性降低。所提出的设计采用两阶段结构:初始加速滴定阶段以快速探索剂量水平,随后是c-CFBD阶段以提高安全性并估计周期特异性最大耐受剂量序列。跨多种场景的模拟研究显示了良好的操作特性。

英文摘要

We propose a novel Phase I intra-patient dose-escalation design tailored for multi-cycle immunotherapy settings, in which toxicity at a fixed dose level is clinically expected to decrease over successive treatment cycles. This design was motivated by a phase I trial of CAR T cell therapy, an emerging cellular immunotherapy with established applications in cancer and growing investigation in autoimmune disease. The design is intended for settings in which nonincreasing cycle-specific toxicity assumption is clinically justified. Specifically, we build on the extrapolation property of the modified curve-free Bayesian decision-theoretic (c-CFBD) design for two-agent trials (Xu, et al. 2025), treating treatment cycle as a second dimension. By redefining the partial order, the c-CFBD framework can accommodate the reduction in toxicity across cycles. The proposed design adopts a two-stage structure: an initial accelerated titration stage to rapidly explore dose levels, followed by a c-CFBD stage to improve safety and estimate the cycle-specific maximum tolerated dose sequence. Simulation studies across a range of scenarios demonstrate favorable operating characteristics.

2301.07386 2026-06-17 q-bio.NC stat.AP 版本更新

Hierarchical Bayesian inference for community detection and connectivity of functional brain networks

功能脑网络社区检测与连接性的层次贝叶斯推断

Lingbin Bian, Nizhuan Wang, Leonardo Novelli, Jonathan Keith, Adeel Razi

AI总结 提出基于贝叶斯潜在块模型的多层社区检测方法,在个体和群体层面稳健检测加权功能网络社区结构,保留个体变异性,并通过模拟和真实fMRI数据验证其准确性和可靠性。

详情
AI中文摘要

大多数功能性磁共振成像研究依赖于对层级组织的功能脑网络的估计,这些网络的分隔与整合反映了人类的认知和行为变化。然而,现有的从个体和群体层面分析方法中估计网络社区结构的大多数方法并未考虑受试者之间的变异性。在本文中,我们开发了一种基于贝叶斯潜在块模型(LBM)的新型多层社区检测方法。该方法能够在个体和群体层面稳健地检测具有未知社区数量的加权功能网络的社区结构,并保留个体网络的变异性。为了验证,我们提出了一种新的基于社区结构的多元高斯生成模型来模拟合成信号。我们的模拟研究表明,通过层次贝叶斯推断估计的社区成员身份与生成模型中预定义的节点标签一致。该方法还通过使用人类连接组项目中100名无关健康受试者的工作记忆任务fMRI数据的分半可重复性进行了测试。使用合成数据和真实数据的分析表明,与常用的(多层)模块性模型相比,我们提出的方法更准确、更可靠。

英文摘要

Most functional magnetic resonance imaging studies rely on estimates of hierarchically organized functional brain networks whose segregation and integration reflect the cognitive and behavioral changes in humans. However, most existing methods for estimating the community structure of networks from both individual and group-level analysis methods do not account for the variability between subjects. In this paper, we develop a new multilayer community detection method based on Bayesian latent block model (LBM). The method can robustly detect the community structure of weighted functional networks with an unknown number of communities at both individual and group levels and retain the variability of the individual networks. For validation, we propose a new community structure-based multivariate Gaussian generative model to simulate synthetic signal. Our simulation study shows that the community memberships estimated by hierarchical Bayesian inference are consistent with the predefined node labels in the generative model. The method is also tested via split-half reproducibility using working memory task fMRI data of 100 unrelated healthy subjects from the Human Connectome Project. Analyses using both synthetic and real data show that our proposed method is more accurate and reliable compared with the commonly used (multilayer) modularity models.

2601.11735 2026-06-17 stat.ME 版本更新

Identifying Conditions Favouring Multiplicative Heterogeneity Models in Network Meta-Analysis

识别网络荟萃分析中支持乘性异质性模型的条件

Xinlei Xu, Caitlin H Daly, Audrey Béliveau

AI总结 通过比较加性随机效应与乘性效应模型在nmadb数据库中的表现,发现乘性模型在拟合优度上相当或更优,且对极端观测和发表偏倚更稳健。

详情
AI中文摘要

在网络荟萃分析(NMA)中,对研究间异质性进行显式建模对于确保有效推断和避免夸大精度至关重要。虽然加性随机效应(RE)模型是传统方法,但乘性效应(ME)模型仍未得到充分探索。ME模型通过加权最小二乘法估计的共同因子膨胀研究内方差,产生与固定效应模型相同的点估计,同时膨胀置信区间。我们基于nmadb数据库中具有显著异质性的两臂研究NMA,实证比较了RE和ME模型,并使用Akaike信息准则评估模型拟合。ME模型通常提供与RE模型相当或更好的拟合。案例研究进一步揭示,RE模型对极端和不精确的观测敏感,而ME模型对此类观测赋予较小权重,因此对发表偏倚表现出更大的稳健性。我们的结果表明,在NMA实践中,ME模型值得与常规RE模型一同考虑。

英文摘要

Explicit modelling of between-study heterogeneity is essential in network meta-analysis (NMA) to ensure valid inference and avoid overstating precision. While the additive random-effects (RE) model is the conventional approach, the multiplicative-effect (ME) model remains underexplored. The ME model inflates within-study variances by a common factor estimated via weighted least squares, yielding identical point estimates to a fixed-effect model while inflating confidence intervals. We empirically compared RE and ME models across NMAs of two-arm studies with significant heterogeneity from the nmadb database, assessing model fit using the Akaike Information Criterion. The ME model often provided comparable or better fit to the RE model. Case studies further revealed that RE models are sensitive to extreme and imprecise observations, whereas ME models assign less weight to such observations and hence exhibit greater robustness to publication bias. Our results suggest that the ME model warrant consideration alongside conventional RE model in NMA practice.

2510.04421 2026-06-17 stat.ML cs.LG math.ST 版本更新

Learning Survival Models with Right-Censored Reporting Delays

学习带有右删失报告延迟的生存模型

Yuta Shikuri, Hironori Fujisawa

AI总结 针对报告延迟导致的生存数据右删失问题,联合建模事件和报告过程的参数风险,提出一致估计量和蒙特卡洛EM算法,并利用迁移学习提高行政删失下及时风险评估的准确性。

Comments 26 pages, 3 figures, 3 tables

详情
AI中文摘要

生存分析提供了对事件发生时间进行建模的统计方法。当事件发生时间未在发生时被观察到,而是仅在报告时被揭示时,就会出现报告延迟。当由于行政删失导致观察窗口较短时,这一问题对于及时风险评估尤为关键。在本研究中,我们通过对事件和报告过程联合建模参数风险,纳入了右删失报告延迟。然后,我们为模型参数构建了一致估计量,并开发了蒙特卡洛期望最大化算法来计算它。为了应对行政删失带来的挑战,我们利用这些发现并提出了一种迁移学习程序。实验结果表明,我们的方法提高了行政删失下及时风险评估的准确性。

英文摘要

Survival analysis provides statistical methods to model the time until an event occurs. Reporting delays arise when event times are not observed at their occurrence but are only revealed upon reporting. This issue is particularly critical for timely risk evaluation when the observation window is short due to administrative censoring. In this study, we incorporate right-censored reporting delays by jointly modeling parametric hazards for the event and reporting processes. We then construct a consistent estimator for the model parameters and develop a Monte Carlo expectation-maximization algorithm to compute it. To address the challenges posed by administrative censoring, we leverage these findings and propose a transfer-learning procedure. Experimental results demonstrate that our method improves the accuracy of timely risk evaluation under administrative censoring.

2401.05343 2026-06-17 q-bio.NC stat.ME 版本更新

Spectral Topological Data Analysis of Brain Signals

脑信号的谱拓扑数据分析

Anass El-Yaagoubi, Shuhao Jiao, Moo K. Chung, Hernando Ombao

AI总结 提出一种保留频率信息的拓扑摘要方法,通过基于相干性的过滤构建谱景观,用于脑电信号的两样本检验,并在ADHD数据中检测到拓扑差异。

Comments 32 pages, 13 figures

详情
AI中文摘要

脑功能连接的拓扑分析通常将每对通道简化为单个标量依赖(通常是皮尔逊相关),因此无法解析组织电生理学的频率特异性同步。我们提出一种保留频率信息的拓扑摘要。谱景观通过傅里叶频率索引Bubenik(2015)的持续景观,每个过滤基于相干性距离构建,因此它是过滤尺度和频率的函数。它在相干矩阵上是Lipschitz稳定的,并在选定频带上提供函数两样本检验,其极限零分布和一致性遵循标准函数数据论证。在模拟中,该检验在零假设下保持名义水平的同时,恢复了所在频带的拓扑差异。应用于53名对照和51名ADHD儿童的脑电图,全局检验在95%水平上拒绝了两组周期拓扑的相等性(p = 0.019);逐频带后续分析将差异定位到伽马和θ频带,尽管在此样本量下没有频带通过族系校正。该模式与这些频带在ADHD中的既定作用一致。

英文摘要

Topological analyses of brain functional connectivity usually reduce each pair of channels to a single scalar dependence, typically the Pearson correlation, and so cannot resolve the frequency-specific synchronisation that organises electrophysiology. We propose a topological summary that keeps the frequency information. The spectral landscape indexes the persistence landscape of Bubenik (2015) by Fourier frequency, building each filtration from a coherence-based distance, so that it is a function of both the filtration scale and the frequency. It is Lipschitz-stable in the coherence matrix and feeds a functional two-sample test over a chosen frequency band, whose limiting null distribution and consistency follow from standard functional-data arguments. In simulations the test recovers a topological difference in the band where it lives while holding its nominal level under the null. Applied to electroencephalography from 53 control and 51 ADHD children, a global test rejects equality of the two groups' cycle topology at the 95% level (p = 0.019); a band-by-band follow-up localises the difference to the gamma and theta bands, although none survives family-wise correction at this sample size. The pattern is consistent with the established role of these bands in ADHD.

9. 经济金融与社会科学统计 5 篇

2606.17723 2026-06-17 stat.AP 新提交

Tail Dependence in EU Carbon Markets: Graphical Models of Extremes for EUA Futures

欧盟碳市场中的尾部依赖:EUA期货的极值图模型

Jan Maciejowski, Manuele Leonelli

AI总结 应用Hüsler-Reiss极值图模型分析EU ETS第三、四阶段20个日度变量,发现尾部网络比平均依赖网络更密集、中心节点不同,且EUA期货在尾部网络中中心性最高,而股指和外汇对则相反。

详情
AI中文摘要

理解极端价格波动如何在金融和能源市场间传播,对于欧盟排放交易体系(EU ETS)的风险管理和监管设计至关重要。我们将Hüsler-Reiss极值图模型应用于一个包含20个日度变量的系统,这些变量围绕EU ETS第三和第四阶段(2013-2025年)的EUA期货,并以高斯图模型作为平均依赖基线。尾部网络在结构上与平均依赖网络截然不同:密度显著更高,围绕不同的中心节点组织,并受部门内同质性支配,这种同质性比平均依赖水平更紧密地约束了部门边界。EUA期货在标准图模型中处于边缘位置,但在尾部网络中达到最高中心性,而股指和主要外汇对则呈现相反趋势。指数随机图模型确认了所有样本期内尾部网络中股票和外汇的边缘性,并识别出市场下行期间的三角闭合是第三阶段的现象,在第四阶段消失。阶段转变重构了尾部网络而未使其稀疏化:平均依赖急剧收缩,而尾部依赖持续存在,崩溃传染从聚集传播转变为扩散传播。这些发现对合规实体的对冲构建、监管机构的压力测试校准以及EU ETS市场系统性风险监测工具的设计具有直接意义。

英文摘要

Understanding how extreme price movements propagate across financial and energy markets is critical for risk management and regulatory design in the EU Emissions Trading System (EU ETS). We apply Hüsler-Reiss graphical models of extremes to a system of 20 daily variables centred on EU allowances futures across Phases 3 and 4 of the EU ETS (2013--2025), with a Gaussian graphical model as the average-dependence baseline. The tail networks are structurally distinct from the average dependence network: substantially denser, organized around different central nodes, and governed by within-sector homophily that binds sector boundaries more tightly than at the average-dependence level. EU allowances futures are peripheral in the standard graphical model but achieve the highest centrality in the tail networks, while equity indices and major FX pairs follow the opposite trajectory. Exponential random graph models confirm equity and FX peripherality in tail networks across all sample periods and identify triadic closure during market downturns as a Phase~3 phenomenon that vanishes in Phase~4. The phase transition restructures the tail network without thinning it: average dependence contracts sharply while tail dependence persists, and crash contagion shifts from clustered to diffuse propagation. These findings have direct implications for hedge construction by compliance entities, stress-test calibration by regulators, and the design of systemic-risk monitoring tools for EU ETS markets.

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(\tau\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2602.19201 2026-06-17 econ.EM stat.ME 版本更新

Panel Quantile Regression with Common Shocks

面板分位数回归与共同冲击

Harold D. Chiang, Antonio F. Galvao, Chia-Min Wei

AI总结 本文发展了一种对共同冲击稳健的面板分位数回归渐近与推断理论,提出了在共同冲击存在下仍一致的协方差估计量,放宽了截面独立和T≫N的假设。

详情
AI中文摘要

本文为固定效应面板分位数回归(FEQR)发展了一种渐近与推断理论,该理论对普遍存在的共同冲击具有稳健性。这种冲击引起截面依赖性,这在许多经济和金融面板中是核心问题,但在现有的FEQR理论中很大程度上被忽略,现有理论通常假设截面独立性并要求$T \gg N$。我们证明,在温和条件$(\log N)^2/T \to 0$下,标准FEQR估计量仍保持渐近正态,从而适应经验相关的情形,包括$T \ll N$的情况。我们进一步证明,共同冲击从根本上改变了渐近协方差结构,使得传统协方差估计量不一致,并提出了一个简单的协方差估计量,在存在和不存在共同冲击的情况下均保持一致。因此,所提出的程序提供了有效的稳健推断,无需事先了解依赖结构,从而大大扩展了FEQR方法在实际面板数据环境中的适用性。

英文摘要

This paper develops an asymptotic and inferential theory for fixed-effects panel quantile regression (FEQR) that delivers inference robust to pervasive common shocks. Such shocks induce cross-sectional dependence that is central in many economic and financial panels but largely ignored in existing FEQR theory, which typically assumes cross-sectional independence and requires $T \gg N$. We show that the standard FEQR estimator remains asymptotically normal under the mild condition $(\log N)^2/T \to 0$, thereby accommodating empirically relevant regimes, including those with $T \ll N$. We further show that common shocks fundamentally alter the asymptotic covariance structure, rendering conventional covariance estimators inconsistent, and we propose a simple covariance estimator that remains consistent both in the presence and absence of common shocks. The proposed procedure therefore provides valid robust inference without requiring prior knowledge of the dependence structure, substantially expanding the applicability of FEQR methods in realistic panel data settings.

2604.14257 2026-06-17 econ.GN stat.AP 版本更新

Mapping the causal structure of price formation in Texas's transitioning electricity market

德克萨斯州转型电力市场中价格形成的因果结构映射

Shiva Madadkhani, Nils Sturma, Mathias Drton, Svetlana Ikonnikova

AI总结 采用因果发现方法研究德克萨斯州电力市场,发现风电已成为日前电价的主要因果驱动因素,其影响是天然气的三倍以上,但价格抑制效应在高峰时段减弱,且风电增长将阻塞成本重新分配给远距离负荷中心。

详情
AI中文摘要

可再生能源的部署以及电气化和大型数字负荷带来的需求增长正在改变电力市场。然而,这些发展如何重塑电价动态仍知之甚少,导致系统规划者、容量投资者和市场参与者依赖于热力主导时代的假设,而这些假设可能不再成立。我们使用因果发现来研究正在经历快速转型的德克萨斯州的批发电价演变。我们的发现推翻了德克萨斯州是一个天然气价格驱动市场的观点,证明了风电已成为日前价格的主要因果驱动因素,其影响是天然气的三倍以上。然而,风电的价格抑制效应在高峰时段正在减弱,并且风电增长将阻塞成本重新分配给远距离负荷中心。此外,德克萨斯州南部和西部负荷的上升改变了系统价格和区域差异。通过揭示因果驱动因素的时空演变性质,我们的分析表明,新发电和大负荷的节奏、地理选址和相对规模将对未来的电价风险、基础设施需求和投资具有决定性作用。

英文摘要

Renewable deployment and rising demand from electrification and large digital loads are transforming electricity markets. However, how these developments reshape electricity price dynamics remains poorly understood, leaving system planners, capacity investors, and market participants reliant on assumptions from a thermal-dominated era that may no longer hold. We use causal discovery to study the evolution of wholesale electricity prices in Texas, which is undergoing rapid transformation. Our findings overturn the view of Texas as a gas-price-driven market, demonstrating that wind generation has become the dominant causal driver of day-ahead prices, with effects more than three times greater than those of natural gas. Yet wind's price-suppressing effect is weakening during peak periods, and wind growth redistributes congestion costs to distant load centres. Furthermore, rising load in South and West Texas alters system prices and regional differentials. Uncovering the evolving spatiotemporal nature of causal drivers, our analysis reveals that the pace, geographic siting, and relative scale of new generation and large loads will be decisive for future electricity price risks, infrastructure needs, and investments.

2603.27049 2026-06-17 stat.ML cs.LG 版本更新

Overcoming the Incentive Collapse Paradox

克服激励崩溃悖论

Qichuan Yin, Ziwei Su, Shuangning Li

AI总结 针对AI辅助任务中激励崩溃问题,提出哨兵审计支付机制,在有限成本下维持正人力努力,并构建激励感知的主动统计推断框架优化审计率与采样分配。

Comments Accepted to ICML 2026

详情
AI中文摘要

AI辅助任务委派日益普遍,但此类系统中的人力成本高昂且通常不可观测。Bastani和Cachon (2025); Sambasivan等人 (2021) 的最新研究表明,基于准确度的支付方案存在激励崩溃:随着AI准确度提升,维持正向人力努力需要无界支付。我们在预算约束的委托-代理框架中研究这一现象,其中战略型人类代理的输出准确度取决于不可观测的努力。我们的第一个贡献是一般性不可能结果,表明激励崩溃不仅是简单线性支付的局限,而是任何仅基于观测任务结果的支付规则都会出现。为克服这一障碍,我们提出一种哨兵审计支付机制,该机制以有限成本强制执行严格为正且可控的人力努力水平,且与AI准确度无关。在此激励鲁棒的基础上,我们构建了一个激励感知的主动统计推断框架,联合优化(i)审计率和(ii)跨不同难度任务的主动采样与预算分配,以在单一预算下最小化最终统计损失。实验表明,相对于标准主动学习和仅审计基线,该方法改善了成本-误差权衡。

英文摘要

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task this http URL overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

10. 数据隐私、稳健性与公平性 7 篇

2606.17995 2026-06-17 stat.ML cs.CR cs.LG 新提交

Differential Privacy of Gaussian Process Posterior Sampling

高斯过程后验采样的差分隐私

Tomasz Maciazek

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院)

AI总结 研究高斯过程后验样本路径的隐私性,通过Rényi-DP界分离后验均值与协方差泄露,揭示有效岭正则化的关键作用,并验证成员推断攻击与正则化的依赖关系。

Comments 8 pages of main text + 25 pages appendix

详情
AI中文摘要

我们研究了当整个训练集(包括协变量和响应)是私有时,从高斯过程(GP)发布后验样本路径的隐私性。与添加外部噪声的标准差分隐私(DP)机制不同,后验采样在构造上是随机的。我们表明,这种内在随机性通过推导GP后验样本路径发布的显式Rényi-DP界来提供DP保证。这些界将后验均值泄露与数据相关的后验协方差泄露分开,表明有意义的隐私严重依赖于有效的岭正则化。我们应用成员推断攻击来表明经验泄露遵循对正则化、后验方差和发布的样本路径数量的预测依赖关系。在下游后验采样任务上的效用实验识别了噪声观测机制,其中隐私兼容的正则化以适度的效用损失保留了有用的决策。当需要更强的隐私时,可以通过添加校准的GP噪声来增强内在保证,提供显式的额外隐私调节旋钮。

英文摘要

We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that this intrinsic randomness yields DP guarantees by deriving explicit Rényi-DP bounds for GP posterior sample-path release. The bounds separate posterior-mean leakage from data-dependent posterior-covariance leakage showing that meaningful privacy depends sharply on effective ridge regularisation. We apply membership-inference attacks to show that empirical leakage follows the predicted dependence on regularisation, posterior variance and the number of released posterior sample-paths. Utility experiments on downstream posterior-sampling tasks identify noisy-observation regimes where privacy-compatible regularisation preserves useful decisions with modest utility loss. When stronger privacy is needed, the intrinsic guarantee can be sharpened by adding calibrated GP noise, providing an explicit additional privacy knob.

2606.17684 2026-06-17 stat.ML cs.CY cs.LG 新提交

Geometrical fairness in graph neural networks

图神经网络中的几何公平性

Arturo Pérez-Peralta, Sandra Benítez-Peña, Blas Kolic, Rosa E. Lillo

发表机构 * Department of Statistics, University Carlos III of Madrid, Spain(马德里卡斯蒂利亚-拉曼恰大学统计系) uc3m-Santander Big Data Institute(uc3m-桑坦德大数据研究所)

AI总结 针对图神经网络中公平性问题,通过修改拉普拉斯算子引入多种互补变换(子空间投影、频谱调整、频率滤波)来缓解偏差,理论分析并实验验证了公平性提升与竞争性能。

Comments 32 pages, 21 tables, 6 figures

详情
AI中文摘要

基于图的学习方法因其在多种应用中的强大性能而日益突出。其中,基于扩散过程的最新框架提供了一个统一的视角,扩展了传统的图神经网络公式,同时解决了标准消息传递机制的局限性。尽管取得了这些进展,但此类模型的公平性问题仍然令人担忧,因为它们可能传播或放大数据中存在的偏差。在这项工作中,我们通过修改底层拉普拉斯算子,引入了一种基于图扩散的公平性感知适应方法。我们的方法结合了多种互补变换,包括子空间投影、频谱调整和基于频率的滤波,以减轻与偏差相关的成分。利用图扩散的内在平滑特性,我们对由此产生的行为进行了原则性分析,并建立了公平性属性的理论见解。我们在合成数据集和真实数据集上评估了所提出的框架,结果表明,在有限的计算成本下,它实现了具有竞争力的性能,同时提高了公平性指标。

英文摘要

Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

2507.20708 2026-06-17 cs.LG math.OC stat.AP 版本更新

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

揭露公平的幻象:审计对分布操纵攻击的脆弱性

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

AI总结 研究恶意被审计方如何通过分布操纵制造公平假象,提出基于熵和最优传输的操纵策略,并评估统计检验的检测能力,为监管验证提供指导。

详情
AI中文摘要

人工智能系统在高风险领域(包括欧盟AI法案(Regulation (EU) 2024/1689)归类为高风险的领域)的快速部署,加剧了对可靠合规审计的需求。对于二分类器,监管风险评估通常依赖于全局公平性指标,如差异影响比,该指标广泛用于评估潜在歧视。在典型的审计设置中,被审计方将其数据集的一个子集提供给审计方,而监管机构可能验证该子集是否代表完整的底层分布。在这项工作中,我们研究了恶意被审计方在多大程度上可以从一个不合规的原始分布中构建一个符合公平性且看似具有代表性的样本,从而制造公平的幻象。我们将该问题形式化为一个受约束的分布投影任务,并引入基于熵和最优传输投影的数学基础操纵策略。这些构造刻画了满足公平约束所需的最小分布偏移。为了对抗此类攻击,我们通过基于分布距离的统计检验形式化代表性,并系统评估其检测操纵样本的能力。我们的分析强调了公平性操纵在统计上未被检测到的条件,并为加强监管验证提供了实用指南。我们通过在用于偏差检测的标准表格数据集上进行实验来验证我们的理论发现。代码公开于 https://this URL。

英文摘要

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at this https URL.

2602.08470 2026-06-17 cs.LG stat.ML 版本更新

Learning Credal Ensembles via Distributionally Robust Optimization

通过分布鲁棒优化学习信度集成

Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez

AI总结 提出CreDRO方法,通过分布鲁棒优化学习集成模型,捕获由训练与测试数据分布偏移导致的认知不确定性,在分布外检测和选择性分类任务上优于现有方法。

Comments Accepted by ICML 2026 as Spotlight paper ( this https URL (https://icml.cc/virtual/2026/poster/62862) )

详情
AI中文摘要

信度预测器是能够感知认知不确定性并产生凸集概率预测的模型。它们提供了一种量化预测认知不确定性(EU)的原则性方法,并已被证明能在各种设置下提高模型鲁棒性。然而,大多数最先进的方法主要将EU定义为由随机训练初始化引起的不一致性,这主要反映对优化随机性的敏感性,而非来自更深层次来源的不确定性。为了解决这一问题,我们将EU定义为在训练数据和测试数据之间i.i.d.假设的不同松弛下训练的模型之间的不一致性。基于这一思想,我们提出CreDRO,通过分布鲁棒优化学习一个由合理模型组成的集成。因此,CreDRO不仅从训练随机性中捕获EU,还从由于训练和测试数据之间潜在分布偏移而产生的有意义的不一致性中捕获EU。实验结果表明,CreDRO在多个基准的分布外检测和医学应用中的选择性分类等任务上,始终优于现有的信度方法。

英文摘要

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

2602.06276 2026-06-17 cs.LG stat.ML 版本更新

Statistical Learning from Attribution Sets

从归因集合中进行统计学习

Lorne Applebaum, Robert Busa-Fekete, August Y. Chen, Claudio Gentile, Tomer Koren, Aryan Mokhtari

AI总结 针对隐私约束下广告点击与转化无法直接关联的问题,提出基于归因集合的无偏损失估计方法,实现经验风险最小化的泛化保证,并优于行业启发式方法。

Comments COLT 2026. 45 pages

详情
AI中文摘要

我们解决了隐私约束下广告领域转化预测模型的训练问题,其中点击和转化之间缺乏直接链接。受隐私保护浏览器API和第三方cookie弃用的启发,我们研究了一种设置,其中学习器观察到一系列点击和一系列转化,但只能将转化与一组候选点击(归因集合)相关联,而不是唯一的来源。我们将此形式化为从由具有候选先验分布的无知对手生成的归因集合中进行学习。尽管缺乏显式标签,我们通过一种新颖的方法从这些粗粒度信号中构建了总体损失的无偏估计量。利用该估计量,我们表明经验风险最小化实现了泛化保证,该保证随先验的信息量而缩放,并且对先验的估计误差也具有鲁棒性,尽管归因集合之间存在复杂的依赖关系。在标准数据集上的简单实证评估表明,我们的无偏方法显著优于常见的行业启发式方法,特别是在归因集合较大或重叠的情况下。

英文摘要

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

2602.04155 2026-06-17 stat.ML cs.GT cs.LG 版本更新

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

最大化相对改进:将公平学习视为讨价还价问题

Jiwoo Han, Moulinath Banerjee, Yuekai Sun

AI总结 提出将群体公平解释为子群体间的讨价还价问题,通过相对改进指标恢复Kalai-Smorodinsky解,并给出公理化和有限样本收敛保证。

Comments Accepted at ICML 2026

详情
AI中文摘要

当在多个子群体上部署单一预测器时,我们提出了一种根本不同的方法:将群体公平解释为子群体间的讨价还价问题。这种博弈论视角揭示了现有的鲁棒优化方法(如最小化最差群体损失或遗憾)对应于经典的讨价还价解,并体现了不同的公平原则。我们提出了相对改进,即实际风险降低相对于基线预测器潜在降低的比率,它恢复了Kalai-Smorodinsky解。与当群体具有不同潜在可预测性时可能不可比较的绝对尺度方法不同,相对改进提供了公理化理由,包括尺度不变性和个体单调性。我们在温和条件下建立了有限样本收敛保证。

英文摘要

When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

2503.10945 2026-06-17 cs.LG cs.AI cs.CR stat.ML 版本更新

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

高斯差分隐私:机器学习中报告差分隐私保证的方法

Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P. Calmon, Jamie Hayes, Borja Balle, Antti Honkela

AI总结 针对当前机器学习中差分隐私报告不完整的问题,提出使用非渐近高斯差分隐私(GDP)作为主要报告方式,通过数值会计和决策理论度量,证明GDP能无误差地捕获DP-SGD等算法的完整隐私特征。

Comments IEEE SatML 2026 (position paper track)

详情
AI中文摘要

当前报告机器学习算法(如DP-SGD)的差分隐私(DP)保证的做法提供了不完整且可能误导的图景。例如,如果仅知道机制的一个$(\varepsilon, \delta)$,标准分析表明可能存在针对训练数据记录的高精度推理攻击,而更仔细的分析发现,对于大多数实际机制,这种精确攻击并不存在。在这篇立场论文中,我们主张使用_非渐近_高斯差分隐私(GDP)作为机器学习中传达DP保证的主要手段,以避免这些潜在缺点。利用DP文献中的两个最新进展:(i)能够以任意精度计算DP-SGD的隐私配置文件和$f$-DP曲线的开源数值会计,以及(ii)关于DP表示的决策理论度量,我们展示了如何使用数值会计提供GDP的非渐近界,并表明GDP能够以几乎无误差的方式捕获DP-SGD及相关算法的整个隐私配置文件(由该度量量化)。为了支持我们的主张,我们研究了最先进的DP大规模图像分类以及美国十年人口普查的TopDown算法的隐私配置文件,观察到GDP在所有情况下都与其配置文件拟合得非常好。最后,我们讨论了这种方法的优缺点,并探讨了哪些其他隐私机制可以从GDP中受益。

英文摘要

Current practices for reporting differential privacy (DP) guarantees for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture. For instance, if only a single $(\varepsilon, \delta)$ is known about a mechanism, standard analyses show that there could exist highly accurate inference attacks against training data records, when, upon a more careful analysis, such accurate attacks do not exist for most practical mechanisms. In this position paper, we argue that using _non-asymptotic_ Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.

11. 数据集、软件与应用 4 篇

2606.18113 2026-06-17 stat.ME 新提交

Undocumented Behavior in the gsynth R package and its Consequences for Three Published Studies

gsynth R包中的未记录行为及其对三项已发表研究的影响

Beniamino Green, P. M. Aronow

AI总结 研究发现gsynth包在特定选项组合下因实现错误严重低估标准误,导致假阳性率升高,并影响三篇APSR论文的结论。

详情
AI中文摘要

在2025年12月CRAN上的1.3.1版本更新之前,gsynth(一个用于估计交互固定效应(IFE)模型的流行R包)可能严重且系统地低估标准误。当两个估计选项(inference = "parametric" 和 EM = TRUE)同时使用时,会出现这种低估,此时该包会对Gobillon和Magnac(2016)的IFE-EM估计量应用参数自助法。该包在2025年12月停止支持这种组合,最新文档现在描述参数自助法因理论上的不兼容性而不适用于IFE-EM估计量。我们的重点是在gsynth的1.3.1之前版本中发现的实现错误:当EM = TRUE时使用的参数自助法与Xu(2017)提出的算法不匹配,使用了样本内残差而非样本外误差。我们证明,仅此实现错误就可能导致低估数个数量级。我们进行了一项实证蒙特卡洛研究,在一系列州级面板数据集上随机分配安慰剂处理,并表明gsynth在现实环境中可能产生高假阳性率。我们识别出三篇发表在《美国政治科学评论》上的论文受到此行为的影响。重新分析这些论文的相关部分,我们表明:(i)纠正实现错误后,大多数发现变得不显著;(ii)使用Xu(2017)的广义合成控制方法替代IFE-EM后,所有发现均变得不显著。

英文摘要

Prior to the version 1.3.1 update on CRAN in December 2025, gsynth, a popular R package for estimating Interactive Fixed Effects (IFE) models, could drastically and systematically underestimate standard errors. This underestimation would occur when two estimation options (inference = "parametric", and EM = TRUE) were used together, in which case the package would apply a parametric bootstrap procedure to Gobillon and Magnac (2016)'s IFE-EM estimator. The package ceased supporting this combination in December 2025, and the latest documentation now describes the parametric bootstrap as not suitable for use with the IFE-EM estimator due to a theoretical incompatibility. Our focus is an implementation error we identified in the pre-1.3.1 versions of gsynth: the parametric bootstrap used when EM = TRUE did not match the algorithm proposed in Xu (2017), using in-sample residuals instead of out-of-sample errors. We show that this implementation error alone can cause underestimation by orders of magnitude. We conduct an empirical Monte Carlo study using randomly assigned placebo treatments on a series of state-level panel datasets, and show that gsynth could yield high false positive rates in realistic settings. We identify three papers published in the American Political Science Review that are affected by this behavior. Reanalyzing the relevant sections of these papers, we show that (i) correcting the implementation error renders most findings insignificant, and (ii) using Xu (2017)'s Generalized Synthetic Control method in place of IFE-EM renders every finding insignificant.

2606.17261 2026-06-17 cs.PF cs.SE stat.AP 新提交

The Right Call for Software Benchmarking: Consistent Decisions in Stateful Environments

软件基准测试的正确调用:有状态环境下的一致决策

Gábor Melis

AI总结 针对有状态环境下基准测试偏差问题,提出基于对比估计量的实验设计,消除程序特定偏差,实现渐近正确决策。

详情
AI中文摘要

在对性能的不懈追求中,现代计算系统越来越依赖有状态机制来适应工作负载和物理环境的动态变化,这提高了效率,但使基准测试以及软件优化变得困难。事实上,自适应机制本质上会在测量之间引入时间依赖性,并导致对单个程序性能的朴素估计产生偏差。注意到纠正此类偏差需要对系统动态进行推测性假设,我们呼吁优先考虑性能差异而非绝对度量,并将软件基准测试形式化为识别最快程序的决策问题,对此相对知识就足够了。为此,我们提出了简单的实验设计,允许对比的一致估计,从而使程序特定偏差在可接受的假设下抵消。这些设计渐近地产生正确的决策,并为有状态环境下的有限预算基准测试提供了一种稳健的方法,对性能敏感软件的开发具有广泛的影响。

英文摘要

In the perpetual pursuit of performance, modern computing systems rely ever more on stateful mechanisms to accommodate the dynamics of workloads and physical environments, bolstering efficiency but confounding benchmarking and thereby the optimization of software. Indeed, by their nature, adaptive mechanisms introduce temporal dependencies between measurements and render naive estimators of individual program performance biased. Observing that rectifying such biases necessitates speculative assumptions about system dynamics, we call for prioritizing performance differentials over absolute measures and formalize software benchmarking as the decision problem of identifying the fastest program, for which relative knowledge suffices. To this end, we propose simple experiment designs admitting consistent estimators of contrasts, whereby program-specific biases cancel under tenable assumptions. These designs asymptotically yield the correct decision and afford a robust methodology for finite-budget benchmarking in stateful environments, bearing broad implications for the development of performance-sensitive software.

2605.24003 2026-06-17 cs.CV cs.AI stat.AP 版本更新

Remote sensing data imputation using deep learning for multispectral imagery

基于深度学习的多光谱遥感数据插补

Shuang Liu, Fiona Johnson, Rohitash Chandra

发表机构 * Water Research Centre, University of New South Wales(新南威尔士大学水研究中心) ARC ITTC Data Analytics for Resources and Environments, University of New South Wales(新南威尔士大学资源与环境数据分析师联盟) Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics, University of New South Wales(新南威尔士大学数学与统计学过渡人工智能研究组)

AI总结 针对云覆盖导致的光学卫星数据缺失问题,本研究比较了线性插值与多种深度学习模型(CNN、Inception Resnet、Autoencoder及其与LSTM的组合)在四个有藻华历史记录的湖泊中重建缺失光谱波段的效果,发现深度学习模型显著优于基线方法,其中CNN表现最佳,且基于插补图像的藻华指数与观测数据吻合良好。

详情
AI中文摘要

近年来,遥感技术在水体应用中得到越来越多的利用。使用光学卫星数据的一个常见挑战是由于云覆盖导致的观测缺失。这些数据缺口可能导致错过对水资源管理部门高度关注的湖泊中关键事件(如藻华)的检测。因此,提高光学卫星数据集的完整性对于改善藻华的监测和预测至关重要。在本研究中,我们比较了传统数据插补方法(即线性插值)与深度学习模型在四个有藻华历史记录的湖泊中重建缺失光谱波段的效果。采用的深度学习模型包括基于CNN的架构(即CNN、Inception Resnet和Autoencoder)以及基于CNN-LSTM的架构(即CNN-LSTM、Resnet-LSTM和Autoencoder-LSTM)。我们的结果表明,在人工掩膜区域内插补光谱波段值时,深度学习模型显著优于基线线性插值方法。在这些模型中,CNN在大多数湖泊中表现最佳。此外,我们通过将插补图像与观测数据进行比较,评估了基于插补图像的藻华指数(即Green/Red和NDCI)的性能。我们的结果表明,深度学习模型对于插补PlanetScope SuperDove影像中的缺失数据是有效的,从而能够实现更可靠的水体监测应用。

英文摘要

Remote sensing techniques have been increasingly utilised in aquatic applications in recent years. A common challenge in using optical satellite data is the presence of missing observations due to cloud cover. These data gaps can lead to missed detection of critical events, such as algal blooms, in lakes of high interest to water authorities. As a result, enhancing the completeness of optical satellite datasets is crucial for improving the monitoring and prediction of algal blooms. In this study, we compared a traditional data imputation method (i.e., linear interpolation) with deep learning models for reconstructing missing spectral bands across four lakes with historical records of algal blooms. The deep learning models adopted include CNN-based architectures (i.e., CNN, Inception Resnet, and Autoencoder) and CNN-LSTM-based architectures (i.e., CNN-LSTM, Resnet-LSTM, and Autoencoder-LSTM). Our results demonstrated that deep learning models substantially outperformed the baseline linear interpolation method in imputing spectral band values within artificially masked regions. Among these models, CNN delivered the best performance across most lakes. Furthermore, we evaluated the performance of algal bloom indices (i.e., Green/Red and NDCI) derived from the imputed imagery by comparing them with the observed data. Our results demonstrate that deep learning models are effective for imputing missing data in PlanetScope SuperDove imagery, enabling more reliable applications in water monitoring.

2408.04327 2026-06-17 stat.ME stat.AP stat.CO 版本更新

BayesFBHborrow: An R Package for Bayesian borrowing for time-to-event data from a flexible baseline hazard

BayesFBHborrow: 基于灵活基线风险的贝叶斯借用方法用于时间-事件数据的R包

Darren Scott, Sophia Axillus, Alex Lewin, Grant Izmirlian

AI总结 提出一种半参数贝叶斯借用模型,通过平滑先验灵活建模基线风险,利用“块-涂抹”先验增强对非交换历史数据的鲁棒性,并开发R包实现协变量调整借用与边际风险比估计。

详情
AI中文摘要

利用外部试验信息加速药物开发的统计方法越来越受欢迎。贝叶斯方法促进了动态借用,其中响应的相似性决定了使用多少信息。我们提出了一种用于时间-事件数据的半参数贝叶斯借用模型,采用平滑先验,通过集成平均允许基线风险取任何形式 \citep{Scott2024}。通过精确建模基线风险,而不是通过固定分段区间近似其形式,当参数可交换性的借用假设成立时,可以提高功效并减少估计治疗效应的偏倚。一种“块-涂抹”借用先验通过增加借用对先验-数据冲突存在的敏感性,使模型对不可交换的历史数据具有鲁棒性,从而减少I类错误膨胀的可能性。我们介绍了BayesFBHborrow,一个实现带有历史对照的半参数贝叶斯借用模型的R包。我们演示了如何选择最优借用超参数。该模型支持协变量调整借用,当结果差异可归因于协变量分布变化时,可以减少先验-数据冲突并提高功效。由于治疗效应估计量不可折叠,可以通过贝叶斯G计算估计边际风险比,同时仍允许进行校正分析以考虑对照组漂移。我们在模拟和真实数据集上展示了贝叶斯灵活基线风险模型,针对边际估计量,进行了未调整和调整分析。

英文摘要

Statistical methods that leverage external trial information to help accelerate drug development are becoming increasingly popular. Bayesian methods facilitate dynamic borrowing, where the similarity of the response guides how much information is used. We have proposed a semiparametric Bayesian borrowing model for time-to-event data, with smoothing priors that allows the baseline hazard to take any form via an ensemble average \citep{Scott2024}. By accurately modelling the baseline hazard, rather than approximating its form via fixed piecewise intervals, power is improved and bias of the estimated treatment effect reduced when the borrowing assumption of parameter exchangeability holds. A ``lump-and-smear'' borrowing prior makes the model robust to non-exchangeable historical data by increasing the sensitivity of borrowing to the presence of prior-data conflict, reducing the potential for type I error inflation. We present BayesFBHborrow, an R package that implements our semiparametric Bayesian borrowing model with a historical control. We demonstrate how to select the optimal borrowing hyperparameters. The model supports covariate-adjusted borrowing, which can reduce prior-data conflict and improve power when differences in outcomes are attributable to changes in the covariate distribution. As the treatment effect estimator is non-collapsible, the marginal hazard ratio can be estimated via Bayesian G-computation, while still permitting an adjusted analysis to account for control group drift. We illustrate the Bayesian flexible baseline hazard model on a simulated and real dataset with a marginal estimand, for both an unadjusted and adjusted analyses.

12. 其他/综合统计 2 篇

2604.07336 2026-06-17 astro-ph.CO astro-ph.IM physics.data-an stat.AP 版本更新

The Non-Gaussian Weak-Lensing Likelihood: A Multivariate Copula Construction and Impact on Cosmological Constraints

非高斯弱引力透镜似然:多元Copula构建及其对宇宙学约束的影响

Veronika Oehl, Tilman Tröster

AI总结 提出用Copula方法构建两点相关函数的非高斯似然,在大尺度上比高斯似然更准确,但对Stage-IV巡天影响可忽略。

Comments 16 pages, 5 figures in the main text. Published in the Open Journal of Astrophysics

详情
AI中文摘要

我们提出了一个计算两点相关函数的非高斯似然的框架。非高斯性在Stage-IV弱引力透镜巡天将精确测量的大尺度上最为显著。我们展示了如何通过Copula方法构建并高效评估这种多元似然,该方法结合了精确的一维边缘分布和来自精确多元似然的依赖结构。发现Copula似然与相关函数的模拟抽样分布比高斯似然更一致,尤其是在大尺度上。此外,我们研究了非高斯Copula似然对后验推断的影响,包括对当代弱引力透镜分析的全参数空间采样。我们发现对于$1\\ 000 \\ \mathrm{deg}^2$巡天,$S_8$可能存在约一个标准差的参数偏移,但对于$10\\ 000 \\ \mathrm{deg}^2$区域偏移可忽略,表明高斯似然对于Stage-IV巡天是足够的,尽管结果依赖于详细的掩膜几何和数据向量结构。

英文摘要

We present a framework to compute non-Gaussian likelihoods for two-point correlation functions. The non-Gaussianity is most pronounced on large scales that will be well-measured by stage-IV weak-lensing surveys. We show how such a multivariate likelihood can be constructed and efficiently evaluated using a copula approach by incorporating exact one-dimensional marginals and a dependence structure derived from the exact multivariate likelihood. The copula likelihood is found to be in better agreement with simulated sampling distributions of correlation functions than Gaussian likelihoods, particularly on large scales. We furthermore investigate the effect of the non-Gaussian copula likelihood on posterior inference, including sampling the full parameter space of contemporary weak-lensing analyses. We find potential parameter shifts in $S_8$ on the order of one standard deviation for $1 \ 000 \ \mathrm{deg}^2$ surveys but negligible shifts for areas of $10 \ 000 \ \mathrm{deg}^2$, suggesting Gaussian likelihoods are sufficient for stage-IV surveys, though results depend on the detailed mask geometry and data-vector structure.

2601.18252 2026-06-17 cs.CV cs.AI cs.LG stat.ML 版本更新

Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

Co-PLNet: 一种用于提示引导的线框解析的协作点线网络

Chao Wang, Xuanying Li, Cheng Dai, Jinglei Feng, Yuxiang Luo, Hao Qin, Yuqi Ouyang

AI总结 提出点线协作框架Co-PLNet,通过点线提示编码器交换空间线索,并利用交叉引导线解码器增强点线一致性,在Wireframe和YorkUrban数据集上提升线框解析的准确性和鲁棒性。

详情
AI中文摘要

线框解析旨在恢复线段及其连接点,以形成结构化的几何表示,用于同时定位与地图构建(SLAM)等下游任务。现有方法分别预测线和点,并在事后进行调和,导致不匹配和鲁棒性降低。我们提出Co-PLNet,一个点线协作框架,在两个任务之间交换空间线索,其中早期检测通过点线提示编码器(PLP-Encoder)转换为空间提示,该编码器将几何属性编码为紧凑且空间对齐的图。交叉引导线解码器(CGL-Decoder)随后通过基于互补提示的稀疏注意力细化预测,强制点线一致性和效率。在Wireframe和YorkUrban上的实验显示,准确性和鲁棒性持续改进,同时具有有利的实时效率,证明了我们在结构化几何感知中的有效性。我们的代码可在该 https URL 获取。

英文摘要

Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception. Our code is available at this https URL.