arXivDaily arXiv每日学术速递 周一至周五更新

1. 统计理论与方法 10 篇

2606.20406 2026-06-19 stat.ME stat.CO 新提交

Flexible modeling of bimodal distributions via skewed-$t$ mixtures

双峰分布的灵活建模:基于偏斜-t分布的混合模型

Marco Bee, Flavio Santi

AI总结 提出基于Fernández和Steel (1998)偏斜-t分布的混合模型,通过EM算法进行极大似然估计,并开发似然比检验,用于拟合双峰、偏斜和厚尾数据,在标准普尔500指数中验证了双峰性。

详情
AI中文摘要

我们提出了一种位置-尺度偏斜-t分布的混合模型,用于拟合双峰、偏斜和厚尾数据。特别地,该混合模型基于Fernández和Steel (1998)的偏斜-t分布,因此模型构建过程可以轻松扩展到其他对称分布的混合。在研究了混合模型的性质后,我们通过EM算法开发了极大似然估计方法,并提出了一个似然比检验,用于检验任何给定成分中无偏斜的原假设。与最近提出的g-and-h分布混合的基于模拟的比较表明,所提出模型在良好指定设置下的估计精度和错误指定框架下的建模能力方面均表现出色。将该模型拟合到标准普尔500指数失真数据,证实了其分布的双峰性,这意味着美国股市历史上处于熊市或牛市状态,而非接近其基本面价值。

英文摘要

We propose a mixture of location-scale skewed-$t$ distributions to fit bimodal, skewed and heavy-tailed data. In particular, the mixture is based on the skewed-$t$ distribution by Fernández and Steel (1998), so that the model-building procedure can be easily extended to mixtures of other symmetric distributions. After studying the properties of the mixture, we develop a maximum likelihood estimation approach via the EM algorithm and a likelihood ratio test of the null hypothesis of no skewness in any given component. A simulation-based comparison to a recently proposed mixture of g-and-h distributions suggests that the performance of the proposed model is excellent, in terms of both estimation precision in well-specified setups and modeling capability in mis-specified frameworks. Fitting the model to the Standard & Poor's 500 distortion allows us to confirm the bimodality of its distribution, with the implication that the US stock market has historically been in bearish or bullish conditions, rather than near its fundamental value.

2606.20226 2026-06-19 stat.ME stat.CO 新提交

Analysis of uncertain fixed-effects model for Latin square designs

拉丁方设计的不确定固定效应模型分析

Yaru Cheng, Zhiming Li

AI总结 针对无频率稳定性的不确定实验数据,建立拉丁方设计的不确定固定效应模型,提出三种估计方法并构建置信区间,进行不确定齐性检验和常见检验,通过数值模拟和实例验证模型有效性。

详情
AI中文摘要

实验设计中常出现无频率稳定性的不确定数据。经典固定效应模型只能分析精确的实验数据。基于不确定测度,本文建立了拉丁方设计的不确定固定效应模型。首先,我们提出了三种不确定方法来估计处理和区组效应,并构建其置信区间。然后,进行不确定齐性检验和常见检验以评估处理效应的显著性。在数值模拟中,基于偏差、均方误差、平均绝对误差、总体标准差、覆盖概率和平均区间长度比较了三种估计方法。给出了几个例子来说明估计和假设检验的过程。最后,将不确定固定效应模型应用于真实教育数据,展示了其实用价值。

英文摘要

Uncertain data without frequency stability often arises in experimental design. Classical fixed-effects models can only analyze precise experimental data. Based on an uncertain measure, this paper establishes uncertain fixed-effect models for Latin-square designs. First, we propose three methods with uncertainty to estimate the treatment and blocked effects and construct their confidence intervals. Then, uncertain homogeneity and common tests are conducted to assess the significance of treatment effects. In the numerical simulations, the three estimation methods are compared based on bias, mean squared error, mean absolute error, overall standard deviation, coverage probability, and average interval length. Several examples are given to illustrate the process of estimation and hypothesis. Finally, the uncertain fixed-effects model is applied to real education data, demonstrating its practical value.

2606.20069 2026-06-19 stat.ME 新提交

A minimum-risk and cost-efficient two-sample sequential testing framework for the shifted exponential models with application to precipitation data

移位指数模型的最小风险与成本高效双序贯检验框架及其在降水数据中的应用

Ashwani Rajput, Neeraj Joshi

AI总结 提出一种双序贯抽样框架,通过控制第一类错误概率并最小化包含第二类错误和抽样成本的损失函数,检验两个移位指数模型的位置参数差异,具有一阶、二阶效率和风险效率。

详情
AI中文摘要

本文通过一种新颖的双序贯抽样框架,研究了比较两个移位指数模型位置参数的问题。所提出的假设检验过程通过将第一类错误概率控制在预设水平,同时最小化包含第二类错误概率和相应抽样成本的损失函数来开发。相应的最优固定样本量表达式依赖于未知的尺度参数,这使得在固定样本设计下,期望的检验精度在实践中无法实现。为克服这一困难,提出了一种双序贯抽样程序,用于在尺度参数未知且不等时检验位置参数之间的差异。所提出的方法具有理想的新近性质,包括一阶效率、二阶效率和二阶风险效率。广泛的模拟研究和涉及气象站强降水事件的实际数据应用证明了所提出程序的实际有效性和适用性。

英文摘要

This paper investigates the problem of comparing the location parameters of two shifted exponential models through a novel double sequential sampling framework. The proposed hypothesis testing procedure is developed by controlling the type I error probability at a preassigned level while minimizing a loss function that incorporates both the type II error probability and the associated sampling cost. The corresponding optimal fixed-sample-size expressions are shown to depend on unknown scale parameters, rendering the desired testing accuracies unattainable in practice under fixed-sample designs. To overcome this difficulty, a double sequential sampling procedure is proposed to test the difference between location parameters when the scale parameters are unknown and unequal. The proposed methodology is shown to possess desirable asymptotic properties, including first-order efficiency, second-order efficiency, and second-order risk efficiency. Extensive simulation studies and a real-data application that involves heavy precipitation episodes at meteorological stations demonstrate the practical effectiveness and applicability of the proposed procedure.

2606.19737 2026-06-19 stat.ME stat.ML 新提交

Calibration without labels in multiple testing

多重检验中的无标签校准

Adway S. Wadekar, Jake A. Soloff

AI总结 针对多重检验中无法观测真实标签的难题,利用有序p值间距构造伪标签,实现局部错误发现率的校准,并揭示q值在心理学和神经科学文献中可能严重失准。

详情
AI中文摘要

大规模假设检验支持对单个假设的概率性声明,如经验贝叶斯方法估计局部错误发现率。我们研究如何将这些声明解释为原假设的近似校准预测,即使在模型误设定下也能产生可解释的错误概率。我们的方法从概率预测中汲取概念灵感,但面临不同的挑战:与预测不同(标签最终可观测),在多重检验中真实情况从未揭示,因此校准必须随机评估并间接建立。我们通过构造一组伪标签来应对这一挑战,这些伪标签源自有序$p$值的间距,并以局部错误发现率作为回归目标。我们的构造解锁了现有工具,用于评估和执行多重检验中的事后校准。值得注意的是,我们在对已发表的心理学和神经科学文献的大规模实证调查中发现,基于错误发现率的流行误差度量$q$值可能严重失准。

英文摘要

Large-scale hypothesis testing supports probability claims about individual hypotheses, as in empirical Bayes methods for estimating local false discovery rates. We study how such claims can be interpreted as approximately calibrated forecasts of the null hypothesis, yielding interpretable error probabilities even under model misspecification. Our approach draws conceptual inspiration from probabilistic forecasting but addresses a different challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed, so calibration must be assessed stochastically and established indirectly. We address this challenge by constructing a set of pseudo-labels, derived from the spacings of ordered $p$-values, which have the local false discovery rate as their regression target. Our construction unlocks existing tools for assessing and performing post-hoc calibration in multiple testing. Notably, we find on a large-scale empirical survey of published psychology and neuroscience literature that the $q$-value, a popular error measure based on the false discovery rate, can be severely miscalibrated.

2606.19580 2026-06-19 stat.ME stat.ML 新提交

Machine Learning Integrated in Wavelet Shrinkage (MLShrink)

机器学习集成小波收缩 (MLShrink)

Dixon Vimalajeewa, Vijini Lakmini, Brani Vidakovic

AI总结 提出MLShrink,结合小波收缩与机器学习,通过双阈值对中间带系数进行数据自适应分类,保留经典阈值简单性,理论证明其非扩张性和oracle一致性,在非平滑信号上表现优异。

详情
AI中文摘要

实践中遇到的数据经常被加性噪声污染,小波收缩仍是非参数估计中恢复潜在信号的基本工具。经典方法如硬阈值和软阈值几乎完全根据系数的大小决定是否保留。尽管在许多情况下有效,这些规则对于幅度落在信号与噪声区分不确定的中间区域的系数可能过于僵化。我们提出MLShrink,一种将小波收缩与机器学习相结合的双阈值小波去噪过程。低于下阈值的系数被丢弃,高于上阈值的系数被保留,中间带的系数使用局部小波域特征进行分类。这样,MLShrink在远离决策边界处保留了经典阈值的简单性,同时允许对模糊系数进行数据自适应决策。本文还为此架构开发了一个理论框架。我们证明MLShrink是一个非扩张的支持选择规则,推导出一个基于oracle的风险分解,表明多余的去噪风险由未决策带上的分类误差决定,并在分类器性能的适当假设下建立了oracle一致性结果。在标准基准信号上的模拟实验表明,MLShrink与几种已建立的小波收缩方法具有竞争力,尤其适用于具有不规则、边缘丰富或非平滑结构的信号。这些发现表明,中间阈值带上的学习决策为经典小波去噪与现代统计学习之间提供了有用且可解释的联系。

英文摘要

Data encountered in practice are frequently contaminated by additive noise, and wavelet shrinkage remains a fundamental tool for recovering underlying signals in nonparametric estimation. Classical procedures such as hard and soft thresholding decide whether to retain a wavelet coefficient almost entirely from its magnitude. Although effective in many settings, these rules can be too rigid for coefficients whose magnitudes fall in an intermediate region where the distinction between signal and noise is uncertain. We propose MLShrink, a two-threshold wavelet denoising procedure that combines wavelet shrinkage with machine learning. Coefficients below a lower threshold are discarded, coefficients above an upper threshold are retained, and coefficients in the intermediate band are classified using local wavelet-domain features. In this way, MLShrink preserves the simplicity of classical thresholding away from the decision boundary while allowing data-adaptive decisions for ambiguous coefficients. The paper also develops a theoretical framework tailored to this architecture. We show that MLShrink is a nonexpansive support-selection rule, derive an oracle-based risk decomposition showing that excess denoising risk is determined by classification errors on the undecided band, and establish an oracle-consistency result under suitable assumptions on classifier performance. Simulation experiments on standard benchmark signals indicate that MLShrink is competitive with several established wavelet shrinkage methods and is especially effective for signals with irregular, edge-rich, or non-smooth structure. These findings suggest that learned decisions on the intermediate threshold band provide a useful and interpretable connection between classical wavelet denoising and modern statistical learning.

2606.19572 2026-06-19 stat.ME 新提交

SCOPE Shrinkage: A Unified Framework for Wavelet Denoising

SCOPE 收缩:小波去噪的统一框架

Dixon Vimalajeewa, Vijini Lakmini, Malith Premarathna, Fabrizio Ruggeri, Brani Vidakovic

AI总结 提出基于对称单峰分布累积分布函数的SCOPE收缩族,通过两个可解释参数分离尺度与形状效应,实现局部强收缩与渐近无偏的平衡,在小波去噪中性能与可解释性兼具。

详情
AI中文摘要

我们引入了对称CDF导向概率增强(SCOPE)收缩,这是一个由对称单峰分布的中心累积分布函数构造的保号收缩规则统一族。所提出的框架生成了一类广泛的衰减轮廓,在零点附近强局部收缩与尾部渐近无偏行为之间插值。我们开发了一个通用公式,通过两个可解释参数分离尺度与形状效应,从而能够独立控制有效的阈值位置和过渡锐度。在明确的规律性假设下,建立了SCOPE收缩的结构性质,包括奇性、单调性、连续性、收缩性以及将规则与软化阈值算子联系起来的混合表示。还发展了贝叶斯和惩罚似然解释:SCOPE规则允许偶惩罚表示,该表示在系数幅度上非递减,并且合适的子类在适当的对称单峰先验下作为精确的最大后验估计出现。基于逻辑分布、均匀分布和柯西分布的代表性例子说明了概率形状如何控制收缩行为。通过Stein型无偏风险估计讨论了光滑子类的数据驱动参数选择。在标准Donoho-Johnstone测试函数上的Oracle校准模拟研究表明,SCOPE收缩与几种已建立的小波去噪方法相比具有竞争力,同时保持了高度的可解释性和结构灵活性。结果突出了中心分布函数作为小波去噪及相关估计问题中收缩的自然且通用的设计原则。

英文摘要

We introduce Symmetric CDF Oriented Probability Enhanced (SCOPE) shrinkage, a unified family of sign-preserving shrinkage rules constructed from centered cumulative distribution functions of symmetric unimodal distributions. The proposed framework generates a broad class of attenuation profiles that interpolate between strong local shrinkage near zero and asymptotically unbiased behavior in the tails. A general formulation is developed that separates scale and shape effects through two interpretable parameters, allowing effective threshold location and transition sharpness to be controlled independently. Under explicit regularity assumptions, structural properties of SCOPE shrinkage are established, including oddness, monotonicity, continuity, contractivity, and a mixture representation that connects the rules to softened thresholding operators. A Bayesian and penalized likelihood interpretation is also developed: SCOPE rules admit even penalty representations that are nondecreasing in coefficient magnitude, and suitable subclasses arise as exact maximum a posteriori estimators under proper symmetric unimodal priors. Representative examples based on logistic, uniform, and Cauchy distributions illustrate how probabilistic shape governs shrinkage behavior. Data driven parameter selection for smooth subclasses is discussed via Stein-type unbiased risk estimation. Oracle calibrated simulation studies on standard Donoho-Johnstone test functions show that SCOPE shrinkage performs competitively with several established wavelet denoising methods, while retaining a high degree of interpretability and structural flexibility. The results highlight centered distribution functions as a natural and versatile design principle for shrinkage in wavelet denoising and related estimation problems.

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 新提交

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2412.17470 2026-06-19 math.ST econ.EM stat.ME stat.TH 版本更新

A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics

异方差稳健检验统计量尺寸可控性的一个充要条件

Benedikt M. Pötscher, David Preinerstorfer

AI总结 针对回归模型中单个约束检验,给出了异方差稳健检验统计量尺寸可控性的充要条件,改进了现有仅充分条件的结果。

Comments Clarification in Footnote 15 added

详情
AI中文摘要

我们重新审视了Pötscher和Preinerstorfer (2025)中关于回归模型中异方差稳健检验统计量的尺寸可控性结果。对于检验单个约束(例如,单个系数的零约束)这一特殊但重要的情形,我们给出了尺寸可控性的一个充要条件,而Pötscher和Preinerstorfer (2025)中的条件通常仅是充分的(即使在检验单个约束的情形下)。

英文摘要

We revisit size controllability results in Pötscher and Preinerstorfer (2025) concerning heteroskedasticity robust test statistics in regression models. For the special, but important, case of testing a single restriction (e.g., a zero restriction on a single coefficient), we povide a necessary and sufficient condition for size controllability, whereas the condition in Pötscher and Preinerstorfer (2025) is, in general, only sufficient (even in the case of testing a single restriction).

2512.19187 2026-06-19 stat.ME 版本更新

Interpolated Quantile Estimation: A Unified Framework Bridging Quantiles and the Mean

插值分位数估计:桥接分位数与均值的统一框架

Saïd Maanan, Azzouz Dermoune, Ahmed El Ghini

AI总结 提出三类在经典分位数与样本均值之间连续插值的估计量,基于平滑L1损失构建统一M估计框架,证明一致性和渐近正态性,并揭示轻尾和重尾分布下的不同效率特性。

详情
AI中文摘要

本文开发并分析了三类估计量,它们在经典分位数与样本均值之间连续插值。构造从$L_1$损失的平滑版本开始,由位置参数$z$和平滑参数$h \ge 0$索引,其最小化器$\hat q(z,h)$产生一个统一的$M$估计框架。根据$(z, h)$的指定方式,该框架生成三类不同的估计量:固定参数平滑分位数估计量、固定分位数的插入估计量,以及一个新的均值估计程序连续统。对于所有三个族,我们通过一致渐近等连续性论证建立了一致性和渐近正态性。极限方差具有封闭形式,允许跨族和平滑水平的效率透明比较。参数空间的几何分解表明,对于固定分位数水平$\tau$,可接受的$(z, h)$对位于直线上,沿该线估计量针对相同的总体分位数,而其渐近方差发生变化。理论分析揭示了两种效率机制。在轻尾分布(例如高斯分布)下,平滑产生单调方差减少。在重尾分布(例如拉普拉斯分布)下,有限平滑参数$h^{*}(\tau) > 0$严格提高了分位数估计的效率。基于模拟数据和真实金融收益的数值实验验证了这些结论,并表明,在渐近和有限样本中,均值估计族并未改进样本均值。

英文摘要

This paper develops and analyzes three families of estimators that continuously interpolate between classical quantiles and the sample mean. The construction begins with a smoothed version of the $L_1$ loss, indexed by a location parameter $z$ and a smoothing parameter $h \ge 0$, whose minimizer $\hat q(z,h)$ yields a unified $M$-estimation framework. Depending on how $(z, h)$ is specified, this framework generates three distinct classes of estimators: fixed-parameter smoothed quantile estimators, plug -- in estimators of fixed quantiles, and a new continuum of mean -- estimating procedures. For all three families we establish consistency and asymptotic normality via a uniform asymptotic equicontinuity argument. The limiting variances admit closed forms, allowing a transparent comparison of efficiency across families and smoothing levels. A geometric decomposition of the parameter space shows that, for fixed quantile level $τ$, admissible pairs $(z, h)$ lie on straight lines along which the estimator targets the same population quantile while its asymptotic variance evolves. The theoretical analysis reveals two efficiency regimes. Under light-tailed distributions (e.g., Gaussian), smoothing yields a monotone variance reduction. Under heavy-tailed distributions (e.g., Laplace), a finite smoothing parameter $h^{*}(τ) > 0$ strictly improves efficiency for quantile estimation. Numerical experiments -- based on simulated data and real financial returns -- validate these conclusions and show that, both asymptotically and in finite samples, the mean-estimating family does not improve upon the sample mean.

2309.15769 2026-06-19 math.ST cs.LG stat.ME stat.TH 版本更新

Benign overfitting beyond prediction: The ordinary least squares interpolator

超越预测的良性过拟合:普通最小二乘插值器

Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon

发表机构 * Department of Data Sciences & Operations, University of Southern California(数据科学与运营系,南加州大学) Department of Statistics, University of California, Davis(统计学系,加州大学戴维斯分校) Department of Statistics, University of California, Berkeley(统计学系,加州大学伯克利分校) Google DeepMind(谷歌DeepMind)

AI总结 本文研究过参数化线性模型中最小ℓ2范数OLS插值器的参数估计与推断性质,推导了留k法、遗漏变量偏误公式和Frisch-Waugh-Lovell定理的过参数化版本,并扩展了高斯-马尔可夫定理。

Comments This work is accepted for publication in Biometrika

详情
AI中文摘要

深度学习的最新进展突显了过参数化统计模型中良性过拟合的现象,引发了对其基础理解的浓厚兴趣。由于其简单性和实际相关性,普通最小二乘(OLS)插值器已成为从理论上理解这一现象的关键研究对象。虽然OLS在经典欠参数化设置下的性质已得到充分理解,但其在过参数化区域中的行为——与岭回归或lasso不同——仍相对较少被探索。我们通过为最小$\ell_2$范数OLS插值器推导新的代数和统计结果,为这一不断增长的文献做出贡献。与现有大部分关注预测风险的工作不同,我们的分析集中于参数估计和推断,这对于许多统计学和因果推断应用至关重要。具体地,我们建立了以下内容的过参数化类比:(i) 留$k$法公式,(ii) 遗漏变量偏误公式,以及(iii) Frisch-Waugh-Lovell定理。在高斯-马尔可夫模型下,我们进一步扩展了高斯-马尔可夫定理,并分析了过参数化设置下同方差性时的方差估计。这些结果共同为研究过参数化线性模型中的参数估计和推断提供了一个系统框架,为超越预测含义的良性过拟合提供了新视角。

英文摘要

Recent advances in deep learning have highlighted the phenomenon of benign overfitting in overparameterized statistical models, sparking significant interest in understanding its foundations. Owing to its simplicity and practical relevance, the ordinary least squares (OLS) interpolator has become a key object of study for gaining theoretical insight into this phenomenon. While the properties of OLS are well understood in classical underparameterized settings, its behavior in the overparameterized regime -- unlike that of ridge regression or the lasso -- remains comparatively less explored. We contribute to this growing literature by deriving new algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In contrast to much of the existing work, which focuses on prediction risk, we center our analysis on parameter estimation and inference, which are fundamental for many statistics and causal inference applications. Specifically, we establish overparameterized analogues of (i) the leave-$k$-out formulas, (ii) the omitted variable bias formula, and (iii) the Frisch-Waugh-Lovell theorem. Under the Gauss-Markov model, we further extend the Gauss-Markov theorem and analyze variance estimation under homoskedasticity in the overparameterized setting. Collectively, these results provide a systematic framework for studying parameter estimation and inference in overparameterized linear models, offering a novel perspective on benign overfitting beyond its implications for prediction.

2. 贝叶斯统计与概率建模 4 篇

2606.19540 2026-06-19 stat.ME stat.CO stat.ML 新提交

Overfitted high-dimensional matrix factorizations via adaptive spectral shrinkage

通过自适应谱收缩的过拟合高维矩阵分解

Lorenzo Mauri, David B. Dunson

AI总结 提出EigenBayes方法,通过谱估计和自适应经验贝叶斯校准超参数,实现快速且具有不确定性量化的过拟合因子模型,在数值实验和基因组学应用中优于现有方法。

详情
AI中文摘要

因子模型是分析高维数据以提取低秩信号和估计协方差的常用方法。它们将协方差矩阵分解为低秩分量和对角分量之和。一个关键问题是如何选择潜在维度$k$,当因子模型仅近似成立且信噪比较低时,这尤其具有挑战性。贝叶斯过拟合因子模型指定$k$的上界,并依赖结构化收缩先验有效去除多余分量。这类方法流行且有效,但计算成本高。我们提出了一种更快的\texttt{EigenBayes}方法,基于潜在因子的谱估计和关键超参数的自适应经验贝叶斯校准,提供有效的不确定性量化。得到的后验分布可跨结果分解且解析可处理,绕过了马尔可夫链蒙特卡洛。我们证明\texttt{EigenBayes}能适应每个结果和潜在维度的信噪比,同时将多余的潜在分量收缩至零。我们建立了良好的渐近性质,并在数值实验和基因组学应用中展示了强大的实证性能,其中EigenBayes优于最先进的替代方法。

英文摘要

Factor models are popular approaches for analyzing high-dimensional data to extract low-rank signals and estimate covariances. They decompose the covariance matrix as the sum of low-rank and diagonal components. A key issue is how to choose the latent dimension $k$, which is particularly challenging when the factor model only holds approximately and in low signal-to-noise scenarios. Bayesian overfitted factor models specify an upper bound on $k$ and rely on structured shrinkage priors to effectively remove extra components. Such approaches are popular and effective, but computationally expensive. We propose a much faster \texttt{EigenBayes} approach that provides valid uncertainty quantification, based on spectral estimation of latent factors and adaptive empirical Bayes calibration of key hyperparameters. The resulting posterior distribution factorizes across outcomes and is analytically tractable, bypassing Markov chain Monte Carlo. We show that \texttt{EigenBayes} adapts to the signal-to-noise ratio of each outcome and latent dimension, while shrinking superfluous latent components to zero. We establish favorable asymptotic properties and demonstrate strong empirical performance in numerical experiments and a genomics application, where EigenBayes outperforms state-of-the-art alternatives.

2606.19643 2026-06-19 stat.ML cs.LG 新提交

Variational Consensus Monte Carlo for Bayesian Mixture

变分共识蒙特卡洛用于贝叶斯混合模型

Julie Fendler, Francesca L. Crowe, Tom Marshall, Sylvia Richardson, Paul D. W. Kirk

AI总结 提出变分共识蒙特卡洛方法扩展至过拟合贝叶斯混合模型,通过新颖的聚类匹配算法和聚合策略,在联邦学习设置下推断聚类数和所有参数,并在模拟和真实电子健康记录数据上验证了有效性。

详情
AI中文摘要

受健康数据的隐私、敏感性和共享限制的驱动,我们提出了一个在联邦学习设置下(即数据无法在计算节点之间完全共享或汇集)对贝叶斯混合模型进行推断的全面流程。我们采用共识蒙特卡洛(CMC)方法,在每个数据孤岛内独立运行MCMC算法以估计局部后验分布,然后聚合这些分布以近似完整数据的后验。Rabinovich, Angelino 和 Jordan (2015) [1] 的变分CMC方法将聚合步骤视为变分推断问题,但他们应用于混合模型时假设聚类数和关键混合参数已知。我们的主要方法贡献是:(i) 将变分CMC扩展到过拟合贝叶斯混合模型,该模型推断聚类数和所有模型参数,无需共轭性;(ii) 适用于跨孤岛设置的新颖聚类匹配算法,其中并非每个聚类都出现在每个局部数据集中;(iii) 针对聚合步骤的多种推断策略,匹配不同的联邦学习约束;以及 (iv) 在实践中选择这些策略的指南。一项全面的模拟研究验证了该框架,并允许我们与最先进的联邦学习替代方法进行比较。值得注意的是,我们表明当局部数据集的组成反映了数据中的底层聚类结构时,我们的方法可以比应用于汇集数据的标准MCMC更准确地恢复小聚类。我们在大规模电子健康记录数据上展示了该框架,识别了英国老年人群中的多发病模式。

英文摘要

Motivated by the privacy, sensitivity and sharing limitations of health data, we present a comprehensive pipeline for inference of Bayesian mixture models within a federated learning setting, i.e. when data cannot be fully shared or pooled across compute nodes. We adopt a Consensus Monte Carlo (CMC) approach, in which an MCMC algorithm is run independently within each data silo to estimate local posterior distributions, which are then aggregated to approximate the posterior over the full data. The variational CMC approach of Rabinovich, Angelino and Jordan (2015) [1] frames the aggregation step as a variational inference problem, but their application to mixtures assumes the number of clusters and key mixture parameters to be known. Our main methodological contributions are: (i) an extension of variational CMC to over-fitted Bayesian mixture models that infer the number of clusters and all model parameters, without requiring conjugacy; (ii) novel cluster-matching algorithms suitable for cross-silo settings in which not every cluster appears in each local dataset; (iii) a number of inference strategies for the aggregation step, matched to different federated learning constraints; and (iv) guidelines for choosing among these in practice. A comprehensive simulation study validates the framework and allows us to compare to state-of-the-art federated learning alternatives. Notably, we show that when the composition of local datasets reflects the underlying clustering structure in the data, our approach can recover small clusters with greater accuracy than standard MCMC applied to the pooled data. We illustrate the framework on large-scale electronic health record data, identifying multi-morbidity patterns in a British geriatric population.

2606.20480 2026-06-19 math.ST stat.ML stat.TH 新提交

Leveraging tails for adaptation

利用尾部进行自适应

Sergios Agapiou, Ismaël Castillo, Paul Egels

AI总结 研究非参数贝叶斯中基于p-指数尾先验的后验收缩率,发现p越小收缩越快,且p→0时可实现光滑性自适应,应用于白噪声回归和ReLU神经网络。

Comments 59 pages, 3 figures

详情
AI中文摘要

我们考虑非参数设定下贝叶斯后验分布的收缩,其中函数在基或字典上的系数被赋予具有$p$指数尾的先验,包括拉普拉斯尾$(p=1)$和更重的尾$(p<1)$。结果表明,随着$p$减小,收缩率提高,并且在适当的$p\to 0$范围内,可以获得对光滑性的完全自适应(达到对数因子)。作为应用,我们考虑了白噪声回归中的级数先验和随机设计回归中的浅层ReLU神经网络。特别地,我们表明过参数化的浅层ReLU网络可以适应任何正则性$0\le \beta\le 2$。通过模拟研究,我们展示了与理论预测行为的高度实证一致性。

英文摘要

We consider contraction of Bayesian posterior distributions in nonparametric settings where coefficients of a function over a basis or dictionary are given priors with $p$--exponential tails, including Laplace tails $(p=1)$ and heavier tails $(p<1)$. It is shown that contraction rates improve as $p$ decreases and that full adaptation to smoothness, up to logarithmic factors, is obtained in an appropriate $p\to 0$ regime. As applications, we consider both series priors in white noise regression and shallow ReLU neural networks in random design regression. In particular, we show that overparametrised shallow ReLU networks can adapt to any regularity $0\le β\le 2$. Through a simulation study, we show strong empirical agreement with the behavior predicted by our theory.

2604.06464 2026-06-19 cs.LG physics.app-ph stat.ML 版本更新

Weighted Bayesian Conformal Prediction

加权贝叶斯共形预测

Xiayin Lou, Peng Luo

发表机构 * Technical University of Munich(慕尼黑技术大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出加权贝叶斯共形预测(WBCP),通过加权Dirichlet先验推广贝叶斯共形预测到重要性加权设置,理论证明有效样本量决定后验方差,并提供更丰富的条件覆盖不确定性。

详情
AI中文摘要

共形预测提供具有有限样本覆盖保证的分布自由预测区间,Snell & Griffiths 最近的工作将其重新解释为贝叶斯求积(BQ-CP),通过阈值上的 Dirichlet 后验产生强大的数据条件保证。然而,BQ-CP 根本上要求 i.i.d. 假设。同时,加权共形预测通过重要性权重处理分布偏移,但仍然是频率学派方法,仅产生点估计阈值。我们提出 \textbf{加权贝叶斯共形预测(WBCP)},它将 BQ-CP 推广到任意重要性加权设置,用加权 Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$ 替换均匀 Dirichlet $\Dir(1,\ldots,1)$,其中 $\neff$ 是 Kish 有效样本量。我们证明了四个理论结果:(1)~$\neff$ 是匹配频率学派和贝叶斯方差的唯一集中参数;(2)~后验标准差以 $O(1/\sqrt{\neff})$ 衰减;(3)~BQ-CP 的随机占优保证扩展到每个权重轮廓的数据条件保证;(4)~HPD 阈值在条件覆盖上提供 $O(1/\sqrt{\neff})$ 的改进。我们将 WBCP 实例化为 \emph{地理贝叶斯共形预测},其中基于核的空间权重产生每个位置的后验,并具有可解释的诊断。在合成和真实空间数据集上的实验表明,WBCP 在保持覆盖保证的同时提供了更丰富的不确定性信息。

英文摘要

Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose \textbf{Weighted Bayesian Conformal Prediction (WBCP)}, which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as \emph{Geographical BQ-CP}, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.

3. 因果推断与实验设计 10 篇

2606.20148 2026-06-19 stat.ME 新提交

A case study of causal mediation using Bayesian nonparametrics and semiparametric corrections

使用贝叶斯非参数和半参数修正的因果中介分析案例研究

Yuhua Zhang, Michael J. Daniels

AI总结 提出截断富集狄利克雷过程混合模型估计自然直接和间接效应,结合高效MCMC算法和基于有效影响函数的一步后验修正,解决贝叶斯非参数中因果估计量的可靠推断问题。

详情
AI中文摘要

我们提出了一种贝叶斯非参数方法,使用截断富集狄利克雷过程混合(EDPM)模型来估计存在后处理混杂因素时的因果中介分析中的自然直接效应(NDE)和间接效应(NIE)。我们引入了一种高效的簇重分配Metropolis-Hasting算法,以改善阻塞吉布斯采样器中的混合。我们基于有效影响函数实现了针对我们设定的一步后验修正。这个后处理步骤解决了贝叶斯非参数中的一个关键问题:如何从为复杂联合分布设计的模型中获得特定因果估计量(NDE和NIE)的可靠估计和后验,并具有优良的频率性质,如正确的覆盖。我们进行了模拟研究以评估我们方法的性能,并将其应用于评估一项体重管理临床试验中的因果中介效应。

英文摘要

We propose a Bayesian nonparametric approach using a truncated Enriched Dirichlet Process mixture (EDPM) model to estimate natural direct (NDE) and indirect (NIE) effects in causal mediation analyses in the presence of post-treatment confounders. We introduce an efficient cluster reallocation Metropolis-Hasting algorithm to improve mixing in the blocked Gibbs sampler. We implement a one-step posterior correction based on the efficient influence function for our setting. This post-processing step solves a critical problem in Bayesian nonparametrics: how to obtain reliable estimates and posteriors for a specific causal estimand of interest (the NDE and NIE) with excellent frequentist properties, such as correct coverage, from a model designed for complex joint distributions. We conduct simulation studies to assess our method's performance and apply it to evaluate causal mediation effects in a weight management clinical trial.

2606.20078 2026-06-19 stat.OT 新提交

A Law of Iterated Expectation Primer for Causal Inference

因果推断中的迭代期望定律入门

Ashley I. Naimi, Razieh Nabi, Lindsay J. Collin, Paul N. Zivich, Stephen R. Cole

AI总结 本文介绍迭代期望定律及其在因果效应识别中的应用,通过g公式的两种非参数等价形式(NICE和ICE)和三个数值示例阐明其数学直觉。

详情
AI中文摘要

g公式是识别观察数据中因果效应的基础工具,它基于迭代期望定律——统计学中的一个关键数学恒等式。然而,表达迭代期望定律和g公式的符号对于统计背景不足的人来说可能难以理解。我们提供了一篇入门文章,介绍迭代期望定律、用于表达它的积分符号,以及它通过g公式在因果效应识别中的作用。在因果一致性、正性和条件可交换性假设下,迭代期望定律可以重写为因果标准化公式(g公式),有两种非参数等价形式:非迭代条件期望(NICE)形式,涉及条件结果均值的单一加权平均;以及迭代条件期望(ICE)形式,涉及嵌套期望。我们通过三个逐步复杂的数值示例说明这两种形式:一个时间固定示例,包含单个二元混杂因子;一个时间固定示例,包含离散和连续混杂因子;以及一个时间变化示例,包含两个时间点。我们阐明了迭代期望定律是什么,它与g公式的关系,以及如何在实际数据示例中理解其数学公式的直觉,这些示例可以推广到各种场景。

英文摘要

The g-formula is a foundational tool for identifying causal effects in observational data. This tool is based on the law of iterated expectation, a key mathematical identity in statistics. However, the notation with which the law of iterated expectation and the g-formula is expressed can be opaque to those with little background in statistics. We provide a primer introducing the law of iterated expectation, the integration notation used to express it, and its role for causal effect identification via the g-formula. Under the assumptions of causal consistency, positivity, and conditional exchangeability, the law of iterated expectation can be rewritten as a causal standardization formula (the g-formula) in two nonparametrically equivalent forms: a non-iterative conditional expectation (NICE) form involving a single weighted average of conditional outcome means, and an iterative conditional expectation (ICE) form involving nested expectations. We illustrate both forms using three progressively complex numerical examples: a time-fixed example with a single binary confounder, a time-fixed example with discrete and continuous confounders, and a time-varying example with two timepoints. We provide clarity on what the law of iterated expectation is, how it is related to the g-formula, and how to gain intuition of its mathematical formulations in actual data examples that can be generalized to a range of settings.

2606.20206 2026-06-19 stat.ML cs.LG 新提交

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

马尔可夫决策过程中奖励非随机缺失的缺失感知策略的离线评估

Ziheng Wei, Annie Qu, Rui Miao

AI总结 针对奖励非随机缺失的离线强化学习问题,提出基于未来状态作为影子变量的识别方法,并利用桥函数和min-max估计器恢复条件均值奖励,实现缺失感知策略的离线评估。

Comments Accepted at ICML 2026. 31 pages, 6 figures

详情
AI中文摘要

在离线强化学习中,由于记录稀疏或不规则,或超出特定奖励值的审查,记录批次数据中的即时奖励通常未被观测到。这个问题出现在实际场景中,包括医疗和营销。我们研究了有限时域马尔可夫决策过程中奖励非随机缺失时的离线策略评估,这破坏了可忽略性,并即使在以状态和行动为条件后也会引起选择偏差。为了解决这个问题,我们形式化了一个依赖于奖励的倾向模型,并使用未来状态作为影子变量来识别完整数据的条件均值奖励。我们进一步引入了一个桥函数,无需显式建模MNAR机制即可恢复条件均值奖励,并通过min-max过程进行估计以避免双重采样。基于这些识别结果,我们提出了一个类似Fitted-Q-Evaluation的估计器,该估计器传播恢复的奖励,同时允许目标策略依赖于过去的缺失指示符。最后,我们为我们的OPE估计器建立了一致性和有限样本误差界,并通过实验在模拟数据和MIMIC-III脓毒症数据上展示了我们方法相比现有方法的强性能。

英文摘要

In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-policy evaluation (OPE) in finite-horizon Markov decision processes when rewards are missing not at random (MNAR), which breaks ignorability and induces selection bias even after conditioning on states and actions. To address this, we formalize a reward-dependent propensity model and use future states as shadow variables to identify the full-data conditional mean reward. We further introduce a bridge function that recovers the conditional mean reward without explicitly modeling the MNAR mechanism, and estimate it via a min-max procedure to avoid double sampling. Building upon these identification results, we propose an Fitted-Q-Evaluation-style estimator that propagates the recovered rewards while allowing target policies to depend on past missingness indicators. Finally, we establish consistency and finite-sample error bounds for our OPE estimator, and show through experiments the strong performance of our method compared to existing methods on simulated and MIMIC-III Sepsis data.

2606.17308 2026-06-19 stat.ME stat.ML 新提交

Kernel-Based Functional Balancing for Causal Inference with Compositional Treatments

基于核的协变量函数平衡法用于成分处理下的因果推断

Sungbum Kim, Jiayi Wang

AI总结 针对成分处理(暴露位于单纯形)的因果效应估计,提出基于核的协变量函数平衡加权法,通过最小化再生核希尔伯特空间中的最坏情况平衡误差构造权重,并构建增强加权估计量,实现√n一致性。

Comments 40 pages, 3 figures

详情
AI中文摘要

我们研究成分处理下的因果效应估计,其中暴露位于单纯形上,估计量定义在成分上而非标量或二元值。通过考虑平均潜在结果在处理空间上的投影,采用基于核的协变量函数平衡方法进行权重构造。权重通过直接最小化在由处理和协变量联合空间定义的再生核希尔伯特空间(RKHS)上的最坏情况平衡误差获得,而非在处理分配模型下估计。基于这些权重,提出了一个增强加权估计量(AWE),其中结果函数通过核岭回归估计,并与协变量分布的边际增广相结合。尽管所得目标函数结构复杂,但通过表示定理和低秩近似,我们将其转化为有限维凸优化问题。所提出的估计量在不要求权重一致估计或光滑性的情况下实现了√n一致性。建立了围绕样本特定目标的渐近正态性结果。通过模拟研究和真实数据应用展示了经验性能。

英文摘要

We study causal effect estimation with compositional treatments, where the exposure lies on a simplex and the estimand is defined over compositions rather than scalar or binary values. By considering a projection of the average potential outcome onto the treatment space, a kernel-based covariate functional balancing approach is adopted for weight construction. The weights are obtained by directly minimizing a worst-case balancing error over a reproducing kernel Hilbert space (RKHS) defined on the joint space of treatments and covariates, instead of being estimated under a treatment assignment model. Building on these weights, an augmented weighted estimator (AWE) is proposed, where the outcome function is estimated via kernel ridge regression and combined with a marginal augmentation over the covariate distribution. Despite the complex structure of the resulting objective, a finite-dimensional convex optimization problem is formulated via a representer theorem and a low-rank approximation. The proposed estimator achieves $\sqrt{n}$-consistency without requiring consistent estimation or smoothness of the weights. An asymptotic normality result is established around a sample-specific target. Empirical performance is demonstrated through simulation studies and a real data application.

2606.17165 2026-06-19 stat.ME cs.AI econ.EM math.ST stat.TH 新提交

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础:用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.(Spotify美国公司)

AI总结 提出替代指标理论框架,证明在弱于分布等价条件下,校准LLM输出可识别平均处理效应,并分析随机性带来的偏差与方差。

详情
AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型(LLM)代替人类参与者,以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效,但这不现实。因此,我们开发了一个统计框架,将替代终点理论适配到LLM。该框架表明,将LLM结果校准到人类结果,在替代性和可比性条件(联合弱于分布等价性)下,可以识别平均处理效应。当这些条件不成立时,感兴趣的效应仅部分可识别,我们提供了诊断方法,可以在历史实验上证伪替代性,并给出有限重叠下最坏情况偏差的界限。我们进一步证明,LLM固有的随机性会引入偏差和方差,但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是,LLM结果作为替代指标的有效性只能对过去的处理被证伪,而无法对新处理被验证,因此对于新颖干预,人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用,以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes can recover the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs, showing that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. We present a falsification test for surrogacy and a bound on the worst-case bias from limited overlap between the LLM and human samples. We further show that the stochasticity inherent to LLMs can weaken surrogacy for identification while also introducing bias and variance during estimation, but that using an average over multiple LLM draws per unit as the surrogate mitigates these issues. Simulations validate the results, and an empirical application to A/B tests on Upworthy headlines shows that raw LLM predictions recover only 39\% of the human treatment effect while nonparametric calibration closes the gap. A central takeaway is that A/B testing on LLMs yields correct results only by assumption, whereas A/B testing on humans is correct by design, and that the required assumptions are hardest to justify precisely where A/B testing on LLMs promises the greatest benefit. We discuss the role of LLM choice, prompting, and temperature as design variables, the compounded challenge posed by long-term outcomes, and how to size human pilot studies for validation.

2603.19745 2026-06-19 stat.ME 版本更新

Invariant quantile regression for heterogeneous environments

异质环境下的不变分位数回归

Bo Fu, Dandan Jiang

AI总结 针对多环境数据集提出不变分位数回归框架,通过核平滑估计器利用环境间不变性实现因果发现和内生性克服。

Comments 25 pages, 4 figures

详情
AI中文摘要

在本文中,我们提出了一个专门针对多环境数据集的不变分位数回归(IQR)框架,该框架捕捉了不同环境之间的不变性。该框架与迁移学习、因果推断和公平机器学习密切相关,其动机源于响应变量在给定协变量下的条件概率发生变化,而某些关键变量保持不变的场景。这一视角与以往仅关注条件均值的工作显著不同,后者通常不足以捕捉异质环境中协变量与响应变量之间的完整因果关系。相比之下,基于分位数的不变性自然地适应异质性,并且与结构因果模型更加一致,其中在一个或多个分位数水平上跨环境不变的变量直接指示潜在且稳定的因果变量。此外,我们表明,与条件均值框架相比,IQR 可能产生更大的内生变量集,从而更有效地排除虚假(非因果)变量。为此,我们引入了一种核平滑不变分位数回归(KS-IQR)估计器,该估计器利用潜在的不变结构和环境间的异质性,确保在多个环境中稳定估计。我们在非渐近框架下建立了我们方法的因果发现性质,展示了其克服“内生性诅咒”的能力,并推导了估计器的 $\ell_2$ 误差界。我们将我们的方法应用于真实数据的因果发现,获得了具有生物学意义的关系,恢复了已知的信号通路并揭示了额外的分位数特定效应。

英文摘要

In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This framework is closely related to transfer learning, causal inference, and fair machine learning, and is motivated by scenarios in which the conditional probability of the response given covariates varies, while certain key variables remain invariant. This perspective differs notably from previous works that restrict attention to the conditional mean, which is often insufficient to capture the full causal relationships between covariates and the response in heterogeneous environments. In contrast, quantile-based invariance naturally accommodates heterogeneity, and aligns more closely with structural causal models, in which variables invariant across environments at one or multiple quantile levels directly indicate potential and stable causal variables. Moreover, we show that IQR may yield a larger set of endogenous variables compared to the conditional mean framework, which in turn promotes more effective exclusion of spurious (non-causal) variables. To achieve this, we introduce a Kernel-Smoothed Invariant Quantile Regression (KS-IQR) estimator, which leverages the underlying invariance structure and heterogeneity among environments, ensuring stable estimation across multiple environments. We establish the causal discovery properties of our method, demonstrate its ability to overcome the ``curse of endogeneity'', and derive an $\ell_2$ error bound for our estimator, all in a non-asymptotic framework. We apply our method to real data for causal discovery and obtain biologically meaningful relationships, recovering known signaling pathways and revealing additional quantile-specific effects.

2506.06267 2026-06-19 stat.ME 版本更新

A causal framework for evaluating the total effect of strategies aiming to expand screening and to improve outcomes

评估旨在扩大筛查和改善结果的策略总效应的因果框架

Joy Zora Nakato, Janice Litunya, Brian Beesiga, Jane Kabami, James Ayieko, Moses R. Kamya, Gabriel Chamie, Laura B. Balzer

AI总结 针对集群随机试验中多层次、缺失数据和中介效应问题,提出反事实分层效应定义总效应,并扩展两阶段目标最小损失估计(TMLE)进行识别和估计。

Comments 20 pages, 3 figures, accepted at "Statistics in Medicine"

详情
AI中文摘要

对于许多健康状况,存在高效的治疗和预防产品。最大化其影响需要改善健康筛查覆盖的策略,以确定谁可能受益。例如,HIV预防策略旨在扩大风险筛查并提高风险人群对暴露前预防(PrEP)的接受度。这些策略通常引起群体层面(如卫生诊所或社区)的变化,并通过集群随机试验进行评估。这种情况产生了复杂的多层次-中介-缺失数据问题,原因如下:首先,策略在集群层面实施,而健康筛查和结果在个体层面;其次,策略通过改善健康筛查直接和间接改善健康结果;第三,每个人都有一个潜在状态,仅在接受筛查者中观察到。为正式定义此类环境中的总效应,我们使用反事实分层效应:因果估计量,其中结果仅与某个群体相关,该群体的成员资格受缺失和/或感兴趣暴露的影响。为识别和估计相应的统计估计量,我们提出了一种新颖的两阶段目标最小损失估计(TMLE)扩展。模拟展示了我们方法的实际性能以及现有方法的局限性。

英文摘要

For many health conditions, there are highly efficacious treatment and prevention products. Maximizing their impact requires strategies that improve the reach of health screening in order to establish who could benefit. For example, HIV prevention strategies aim to expand risk screening and to improve uptake of pre-exposure prophylaxis (PrEP) among those experiencing risk. Often, these strategies induce changes at the group-level (e.g., health clinics or communities) and are evaluated through cluster randomized trials. This scenario creates a complex, multilevel-mediation-missing data problem for the following reasons. First, the strategy is delivered at the cluster-level, while health screening and outcomes are at the individual-level. Second, the strategy improves health outcomes directly and indirectly through improved health screening. Third, everyone has an underlying status, which is only observed among those screened. To formally define the total effect in such settings, we use Counterfactual Strata Effects: causal estimands where the outcome is only relevant for a group whose membership is subject to missingness and/ or impacted by the exposure of interest. To identify and estimate the corresponding statistical estimand, we propose a novel extension of Two-Stage targeted minimum loss-based estimation (TMLE). Simulations demonstrate the practical performance of our approach as well as the limitations of existing approaches.

2506.18808 2026-06-19 stat.AP 版本更新

A Practical Introduction to Regression-based Causal Inference in Meteorology (I): All confounders measured

气象学中基于回归的因果推断实用入门(I):所有混杂因素可测

Caren Marzban, Yikun Zhang, Nicholas Bond, Michael Richman

AI总结 介绍在非时间序列场景下,利用匹配方法进行因果推断,提供气象学应用实例和R代码。

详情
AI中文摘要

一个变量是否是另一个变量的原因,或者仅仅与之相关,通常是一个重要的科学问题。因果推断是在统计背景下解决该问题的技术体系。尽管在存在时间信息时评估因果关系相对直接,但在非时间序列场景(本文考虑的情况)下,评估因果效应更为困难。因果推断领域的发展涉及广泛的主题概念,从而限制了其在包括气象学在内的一些领域的应用。然而,其核心所需的因果推断知识仅涉及基本概率论和回归,这是大多数气象学家熟悉的主题。通过聚焦这些核心领域,本文及其姊妹篇为气象学界进入(非时间序列)因果推断领域提供了垫脚石。尽管介绍了一些理论基础,但主要目标是将一种称为匹配的特定方法应用于气象学问题。应用数据为公开数据,并提供了R代码,为气象学学生和研究人员进入该领域铺平了道路。

英文摘要

Whether a variable is the cause of another, or simply associated with it, is often an important scientific question. Causal Inference is the name associated with the body of techniques for addressing that question in a statistical setting. Although assessing causality is relatively straightforward in the presence of temporal information, outside of that setting - the situation considered here - it is more difficult to assess causal effects. The development of the field of causal inference has involved concepts from a wide range of topics, thereby limiting its adoption across some fields, including meteorology. However, at its core, the requisite knowledge for causal inference involves little more than basic probability theory and regression, topics familiar to most meteorologists. By focusing on these core areas, this and a companion article provide a steppingstone for the meteorology community into the field of (non-temporal) causal inference. Although some theoretical foundations are presented, the main goal is the application of a specific method, called matching, to a problem in meteorology. The data for the application are in public domain, and R code is provided as well, forming an easy path for meteorology students and researchers to enter the field.

2506.18652 2026-06-19 stat.AP 版本更新

A Practical Introduction to Regression-based Causal Inference in Meteorology (II): Unmeasured confounders

气象学中基于回归的因果推断实用入门(二):未测量的混杂因素

Caren Marzban, Yikun Zhang, Nicholas Bond, Michael Richman

AI总结 介绍在未测量混杂因素存在时,利用工具变量法通过回归估计因果效应,并以气象数据为例说明工具变量选择的重要性。

详情
AI中文摘要

将相关性“提升”为因果关系的障碍之一是混杂现象,即两个变量之间的相关性实际上是由第三个变量(称为混杂因素)引起的。在先前的一篇配套文章中,我们考察了混杂因素被测量的情况。本文表明,即使混杂变量未被测量,在某些条件下,仍然可以通过一种基于回归的方法(利用工具变量的概念)来估计因果效应。使用与姊妹篇类似的气象数据集,比较和对比了因果效应的几种不同估计。结果表明,工具变量估计的因果效应依赖于工具变量的选择,而气象学考虑对于解决这种不确定性至关重要。提供了用于生成所有结果的R代码,并概述了未来工作的许多方向。

英文摘要

One obstacle to ``elevating'' correlation to causation is the phenomenon of confounding, i.e., when a correlation between two variables exists because both variables are in fact caused by a third variable, called a confounder. The situation where the confounders are measured is examined in an earlier, accompanying article. Here, it is shown that even when the confounding variables are not measured, under certain conditions it is still possible to estimate the causal effect via a regression-based method that uses the notion of instrumental variables. Using a meteorological data set, similar to that in the sister article, a number of different estimates of the causal effect are compared and contrasted. It is shown that the instrumental-variable estimates of causal effect depend on the choice of the instrumental variable, and that meteorological considerations are important in resolving the ambiguity. R code is provided for generating all of the results, and numerous directions for future work are outlined.

2405.00118 2026-06-19 math.ST stat.ME stat.TH 版本更新

Causal Inference with High-dimensional Discrete Covariates

高维离散协变量下的因果推断

Zhenghao Zeng, Sivaraman Balakrishnan, Yanjun Han, Edward H. Kennedy

AI总结 研究高维离散协变量下因果效应的估计问题,证明常用估计量的均方误差界为d²/n²+1/n,并给出极小化下界,提出利用效应同质性和先验知识的新估计量以加速收敛。

Comments 74 pages, 9 figures

详情
AI中文摘要

在从观察性研究估计因果效应时,研究人员通常需要调整许多协变量以消除暴露与结果之间的非因果关系,其中许多协变量是离散的。常用估计量在存在许多离散协变量时的行为尚不明确,因为它们的性质通常是在稀疏性和平滑性等结构假设下分析的,而这些假设不适用于离散设置。在这项工作中,我们研究了一个模型中因果效应的估计,其中用于混杂调整的协变量是离散但高维的,意味着类别数量$d$与样本量$n$相当甚至更大。具体来说,我们证明了常用回归、加权和双稳健估计量的均方误差以$\frac{d^2}{n^2}+\frac{1}{n}$为界。然后,我们证明了平均处理效应的极小化下界为$\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$量级,这刻画了高维离散设置下因果效应估计的基本难度,并表明上述估计量在忽略对数因子时是速率最优的。我们进一步考虑了可以利用的额外结构,即效应同质性和协变量分布的先验知识,并提出了新的估计量,这些估计量具有更快的收敛速率$\frac{d}{n^2} + \frac{1}{n}$,从而在更广泛的范围内实现一致性。通过模拟研究对结果进行了实证说明。

英文摘要

When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d^2}{n^2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n^2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies.

4. 高维统计与正则化 1 篇

2606.20514 2026-06-19 stat.ME 新提交

Hypergraph Variable Selection with False Discovery Rate Control

具有错误发现率控制的超图变量选择

Sarah Organ, Toby Kenney, Hong Gu

AI总结 针对预测变量复杂依赖结构导致变量选择方法功效降低的问题,提出基于超图的选择方法,在控制错误发现率的同时提高选择功效。

Comments 28 pages, 4 figures

详情
AI中文摘要

控制错误发现率的变量选择方法在预测变量呈现复杂依赖结构时往往会失去功效。我们先前表明,选择分层聚类组的预测变量可以缓解这一问题,同时保持错误发现率控制。然而,当相关性结构较不明确时,重叠的预测变量集可能更有效。我们引入了针对预测变量集上定义假设的广义错误发现率,并提出了一种基于超图的选择方法。该方法在各种设置下实现了更高的功效,同时保持了严格的错误发现率控制。

英文摘要

Variable selection methods that control the false discovery rate often lose power when predictors exhibit complex dependence structures. We previously showed that selecting hierarchically clustered groups of predictors can mitigate this issue while maintaining false discovery rate control. When correlations are less structured, however, overlapping predictor sets may be more effective. We introduce a generalized false discovery rate for hypotheses defined on sets of predictors and propose a hypergraph-based selection method. This approach achieves higher power across diverse settings while preserving rigorous false discovery rate control.

5. 时间序列与空间统计 1 篇

2202.03332 2026-06-19 stat.ME econ.EM stat.AP 版本更新

Practical Forecasting of Environmental Maps: A Functional Data Approach

环境地图的实用预测:一种函数型数据方法

Alexander Gleim, Nazarii Salish

AI总结 提出一种基于函数型数据分析的统计方法,用于预测随时间变化的地理区域环境数据,通过整合时空依赖关系生成预测表面,并以德国地面臭氧浓度预测为例验证其有效性。

详情
AI中文摘要

环境问题在社会经济和健康研究中日益受到关注,推动了相关现实过程记录和数据收集的进展。然而,传统数据处理工具往往过于局限,无法考虑此类数据集的丰富特性。本文提出了一种简单的统计视角,用于预测随时间在预定义地理区域上顺序收集的环境数据。我们将此类数据集视为具有可能复杂地理区域的表面(或函数型)时间序列。利用函数型数据分析技术,我们开发了一种预测方法,能够同时考虑地理和时间依赖性。该方法允许整合传统多元技术以提供预测表面。我们通过德国地面臭氧浓度的预测示例展示了我们方法的实用价值,证明了其有效性和广泛应用的潜力。

英文摘要

Environmental problems are receiving increasing attention in socio-economic and health studies, fostering advances in recording and data collection of related real-life processes. However, traditional tools for data processing are often found too restrictive as they do not account for the rich nature of such data sets. In this paper, we propose a simple statistical perspective on forecasting environmental data collected sequentially over time across some predefined geographic region. We treat such data set as a surface (or functional) time series with a possibly complicated geographical domain. Using techniques from functional data analysis, we develop a forecasting methodology that allows to account for both geographic and temporal dependencies. This methodology allows integration of traditional multivariate techniques to provide forecasts surfaces. We demonstrate the practical value of our approach with a forecasting example of ground-level ozone concentration across Germany, showcasing its effectiveness and potential for broad application.

6. 计算统计与MCMC 12 篇

2606.20191 2026-06-19 stat.ML stat.ME 新提交

AK-MCS-C2 : Active Kriging Monte Carlo Simulation method with conformal certification for failure probability estimation

AK-MCS-C2: 具有共形认证的主动克里金蒙特卡洛模拟方法用于失效概率估计

Edgar Jaber, Vincent Chabridon, Mathilde Mougeot

AI总结 提出一种结合主动克里金蒙特卡洛模拟与共形预测的主动学习框架,通过自适应交叉共形策略和J+GP共形估计器,在少量样本下提供无分布假设的预测误差保证,提高极限状态面附近样本分类可靠性,从而提升失效概率估计的准确性和鲁棒性。

详情
AI中文摘要

我们提出了一种新颖的主动学习框架,用于结构可靠性分析中的失效概率估计,该框架将主动克里金蒙特卡洛模拟与共形预测相结合。所提出的方法采用了一种自适应交叉共形策略,专门针对小样本设置和基于J+GP共形估计器的克里金代理模型设计。与标准的AK-MCS方法不同,所提出的框架对预测误差提供了无分布假设的保证,从而对极限状态面附近的样本进行更可靠的分类。这种改进的不确定性量化增强了失效概率估计的准确性和鲁棒性,特别是在这种效率至关重要的罕见事件区域。可重复的数值结果说明了该方法的有效性,并在公认的基准测试上将其与经典方法进行了比较。

英文摘要

We introduce a novel active-learning framework for failure probability estimation in structural reliability analysis that integrates Active Kriging Monte Carlo simulation with conformal prediction. The proposed approach employs an adaptive cross-conformal strategy specifically designed for small-sample settings and kriging surrogate models using the J+GP conformal estimator. Unlike standard AK-MCS methods, the proposed framework provides distribution-free guarantees on prediction errors, leading to more reliable classification of samples near the limit-state surface. This improved uncertainty quantification enhances both the accuracy and robustness of failure probability estimates, especially for rare-event regimes where such efficiency is crucial. Reproducible numerical results illustrate the effectiveness of the method and also compare it to classical approaches on well-established benchmarks.

2606.20141 2026-06-19 stat.CO 新提交

DASH: A Dimensionality Reduction Method for Large-scale Convex MIQP with Applications in Subset Portfolio Selection

DASH: 一种用于大规模凸MIQP的降维方法及其在子集投资组合选择中的应用

Pinzhang Cheng

AI总结 提出DASH降维方法,通过减少变量层次改善大规模凸MIQP求解器性能,在子集投资组合选择中显著提升Gurobi难以求解问题的初始解质量。

详情
AI中文摘要

作为MIP(混合整数规划)的子集选择问题是NP难的。对于大规模问题,在合理时间内找到全局最优解是不可行的,实践中常通过MIP求解器寻找高质量的初始解。本文提出DASH(递减活动集层次)——一种降维方法,针对可表述为MIQP(混合整数二次规划)的一类最佳子集选择问题,提高MIP求解器的性能。我们在子集投资组合选择问题中开发并评估了DASH的性能,并与商业MIP求解器Gurobi进行了比较。除了问题规模外,问题的难度还与协方差矩阵的条件数以及投资组合权重的箱约束有关。大量不同问题配置的数值实验表明,当Gurobi难以求解问题时,DASH能持续显著改进初始解。特别是,DASH改进的幅度和持续时间随问题难度增加而扩大。

英文摘要

Subset selection problems as MIPs (Mixed Integer Programs) are NP-hard. For large scale problems, it is infeasible to find global optimal solutions in a reasonable time and good-quality incumbent solutions are sought after with MIP solvers in practice. This paper proposes DASH (Decreasing Active Set Hierarchy) -- a dimensionality reduction method that improves the MIP solver performance for a subclass of best subset selection problems that can be formulated as MIQPs (Mixed Integer Quadratic Programs). We develop and evaluate the performance of DASH in the subset portfolio selection problem with comparison to Gurobi, a commercial MIP solver. In addition to the problem size, the difficulty of a problem is related to the condition number of the covariance matrix and the box constraint on portfolio weights. An extensive set of numerical experiments with varying problem configurations shows that DASH offers consistent and significant improvement of incumbent solutions when the problem is difficult to solve by Gurobi. In particular, the magnitude and duration of improvement by DASH scale with the difficulty of the problem.

2606.19909 2026-06-19 stat.CO math.PR stat.ME 新提交

Establishing an $Ω(\sqrt{d})$ complexity lower bound for PDMP samplers and how to break it: a sub-$\sqrt{d}$ algorithm for Gaussian-tailed targets

建立 PDMP 采样器的 $\Omega(\sqrt{d})$ 复杂度下界及如何突破:针对高斯尾目标的一个亚 $\sqrt{d}$ 算法

Augustin Chevallier

AI总结 本文证明分段确定性马尔可夫过程采样器在标准设置下具有 $\Omega(\sqrt{d})$ 复杂度下界,并通过放宽目标密度连续时间不变性假设,提出一种新方案,对高斯尾目标实现 $O(d^\alpha)$($\alpha\in[0.2,0.3]$)的经验复杂度。

详情
AI中文摘要

尽管分段确定性马尔可夫过程(PDMP)采样器在理论上有非可逆性的吸引力,但迄今为止,尚未开发出在计算复杂度上相对于目标维度 $d$ 优于 $\mathcal{O}(\sqrt{d})$ 的 PDMP 采样器。我们通过在标准设置中建立 PDMP 采样器算法复杂度的 $\Omega(\sqrt{d})$ 下界,证明这是一个基本限制。通过放宽目标密度必须在所有连续时间保持不变的假设,我们随后展示了如何突破这一障碍。具体来说,我们引入了一种新颖的 PDMP 采样方案,并表明它对高斯尾目标实现了 $\mathcal{O}(d^\alpha)$ 的经验复杂度,其中 $\alpha \in [0.2, 0.3]$。此外,该 PDMP 方案在轨迹长度和速度更新之间的距离上都是局部自适应的。

英文摘要

Despite the theoretical appeal of their non-reversibility, to date, no Piecewise Deterministic Markov Process (PDMP) samplers have been developed that scale better than $\mathcal{O}(\sqrt{d})$ in computational complexity with respect to the target dimension $d$. We prove that this is a fundamental limitation by establishing an $Ω(\sqrt{d})$ lower bound on the algorithmic complexity of PDMP samplers in a standard setup. By relaxing the assumption that the target density must remain invariant at all continuous times, we then demonstrate how to bypass this barrier. Specifically, we introduce a novel PDMP sampling scheme and show that it achieves an empirical complexity of $\mathcal{O}(d^α)$, where $α\in [0.2, 0.3]$ for Gaussian-tailed targets. In addition, this PDMP scheme is locally adaptive in both trajectory length and distance between velocity updates.

2606.19655 2026-06-19 stat.CO math.ST stat.TH 新提交

A Flat Connection: The Pooling Factor and the Geometry of Centring in Hierarchical MCMC

平坦联络:分层MCMC中的汇集因子与中心化几何

Aidan D. Bindoff

AI总结 研究分层MCMC中中心化/非中心化障碍的几何原因,证明Fisher信息诱导的联络是平坦的,障碍源于统计上的汇集因子π_j,并据此提出诊断方法。

Comments 39 pages, 9 figures, accompanying R package

详情
AI中文摘要

标准MCMC诊断($\hat{R}$、有效样本量、发散计数)检测链是否混合,但不检测为何未混合。我们询问分层模型中的中心化/非中心化障碍是否具有度量之外的几何原因。联合参数空间是一个纤维丛(超参数为底,组级参数为纤维),Fisher信息度量诱导一个Ehresmann联络$A = -G_{FF}^{-1}G_{BF}$;自然假设是障碍是其曲率,采样器将其感受为和乐。我们证明这是错误的。对于任何光滑的分层后验,不仅是高斯情况,联络是平坦的,因为其水平叶是纤维得分$\partial_\alpha \log p$的水平集:度量之上没有几何障碍。剩下的障碍是统计的,而非几何的,平坦联络将其识别为一个单一量:纤维对底的条件依赖性,由每组的先验比例$\pi_j$(经典汇集因子)控制。该框架由此恢复了已有图景:先验主导的组混合缓慢,每组的非中心化最优权重有闭式解,并且一项模拟研究通过它们对分层方差的相反依赖性,将这种底-纤维耦合与漏斗(一种不同的底空间病态)区分开来。一项直接归因测试确认NUTS不运输纤维:链级足迹是先验主导组中多余的条件自相关,正如$\pi_j$所预测。真正的、甚至旋转的曲率确实出现,但仅针对由采样器工作度量(固定质量矩阵)构建的联络,此时和乐作为算法现象而非几何现象重新出现。先验比例诊断作为R包fibr分发,几何方法作为附带的复现代码。

英文摘要

Standard MCMC diagnostics ($\hat{R}$, effective sample size, divergence counts) detect whether a chain has mixed, but not why it has not. We ask whether the centring/non-centring obstruction in hierarchical models has a geometric cause beyond the metric. The joint parameter space is a fiber bundle (hyperparameters the base, group-level parameters the fibers), and the Fisher information metric induces an Ehresmann connection $A = -G_{FF}^{-1}G_{BF}$; the natural hypothesis is that the obstruction is its curvature, felt by the sampler as holonomy. We prove this false. The connection is flat for any smooth hierarchical posterior, not only the Gaussian case, because its horizontal leaves are the level sets of the fiber score $\partial_α\log p$: there is no geometric obstruction above the metric. What remains is statistical, not geometric, and the flat connection identifies it as a single quantity: the conditional dependence of fiber on base, governed per group by the prior fraction $π_j$, the classical pooling factor. From it the framework recovers the established picture, that prior-dominated groups mix slowly and that the optimal per-group non-centring weight follows in closed form, and a simulation study separates this base-fiber coupling from the funnel, a distinct base-space pathology, by their opposite dependence on the hierarchical variance. A direct attribution test confirms that NUTS does not transport the fiber: the chain-level footprint is excess conditional autocorrelation in prior-dominated groups, exactly as $π_j$ predicts. Genuine, even rotational, curvature does appear, but only for connections built from a sampler's working metric (a fixed mass matrix), where holonomy re-enters as an algorithmic rather than geometric phenomenon. The prior-fraction diagnostic is distributed as the R package fibr, with the geometric methods as accompanying reproduction code.

2606.19361 2026-06-19 cs.LG cs.AI cs.NA math.NA stat.CO stat.ME stat.ML 新提交

Computational Identifiability

计算可识别性

Lucius E. J. Bynum, Rajesh Ranganath, Kyunghyun Cho

发表机构 * New York University(纽约大学)

AI总结 提出“计算可识别性”框架,通过有限计算搜索过程在指定误差容限内找到经验估计量,从而解决理论可识别性在有限样本、模糊图标准等实际场景中的不足。

详情
AI中文摘要

识别条件描述了目标查询或感兴趣参数作为可用信息类型和数量的函数的可计算性。在因果识别中,这些信息通常以因果图的形式表达,数据是针对图中某些变量子集观测或收集的。目标查询可以是单个效应,也可以是给定模型中的一类效应。识别算法的推导在数学上定义了期望中理论上唯一确定所需因果效应的过程。期望中的可识别性,即“理论可识别性”,通常假设渐近性质、无限数据或其他数学理想化条件。在本文中,我们探讨了这种理论理想化的可识别性与一种受计算限制的替代方案之间的根本区别。我们提出的框架——“计算可识别性”——而是为经验估计量定义一个有限的计算搜索过程。如果该过程在期望的误差容限内经验性地找到了估计量,则满足可识别性,条件取决于搜索的指定假设(即参数上的先验分布)以及搜索过程本身。通过多个实验,我们展示了该框架如何回答细粒度的实际识别问题,例如小有限样本下的识别、模糊图标准下的识别、混合观测-干预数据下的识别,以及跨反事实数据和估计量的识别。代码见 https://this https URL。

英文摘要

Identification conditions describe the computability of a target query or parameter of interest as a function of the type and amount of information available. In causal identification, this information is often expressed in the form of a causal graph, and data are observed or collected for some subset of variables in the graph. Target queries may be for a single effect alone or for a class of effects in a given model. The derivation of an identification algorithm then defines mathematically the process by which the desired causal effect(s) can be uniquely determined, theoretically, in expectation. Identifiability in expectation, or 'theoretical identifiability,' generally assumes asymptotic properties, infinite data, or other mathematically idealized conditions. In this paper, we explore a fundamental distinction between this theoretical, idealized notion of identifiability and a proposed alternative that is computation-bound. The framework we propose - 'computational identifiability' - is to instead define a finite computational search procedure for an empirical estimator. If this process finds an estimator empirically, within a desired error tolerance, then identifiability is satisfied, conditional on the specified assumptions of the search (i.e., a prior distribution over the parameters) and conditional on the search procedure itself. Through several experiments, we demonstrate how this framework allows us to answer fine-grained, practical identification questions, such as identification with small finite samples, with ambiguous graphical criteria, with mixed observational-interventional data, and across counterfactual data and estimands. Code is available at https://github.com/lbynum/metadentify.

2606.04307 2026-06-19 cs.LG stat.CO stat.ME 版本更新

Folded Transport MCMC: Eliminating Label Switching by Sampling on a Fundamental Domain

折叠传输MCMC:对称贝叶斯模型的可认证商后验计算

Jun Hu

发表机构 * Wuhan University of Technology(武汉理工大学)

AI总结 针对对称贝叶斯模型中的冗余多峰性导致MCMC收敛诊断退化的问题,提出Folded Transport MCMC方法,通过在对称群的基本域上构建独立采样器直接对商后验进行推断,并利用LCNF振荡认证框架在商度量下提供可证明的认证下界。

Comments 50 pages (including supplementary material), 5 figures, 6 tables. Submitted to Journal of Computational and Graphical Statistics

详情
AI中文摘要

具有有限对称性的贝叶斯模型——如可交换分量的混合模型、具有紧密间隔模态的结构识别——定义的后验在标签置换群下不变,产生冗余的多峰性,从而降低MCMC收敛诊断的质量。我们引入折叠传输MCMC(FolT-MCMC),该方法通过在对称群的基本域上构建独立采样器,直接对商后验进行推断。商提议分布通过对群轨道上学习的归一化流进行对称化得到。我们证明了基于LCNF振荡的认证框架可以迁移到商度量,并具有稳定子修正的球质量界和改进的覆盖半径,并且当未折叠流表现出跨模态提议缺陷时,分位数核心认证下界会得到改善。在高斯混合(d=2-20)、标签切换目标(最多24个等价模态)以及标准贝叶斯三分量混合后验上,分位数核心认证改进比从2倍到145倍不等,且折叠认证经验上几乎与维度无关。在台风山竹期间超高层建筑的真实加速度计数据上,FolT-MCMC产生了非平凡的分位数核心认证,而未折叠认证是平凡的。

英文摘要

In Bayesian mixture models and other exchangeable-component models, the posterior is invariant under permutation of component labels, creating m! equivalent modes-the label-switching problem. Standard MCMC methods either mix poorly across these modes or rely on post-hoc relabelling that cannot guarantee the sampler has converged. We propose Folded Transport MCMC (FolT-MCMC), which eliminates label switching before sampling by restricting the Markov chain to a fundamental domain-a sorted or reflected subspace containing exactly one representative from each symmetric mode. The proposal is a learned normalising flow whose density is symmetrised over the group orbits, ensuring correct targeting on the reduced space. We show that this construction preserves a computable convergence diagnostic based on the oscillation of the log-density ratio, and that the diagnostic becomes sharper on the fundamental domain whenever the original-space flow under-covers one or more symmetric modes. Experiments on Gaussian mixtures (d=2-20), label-switching targets (up to 24 equivalent modes), a standard Bayesian three-component mixture posterior, and real accelerometer data from a supertall building show improvement ratios of 2x to 145x, with the folded diagnostic stable across dimensions while the unfolded diagnostic collapses.

2603.20022 2026-06-19 stat.ME 版本更新

Q-approximation of operating characteristics of clinical trial designs

临床试验设计操作特性的Q-近似

Susanna Gentile, Daniel E. Schwartz, Riddhiman Saha, Lorenzo Trippa

AI总结 提出Q-近似方法,通过二次近似似然函数替代完整数据模拟,快速评估临床试验的操作特性,计算效率比蒙特卡罗模拟高150-1900倍。

详情
AI中文摘要

设计临床试验需要评估多个操作特性(OCs),例如早期停止决策的可能性、检测治疗效应的概率以及I类错误率。在大多数情况下,这些评估基于计算密集型的蒙特卡罗模拟。随着临床试验复杂性和适应性设计使用的增加,计算负担可能迅速变得难以承受。我们引入了一种快速近似OCs的策略,称为Q-近似。我们的方法基于对数似然的二次近似和渐近论证。主要思想是用模拟决定试验中期和最终决策的近似似然函数来替代完整试验数据集的模拟。Q-近似方法可应用于任何使用与似然原理一致的数据分析方法的试验设计,包括具有早期停止的多阶段设计、自适应随机化设计以及利用外部数据的设计。我们通过几个例子说明了该方法,并表明它在减少计算时间的同时提供了重要OCs的准确近似。特别是,在我们的实验中,要达到相当的精度水平,标准蒙特卡罗近似OCs所需的计算预算比Q-近似高150到1900倍。通过实现快速的OC评估,Q-近似可以支持在应用试验规划和方法学开发中更广泛地使用创新试验设计。

英文摘要

Designing clinical trials requires evaluating multiple operating characteristics (OCs), such as the likelihood of an early stopping decision, the probability of detecting a treatment effect, and the Type I error rate. In most cases, these evaluations are based on computationally intensive Monte Carlo simulations. As the complexity of clinical trials and the use of adaptive designs increase, the computational burden can quickly become prohibitive. We introduce a strategy for rapidly approximating OCs, called the Q-approximation. Our approach is based on quadratic approximations of the log-likelihood and asymptotic arguments. The main idea is to replace simulation of full trial datasets with simulation of the approximate likelihood functions that determine the trial's interim and final decisions. The Q-approximation approach can be applied to any trial design that uses data analysis methods coherent with the likelihood principle, including multistage designs with early stopping, adaptively randomized designs, and designs that leverage external data. We illustrate the approach with several examples and show that it provides an accurate approximation of important OCs while reducing the computation time compared to Monte Carlo simulations. In particular, in our experiments, the standard Monte Carlo approximation of OCs requires 150 to 1,900 times greater computing budget than Q-approximations to achieve comparable levels of accuracy. By enabling fast OC evaluations, Q-approximations can support the broader use of innovative trial designs in both applied trial planning and methodological development.

2602.01929 2026-06-19 math.DS stat.CO stat.ML 版本更新

Probabilistic function-on-function nonlinear autoregressive model for emulation and reliability analysis of stochastic dynamical systems

概率函数对函数非线性自回归模型用于随机动力系统的仿真与可靠性分析

Zhouzhou Song, Marcos A. Valdebenito, Styfen Schär, Stefano Marelli, Bruno Sudret, Matthias G. R. Faes

AI总结 提出F2NARX模型,从函数对函数回归角度改进NARX方法,结合PCA和高斯过程回归实现概率预测,并通过主动学习高效估计首次穿越失效概率。

详情
AI中文摘要

在许多工程领域,构建准确且计算高效的代理模型(或仿真器)用于预测动力系统响应至关重要,但由于外部激励和系统参数到系统响应的强非线性和高维映射,这仍然具有挑战性。本文引入了一种新颖的函数对函数非线性自回归外生输入模型(F2NARX),该模型从函数对函数回归的角度重新表述了最近提出的$\mathcal{F}$-NARX方法。所提出的框架在保持高精度的同时显著提高了预测效率。通过将主成分分析与高斯过程回归相结合,F2NARX进一步通过无迹变换以自回归方式实现动力响应的概率预测。这种概率预测能力进一步促进了首次穿越概率评估的主动学习。通过不同复杂度的案例研究证明了该方法的有效性。结果表明,F2NARX在效率上比最先进的NARX模型高出几个数量级,同时通常达到更高的精度。此外,主动学习方法能够仅使用少量训练时间历程准确估计动力系统的首次穿越失效概率。

英文摘要

Constructing accurate and computationally efficient surrogate models (or emulators) for predicting dynamical system responses is critical in many engineering domains, yet remains challenging due to the strongly nonlinear and high-dimensional mapping from external excitations and system parameters to system responses. This work introduces a novel Function-on-Function Nonlinear AutoRegressive model with eXogenous inputs (F2NARX), which reformulates the recently proposed $\mathcal{F}$-NARX method from a function-on-function regression perspective. The proposed framework substantially improves predictive efficiency while maintaining high accuracy. By combining principal component analysis with Gaussian process regression, F2NARX further enables probabilistic predictions of dynamical responses via the unscented transform in an autoregressive manner. Such probabilistic prediction capabilities further facilitate active learning for first-passage probability evaluation. The effectiveness of the method is demonstrated through case studies of varying complexity. Results show that F2NARX outperforms state-of-the-art NARX model by orders of magnitude in efficiency while achieving higher accuracy in general. Meanwhile, the active learning approach enables accurate estimation of first-passage failure probabilities for dynamical systems using only a small number of training time histories.

2601.23173 2026-06-19 stat.ME 版本更新

Robust, partially alive particle Metropolis-Hastings via the Frankenfilter

鲁棒的、部分存活的粒子Metropolis-Hastings算法:基于Frankenfilter

Chris Sherlock, Andrew Golightly, Anthony Lee

AI总结 针对隐马尔可夫模型中条件似然为零导致粒子滤波失效的问题,提出Frankenfilter,通过固定模拟次数上下限并设定成功目标,实现鲁棒且高效的似然估计,在伪边际Metropolis-Hastings中比标准粒子滤波效率提高2-3倍。

详情
AI中文摘要

当隐马尔可夫模型允许给定隐藏过程的观测条件似然为零时,从一个观测时间到下一个观测时间的所有粒子模拟可能产生零值。如果是这样,滤波分布无法估计,且估计的参数似然为零。存活粒子滤波器通过为每个观测间隔模拟随机数量的粒子来解决这个问题,在达到目标数量的非零条件似然后停止。对于异常观测或较差的参数值,非零结果可能极不可能发生,计算成本过高。我们引入了Frankenfilter,一种有原则的、部分存活的粒子滤波器,它在固定模拟次数上下限的同时,针对用户定义的成功量。Frankenfilter产生似然的无偏估计,适用于伪边际Metropolis-Hastings(PMMH)。我们证明,与使用标准粒子滤波器的PMMH相比,使用Frankenfilter的PMMH对异常值和错误指定的初始参数值更加鲁棒,并且通常效率至少提高2-3倍。我们还提供了选择成功量的建议。在n个精确观测的情况下,这特别简单:目标为n次成功。

英文摘要

When a hidden Markov model permits the conditional likelihood of an observation given the hidden process to be zero, all particle simulations from one observation time to the next could produce zeros. If so, the filtering distribution cannot be estimated and the estimated parameter likelihood is zero. The alive particle filter addresses this by simulating a random number of particles for each inter-observation interval, stopping after a target number of non-zero conditional likelihoods. For outlying observations or poor parameter values, a non-zero result can be extremely unlikely, and computational costs prohibitive. We introduce the Frankenfilter, a principled, partially alive particle filter that targets a user-defined amount of success whilst fixing lower and upper bounds on the number of simulations. The Frankenfilter produces unbiased estimators of the likelihood, suitable for pseudo-marginal Metropolis--Hastings (PMMH). We demonstrate that PMMH with the Frankenfilter is more robust to outliers and mis-specified initial parameter values than PMMH using standard particle filters, and is typically at least 2-3 times more efficient. We also provide advice for choosing the amount of success. In the case of n exact observations, this is particularly simple: target n successes.

2512.17473 2026-06-19 eess.SP cs.LG math.OC stat.ML 版本更新

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

非线性矩阵分解的交替方向乘子法

Atharva Awari, Nicolas Gillis, Arnaud Vandaele

发表机构 * University of Mons(蒙斯大学)

AI总结 提出基于交替方向乘子法(ADMM)的算法求解非线性矩阵分解(NMD),支持多种非线性函数和损失函数,在真实数据集上验证了适用性和效率。

Comments 16 pages, 7 figures. v3: Revised version: added new experiments and comparisons. Code available from https://gitlab.com/Atharva05/admm-for-nmd

详情
AI中文摘要

我们提出了一种基于交替方向乘子法(ADMM)的算法,用于求解非线性矩阵分解(NMD)。给定输入矩阵 $X \in \mathbb{R}^{m \times n}$ 和分解秩 $r \ll \min(m, n)$,NMD 寻求矩阵 $W \in \mathbb{R}^{m \times r}$ 和 $H \in \mathbb{R}^{r \times n}$,使得 $X \approx f(WH)$,其中 $f$ 是逐元素非线性函数。我们在几个代表性非线性模型上评估了我们的方法:适用于非负稀疏数据近似的修正线性单元激活 $f(x) = \max(0, x)$,适用于概率电路表示的逐分量平方 $f(x) = x^2$,以及适用于推荐系统的 MinMax 变换 $f(x) = \min(b, \max(a, x))$。所提出的框架灵活支持多种损失函数,包括最小二乘、$\ell_1$ 范数和 Kullback-Leibler 散度,并且可以轻松扩展到其他非线性和度量。我们在真实世界数据集上展示了该方法的适用性、效率和适应性,突出了其在广泛应用中的潜力。

英文摘要

We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

2508.13313 2026-06-19 stat.ML cs.LG math.OC 版本更新

Flow Matching for Efficient and Scalable Data Assimilation

用于高效可扩展数据同化的流匹配

Taos Transue, Bohan Chen, So Takao, Bao Wang

发表机构 * The Computing and Mathematical Sciences Department, California Institute of Technology(加州理工学院计算与数学科学系) Department of Mathematics and Scientific Computing and Imaging Institute, University of Utah(犹他大学数学与科学计算系和成像研究所)

AI总结 提出基于流匹配的无训练集成流滤波器(EnFF),通过蒙特卡洛估计和局部化引导加速高维非线性数据同化,在成本-精度权衡和可扩展性上优于现有方法。

Comments revamp presentation, add experiments

详情
AI中文摘要

数据同化(DA)从含噪声观测中估计动态系统的状态。最近的生成模型如集成得分滤波器(EnSF)改进了高维非线性设置下的DA,但计算成本高。我们引入集成流滤波器(EnFF),一种基于流匹配(FM)的无训练框架,加速采样并提供流设计灵活性。EnFF使用边际流场的蒙特卡洛估计器、用于观测同化的局部化引导,并利用一种利用贝叶斯DA公式的新型流路径。它推广了经典滤波器如自举粒子滤波器和集成卡尔曼滤波器。在高维基准上的实验证明了EnFF改进的成本-精度权衡和可扩展性,突显了FM在高效、可扩展DA中的潜力。代码见 https://this URL。

英文摘要

Data assimilation (DA) estimates a dynamical system's state from noisy observations. Recent generative models like the ensemble score filter (EnSF) improve DA in high-dimensional nonlinear settings but are computationally expensive. We introduce the ensemble flow filter (EnFF), a training-free, flow matching (FM)-based framework that accelerates sampling and offers flexibility in flow design. EnFF uses Monte Carlo estimators for the marginal flow field, localized guidance for observation assimilation, and utilizes a novel flow path that exploits the Bayesian DA formulation. It generalizes classical filters such as the bootstrap particle filter and ensemble Kalman filter. Experiments on high-dimensional benchmarks demonstrate EnFF's improved cost-accuracy tradeoffs and scalability, highlighting FM's potential for efficient, scalable DA. Code is available at https://github.com/Utah-Math-Data-Science/Data-Assimilation-Flow-Matching.

2503.11479 2026-06-19 stat.CO math.PR math.ST stat.ME stat.TH 版本更新

Towards practical PDMP sampling: Metropolis adjustments, locally adaptive step-sizes, and NUTS-based time lengths

走向实用的PDMP采样:Metropolis调整、局部自适应步长和基于NUTS的时间长度

Augustin Chevallier, Sam Power, Matthew Sutton

AI总结 针对PDMP采样需要计算模型特定界限的难题,提出Metropolis调整近似、自适应步长机制和NUTS启发的路径长度选择,集成得到双重自适应PDMP采样器,提升鲁棒性和效率。

详情
AI中文摘要

分段确定性马尔可夫过程(PDMP)在从复杂概率分布中采样方面具有重要前景。然而,其实践应用受到需要计算模型特定界限的限制。相反,虽然哈密顿蒙特卡洛(HMC)提供了一种普遍有效的采样方法,但其无法自适应调整步长,导致在采样漏斗形等复杂分布时性能受损。为解决这些限制,我们引入了三个创新概念:(a) 一种Metropolis调整的PDMP模拟近似,无需显式界限且不破坏不变测度;(b) 一种与Metropolis校正兼容的自适应步长机制;(c) 一种受无U型转弯采样器(NUTS)启发的方案,用于动态选择PDMP中的路径长度。这三个想法可以无缝集成到一个单一的“双重自适应”PDMP采样器中,具有良好的鲁棒性和效率特性。

英文摘要

Piecewise-Deterministic Markov Processes (PDMPs) hold significant promise for sampling from complex probability distributions. However, their practical implementation is hindered by the need to compute model-specific bounds. Conversely, while Hamiltonian Monte Carlo (HMC) offers a generally efficient approach to sampling, its inability to adaptively tune step sizes impedes its performance when sampling complex distributions like funnels. To address these limitations, we introduce three innovative concepts: (a) a Metropolis-adjusted approximation for PDMP simulation that eliminates the need for explicit bounds without compromising the invariant measure, (b) an adaptive step size mechanism compatible with the Metropolis correction, and (c) a No U-Turn Sampler (NUTS)-inspired scheme for dynamically selecting path lengths in PDMPs. These three ideas can be seamlessly integrated into a single, `doubly-adaptive' PDMP sampler with favourable robustness and efficiency properties.

7. 机器学习统计基础 20 篇

2606.20451 2026-06-19 stat.ML cs.LG stat.AP stat.CO 新提交

SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU Data

SSH-Net: 一种用于竞争风险下预测失效时间分布函数的深度神经网络及其在GPU数据上的应用

Jie Min, Yueyao Wang, Mengkun Chen

AI总结 提出结构化分段风险深度神经网络(SSH-Net),通过将网络结构与数据结构关联,允许不同协变量组通过子网络影响预测,在竞争风险框架下预测失效时间分布函数,仿真和GPU数据验证了准确性。

详情
AI中文摘要

竞争风险在工程领域常见,当应用场景复杂时会给时间事件数据建模带来挑战。近年来,深度神经网络因其灵活性和高学习能力在竞争风险预测中受到广泛关注。然而,神经网络结构的复杂性使得基于不同数据输入的超参数调优更加困难。此外,当工程系统具有多层级的复杂物理结构时,将所有结构层级视为单一输入组可能无法捕捉关键信息。为解决这些问题,我们提出了一种结构化分段风险深度神经网络(SSH-Net),用于在特定原因竞争风险框架下预测失效时间。我们的方法将神经网络结构与数据结构相关联,并允许不同的协变量组通过分离的子网络影响失效预测。神经网络基于特定原因竞争风险模型构建。SSH-Net输出特定原因风险函数,并采用惩罚对数似然作为损失函数。通过评估Brier分数、接收者操作特征曲线下面积(AUC)和预测的特定原因累积发生函数的均方根误差(RMSE),仿真研究验证了SSH-Net的预测准确性。我们进一步使用Titan GPU失效时间数据展示了模型预测失效时间分布函数的能力。

英文摘要

Competing risks are commonly observed in engineering fields and can bring challenges to time-to-event data modeling when the application scenarios are complicated. Recently, deep neural networks have received great attention for prediction with competing risks, due to their flexibility and high learning capability. However, the complexity of neural network structure brings extra difficulty in hyperparameter tuning based on different data inputs. Additionally, when an engineered system has complex physical structures with multiple hierarchical levels, treating all structural levels as a single group of inputs may fail to capture critical information. To address the issues, we propose a Structured Segmented Hazard Deep Neural Network (SSH-Net) for failure time prediction under cause-specific competing risks framework. Our approach associates neural network structure with data structures, and allows different covariate groups to impact the failure prediction through separate sub-networks. The neural network is constructed based on a cause-specific competing risks model. The SSH-Net outputs cause-specific hazard functions, and utilizes the penalized log-likelihood as the loss function. The prediction accuracy of SSH-Net is validated through simulation studies by evaluating the Brier score, the area under receiver operating characteristic curves (AUC), and the root mean square error (RMSE) of the predicted cause-specific cumulative incident function. We further demonstrate the model's ability to predict failure time distribution functions using the Titan GPU failure time data.

2606.19714 2026-06-19 stat.ML cs.AI cs.LG stat.CO stat.ME 新提交

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing

AURA: 用于LLM作为评判审计的自适应不确定性感知精炼

Zilong Zhang, Yi-Ting Hung, Weiyi He, Junxi Zhang, Lei Ding, Chi-Kuang Yeh

AI总结 提出AURA框架,通过自适应不确定性感知精炼,在少量人工验证下迭代学习人类一致性信号,优先审核不确定比较,提升LLM评判的可靠性。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用作开放式生成的评判者,因为大规模人工评估通常昂贵且难以扩展,但它们的偏好仍然是人类判断的不完美代理。现有的审计流程通常假设事先存在可靠的示例子集或干净的监督信号,例如来自人工注释、启发式过滤或强评判者的输出。在LLM评估中,这一假设是脆弱的:初始分割可能继承评判者偏差,而人工验证通常过于稀缺,无法在规模上定义稳定组。我们提出AURA,一种自适应不确定性感知精炼框架,用于在选定的人工验证下审计成对LLM作为评判的决策。AURA迭代学习人类一致性信号,传播可靠证据,并优先将不确定的比较提交人工审核。关键思想是将对评判者的信任视为一个潜在量,随着证据积累逐步精炼。我们提供了紧凑的公式、稳定的精炼过程,以及在合成和真实成对LLM答案数据上的全面评估。

英文摘要

Large language models (LLMs) are increasingly used as judges for open-ended generation, as large-scale human evaluation is often expensive and difficult to scale, yet their preferences remain imperfect proxies for human judgment. Existing auditing pipelines often assume that a reliable subset of examples or clean supervision signals are available beforehand, for example from human annotation, heuristic filtering, or the outputs of strong judges. In LLM evaluation, this assumption is fragile: the initial split may inherit judge bias, while human verification is typically too scarce to define stable groups at scale. We propose AURA, an adaptive uncertainty--aware refinement framework for auditing pairwise LLM--as--a--judge decisions under selected human verification. AURA iteratively learns a human-consistency signal, propagates reliable evidence, and prioritizes uncertain comparisons for human review. The key idea is to treat trust in a judge as a latent quantity that is progressively refined as evidence accumulates. We provide a compact formulation, a stable refinement procedure, and a comprehensive evaluation on both synthetic and real pairwise LLM-answer data.

2606.19587 2026-06-19 stat.ML cs.LG 新提交

A Solver-Free Training Method for Predict-then-Optimize

一种无求解器的预测后优化训练方法

Beichen Wan, Mo Liu

AI总结 提出一种基于测度变换的决策聚焦学习管道,通过无求解器代理损失实现预测后优化中预测模型的高效训练,理论保证Fisher一致性,训练时间降低数个数量级。

Comments Accepted by ICML 2026

详情
AI中文摘要

我们提出了一种可扩展的方法,用于在预测后优化范式中训练预测(机器学习)模型,其中模型输出作为后续线性优化任务的系数。直接最小化经验决策遗憾对于线性规划和组合优化是不可行的,因为决策映射是分段常数,且梯度几乎处处为零。虽然现有方法通过平滑微分过程来解决这一问题,但它们存在可扩展性问题,因为每次梯度评估都需要调用计算昂贵的求解器。为了解决这个问题,我们提出了一种基于测度变换原理的决策聚焦学习管道,该管道在训练期间产生一个完全无优化求解器的新代理损失。我们建立了理论保证,包括Fisher一致性和超额风险界。实验上,我们的方法在实现与最先进方法相当的决策质量的同时,将训练时间减少了数个数量级。

英文摘要

We propose a scalable method for training prediction (machine learning) models in the predict-then-optimize paradigm, where model outputs serve as coefficients for a subsequent linear optimization task. Directly minimizing the empirical decision regret is intractable for linear programming and combinatorial optimization since the decision mapping is piecewise constant, and the gradients are zero almost everywhere. While existing methods address this by smoothing the differentiation process, they suffer from scalability issues, since a computationally expensive solver call is required for every gradient evaluation. To address this, we propose a decision-focused learning pipeline based on a measure transformation principle, which yields a new surrogate loss that is completely optimization-solver-free during training. We establish theoretical guarantees, including Fisher consistency and excess risk bounds. Empirically, our method achieves decision quality competitive with state-of-the-art methods while reducing training time by orders of magnitude.

2606.19410 2026-06-19 stat.ML cs.LG 新提交

The Representational Limit of Scalar Interactions: An Interventional Decomposition

标量交互的表征限制:一种干预分解

Potito Aghilar, Sabino Roccotelli, Stanislao Fidanza, Vito Walter Anelli, Sebastiano Stramaglia, Tommaso Di Noia

AI总结 本文证明标量交互指标混淆了唯一性、冗余性和协同性,并提出Stochastic Hi-Fi方法,通过干预掩码推理分解每个特征的U/R/S轮廓,在表格和图像任务中恢复被标量基线遗漏的结构。

详情
AI中文摘要

有符号的成对交互指标从根本上混淆了唯一性(U)、冗余性(R)和协同性(S)。我们在一个最小的3路XOR结构因果模型上证明了这一点:忠实的指标如Shapley-Taylor对每对返回零,而投影指标如Shapley Interaction将三阶效应扩散到混淆三种机制的成对标量中。我们引入了Stochastic Hi-Fi,一种事后、无需重新训练的可预测性分解方法,通过干预掩码推理估计每个特征的U/R/S轮廓。该估计器提供精确的干预语义、有限样本蒙特卡洛界限、耦合菱形采样带来的严格方差减少以及均匀的有限词汇收敛。在表格SCM上,Stochastic Hi-Fi恢复了被标量基线遗漏的结构(交互幅度恢复比高达411倍)。它还在GPT-2 IOI电路中分离了冗余和协同头。在NIH ChestX-ray14上,Stochastic Hi-Fi在Pointing Game中匹配GradCAM,并在Deletion AUC上显著改进。

英文摘要

Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

2606.19883 2026-06-19 cs.LG stat.ML 新提交

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

匹配市场遇上累积前景理论:迈向最优和对抗鲁棒学习

Ananya Kunisetty, Avishek Ghosh

发表机构 * Indian Institute of Technology Bombay(印度理工学院孟买分校)

AI总结 研究基于累积前景理论(CPT)的竞争性双边匹配市场多智能体多臂赌博机问题,提出最优遗憾界算法并扩展到对抗性市场。

Comments Accepted at ECML-PKDD 2026, Naples, Italy

详情
AI中文摘要

我们研究了一个在竞争性设置下具有双边匹配市场的多智能体多臂赌博机问题,该问题基于以人为中心的决策模型。为了捕捉人类偏好,我们使用累积前景理论(CPT),该理论通过一个(α-Hölder连续)权重函数以非线性方式加权智能体的行动。CPT已被广泛用于行为经济学和风险敏感机器学习中,以模拟人类偏好。我们分析了带有CPT权重扭曲奖励的最先进学习算法,并获得了玩家最优遗憾界为$\mathcal{O}(K\log T \left(\frac{1}{\Delta}\right)^{2/\alpha})$,其中$K$表示臂数,$T$是学习时间,$\Delta$表示(适当定义的)玩家的最小偏好差距。注意到对$\Delta$的依赖是次优的,我们通过明智地选择探索期间的活跃臂集进一步改进了这一遗憾,从而在主导项中消除了对$K$的依赖,并在臂数$K$显著大于玩家数$N$的设置中实现了改进的(最优)遗憾保证。此外,我们考虑了对抗性市场,其中智能体的观测奖励可能被破坏。我们提出并分析了在已知和未知总破坏预算两种设置下,以CPT作为风险敏感度量的鲁棒市场算法,并在两种情况下建立了对数级别的玩家最优遗憾保证。

英文摘要

We study a multi-agent multi-armed bandit problem in the competitive setup with two-sided matching markets under a human centric decision making model. To capture human preferences, we use cumulative prospect theory (CPT) that weighs the actions of the agent in a nonlinear fashion using a ($α$-Hölder continuous) weight function. CPT has been widely used in behavioral economics and risk sensitive machine learning to emulate human preferences. We analyze the state-of-the-art learning algorithm with CPT weight distorted rewards and obtain a player optimal regret of $\mathcal{O}(K\log T \left(\frac{1}Δ\right)^{2/α})$, where $K$ denotes the number of arms, $T$ is the learning horizon, and $Δ$ represents (suitably defined) players' minimum preference gap. Noticing the dependence on $Δ$ to be sub-optimal, we further improve this regret by judiciously selecting the active set of arms during exploration, which removes the dependence on $K$ in the dominant term and achieves an improved (optimal) regret guarantees in the setting where the number of arms $K$ is significantly larger than the number of players $N$. In addition, we consider adversarial markets where the observed rewards of the agents may be corrupted. We propose and analyze algorithms for robust markets with CPT as risk sensitive measure in both settings where the total corruption budget is known and where it is unknown, and establish logarithmic player-optimal regret guarantees in both cases.

2606.19607 2026-06-19 cs.AI stat.AP 新提交

Which Pairs to Compare for LLM Post-Training?

LLM后训练中应比较哪些对?

Jiangze Han, Vineet Goyal, Will Ma

发表机构 * Columbia University(哥伦比亚大学)

AI总结 研究偏好后训练中如何选择最具信息量的比较对,提出基于采样设计的比较策展方法,通过DPO训练的理论分析给出优化准则,实验证明能提升样本效率。

详情
AI中文摘要

基于偏好的后训练已成为对齐语言模型的核心范式。常见的数据收集策略是为每个提示生成少量补全并标注生成的比较对。然而,人工偏好标签通常比生成额外补全昂贵得多,这提示了相同标注预算的不同使用方式:生成更大的补全集,但只标注最具信息量的比较对。本文研究在基于偏好的后训练中应比较哪些对。我们将比较策展形式化为一个采样设计问题,并通过基于偏好的后训练目标下的最终策略质量来评估设计。我们针对直接偏好优化(DPO)实例化该框架,分析标注对的选择如何通过DPO训练传播到下游策略性能。我们的主要结果为DPO训练策略的后训练最优性差距提供了匹配的上界和下界。这些界限表明,比较选择通过一个单一的设计相关信息矩阵影响下游性能,该矩阵将标签分配与参数估计误差和策略次优性联系起来。这为预算受限的比较策展提供了显式优化准则,并激发了从大型生成补全池中选择信息对的实际采样设计。在合成设置和语言模型后训练基准上的实验表明,所提出的设计在样本效率上持续优于常见的比较选择启发式方法。

英文摘要

Preference-based post-training has become a central paradigm for aligning language models. A common data-collection strategy is to generate a small set of completions for each prompt and label the resulting comparison pairs. However, human preference labels are often much more expensive than generating additional completions, suggesting a different use of the same labeling budget: generate a larger pool of completions, but label only the most informative comparison pairs. This paper studies which pairs should be compared in preference-based post-training. We formulate comparison curation as a sampling-design problem and evaluate designs by the quality of the final policy under the preference-based post-training objective. We instantiate this framework for Direct Preference Optimization (DPO), analyzing how the choice of labeled pairs propagates through DPO training to downstream policy performance. Our main results provide matching upper and lower bounds on the post-training optimality gap of the DPO-trained policy. The bounds show that comparison selection affects downstream performance through a single design-dependent information matrix, which links label allocation to parameter estimation error and policy suboptimality. This yields an explicit optimization criterion for budgeted comparison curation and motivates practical sampling designs for selecting informative pairs from large generated completion pools. Experiments on synthetic settings and language-model post-training benchmarks show that the proposed designs consistently improve sample efficiency over common comparison-selection heuristics.

2606.19491 2026-06-19 cs.LG stat.ML 新提交

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

LayerNorm Transformer 中的代数死方向:一种仅需前向传播的大语言模型规模诊断方法

Tejas Pradeep Shirodkar, P. J. Narayanan

发表机构 * IIIT, Hyderabad(海得拉巴国际信息技术学院)

AI总结 本文发现 LayerNorm 的逆尺度方向是后最终归一化中心激活协方差矩阵的精确代数核,可仅从参数中读取死方向,无需前向或后向传播,并在 14 个预训练模型上验证了其有效性。

Comments 34 pages, 7 figures, 6 tables. Empirical companion to arXiv:2606.05957

详情
AI中文摘要

预训练 Transformer 位于损失函数的奇异极小值附近,此时 Fisher 信息度量沿死方向退化:参数空间中方向性 Fisher 为零的方向。通常定位这样的方向需要一次前向传播和激活矩阵的特征分解,或基于采样的复杂度估计;没有一种方法能仅从网络参数计算方向。我们针对 LayerNorm Transformer 给出了一个这样的方向。LayerNorm 仿射的逆尺度方向 $\gamma^{-1}/\|\gamma^{-1}\|$ 是后最终归一化中心激活协方差矩阵的精确代数核,适用于任何输入分布,并在参数空间中诱导出相应的死方向。它仅从 LN 尺度参数读取,无需前向或后向传播,无需特征分解:这是针对 LayerNorm 的最廉价死方向读取方法。我们在 14 个预训练 Transformer(9 个 LayerNorm,5 个 RMSNorm;160M-35B;语言和视觉目标)上进行了测试。在随机初始化时,预测方向与测量的底部奇异方向(一次前向传播,直接 SVD)在 9/9 的 LayerNorm 模型上匹配到小数点后四位,并在 5/5 的 RMSNorm 模型上正确缺失,后者缺乏产生该方向的均值减法投影器。在训练后的检查点上,沿该方向的协方差特征值加深约 ${\sim}10^3$ 倍,并打开更多死方向;随机初始化到训练后的差距是一次前向传播、每检查点沿预测坐标的奇异结构读出。由此得出两个闭式结论:残差流的最小奇异值在 13/14 个 Transformer 上逐块保持不变(在其自身输入分布上测量),唯一的例外(Gemma$4$-$31$B)是一个真正的死方向,同一读出可精确定位;核方向的存在从参数本身即可对 Transformer 的归一化进行分类。

英文摘要

Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $γ^{-1}/\|γ^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

2606.20557 2026-06-19 cs.LG math.ST stat.ML stat.TH 新提交

Optimal Deterministic Multicalibration and Omniprediction

最优确定性多校准与全预测

Georgy Noarov, Aaron Roth

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出一种确定性算法,实现多校准的极小化最优样本复杂度,并推广到结果不可区分性,解决确定性预测器是否必要的问题。

详情
AI中文摘要

一个模型在一组群体权重 $G$ 上是多校准的,如果它是校准的——即即使以其预测为条件也是无偏的——不仅整体上,而且在通过每个 $g \in G$ 对上下文重新加权后也是如此。这对于许多下游应用是一个有用的性质,也是可信机器学习的基本要求。在这项工作之前,所有已知达到 $\varepsilon$-多校准的极小化最优 $\widetilde O(\varepsilon^{-3})$ 样本复杂度的预测器都是随机化的,而确定性预测器仅以更差的样本复杂度已知。多校准中随机化对于最优样本复杂度是否必要的问题由 [CLNR26] 明确提出,并在之前的几项工作中隐含提出。我们通过给出一个输出确定性预测器的极小化最优多校准算法解决了这个开放问题。然后我们将该算法推广到产生满足关于有限或有限覆盖测试集合的结果不可区分性(OI)的最优确定性预测器。作为一个应用,这也给出了具有最优样本复杂度的确定性全预测器和泛预测器,解决了 [OKK25] 和 [BHHLZ25] 提出的开放问题。

英文摘要

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon^{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].

2606.20022 2026-06-19 stat.ML cs.LG math.OC 新提交

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

具有有界噪声的随机线性上下文赌博机:一种集合成员方法

Haonan Xu, Yingying Li

AI总结 针对有界奖励噪声的随机线性上下文赌博机,提出基于集合成员估计和乐观原则的SME-OFU算法,实现O(log T)的遗憾界,优于次高斯噪声下的最优界。

Comments 23 pages, 1 figure

详情
AI中文摘要

本文考虑具有有界奖励噪声的随机线性上下文赌博机(SLCB)。现有工作通常假设次高斯奖励噪声和有界期望奖励,在此条件下最优遗憾界关于时间T为$\tilde{O}(\sqrt{T})$。然而,在许多应用中,实现/观测到的奖励也自然有界,这意味着奖励噪声有界。有界噪声比次高斯条件更具信息性,但在SLCB文献中尚未被明确利用。本文通过利用一种称为集合成员估计(SME)的不确定性量化方法,并应用面对不确定性的乐观原则(OFU),提出了一种新颖的算法SME-OFU。我们的算法享有改进的遗憾界$O(\log T)$。注意,这并不与次高斯噪声下现有的最优界$\tilde{O}(\sqrt{T})$矛盾,因为有界噪声是更强的条件。最后,仿真表明,当奖励噪声有界时,SME-OFU相对于为次高斯噪声设计的基准算法在经验上有所改进。

英文摘要

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

2606.19878 2026-06-19 cs.LG math.OC stat.ML 新提交

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University(普渡大学)

AI总结 提出分段多项式插值梯度下降(PPI-GD)方法,通过数据域等距点查询一阶预言构造多项式插值近似全梯度,在强凸和非凸损失下分析预言复杂度,证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情
AI中文摘要

最近关于经验风险最小化(ERM)的一阶优化器的工作表明,可以利用ERM损失函数在训练数据中的光滑性(而非优化参数中的光滑性)来改进梯度下降(GD)方法的预言复杂度。在本文中,我们提出了一种不精确梯度方法——分段多项式插值梯度下降(PPI-GD),该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度,从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度,其中数据空间维数以训练样本数量的多对数函数为界,并发现当损失函数足够光滑时,PPI-GD在关键区域优于几种GD变体。此外,我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中,这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

2606.20356 2026-06-19 math.OC cs.AI cs.LG math.PR stat.ML 新提交

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

公共噪声Wasserstein不确定性下的平均场控制鲁棒$Q$-学习

Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

AI总结 提出一种针对公共噪声分布Wasserstein不确定性的离散时间平均场控制鲁棒$Q$-学习算法,结合量化投影与Wasserstein对偶,证明同步和异步学习的收敛性及有限时间界,并在系统风险和流行病模型中验证鲁棒性-性能权衡。

详情
AI中文摘要

在本文中,我们提出了一种针对公共噪声定律下Wasserstein不确定性的离散时间平均场控制问题的鲁棒$Q$-学习算法。该算法将量化投影方案与公共噪声空间上的Wasserstein对偶重述相结合。我们建立了其收敛性以及同步和异步学习方案的有限时间迭代界。关于系统风险和流行病模型的数值实验将异步实现与理想化的Bellman迭代进行了比较,说明了在公共噪声误设下的鲁棒性-性能权衡,并报告了异步$Q$-学习算法的观察收敛行为。

英文摘要

In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

2606.20299 2026-06-19 stat.ML cs.LG hep-ph physics.data-an 新提交

Statistical Properties of Training & Generalization

训练与泛化的统计特性

Itay Lavie, Noam Levi, Yonatan Kahn

AI总结 从物理学角度研究深度学习的关键特征和意外现象,回顾神经缩放定律及其与物理问题中约束和归纳偏置的相互作用。

Comments 32 pages, 3 figures. Part of the VERaiPHY initiative

详情
AI中文摘要

深度学习成功规避了经典统计学的众多直觉,在多个现实任务中取得了前所未有的性能。本文从物理学角度研究深度学习的关键特征和意外现象,注意指出并尽可能证明构建深度学习模型时固有的多种选择。特别地,我们回顾了神经缩放定律的现象,并讨论了它们与在物理问题中应用机器学习时可能存在的约束和归纳偏置之间的相互作用。

英文摘要

Deep learning has managed to evade numerous intuitions from classical statistics to achieve unprecedented performance on a number of real-world tasks. In this article, we investigate the key features and surprises of deep learning from a physics-informed perspective, taking care to point out and justify where possible the many choices inherent in constructing a deep learning model. In particular, we review the phenomenon of neural scaling laws and discuss their interplay with the constraints and inductive biases which may be present when applying machine learning to problems in physics.

2605.02989 2026-06-19 cs.IT eess.SP math.IT stat.ML 版本更新

Information Theory and Statistical Learning

信息论与统计学习

Abbas El Gamal

AI总结 本文是Cover & Thomas《信息论基础》第三版的章节预印本,系统介绍了散度度量在模型训练中的作用,涵盖线性回归、生成扩散模型等,并给出了扩散模型更系统的推导。

详情
AI中文摘要

本手稿包含即将出版的《Cover and Thomas信息论基础》第三版中一章的预印本,经Wiley许可发布。新版的目录EIT-3 ToC可在此https URL找到。反馈请联系abbas@ee. this http URL。学习与信息论在模型训练和基本性能极限的表征中均有交叉。本手稿对第一个交叉点进行了简洁易懂的处理,仅需高年级本科生或一年级研究生水平的信息论和统计学基础知识。章末习题使材料既适合课堂使用也适合自学。本章重点讨论散度度量在模型训练中的作用,示例涵盖从线性回归、逻辑回归到自回归模型、变分自编码器、扩散模型、生成对抗网络和基于分数的模型。介绍了证据下界(ELBO)、f-散度和Fisher散度。特别是,对生成扩散模型的处理提供了比文献中更系统、更明确的推导。

英文摘要

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), f-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

2605.18315 2026-06-19 math.OC stat.ML 版本更新

Attention-based PCA

基于注意力的PCA

Rodrigo Maulen-Soto, Claire Boyer

AI总结 本文研究了注意力机制在无监督问题PCA中的表现,证明在高斯数据上训练时,softmax和线性注意力层学习的参数与协方差矩阵的主特征向量对齐,建立了与PCA的直接联系,并扩展到上下文设置中。

详情
AI中文摘要

我们通过一个经典无监督问题——主成分分析(PCA)的视角研究注意力机制。我们证明,当在高斯数据上训练时,softmax和线性注意力层学习的参数与协方差矩阵的主特征向量对齐,从而建立了与PCA的直接且明确的联系。我们的分析涵盖了有限和无限提示范围。在无限提示极限下,我们证明收敛到与主谱方向对齐的全局最优解;而在有限提示设置中,我们显示相同的行为在采样效应范围内出现。我们进一步将分析扩展到具有突出Wishart协方差的上下文设置中,其中注意力成功地恢复了底层信号方向。这些结果表明,在无监督目标下,注意力本质上执行类似于PCA的计算,为其实现表示学习能力提供了理论基础。

英文摘要

We study attention mechanisms through the lens of a canonical unsupervised problem: principal component analysis (PCA). We show that, when trained on Gaussian data, both softmax and linear attention layers learn parameters that align with the principal eigenvectors of the covariance matrix, thereby establishing a direct and explicit connection with PCA. Our analysis covers both finite and infinite prompt regimes. In the infinite-prompt limit, we prove convergence to globally optimal solutions aligned with the leading spectral direction, while in the finiteprompt setting we show that the same behavior emerges up to sampling effects. We further extend the analysis to an in-context setting with spiked Wishart covariances, where attention successfully recovers the underlying signal direction. These results demonstrate that attention inherently performs PCA-like computations under unsupervised objectives, providing a theoretical foundation for its representation-learning capabilities.

2604.21097 2026-06-19 stat.ML cs.LG 版本更新

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

学习模拟混沌:对抗最优传输正则化

Gabriel Melo, Leonardo Santiago, Peter Y. Lu

发表机构 * Department of Mechanical and Aerospace Engineering, North Carolina State University, Raleigh, NC(北卡罗来纳州立大学机械与航空航天工程系) Department of Electrical and Computer Engineering, Tufts University, Medford, MA(塔夫茨大学电气与计算机工程系) Work performed while at the University of Campinas(在坎皮纳斯大学工作期间)

AI总结 针对混沌动力学模拟中长程统计保真度低的问题,提出基于对抗最优传输的目标函数,联合学习高质量汇总统计量和物理一致的模拟器,理论分析与实验验证了Sinkhorn散度和WGAN对偶形式的有效性。

详情
AI中文摘要

混沌出现在许多复杂动力系统中,从天气到电网,但使用机器学习模拟器等数据驱动方法难以准确建模。虽然模拟器是加速模拟和解决逆问题的有前途的工具,但它们仍然难以学习混沌动力学,其中对初始条件的敏感性使得精确的长期预测不可行,尤其是在给定噪声数据的情况下。最近的工作转而训练模拟器以匹配混沌吸引子的统计特性,但这些方法通常依赖于手工制作的汇总统计量或大型、多样的多环境数据集。在这项工作中,我们提出了一类对抗最优传输目标,可以从单个噪声轨迹中联合学习高质量的汇总统计量和物理一致的模拟器。我们从理论上分析并实验验证了我们的方法的Sinkhorn散度公式(2-Wasserstein)和WGAN风格的对偶公式(1-Wasserstein)。在各种混沌系统(包括具有高维时空混沌的系统)上的数值实验表明,使用我们提出的目标训练的模拟器具有显著改善的长期统计保真度。

英文摘要

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model with data-driven methods such as machine learning emulators. While emulators are promising tools for accelerating simulations and solving inverse problems, they still struggle to learn chaotic dynamics, where sensitivity to initial conditions renders exact long-term forecasts infeasible, especially given noisy data. Recent work instead trains emulators to match the statistical properties of chaotic attractors, but these approaches often rely on handcrafted summary statistics or large, diverse multi-environment datasets. In this work, we propose a family of adversarial optimal transport objectives that can jointly learn high-quality summary statistics and a physically consistent emulator from a single noisy trajectory. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein) of our approach. Numerical experiments across a variety of chaotic systems, including ones with high-dimensional spatiotemporal chaos, show that emulators trained using our proposed objectives have significantly improved long-term statistical fidelity.

2604.03146 2026-06-19 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China Huawei Noah's Ark Lab, Huawei Technologies, Paris, France School of Electronic Information Communications, Huazhong University of Science \& Technology, China

AI总结 通过将凸高斯极小极大定理推广到非高斯数据,刻画了高维经验风险最小化估计量的渐近分布,揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

Journal ref ICML 2026

详情
AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化(ERM)。通过启发式地将凸高斯极小极大定理(CGMT)扩展到非高斯设置,我们推导出关键统计量的渐近极小极大表征,从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地,在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下,我们证明:对于独立于训练数据的测试协变量 $x$,投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量(方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$)的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外,我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟,以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

2603.10184 2026-06-19 stat.ML cs.LG 版本更新

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

使用正则化稳定赌博机:精确遗憾与定量中心极限定理

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Samya Praharaj, Koulik Khamaru

发表机构 * Department of Statistics, Rutgers University(罗切斯特大学统计系) Indian Statistical Institute, Kolkata(加尔各答印度统计研究所)

AI总结 本文提出一种精细的稳定性条件,证明正则化随机镜像下降算法满足该条件,并推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界、匹配的遗憾上下界,以及抗腐败下的渐近正态性,同时揭示正则化是有效推断的必要代价。

Comments Updated rate of convergence and precise regret in version 2

详情
AI中文摘要

由于自适应采样违反了经典渐近理论中的独立性假设,使用赌博机数据进行统计推断面临根本性挑战。近期工作将稳定性~\citep{laiwei82} 确定为自适应下有效推断的充分条件。本文首先提出一个精细的稳定性条件,以在线算法的迭代形式表述,并证明一大类正则化随机镜像下降算法满足该条件。这一精细条件使我们能够在多个方面加强~\citet{laiwei82} 的渐近结果。首先,我们推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界。其次,我们推导出所提算法遗憾的匹配非渐近上下界,从而精确刻画其遗憾。第三,我们证明这些正则化算法在给定水平的对抗性腐败下保持渐近正态性和有效推断。最后,我们表明正则化是必要的而非偶然的:Lai-Wei稳定性与最优的$O(\sqrt{T})$遗憾率(如EXP3等非正则化算法所达到的)不相容,因此受控的多对数级遗憾膨胀是有效推断的代价。

英文摘要

Statistical inference with bandit data presents fundamental challenges owing to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability~\citep{laiwei82} as a sufficient condition for valid inference under adaptivity. This paper first provides a refined stability condition, stated in terms of the iterates of an online algorithm, and shows that a large class of regularized stochastic-mirror-descent-style algorithms satisfy it. This refined condition allows us to strengthen the asymptotic results of~\citet{laiwei82} in several ways. First, we derive a non-asymptotic Berry--Esseen bound for the empirical reward estimates under adaptive sampling. Second, we derive matching non-asymptotic upper and lower bounds on the regret of the proposed algorithm, yielding a precise characterization of its regret. Third, we show that these regularized algorithms preserve asymptotic normality and valid inference under a prescribed level of adversarial corruption. Finally, we show that regularization is necessary rather than incidental: Lai--Wei stability is incompatible with the optimal $O(\sqrt{T})$ regret rate -- the rate attained by unregularized algorithms such as EXP3 -- so that a controlled, polylogarithmic inflation in regret is the price of valid inference.

2601.14430 2026-06-19 stat.ML cs.LG 版本更新

Meta Flow Maps enable scalable reward alignment

元流映射实现可扩展的奖励对齐

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, Yee Whye Teh

发表机构 * University of Oxford(牛津大学) Harvard University(哈佛大学) Kempner Institute(凯普纳研究所)

AI总结 提出元流映射(MFMs)框架,通过可微分的单步后验采样实现高效价值函数估计,从而无需轨迹模拟即可进行推理时引导和离策略微调,显著降低计算成本。

详情
AI中文摘要

控制生成模型在计算上是昂贵的。这是因为与奖励函数的最优对齐——无论是通过推理时引导还是微调——都需要估计价值函数。这一任务需要访问条件后验 $p_{1|t}(x_1|x_t)$,即与中间状态 $x_t$ 一致的干净数据 $x_1$ 的分布,这一要求通常迫使方法诉诸昂贵的轨迹模拟。为了解决这一瓶颈,我们引入了元流映射(MFMs),这是一个将一致性模型和流映射扩展到随机机制的框架。MFMs 被训练为执行随机单步后验采样,从任意中间状态生成任意多个独立同分布的干净数据 $x_1$ 样本。关键在于,这些样本提供了一个可微分的重参数化,从而解锁了高效的价值函数估计。我们利用这一能力解决了两种范式中的瓶颈:实现无需内部展开的推理时引导,并促进对一般奖励的无偏、离策略微调。实验上,我们的单粒子引导 MFM 采样器在 ImageNet 上以极少的计算量在多个奖励上优于 Best-of-1000 基线。

英文摘要

Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate state $x_t$, a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and flow maps into the stochastic regime. MFMs are trained to perform stochastic one-step posterior sampling, generating arbitrarily many i.i.d. draws of clean data $x_1$ from any intermediate state. Crucially, these samples provide a differentiable reparametrization that unlocks efficient value function estimation. We leverage this capability to solve bottlenecks in both paradigms: enabling inference-time steering without inner rollouts, and facilitating unbiased, off-policy fine-tuning to general rewards. Empirically, our single-particle steered-MFM sampler outperforms a Best-of-1000 baseline on ImageNet across multiple rewards at a fraction of the compute.

2509.15822 2026-06-19 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

具有多于 $\sqrt{n}$ 个社区的随机块模型的相变

Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

发表机构 * Institut für Mathematik – Universität Potsdam, Potsdam, Germany(波恩大学数学研究所,德国波恩) Laboratoire de Mathématiques d’Orsay, Université Paris-Saclay, CNRS, France(奥赛数学实验室,巴黎-萨克雷大学,法国 CNRS) INRAE, Institut Agro, MISTEA, Univ. Montpellier, France(国家农业研究院,蒙彼利埃大学,法国)

AI总结 本文证明在随机块模型中,当社区数 $K\geq \sqrt{n}$ 时,低度多项式在 Chin 等人提出的阈值以下无法恢复社区,而通过计数特定子图可在多项式时间内实现恢复,支持了新相变阈值的猜想。

详情
AI中文摘要

统计物理的预测表明,在随机块模型(SBM)中,当社区数 $K$ 固定时,社区恢复在 Kesten-Stigum (KS) 阈值以上(且仅在其以上)可以在多项式时间内实现。这一猜想催生了丰富的文献,证明在 KS 阈值以上的 SBM 中,非平凡社区恢复确实是可能的。只要 $K\ll \sqrt{n}$(其中 $n$ 是观测图中的节点数),KS 阈值以下低度多项式(LDP)的失败也被证明。当 $K\geq \sqrt{n}$ 时,Chin 等人(2025)最近证明,在稀疏机制中,通过计数非回溯路径,可以在 KS 阈值以下的多项式时间内实现社区恢复。这一突破使他们提出了多社区机制 $K\geq \sqrt{n}$ 的新阈值。在这项工作中,我们为他们的猜想提供了证据:\n1- 我们证明,对于任意图密度,LDP 无法在 Chin 等人(2025)提出的阈值以下恢复社区;\n2- 我们证明,在所提出的阈值以上,不仅是在 Chin 等人(2025)考虑的稀疏机制中,而且在适度稀疏机制中,通过计数受 LDP 分析启发的某些特定子图,可以在多项式时间内实现社区恢复。\n特别地,计数长度为 $\log(n)$ 的自避路径(这与基于非回溯算子的谱算法密切相关)仅在稀疏机制中是最优的。在更密集的机制中,必须考虑基于循环放大的更复杂子图。

英文摘要

Predictions from statistical physics postulate that recovery of the communities in the Stochastic Block Model (SBM) with a fixed number $K$ of communities is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold. Failure of low-degree polynomials (LDP) below the KS threshold was also proven, as long as $K\ll \sqrt{n}$, where $n$ is the number of nodes in the observed graph. When $K\geq \sqrt{n}$, Chin et al.(2025) recently proved that, in a \emph{sparse regime}, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough led them to postulate a new threshold for the many-communities regime $K\geq \sqrt{n}$. In this work, we provide evidence supporting their conjecture:\\ 1- We prove that, for \emph{any graph density}, LDP fail to recover communities below the threshold postulated by Chin et al.(2025) ;\\ 2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the \emph{sparse regime} considered in Chin et al.~(2025), but also in \emph{moderately sparse regimes}, by counting occurrences of some specific motifs inspired by the LDP analysis.\\ In particular, counting self-avoiding paths of length $\log(n)$, which is closely related to spectral algorithms based on the Non-Backtracking operator, is optimal only in the sparse regime. More complex motifs based on the blow-up of a cycle must be considered in denser regimes.

2104.08928 2026-06-19 stat.ML cs.CL cs.LG 版本更新

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

面向词嵌入迁移学习的组稀疏矩阵分解

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

发表机构 * W. P. Carey School of Business, Arizona State University(亚利桑那州立大学韦伯商学院) University of Pennsylvania(宾夕法尼亚大学) Wharton School, University of Pennsylvania(宾夕法尼亚大学沃顿商学院)

AI总结 提出一种基于组稀疏惩罚的两阶段估计器,通过结合大规模语料和少量领域数据高效迁移学习领域特定的词嵌入,并证明了其泛化误差界和非凸目标函数的局部最优与全局最优统计等价。

详情
AI中文摘要

非结构化文本为许多领域的决策者提供了丰富的数据源,从零售中的产品评论到医疗保健中的护理记录。为了利用这些信息,单词通常通过无监督学习算法(如矩阵分解)转化为词嵌入——编码单词之间语义关系的向量。然而,从训练数据有限的新领域学习词嵌入可能具有挑战性,因为在新领域中含义/用法可能不同,例如,单词“positive”通常具有积极情感,但在医疗记录中通常具有消极情感,因为它可能意味着患者检测出疾病阳性。在实践中,我们预计只有少数领域特定的单词可能具有新含义。我们提出了一种直观的两阶段估计器,通过组稀疏惩罚利用这种结构,通过结合大规模文本语料库(如维基百科)和有限的领域特定文本数据,高效地迁移学习领域特定的词嵌入。我们限定了迁移学习估计器的泛化误差,证明当只有少量嵌入在领域间改变时,它可以用显著更少的领域特定数据实现高精度。此外,我们证明了在标准正则化条件下,由非凸目标函数识别的所有局部最小值与全局最小值在统计上不可区分,这意味着我们的估计器可以高效计算。我们的结果首次给出了组稀疏矩阵分解的界限,这可能具有独立意义。我们通过与自然语言处理中最先进的微调启发式方法进行实证比较来评估我们的方法。

英文摘要

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.

8. 生物统计与医学统计 8 篇

2606.20341 2026-06-19 stat.ME stat.AP 新提交

Anchors Away: Navigating Unanchored Indirect Comparisons with Multilevel Unanchored Meta-Regression (ML-UMR)

锚定之外:使用多层次非锚定元回归(ML-UMR)导航非锚定间接比较

Conor Chandler, Jack Ishak

AI总结 针对随机证据缺失时的非锚定治疗比较,提出多层次非锚定元回归(ML-UMR),通过贝叶斯框架联合建模个体与汇总数据,估计多治疗、多研究及目标人群的边际和条件效应,并明确识别假设与可转移性假设。

Comments 20 pages (excluding supplementary material), 5 figures

详情
AI中文摘要

当随机证据不可用时,使用单臂研究或断开证据的非锚定间接治疗比较越来越多地用于卫生技术评估(HTA)。现有方法,包括匹配调整间接比较(MAIC)和模拟治疗比较(STC),通常局限于成对设置,并且通常估计比较研究人群中的边际效应,这可能与决策相关人群不同。我们提出多层次非锚定元回归(ML-UMR),一种用于综合来自完全断开证据的个体患者数据和汇总数据的贝叶斯回归框架。ML-UMR通过在一个统一似然中联合建模个体水平和汇总水平数据,将多层次网络元回归(ML-NMR)扩展到非锚定设置,从而能够估计跨多个治疗、研究和目标人群的治疗特异性结果以及边际和条件效应。ML-UMR区分了识别治疗效应所需的假设与将结果转移到目标人群所需的假设。与所有非锚定比较一样,有效推断依赖于强且通常不可验证的假设,包括条件可交换性、结果模型的正确设定以及跨治疗假设(例如,共享预后因素假设(SPFA))。ML-UMR并未减轻这些要求,而是在统一框架内使其明确,并促进敏感性分析。在模拟研究中,ML-UMR对比较人群效应产生了低偏差和名义覆盖。向其他人群的可转移性关键取决于识别假设:在强效应修饰下,违反SPFA导致偏差,而纳入亚组信息则恢复了近乎无偏的估计和名义覆盖。

英文摘要

Unanchored indirect treatment comparisons using single-arm studies or disconnected evidence are increasingly used in health technology assessment (HTA) when randomized evidence is unavailable. Existing methods, including matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC), are generally limited to pairwise settings and typically estimate marginal effects in the comparator study population, which may differ from the decision-relevant population. We propose multilevel unanchored meta-regression (ML-UMR), a Bayesian regression framework for synthesizing individual patient data and aggregate data from fully disconnected evidence. ML-UMR extends multilevel network meta-regression (ML-NMR) to unanchored settings by jointly modeling individual- and aggregate-level data within a unified likelihood, enabling estimation of treatment-specific outcomes and both marginal and conditional effects across multiple treatments, studies, and target populations. ML-UMR distinguishes assumptions required to identify treatment effects from those required to transport results to target populations. As with all unanchored comparisons, valid inference relies on strong and often unverifiable assumptions, including conditional exchangeability, correct specification of the outcome model, and cross-treatment assumptions (e.g., shared prognostic factor assumption (SPFA)). ML-UMR does not lessen these requirements but makes them explicit within a unified framework and facilitates sensitivity analyses. In simulation studies, ML-UMR produced low bias and nominal coverage for comparator-population effects. Transportability to alternative populations depended critically on identifying assumptions: violations of SPFA led to bias under strong effect modification, whereas incorporating subgroup information restored near-unbiased estimation and nominal coverage.

2606.19982 2026-06-19 stat.ME 新提交

Built-in Selection Bias in Proportional Hazards Models with Omitted Covariates: Simulation Evidence and Alternative Approaches

省略协变量的比例风险模型中的内置选择偏倚:模拟证据与替代方法

Ayoub Bifenzi, Helene Jacqmin-Gadda

AI总结 本文通过模拟和实际数据,证明在随机试验中,即使省略的协变量与处理独立,仍会导致Cox比例风险模型估计的处理风险比存在偏倚,并比较了脆弱模型、加速失效时间模型和Kaplan-Meier曲线等替代方法的稳健性。

详情
AI中文摘要

在时间-事件分析中,来自Cox比例风险(PH)模型的风险比(HR)是评估治疗效果最常用且广泛报告的指标。然而,由于风险比固有地依赖于每个时间点的生存条件,它们具有非可压缩性。因此,当存在因省略重要协变量导致的未测量异质性时,即使这些协变量在基线时与主要暴露独立(如随机对照试验中),风险比也会受到内置选择偏倚的影响。本文旨在概述文献中关于未观测异质性(由影响结局的省略协变量引起)如何在标准比例风险模型中偏倚治疗风险比估计的关键发现,即使在处理分配独立于这些协变量的随机试验中也是如此。通过模拟,我们评估了半参数Cox PH模型和参数PH模型在各种未测量异质性场景下的偏倚程度。然后,我们将这些标准模型与替代方法进行比较,这些方法要么解决了这一问题,要么被认为对此具有稳健性。这些替代方法包括来自脆弱模型的风险比、来自加速失效时间(AFT)模型的回归参数,以及使用Kaplan-Meier曲线非参数估计或基于具有时变暴露效应的Cox模型估计的治疗组间生存差异。我们通过一个来自放射治疗肿瘤学组(RTOG 9202)的随机对照试验的实际数据应用,说明了所探索替代方法的实际相关性。

英文摘要

In time-to-event analysis, the hazard ratio (HR) derived from the Cox proportional hazards (PH) model is the most commonly used and widely reported measure for assessing treatment effects. However, hazard ratios are non-collapsible due to their inherent conditioning on survival up to each time point. As a result, they are subject to built-in selection bias in the presence of unmeasured heterogeneity arising from omitted important covariates, even when these covariates are independent of the main exposure at baseline, as is the case in randomized controlled trials. This article aims to provide an overview of key findings from the literature on how unobserved heterogeneity, due to omitted covariates that affect the outcome, can bias the estimation of the treatment hazard ratio in standard proportional hazards models, even in randomized trials where treatment is assigned independently of such covariates. Through simulations, we evaluate the extent of bias in the semi-parametric Cox PH model and parametric PH model under various scenarios of unmeasured heterogeneity. We then compare these standard models to alternative approaches that either account for this issue or are considered robust to it. These alternatives include the hazard ratio estimated from frailty models, regression parameters from an Accelerated Failure Time (AFT) model, and survival differences between treatment groups estimated nonparametrically using Kaplan-Meier curves or based on a Cox model with time-dependent effect of the exposure. We illustrate the practical relevance of the explored alternatives through a real data application to a randomized controlled trial from the Radiation Therapy Oncology Group (RTOG 9202).

2606.19892 2026-06-19 stat.ME 新提交

The Ghosh-Lin and Fine-Gray models for a mix of administrative and random censoring

混合行政删失与随机删失下的Ghosh-Lin和Fine-Gray模型

Thomas H. Scheike, Christian Mirian, Isao Yokota, Giuliana Cortese

AI总结 针对同时存在行政删失和随机删失的数据,提出结合风险集调整和逆概率删失加权的方法,使Ghosh-Lin和Fine-Gray模型得到一致估计。

详情
AI中文摘要

复发事件或竞争风险回归模型通常应用于生物医学领域,两者都可视为边际模型。在存在右删失的情况下,需要调整这些模型以获得一致估计量。当删失是行政性时,边际回归模型特别容易估计。然而,当删失是随机作用时,通常考虑逆概率删失加权(IPCW)调整来获得参数估计。该技术通过正确的删失模型进行删失权重调整,但对于行政删失,只需修改风险集即可正确调整。在实践中,对于大型中央登记处或某些临床试验,所有受试者的行政删失时间已知,但通常也会有一定比例的受试者被随机删失。在这项工作中,我们考虑两种常用的回归方法:用于带有终止事件的复发事件的Ghosh-Lin模型和用于竞争事件的Fine-Gray模型。对于这两种情况,当同时存在行政删失和随机删失时,我们展示了如何通过处理这两种不同类型删失的组合,在最小化建模假设的基础上获得正确估计。

英文摘要

Recurrent events or competing risks regression models are often applied in the bio-medical setting and both can be considered as marginal models. In presence of right-censoring, such models need to be adjusted to give consistent estimators. When censoring is administrative, marginal regression models are particularly easy to estimate. However, when censoring is instead acting randomly, inverse probability of censoring weighting (IPCW) adjustments are typically considered to obtain parameter estimates. This technique relies on a censoring-weights adjustment via a correct censoring model, but for administrative censoring the adjustment is done correctly simply by modifying the risk-set. In practice for large central registries or some clinical trials, the administrative censoring time will be known for all subjects, but there will typically also be a proportion of subjects that are censored at random. In this work, we consider two frequently used regression approaches, the Ghosh-Lin model for recurrent events with terminal events and the Fine-Gray model for competing events. For these two settings, when both administrative and random censoring are present, we demonstrate how to obtain correct estimation by dealing with the combination of the two different types of censoring relying on a minimum of modeling assumptions.

2606.19760 2026-06-19 stat.AP 新提交

Covariate-Adjusted Functional Principal Components Analysis for Modeling Hazard Rates of Physical Activity in the US Population

协变量调整的功能主成分分析用于建模美国人口体力活动的风险率

Md Rokibul Hasan, Pratim Guha Niyogi

AI总结 提出基于风险函数的分布分析方法,利用功能主成分分析(FPCA)从腕部加速度计数据中刻画个体活动强度分布变异,优于均值摘要。

详情
AI中文摘要

体力活动在人类健康中起着至关重要的作用。其整体分布因人而异。常用的汇总指标无法描述这种分布模式。我们提出了一种基于分布的分析方法,通过从腕部加速度计数据中导出的风险函数来建模个体活动强度模式,从而描述体力活动。我们分析了2011-2012年国家健康与营养调查(NHANES)中4297名连续佩戴设备7天的成年人的分钟级独立于监测器的运动摘要(MIMS)数据。我们使用基于生存的方法为每个个体在共同强度网格上导出了非参数活动强度风险,将MIMS的风险曲线及其对数变换后的MIMS都视为功能对象。我们在MIMS的两个尺度上使用功能主成分分析(FPCA)来表征活动强度分布的主要变异模式。组均值风险函数在低强度水平上差异很小,而在高强度水平上我们观察到显著差异。我们的结果表明,基于风险的功能表示方法能够捕捉个体间体力活动强度分布的差异,提供了一种灵活且可解释的方式来表征异质性。该方法优于基于均值的摘要,并支持对人口亚组之间体力活动模式进行有原则的比较。

英文摘要

Physical activity plays a vital role in human health. Its entire distribution differs among people. Commonly used summary measures cannot describe this distributional pattern. We present a distribution-based analytical approach to describe physical activity by modeling individual-level activity-intensity patterns through hazard functions derived from wrist-worn accelerometer data. We analyzed minute-level Monitor-Independent Movement Summary (MIMS) data of 4297 adults with seven continuous days of device wear from the 2011- 2012 National Health and Nutrition Examination Survey (NHANES). We derived a nonparametric activity-intensity hazard using a survival-based approach for each individual on a common intensity grid, treating both the hazard curves from MIMS and their log-transformed MIMS as functional objects. We used functional principal component analysis (FPCA) on both scales of MIMS to characterize dominant modes of variation in activity-intensity distributions. Group-wise mean hazard functions showed little difference at lower intensity levels, while we observed a substantial difference at higher intensity levels. Our results demonstrate that hazard-based functional representations for capturing differences in physical activity intensity distributions across individuals offer a flexible and interpretable way to characterize heterogeneity. This approach works better than mean-based summaries and supports principled comparisons of physical activity patterns across population subgroups.

2606.19743 2026-06-19 stat.ME stat.AP 新提交

A Bayesian spatio-temporal nearest neighbor Gaussian process model for pooled genetic data

一种用于汇总遗传数据的贝叶斯时空最近邻高斯过程模型

Imke Botha, Tianxiao Hao, Lucinda E. Harrison, Nick Golding, Daniel J. Weiss, Jennifer A. Flegg

AI总结 提出最近邻高斯过程模型,结合序贯蒙特卡洛平方算法,高效推断汇总遗传数据中的单倍型频率,并应用于非洲抗疟药物耐药性遗传数据分析。

详情
AI中文摘要

大规模遗传数据集通常汇总不同遗传标记的总等位基因计数。从这些汇总数据中推断单倍型频率(即多标记等位基因的频率)是一个挑战。由于计算成本,先前在此背景下的时空建模仅限于3个标记。在这项工作中,我们提出了一种最近邻高斯过程(NNGP)模型,以改善随标记和观测数量扩展的规模。为了推断模型参数,我们开发了一种新颖的序贯蒙特卡洛平方算法,该算法使用带有祖先抽样的粒子吉布斯来变异NNGP函数值。后者在观测数量和NNGP数量上具有线性成本,并可应用于广泛的NNGP模型。作为案例研究,我们分析了与非洲抗疟药物耐药性相关的遗传数据,并在3和6个遗传标记数据集上实证展示了我们的扩展结果。

英文摘要

Large scale genetic datasets often aggregate the total allele counts of distinct genetic markers. Inferring haplotype frequencies (i.e.\ the frequency of multimarker alleles) from these pooled data is a challenge. Previous spatio-temporal modelling in this context has been limited to 3 markers due to the computational cost. In this work, we propose a nearest neighbor Gaussian process (NNGP) model to improve scaling with the number of markers and observations. To infer the parameters of our model, we develop a novel sequential Monte Carlo squared algorithm, which uses particle Gibbs with ancestor sampling to mutate the NNGP function values. The latter has a linear cost in the number of observations and the number of NNGPs, and can be applied to a broad range of NNGP models. As a case study, we analyse genetic data relating to antimalarial drug resistance in Africa, and show our scaling results empirically on a 3 and 6 genetic marker dataset.

2606.20489 2026-06-19 q-bio.PE nlin.CG physics.bio-ph stat.AP 新提交

West Nile virus outbreak in Italy modelled with the quantum Game of Life

意大利西尼罗病毒疫情用量子生命游戏建模

Andrea Fontana, Simone Tambascia, Ciro Di Carluccio, Andrea Esposito, Bernardo Spagnolo, Andrea M. Chiariello

AI总结 使用量子生命游戏细胞自动机模型模拟2025年夏季意大利西尼罗病毒传播,通过优化蚊子出生和移除率,准确拟合局部和区域平均累计感染曲线,并评估环境变化的影响。

详情
AI中文摘要

近年来,意大利观察到西尼罗病毒(WNV)异常高传播,特别是在拉齐奥南部、坎帕尼亚和威尼托地区感染高峰显著。WNV的主要病媒是库蚊,通过叮咬传播人类感染。本文通过基于量子版本的生命游戏(GOL)细胞自动机模型的计算方法,研究2025年夏季意大利西尼罗热疫情的扩散。具体而言,人类动力学根据GOL规则演化,而病媒(即蚊子)的随机动力学及其与人类的相互作用同时发生。我们表明,该模型在局部和平均区域水平上以高精度拟合累计感染个体曲线,仅需优化蚊子出生率和移除率参数。此外,利用模型的灵活性,我们表明模型参数值的变化阐明了系统对环境变化的响应。例如,我们量化了蚊子传播控制措施或由于气候和生态变化导致的蚊子突然增加的影响。总体而言,我们提供了意大利WNV感染传播的一般定量描述,可作为测试不同环境情景的支持工具,并有助于决策者制定监测病媒动力学和控制病毒传播的策略。

英文摘要

In the last years, an anomalously high spreading of West Nile virus (WNV) has been observed in Italy, with particularly high peaks of infections in southern Lazio, Campania and Veneto regions. The main disease vector for WNV is represented by Culex pipiens mosquitoes, which spread human infections through their bites. Here, we investigate WNV fever epidemic diffusion during summer season 2025 in Italy through a computational approach based on a quantum version of the Game of Life (GOL) cellular automaton model. Specifically, human dynamics evolves according to the GOL rules, while stochastic dynamics of disease vectors, i.e., mosquitoes, as well as their interaction with humans, simultaneously occur. We show that this model fits the curves of cumulative infected individuals with high accuracy, either at local and average-regional level, with only optimization of mosquito birth and removal rates parameters. Furthermore, leveraging model flexibility, we show that changes in model parameters values elucidate system response to environmental variations. For instance, we quantify, e.g., the impact of mosquito spreading containment measures or sudden mosquito increasing abundance due to climatic and ecological changes. Overall, we provide a general, quantitative description of WNV infection spreading in Italy which could represent a supportive tool to test different environmental scenarios and could be useful to devise strategies for decision makers to monitor disease vector dynamics and to control consequent virus diffusion.

2606.19041 2026-06-19 stat.ME 新提交

Efficient Cumulative Incidence Estimation in Biobank Studies Using All Prevalent and Incident Events

利用所有现患和发病事件在生物库研究中进行高效累积发病率估计

David M. Zucker, Malka Gorfine

AI总结 针对生物库数据中同时包含招募前发病(现患)和随访期间发病的个体,提出一种新的累积发病率函数估计方法,整合所有病例,处理年轻发病且生存期长的疾病,理论证明渐近性质,模拟和UK生物库癌症数据验证其优势。

详情
AI中文摘要

基于人群的生物库已在许多国家建立,为大规模研究各种疾病的发病率提供了机会。生物库数据通常是在特定日历期内招募的研究队列中收集的,受试者在年龄介于$R_L$和$R_U$之间时进入研究。本研究关注包含两类个体的生物库数据:在招募前已发生目标疾病(称为现患病例)的个体,以及最初招募时无病但在随访期间发病的个体。我们提出一种新的累积发病率函数(CIF)估计量,它超越了现有方法,因为它整合了所有疾病病例,无论是现患还是发病,无论其后续生命历程如何。特别是,新方法可以处理涉及在年轻年龄发生且发病后生存期长的疾病的情况。建立了新方法的渐近性质,并进行了模拟研究以检验该方法的性能。我们通过将方法应用于英国生物库的癌症数据,说明了该方法的使用,并强调了其相对于现有方法的优势。

英文摘要

Population-based biobanks, now established in many countries, offer opportunities for large-scale studies investigating the incidence of various diseases. Biobank data is typically collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $R_L$ and $R_U$. This work focuses on biobank data that includes individuals in whom onset of the disease of interest occurred before recruitment, termed prevalent cases, along with individuals initially recruited as disease-free in whom disease onset occurred during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that goes beyond existing methods in that it incorporates all disease cases, both prevalent and incident, irrespective of their subsequent life course. In particular, the new method can handle situations involving diseases that can occur at young ages with long survival after disease onset. Asymptotic properties of the new method are established and a simulation study is presented examining the performance of the method. We illustrate the use of the method and highlight its advantages over existing methods with an application to cancer data from the UK biobank.

2406.01557 2026-06-19 stat.ME stat.AP 版本更新

Flexible aggregation of compositional predictors with shared effects for microbiome association analysis

共享效应组合预测因子的灵活聚合用于微生物组关联分析

Satabdi Saha, Liangliang Zhang, Michele Guindani, Kim-Anh Do, Christine B. Peterson

AI总结 提出BRACE方法,通过尖峰-聚类先验和投影约束高斯先验,实现微生物组数据的自适应聚类和变量选择,识别与结果共享效应的关键特征。

详情
AI中文摘要

微生物组分析的最新进展为微生物群落的分子动态提供了前所未有的见解,激发了揭示微生物组在人类健康中关键作用的兴趣。然而,由于微生物组数据的高维、稀疏和组成性,识别与临床结果相关的微生物特征仍然具有挑战性。此外,许多微生物分类群虽然被分类为不同的,但可能共享功能角色,使传统的变量选择方法复杂化。为了克服这些障碍,我们引入了具有聚合组成效应的贝叶斯回归(BRACE),这是一种新方法,使用结合伯努利活动指标的尖峰-聚类先验、有限活动集上的Ewens可交换分割先验以及聚类效应上的投影约束高斯先验,进行数据自适应聚类和变量选择。我们工作的方法论创新在于如何将Ewens分割先验与聚类原子上的投影约束高斯相结合,以强制执行总和为零的约束。BRACE将具有相似效应的微生物分类群分组,产生更可解释的模型,同时实现有效的降维。通过综合模拟和一项检查口腔微生物组组成对胰岛素抵抗影响的真实应用,我们证明了BRACE在识别具有共享效应的关键特征方面优于现有方法。

英文摘要

Ongoing advancements in microbiome profiling have provided unprecedented insights into the molecular dynamics of microbial communities, sparking a surge of interest in uncovering the microbiome's critical role in human health. Identifying microbial features linked to clinical outcomes, however, remains challenging due to the high-dimensional, sparse, and compositional nature of microbiome data. Additionally, many microbial taxa, although classified as distinct, may share functional roles, complicating traditional variable selection methods. To overcome these obstacles, we introduce Bayesian Regression with Agglomerated Compositional Effects (BRACE), a novel approach using a spike-and-cluster prior combining Bernoulli activity indicators, an Ewens exchangeable partition prior on the finite active set, and a projection-based constrained Gaussian prior on cluster effects to perform data-adaptive clustering and variable selection. The methodological innovation of our work lies in how we combine the Ewens partition prior with a projection-based constrained Gaussian on the cluster atoms to enforce the sum-to-zero constraint. BRACE groups microbial taxa with similar effects on the outcome, yielding more interpretable models while enabling effective dimension reduction. Through comprehensive simulations and a real-world application examining the influence of oral microbiome composition on insulin resistance, we demonstrate BRACE's superior performance over existing methods, particularly in identifying key features with shared effects on outcomes.

9. 经济金融与社会科学统计 9 篇

2606.20240 2026-06-19 econ.EM stat.AP 新提交

Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

两样本IV:高效两步估计及过度识别与弱工具变量检验

Fatima Kasenally, Ruoxi Guan, Frank Windmeijer

AI总结 针对两样本IV估计,提出异方差和样本异质性下稳健的两步高效估计方法及过度识别检验,仅需线性回归的汇总统计量,并扩展弱工具变量检验。

详情
AI中文摘要

两样本IV是一种流行的估计方法,当结果变量和处理变量在不同样本中可用,而工具变量在两个样本中都可用时。标准估计量是两样本两阶段最小二乘估计量,在同方差和样本同质性下是有效的。我们开发了一个稳健的两步程序,用于在一般异方差和样本异质性下进行有效估计,并提出了相关的两样本Hansen过度识别检验。我们方法的一个关键特征是只需要两个样本中简化形式和第一阶段的线性回归的汇总统计量。这些是估计系数向量的六个对象,以及同方差和异方差稳健的估计方差矩阵。我们进一步表明,在同方差和同质性下,处理样本中的第一阶段F统计量可以按标准方式用作弱工具变量检验,这里的相对偏差是比例偏差。我们提出了Montiel-Olea和Pflueger (2013)的有效F统计量的扩展,用于异方差情况,遵循Windmeijer (2025)的推广。我们在Marshall (2019)研究教育对投票行为影响的应用中说明了估计量和检验,并进行了聚类稳健推断。

英文摘要

Two-sample IV is a popular estimation method when the outcome and treatment variables are available in different samples, whereas instruments are available in both samples. The standard estimator is two-sample two-stage least squares estimator, which is efficient under homoskedasticity and homogeneity of the samples. We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias here a proportional bias. We propose an extension of the effective F-statistic of Montiel-Olea and Pflueger (2013) for the heteroskedastic case, following the generalization in Windmeijer (2025). We illustrate the estimators and tests in an application studying the effect of education on voting behavior from Marshall (2019), with cluster robust inference.

2606.20420 2026-06-19 q-fin.CP stat.AP 新提交

Advanced Calibration Analysis and Tools: Identifying Influential Observations in Stochastic Interest Rate Model Calibration

高级校准分析与工具:识别随机利率模型校准中的有影响观测值

Philipp Mahler, Peter Ruckdeschel

AI总结 将校准问题嵌入非线性回归理论,证明最小化RMSRE等价于加权最小二乘,开发诊断框架(加权帽子矩阵、影响函数、泛函Delta方法),实证发现杠杆边界主导、有效维度损失及2022年后参数稳定性转变,指出低RMSRE不足以验证校准。

Comments 47 pages, 9 figures, 1 table

详情
AI中文摘要

利率模型的准确校准对于市场一致性估值和经济情景生成器(ESGs)至关重要。多因子模型(如G2++模型)的传统校准方法通常依赖于点估计,忽略了特定市场数据的影响和估计不确定性的量化。本文开发了一个诊断框架,将校准问题嵌入非线性回归理论。研究表明,行业常见的均方根相对误差(RMSRE)最小化等价于加权最小二乘(WLS)问题。这一等价关系导出了诊断工具的相应公式,包括用于杠杆分析的加权帽子矩阵、用于局部敏感性诊断的影响函数,以及用于局部、边界置信区间的泛函Delta方法。实现中采用了高效的雅可比矩阵分解,利用了平价(ATM)上限的解析可处理性。该框架应用于2016-2025年期间的欧元ATM上限数据集。我们的实证分析揭示了边界主导的杠杆分布、由于参数约束活跃导致的重复有效维度损失,以及2022年后市场转型中局部参数稳定性的诊断机制转变。对精算模型治理的启示是:低RMSRE不足以验证校准。最后,我们讨论了该框架对一般最小二乘问题的适用性,同时指出了对于缺乏闭式梯度的工具(如互换期权)的计算挑战。

英文摘要

The accurate calibration of interest rate models is central to market-consistent valuation and Economic Scenario Generators (ESGs). Traditional calibration methods for multi-factor models such as the G2++ model often rely on point estimates, neglecting the influence of specific market data and the quantification of estimation uncertainty. This paper develops a diagnostic framework embedding the calibration problem into non-linear regression theory. It shows that the common industry practice of minimizing the Root Mean Squared Relative Error (RMSRE) is equivalent to a Weighted Least Squares (WLS) problem. This equivalence yields the corresponding formulations for diagnostic tools, including the Weighted Hat Matrix for leverage analysis, Influence Functions for local sensitivity diagnostics, and the Functional Delta Method for local, boundary-respecting confidence intervals. The implementation uses an efficient Jacobian factorization that exploits the analytical tractability of At-The-Money (ATM) caps. The framework is applied to a dataset of Euro ATM caps covering the period 2016--2025. Our empirical analysis reveals a boundary-dominated leverage profile, repeated losses of effective dimensionality due to active parameter constraints, and a diagnostic regime shift in local parameter stability around the post-2022 market transition. The resulting message for actuarial model governance is that low RMSRE is not sufficient for calibration validation. We conclude by discussing the framework's applicability to general least-squares problems while highlighting the computational challenges for instruments lacking closed-form gradients, such as swaptions.

2606.19789 2026-06-19 math.OC stat.ME 新提交

Dynamic Core Allocation for Malleable Jobs with Unknown Speed-up Parameters

具有未知加速参数的可变作业的动态核心分配

S. ~A. Bodas, J. ~L. Dorsman, M. Mandjes, L. Ravner

AI总结 针对多核系统中具有未知加速参数的可变作业,提出一种迭代学习-控制框架,通过最大似然估计未知参数并求解马尔可夫决策过程更新分配策略,以最小化长期平均作业数。

详情
AI中文摘要

我们研究了具有固定数量处理核心和可变形作业流的多核计算系统中的动态资源分配问题。每个作业可以在执行期间调整其并行度,从而允许在并发活动作业之间自适应地重新分配资源。作业属于两个可观测类别之一,每个类别由具有未知参数的独特加速函数表征。目标是学习一种核心分配策略,以最小化系统中长期平均作业数,即稳态下的平均响应时间。为了解决这种不确定性,我们开发了一个迭代学习与控制框架。系统在根据观察到的作业完成情况估计未知加速参数和求解相关马尔可夫决策过程以更新分配策略之间交替。在每个作业类别内,核心在活动作业之间平均共享;分配给每个类别的容量比例来自文献[17]的MDP公式,并在当前参数估计下进行评估。我们基于状态相关的离开时间构建了最大似然估计器,并证明了在固定分配策略下其强一致性。我们进一步提出了两种学习算法,将该估计步骤与基于动态规划的策略更新相结合,并通过数值实验说明了它们的性能。

英文摘要

We study dynamic resource allocation in a multicore computing system with a fixed number of processing cores and a stream of {\it malleable} jobs. Each job may adjust its level of parallelism during execution, allowing adaptive redistribution of resources across concurrently active jobs. Jobs belong to one of two observable classes, each characterized by a distinct speed-up function with unknown parameters. The objective is to learn a core-allocation policy that minimizes the long-run mean number of jobs in the system, equivalently the mean response time in steady state. \noindent To address this uncertainty, we develop an iterative learning-and-control framework. The system alternates between estimating the unknown speed-up parameters from observed job completions and solving the associated Markov decision process (MDP) to update the allocation policy. Within each job class, cores are shared equally among active jobs; the fraction of capacity assigned to each class is obtained from the MDP formulation of \cite{berg2017}, evaluated at the current parameter estimates. We construct a maximum likelihood estimator based on state-dependent inter-departure times and prove its strong consistency under a fixed allocation policy. We further propose two learning algorithms that combine this estimation step with dynamic programming-based policy updates, and illustrate their through numerical experiments.

2605.15896 2026-06-19 stat.ME stat.AP 版本更新

A Model-Agnostic Bootstrap for Macro-Level Claims Reserving Under the Conditioning Principle

基于条件原理的宏观层面赔款准备金模型无关自助法

Robin Van Oirbeek, Tim Verdonck

AI总结 本文提出一种满足条件原理的自助法,用于宏观层面赔款准备金估计,通过Dirichlet-Gamma层次结构实现精确校准,改进了现有自助法的覆盖误差问题。

Comments 23 pages, v2: correction of the interpretation of the $κ$ parameter

详情
AI中文摘要

正确的推断对象是条件预测分布p(R|D,θ̂),其中D是观察到的三角形保持固定。我们称之为条件原理。所有现有自助法违反这一原理,通过在预测循环中对D的函数进行重采样,产生O(1)的覆盖误差,随着三角形增大不消失。Dirichlet-Gamma层次结构允许一种满足该原理的自助法:S^{IBNP}_i = X^{obs}_i (1-W_i)/W_i,其中W_i ~ Beta(cF_{I-i}, c(1-F_{I-i}))直接从其预测分布中采样。仅模拟分配比例W_i;观察到的三角形保持固定。因此继承了任何开发比例方法(链式梯度、Bornhuetter-Ferguson、Cape Cod或其他)的校准,使其模型无关。覆盖缺陷为O(I^{-1/2}),与开发时期数量无关。在复合泊松数据生成过程中,该自助法对于每个F_{I-i} ∈ (0,1)是保守的:预测标准差分析上超过真实值的因子为1/√F_{I-i}。ODP自助法通过两种相反方向的机制违反该原理:重新估计在ODP DGP下膨胀自助方差,而缺失事故年脆弱性在脆弱性DGP下缩小它。结果覆盖差异为Ω(1),无论I如何,为Meyers(2015)文档的跨投资组合误校准异质性提供了结构解释。链式梯度、Bornhuetter-Ferguson和Cape Cod在稀疏、信息丰富和池化先验下分别作为可信度估计量,计数和金额具有相同结构。集中程度c作为诊断:ĉ < 30表明开发非平稳。

英文摘要

The correct inferential object in claims reserving is the conditional predictive distribution $p(R \mid \mathcal{D}, \hatθ)$, where $\mathcal{D}$ is the observed triangle held fixed. We refer to this as the conditioning principle. All existing bootstraps violate it by resampling functions of $\mathcal{D}$ inside the predictive loop, producing an $O(1)$ coverage error that does not vanish as the triangle grows. The Dirichlet-Gamma hierarchy admits a bootstrap that satisfies the principle exactly: $S^{IBNP}_i = X^{obs}_i (1-W_i)/W_i$ with $W_i \sim \mathrm{Beta}(c\hat{F}_{I-i}, c(1-\hat{F}_{I-i}))$ sampled directly from its predictive distribution. Only the allocation proportion $W_i$ is simulated; the observed triangle is held fixed. It thus inherits calibration from any development-proportion method (Chain-Ladder, Bornhuetter-Ferguson, Cape Cod, or other), making it model-agnostic. The coverage deficit is $O(I^{-1/2})$, independent of the number of development periods. Under compound Poisson data-generating processes the bootstrap is conservative for every $F_{I-i} \in (0,1)$: the predictive standard deviation analytically exceeds the true value by the factor $1/\sqrt{F_{I-i}}$. The ODP bootstrap violates the principle through two mechanisms in opposite directions: re-estimation inflates bootstrap variance under the ODP DGP, while missing accident-year frailty deflates it under frailty DGPs. The resulting coverage discrepancy is $Ω(1)$ regardless of $I$, providing a structural explanation for the cross-portfolio miscalibration heterogeneity documented by Meyers (2015). Chain-Ladder, Bornhuetter-Ferguson and Cape Cod emerge as credibility estimators under diffuse, informative and pooling priors respectively, with identical structure for counts and amounts. The concentration $c$ serves as a diagnostic: $\hat{c} < 30$ signals non-stationary development.

2605.15811 2026-06-19 stat.ME stat.AP 版本更新

The Negative Binomial Chain-Ladder: A Full Likelihood Model for Claim Count Reserving

负二项链梯法:一种完整的似然模型用于赔款准备

Robin Van Oirbeek

AI总结 本文提出负二项链梯模型,通过泊松-伽马构造自然产生负二项分布,提供更清晰的生成解释,统一了链梯方法家族,并通过模拟验证了模型的稳健性。

Comments 35 pages, 3 figures, v2: correction of the interpretation of the $κ$ parameter

详情
AI中文摘要

链梯法仍是非寿险赔款准备的主要宏观技术,但其经典形式缺乏一致的概率基础。现有随机扩展,包括马科模型和过分散泊松(ODP)框架,提供不确定性度量但依赖二阶矩假设或准似然方差结构。本文开发了一种负二项链梯(NB-CL)模型,将链梯方法嵌入完整的似然框架中。关键贡献是微观层面推导,显示负二项分布自然源于泊松-伽马构造:索赔按具有伽马分布年度异质性的泊松过程到达,聚合产生负二项增量计数。此推导赋予分散参数κ结构解释,即年度异质性,而非随意的过分散调整。NB-CL模型在κ→∞极限下推广泊松链梯模型,与ODP模型共享点估计但方差函数不同(二次vs线性),并在单个概率层级内统一链梯家族。开发了参数Bootstrap程序以纳入过程和参数不确定性。模拟研究证实,在正确规范下,当分散参数经过偏差校正后,覆盖率接近名义水平;在模型不规范情况下表现出受控退化。对索赔计数数据(澳大利亚机动车身体伤害)和已付金额(泰勒-阿什)的实证研究证实了κ的结构解读以及在金额情况下的工作近似状态。

英文摘要

The Chain-Ladder (CL) method remains the dominant macro-level technique for claims reserving in non-life insurance, yet its classical formulation lacks a coherent probabilistic foundation. Existing stochastic extensions-including the Mack model and the Over-Dispersed Poisson (ODP) framework-provide measures of uncertainty but rely on second-moment assumptions or quasi-likelihood variance structures without clear generative interpretations. This paper develops a Negative Binomial Chain-Ladder (NB-CL) model that embeds the CL method within a full likelihood-based framework. The key contribution is a micro-level derivation showing that the negative binomial distribution arises naturally from a Poisson-Gamma construction: claims arrive according to a Poisson process with Gamma-distributed accident-year heterogeneity, and aggregation yields negative binomial incremental counts. This derivation gives the dispersion parameter $κ$ a structural interpretation as accident-year heterogeneity, rather than an ad-hoc overdispersion adjustment. The NB-CL model generalises the Poisson Chain-Ladder model in the limit $κ\to \infty$, shares the point estimates of the ODP model while differing in its variance function (quadratic vs. linear), and unifies the Chain-Ladder family within a single probabilistic hierarchy. A parametric bootstrap procedure is developed to incorporate both process and parameter uncertainty. Simulation studies confirm near-nominal coverage under correct specification once the dispersion parameter is bias-corrected, and a controlled degradation under model misspecification. Empirical illustrations on claim count data (Australian motor bodily injury) and paid amounts (Taylor-Ashe) document both the structural reading of $κ$ and the working-approximation status of the model in the amounts case.

2604.03076 2026-06-19 stat.AP 版本更新

Carbon cost pass-through rate in power system: evidence from Italy under the EU ETS

电力系统中碳成本传导率:来自欧盟排放交易体系下意大利的证据

Pierdomenico Duttilo, Francesco Lisi

AI总结 研究欧盟排放交易体系下碳成本在意大利电力市场的传导率,基于2016-2024年数据,采用自回归线性回归模型,发现全国平均传导率约32%,且各市场区域存在显著异质性。

详情
AI中文摘要

本文研究了欧盟排放交易体系(EU ETS)下碳定价对意大利电力市场的影响,重点关注第三和第四阶段(2016-2024年)各市场区域的碳成本传导率(CPTR)。利用日度数据,研究采用基于自回归动态线性回归模型的计量经济学框架,估计碳成本在批发电力价格中的反映程度。进一步通过稳健性检验和分位数回归,评估CPTR在不同燃料价差水平下的变化。结果表明,碳成本正向且显著地传导至电力价格,证实了碳定价作为关键市场驱动因素的相关性。然而,传导不完全,CPTR值始终低于100%。在国家层面,传导率估计约为32%,第三阶段和第四阶段之间无统计显著变化。各市场区域出现显著异质性:在北部、中北部和撒丁岛,第四阶段传导率上升,而在中南部和西西里岛则下降,反映了发电结构、碳强度和市场条件的差异。总体而言,研究结果强调了市场区域因素在塑造电力市场碳定价有效性中的重要性。

英文摘要

This paper investigates the impact of carbon pricing under the EU Emissions Trading System (EU ETS) on the Italian electricity market, focusing on the carbon cost pass-through rate (CPTR) across market zones during Phases 3 and 4 (2016-2024). Using daily data, the study applies an econometric framework based on a linear regression model with autoregressive dynamics to estimate the extent to which carbon costs are reflected in wholesale electricity prices. It further incorporates robustness checks and quantile regression to assess how the CPTR varies across different fuel spread levels. The results show that carbon costs are positively and significantly transmitted to electricity prices, confirming the relevance of carbon pricing as a key market driver. However, pass-through is incomplete, with CPTR values consistently below 100%. At the national level, the pass-through estimate is around 32%, with no statistically significant change between Phase 3 and Phase 4. Substantial heterogeneity emerges across market zones: pass-through increases in the North, Centre-North, and Sardinia during Phase 4, while it declines in the Centre-South and Sicily, reflecting differences in generation mix, carbon intensity, and market conditions. Overall, the findings highlight the importance of market zones factors in shaping the effectiveness of carbon pricing in electricity markets.

2603.06820 2026-06-19 econ.EM stat.OT 版本更新

Hippocratic Utility and Status Quo Bias

希波克拉底效用与现状偏见

Tomasz Strzalecki

AI总结 本文通过简单例子揭示一种重视失去生命多于拯救生命的效用函数,其适用范围比最初看起来有限得多。

详情
AI中文摘要

一种效用函数被提出,它更重视失去的生命而非被拯救的生命。我不质疑这种不对称背后的伦理动机。然而,我通过一个简单例子表明,这种决策标准的适用范围比最初看起来要有限得多。

英文摘要

A utility function has been proposed that values more lives that are lost than those that are saved. I do not dispute the ethical motivation behind this kind of asymmetry. However, I show with a simple example that the scope of applicability of such a decision criterion is considerably more limited than it may first appear.

2410.19333 2026-06-19 econ.GN physics.soc-ph q-fin.EC stat.AP 版本更新

Swiss-system chess tournaments and unfairness

瑞士制国际象棋锦标赛与不公平性

László Csató, Alex Krumer

AI总结 研究瑞士制国际象棋锦标赛中轮次奇偶性导致的不公平性,发现多执白一局的选手得分显著更高,建议采用偶数轮次和平衡颜色分配机制。

Comments 13 pages, 4 tables

详情
AI中文摘要

瑞士制是一种日益流行的比赛形式,因为它提供了比赛场次与排名准确性之间的有利权衡。然而,关于瑞士制国际象棋锦标赛在奇数轮次下潜在的不公平性,尚无实证研究。为了分析这一问题,我们的论文比较了比赛中多执白一局的选手与少执白一局的选手的得分。利用28个高知名度赛事的数据,我们发现多执白一局的选手得分显著更高。特别是在四个Grand Swiss赛事中,这一优势超过了平局的价值。解决这种不公平性的一种潜在方案是组织偶数轮次的瑞士制国际象棋锦标赛,并使用最近提出的配对机制保证所有选手的颜色分配平衡。

英文摘要

The Swiss system is an increasingly popular competition format as it provides a favourable trade-off between the number of matches and ranking accuracy. However, there is no empirical study on the potential unfairness of Swiss-system chess tournaments if an odd number of rounds is played. To analyse this issue, our paper compares the number of points scored in the tournament between players who played one game more with the white pieces and players who played one game fewer with the white pieces. Using data from 28 highly prestigious competitions, we find that players with an extra white game score significantly more points. In particular, the advantage exceeds the value of a draw in the four Grand Swiss tournaments. A potential solution to this unfairness could be organising Swiss-system chess tournaments with an even number of rounds, and guaranteeing a balanced colour assignment for all players using a recently proposed pairing mechanism.

2512.02203 2026-06-19 econ.EM stat.AP 版本更新

Statistical Inference in Large Multi-way Networks

大规模多路网络中的统计推断

Lucas Resende, Guillaume Lecué, Lionel Wilner, Philippe Choné

AI总结 提出一种基于分类任务的多路网络结构参数估计方法,无需固定效应数量与结构假设,避免 incidental parameter 问题,在稀疏网络中比 PPML 更快且置信区间更可靠,应用于法国医疗政策因果效应分析。

Comments Working paper

详情
AI中文摘要

我们提出了一种新方法,用于在多路网络中估计结构参数,同时控制丰富的固定效应结构。该方法基于一系列分类任务,对固定效应的数量和结构均不敏感。与完全最大似然方法相比,我们的估计量不会受到 incidental parameter 问题的影响。对于稀疏连接的网络,它在计算上也比 PPML 更快。我们提供的经验证据表明,我们的估计量比 PPML 及其偏差修正策略产生更可靠的置信区间。即使在模型误设下,这些改进仍然成立,并且在稀疏设置中更为显著。虽然 PPML 在密集、低维数据中仍具有竞争力,但我们的方法为多路模型提供了一种稳健的替代方案,能够随稀疏性高效扩展。该方法被应用于研究政策改革对法国医疗空间可达性的因果效应。

英文摘要

We propose a new method to estimate structural parameters in multi-way networks while controlling for rich structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. In contrast to full maximum likelihood approaches, our estimator does not suffer from the incidental parameter problem. For sparsely connected networks, it is also computationally faster than PPML. We provide empirical evidence that our estimator yields more reliable confidence intervals than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. The method is applied to study the causal effect of a policy reform on spatial accessibility to health care in France.

10. 数据隐私、稳健性与公平性 2 篇

2606.20427 2026-06-19 math.ST stat.ME stat.TH 新提交

Private Rate-Double-Robust Inference

私有率双稳健推断

Máté Kormos, Aad van der Vaart

AI总结 本文通过局部隐私机制注入噪声保护个体隐私,同时利用率双稳健性实现目标参数的无偏和半参数有效推断,并开发了私有化非参数和参数 nuisance 估计方法。

详情
AI中文摘要

我们协调了隐私保护和率双稳健推断。个体隐私通过局部隐私机制得到保护:向敏感数据注入噪声,仅揭示用于推断的噪声数据。因此,隐私保护阻碍了推断。相比之下,当目标参数的估计量的大样本偏差由另外两个 nuisance 参数的估计误差之间的权衡表征时,该参数的推断是率双稳健的。因此,率双稳健性促进了推断。我们协调的起点是一类由无限维线性索引和低维非线性回归索引的率双稳健目标参数。这包括因果参数等。为了私有地推断这些目标,我们展示了合适的隐私机制如何将敏感数据模型的半参数性质转移到私有设置中。率双稳健性被转移,从而实现了对目标参数的局部私有、无偏和半参数有效推断。最后,我们将一般的非参数 nuisance 估计量转化为私有估计量,这些估计量继承了其非私有对应物的收敛性质。对于参数 nuisance 模型,我们开发了一种私有矩估计方法及其大样本推断理论。

英文摘要

We reconcile privacy protection and rate-double-robust inference. The privacy of individuals is protected by a local privacy mechanism: injecting noise into their sensitive data, revealing only the noisy data for inference. Hence, privacy protection hinders inference. In contrast, the inference of a target parameter is rate-double-robust when the large-sample bias of an estimator of the parameter is characterised by a trade-off between the estimation errors of two other, nuisance, parameters. Hence, rate-double-robustness facilitates inference. Our starting point of reconciliation is a class of rate-double-robust target parameters indexed linearly by an infinite-dimensional and nonlinearly by a low-dimensional regression. Among others, this includes causal parameters. To infer these targets privately, we show how suitable privacy mechanisms transfer the semiparametric properties of the sensitive-data model to the private setting. Rate-double-robustness is transferred, enabling locally-private, unbiased and semiparametrically efficient inference of our target parameters. Finally, we transform general nonparametric nuisance estimators into private ones, which inherit convergence properties of their nonprivate counterparts. For parametric nuisance models, we develop a private method-of-moments estimator and its large-sample inference theory.

2601.02322 2026-06-19 stat.ME cs.LG 版本更新

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

环境自适应协变量选择:学习何时利用虚假相关进行分布外预测

Shuozhi Zuo, Yixin Wang

发表机构 * Department of Statistics, University of Michigan, Ann Arbor(统计系,密歇根大学,安阿伯分校)

AI总结 针对分布外预测中协变量选择问题,提出环境自适应算法,根据环境特征动态选择协变量集,在模拟和实际数据中优于静态方法。

详情
AI中文摘要

一种常见的分布外预测方法将模型限制为因果或不变协变量,以避免可能随环境变化的虚假关联。尽管具有理论吸引力,但当仅观察到结果的部分因果父节点时,该策略可能不如经验风险最小化。在这种情况下,非因果协变量可以作为未观察到的因果父节点的代理,当代理关系稳定时改善预测,但当变化破坏这种关系时则有害。因此,最优协变量集可能取决于所遇到的具体变化。由于不同的变化会在未标记的协变量分布中留下特征,我们提出了一种环境自适应协变量选择算法,该算法将环境级摘要映射到特定于环境的协变量集。这些摘要可以是手工制作的,也可以从多环境数据中学习,并且先验因果知识可以作为约束条件纳入。在模拟和应用数据集中,所提出的方法在各种变化下优于静态因果、不变和其他非自适应规则。

英文摘要

A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

11. 数据集、软件与应用 8 篇

2606.20114 2026-06-19 stat.ME stat.AP 新提交

Community detection in small-sample ordinal regimes: A benchmarking framework for Delphi data

小样本有序情境下的社区检测:德尔菲数据的基准测试框架

Yuri Calleo, Simone Di Zio, Fabrizio Maturo

AI总结 针对德尔菲数据高维小样本导致的秩亏问题,提出从变量中心协方差模型转向网络中心连接模型,利用社区检测算法识别潜在主题结构,实现结构稳定的降维。

详情
AI中文摘要

德尔菲数据共识的统计建模面临一个关键瓶颈:问卷项目的高维性与专家小组有限样本量之间的矛盾。这种秩亏导致传统潜变量模型(如主成分分析)结构不稳定且易过拟合。为弥补这一方法论空白,本研究提出从变量中心协方差模型转向网络中心连接模型。通过将项目相关性映射到加权图拓扑,我们提出了一个基于模拟的基准测试,利用社区检测算法识别潜在主题结构,有效解决了高维小样本情境下典型的谱不稳定性和秩亏问题。该研究系统评估了基于结构密度、信息流和谱划分的拓扑方法在合成数据集上的鲁棒性,这些数据集旨在复制共识数据的病理条件,包括有序量表和系统噪声。核心方法论贡献在于证明专家判断间的共线性——传统上被视为需要正则化的统计冗余——可以有效地重新解释为凝聚的拓扑信号。该框架为研究人员提供了一种结构化的自动降维程序,确保即使在标准因子分析失效的小样本情境下也能保持结构稳定性和心理测量一致性。

英文摘要

The statistical modeling of consensus in Delphi data faces a critical bottleneck: the high dimensionality of questionnaire items relative to the limited sample size of expert panels. This rank deficiency leads traditional latent variable models, such as Principal Component Analysis, to be structurally unstable and prone to overfitting. Addressing this methodological gap, this study proposes a transition from variable-centric covariance models to network-centric connectivity models. By mapping item correlations onto a weighted graph topology, we present a simulation-based benchmark that utilizes community detection algorithms to identify latent thematic structures, effectively addressing the spectral instability and rank deficiency typical of high-dimensional, low-sample-size regimes. The research systematically evaluates the robustness of topological approaches based on structural density, information flow, and spectral partitioning against synthetic datasets designed to replicate the pathological conditions of consensus data, including ordinal scales and systemic noise. The central methodological contribution lies in demonstrating that collinearity among expert judgments - traditionally treated as statistical redundancy to be regularized - can be effectively reinterpreted as a topological signal of cohesion. This framework provides researchers with a structured and automated procedure for dimensionality reduction, ensuring structural stability and psychometric consistency even in small-sample regimes where standard factor analysis breaks down.

2606.19775 2026-06-19 cs.SI stat.AP stat.OT 新提交

Rethinking Sampling Strategy in Link Prediction

重新思考链接预测中的采样策略

Yilin Bi, Zhenyu Deng, Xinshan Jiao, Tao Zhou

AI总结 提出β-采样方案,研究两阶段采样对链接预测性能的影响,发现缺失链接的结构特征显著影响预测精度,且第二阶段采样策略至关重要。

Comments 19 pages, 5 figures, 3 tables

详情
AI中文摘要

许多现实世界的网络是不完整的,使得链接预测成为网络科学中的一个基本挑战。为了训练参数和评估算法,观察到的链接通常被划分为三个子集,即训练集、验证集和探测集。这种划分隐含地涉及两个采样过程:第一阶段采样产生探测集,第二阶段采样获得变化集。迄今为止,我们对这两个采样过程如何影响算法性能的理解仍然非常有限。为了解决这个问题,我们提出了一种称为β-采样的采样方案,其中链接的采样概率与其两个端点的度数乘积的β次幂成正比。在45个真实网络上的实验表明,通过改变探测集模拟的缺失链接的结构特征显著影响预测精度。当缺失链接倾向于连接高度数节点时,这类链接可以很容易地被准确预测。此外,即使探测集固定,第二阶段采样仍然对预测精度产生显著影响。值得注意的是,最优的第二阶段采样策略不同于随机采样(随机选择链接形成验证集)和一致采样(保证验证集和探测集中的链接具有相同的结构特征)。

英文摘要

Many real-world networks are incomplete, making link prediction a fundamental challenge in network science. To train parameters and evaluate algorithms, observed links are usually divided into three subsets, namely training, validation, and probe sets. This division implicitly involves two sampling processes: first-stage sampling yields the probe set and second-stage sampling obtains the variation set. To date, our understanding of how these two sampling processes affect algorithm performance remains quite limited. To address this issue, we propose a sampling scheme called $β$-sampling, where the sampling probability of a link is proportional to the product of the degrees of its two endpoints raised to the power of $β$. Experiments on 45 real-world networks reveal that the structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy. When missing links tend to connect high-degree nodes, such links can be predicted accurately with ease. Furthermore, even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy. Notably, the optimal second-stage sampling strategy differs from \textit{random sampling} (which randomly selects links to form the validation set) and \textit{consistent sampling} (which guarantees that links in the validation and probe sets share identical structural characteristics).

2606.19642 2026-06-19 physics.ao-ph stat.AP stat.ML 新提交

Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction

基于保形预测的概率AI天气预报的严格不确定性量化

Anna Asch, Raphael Rossellini, Pedram Hassanzadeh, Rebecca Willett

AI总结 针对AI概率天气预报校准不足(尤其是极端事件),提出使用保形预测方法,无需分布假设即可数学保证覆盖,应用于三个全球模型(GenCast、NeuralGCM、AIFS-ENS)的温度和降水预报,实现校准不确定性而不牺牲其他概率指标。

详情
AI中文摘要

概率天气预报正随着人工智能(AI)经历快速变革。在传统数值天气预报中,计算能力可能限制集合预报对未知未来状态统计分布的近似程度。AI模型便于生成更大的集合,并经过概率考量训练,理论上能带来更好的不确定性量化。这些最先进模型的预报通常被认为是良好校准的。然而,我们在此表明,此类模型的统计覆盖(校准的最终度量)可能存在问题,尤其是在极端事件上。为解决这一缺陷,我们采用保形预测,这是一类统计方法,与以往的后处理技术不同,它在无分布假设下数学上保证覆盖。我们将在线保形预测应用于三个领先全球天气模型(GenCast、NeuralGCM和AIFS-ENS)的温度和降水预报(包括极端情况),确保校准不确定性而不牺牲其他概率指标。这种后处理方法可应用于任何预报模型。

英文摘要

Probabilistic weather forecasting is undergoing rapid transformation with artificial intelligence (AI). In traditional numerical weather prediction, computing power can limit how well ensemble forecasts approximate the unknown statistical distribution of future states. AI models facilitate larger ensembles and are trained with probabilistic considerations, ideally leading to better uncertainty quantification. Forecasts from these state-of-the-art models are often considered well-calibrated. However, here we show that the statistical coverage of such models, the ultimate measure of calibration, can struggle, especially on extreme events. To address this shortcoming, we employ conformal prediction, a class of statistical methods that mathematically guarantees coverage under no distributional assumptions, unlike previous post-processing techniques. We apply online conformal prediction to temperature and precipitation forecasts (including extremes) of three leading global weather models, GenCast, NeuralGCM, and AIFS-ENS, ensuring calibrated uncertainty at no expense to other probabilistic metrics. This post-processing method can be applied to any forecasting model.

2606.18544 2026-06-19 stat.AP 新提交

Chess Signatures of Play

对弈的棋谱签名

Christian Turk, Nicholas Polson

AI总结 利用粗路径理论的签名变换提取棋局中事件顺序与交互的不变特征,构建签名核双样本检验和时序有效作弊检测方法,在控制错误率的同时显著提升检测能力。

详情
AI中文摘要

一局棋是一个流:一个按时间排序的走法序列,每个走法携带引擎评估、准确度度量、局面复杂度度量和时钟读数。我们将一局棋建模为多元路径,并应用粗路径理论的签名变换,获得一个重参数化不变、分级的特征集,记录棋局内事件的顺序和交互,无需参数化似然。我们证明,棋手的对弈法则可以从期望签名中识别,直至树状等价;构造路径空间上的签名核双样本检验;并将作弊检测重新表述为任意时序有效的序列检验:签名符合度得分成为一个e过程,其误差通过Ville不等式对每个样本量同时控制,波动在中等偏差尺度上校准。判别信息存在于签名的Levy面积中,该面积衡量准确度是否恰好当局面变难时上升——这是引擎辅助的特征,而聚合的匹配率统计忽略了这一点。在对照研究中,该检验保持精确的第一类错误控制,检测能力从对细微辅助的微不足道上升到对明显辅助的0.98,中位检测时间与增长率预测一致。校准至马格努斯·卡尔森记录在案的精英准确度后,该监测器不会标记世界冠军级别的对弈;我们展示了作弊策略,这些策略使所有聚合统计量(包括Regan系统的最佳走法频率z分数)保持不变,却被签名干净地捕获——精确说明了顺序感知、任意时序有效的检验如何加强现有的国际象棋反作弊方法。

英文摘要

A game of chess is a stream: a time-ordered sequence of moves, each carrying an engine evaluation, a measure of accuracy, a measure of position complexity, and a clock reading. We model a game as a multivariate path and apply the signature transform of rough-path theory to obtain a reparametrization-invariant, graded feature set that records the order and interaction of in-game events without a parametric likelihood. We show that a player's law of play is identifiable from the expected signature up to tree-like equivalence, construct a signature-kernel two-sample test on path space, and recast cheating detection as an anytime-valid sequential test: a signature conformance score becomes an e-process whose error is controlled for every sample size at once by Ville's inequality, with fluctuations calibrated on the moderate-deviation scale. The discriminating information lives in the signature's Levy areas, which measure whether accuracy rises precisely when positions become hard--the fingerprint of engine assistance that aggregate match-rate statistics discard. In a controlled study the test holds exact type-I control and detection power rises from negligible for subtle assistance to 0.98 for blatant assistance, with a median detection time matching the growth-rate prediction. Calibrated to Magnus Carlsen's documented elite accuracy, the monitor does not flag world-champion-level play; and we exhibit cheating strategies that leave every aggregate statistic, including the best-move-frequency z-score of the Regan system, unchanged yet are caught cleanly by the signature--making precise how an order-aware, anytime-valid test strengthens the prevailing approach to chess anti-cheating.

2606.18436 2026-06-19 stat.ML cs.LG 新提交

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义?基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute(挪威气象研究所)

AI总结 本研究通过多模态图神经网络系统,消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响,发现各模态分别改善不同方面,点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情
AI中文摘要

稀疏点观测在降水临近预报中日益可用,但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率,并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置,使用雷达网格、站点位置、降雨起始的互补诊断,以及oracle、位移和幅度评分。结果表明,每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推,Netatmo观测改善了局部站点和起始诊断,卫星预测因子减少了某些站点级偏差,但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益,而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论,但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是,稀疏观测可以提供有用的局部约束,但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

2606.18611 2026-06-19 cs.SD cs.AI cs.LG stat.ML 新提交

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

2508.14009 2026-06-19 stat.OT 版本更新

Understanding Pedagogical Content Knowledge of Introductory Data Science Instructors: An Inaugural Framework

理解入门数据科学教师的教学内容知识:一个初步框架

Sinem Demirci, Mine Doğucu, Andrew Zieffler, Joshua M. Rosenberg

AI总结 通过访谈14名入门数据科学教师并分析教学大纲,探索其教学内容知识(PCK)的关键组成部分,为教师发展提供见解,并建立IDS领域的PCK初步框架。

Comments 67 pages, 4 tables

详情
AI中文摘要

随着数据科学成为一门独立的学科,入门数据科学(IDS)课程在塑造学生的基础理解方面发挥着关键作用。这些课程通常由没有数据科学或教育学正式培训的教师授课,为研究教学内容知识(PCK)提供了一个独特且全球相关的背景。本研究基于对14名IDS教师的半结构化访谈及其课程大纲,探讨IDS教师如何描述和理解其教学实践,并通过PCK的视角进行分析。研究结果突出了关于IDS的PCK的关键组成部分,并为支持教师发展提供了见解。这项工作有助于将PCK研究扩展到新的跨学科领域,并支持全球范围内数据科学教育能力建设的持续努力。它可作为开发专门针对IDS的PCK框架的起点。

英文摘要

As data science emerges as a distinct academic discipline, introductory data science (IDS) courses play a key role in shaping students foundational understanding. Often taught by instructors without formal training in data science or pedagogy, these courses present a unique and globally relevant context for examining pedagogical content knowledge (PCK). Drawing on semi-structured interviews with 14 IDS instructors and their course syllabi, this study explores how IDS instructors describe and make sense of their teaching practices, which are analyzed through the lens of PCK. The findings highlight key components of PCK about IDS and offer insights into supporting instructor development. This work contributes to expanding the scope of PCK research into new interdisciplinary domains and ongoing global efforts to build capacity in data science education. It could serve as a starting point for developing a PCK framework specific to IDS.

2502.06866 2026-06-19 cs.LG cs.AI econ.EM stat.AP stat.ML 版本更新

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

全球生活便利指数:面向主要经济体纵向分析的机器学习框架

Arun Kumar Selvaraj, Tanay Panat, Rohitash Chandra

发表机构 * Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics(过渡人工智能研究组,数学与统计学学院) Centre for Artificial Intelligence and Innovation(人工智能与创新中心) Pingla Institute(Pingla研究所)

AI总结 提出全球生活便利指数,结合社会经济和基础设施因素,利用机器学习处理缺失数据,并通过主成分分析和因子分析降维,为政策制定者提供改善生活质量的可操作工具。

详情
AI中文摘要

全球经济、地缘政治条件以及COVID-19疫情等破坏性事件对生活成本和生活质量产生了巨大影响。理解主要经济体中生活成本和生活质量的长期影响至关重要。一个透明且全面的生活指数必须包含生活条件的多个维度。在本研究中,我们提出了一种通过全球生活便利指数量化生活质量的方法,该指数将各种社会经济和基础设施因素整合为一个单一综合得分。我们的指数利用定义生活水平的经济指标,这有助于针对特定领域进行干预改进。我们提出了一个机器学习框架来处理特定国家某些经济指标的数据缺失问题。然后,我们整理并更新数据,并使用降维方法(主成分分析和因子分析)创建自1970年以来主要经济体的生活便利指数。我们的工作通过为政策制定者提供识别需要改进领域(如医疗系统、就业机会和公共安全)的实用工具,显著丰富了相关文献。我们的方法使用开放数据和代码,易于复现并适用于各种情境,为生活质量评估的持续研究和政策制定提供了透明度和可访问性。

英文摘要

The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

12. 其他/综合统计 17 篇

2606.19859 2026-06-19 cs.IT cs.LG math.IT math.PR math.ST stat.TH 新提交

Doeblin Curves

Doeblin 曲线

Dongmin Lee, William Lu, Anuran Makur, Japneet Singh

AI总结 提出 Doeblin 曲线概念,量化马尔可夫核在不同散度和功率水平下的收缩行为,并应用于噪声迭代优化、噪声电路可靠计算和差分隐私等领域的更细粒度收缩分析。

Comments 42 pages, 2 figures

Journal ref IEEE Transactions on Information Theory, vol. 72, no. 6, pp. 3556-3596, June 2026

详情
AI中文摘要

近期关于 Doeblin 系数的研究揭示了它们作为 TV 距离的 Dobrushin 收缩系数的多路泛化的有用性,这与它们在马尔可夫链遍历性理论中的经典作用不同。然而,为了建立信息收缩的存在性,通常需要强条件,例如远离 0。基于最近提出的非线性信息收缩概念,我们旨在提出一种更细粒度的基于 Doeblin 的多路收缩行为刻画,即使对于 Doeblin 系数为 0 的信道,也能产生非平凡的收缩保证。为此,我们引入了 Doeblin 曲线的概念——一种非线性函数,它量化了马尔可夫核在特定散度和功率水平下对输入分布集合的收缩行为。在我们的分析过程中,我们发展了 Doeblin 系数的新变分刻画,提出了 Doeblin 曲线的若干性质,定义了功率约束 Doeblin 曲线的几个版本,并利用上述变分刻画推导了上下界。然后,我们将这些结果应用于不同领域,包括噪声迭代优化的泛化界、噪声电路可靠计算的误差界以及在线迭代算法的差分隐私保证。特别是,我们将这些领域的结果扩展到更广泛的领域或群体设置,利用 Doeblin 曲线揭示比 Doeblin 系数更细粒度的收缩现象。

英文摘要

Recent research on Doeblin coefficients has shed light on their usefulness as a multi-way generalization of the Dobrushin contraction coefficient for TV distance, in a separate vein from their classic role in the theory of Markov chain ergodicity. However, strong conditions, such as being bounded away from 0, are typically necessary for Doeblin coefficients to establish the existence of information contraction. Building on recently formulated concepts of nonlinear information contraction, we aim to propose a finer-grained Doeblin-based characterization of multi-way contraction behavior which yields non-vacuous contraction guarantees even for channels whose Doeblin coefficient is 0. To this end, we introduce the notion of a Doeblin curve -- a nonlinear function which quantifies the contraction behavior of a Markov kernel on collections of input distributions at specific levels of divergence and power. Through the course of our analysis, we develop a new variational characterization of Doeblin coefficients, present several properties of Doeblin curves, define several versions of power-constrained Doeblin curves, and derive upper and lower bounds using our aforementioned variational characterization. We then utilize these results in diverse areas, including generalization bounds for noisy iterative optimization, error bounds for reliable computation with noisy circuits, and differential privacy guarantees for online iterative algorithms. In particular, we extend results in these areas to broader domains or group settings, leveraging Doeblin curves to reveal finer-grained contraction phenomena than Doeblin coefficients.

2606.19726 2026-06-19 math.ST stat.TH 新提交

A Laplace equation approach to the Behrens--Fisher problem

Behrens-Fisher问题的拉普拉斯方程方法

Nagananda K G, Jong Sung Kim

AI总结 针对两独立正态样本方差未知且不等的情况,提出偏微分方程公式,通过正交分解和球面楔概率将分布问题转化为拉普拉斯-狄利克雷边值问题,导出累积分布函数和概率密度的精确有限样本表示,并得到尾部分布展开。

Comments 31 pages, 4 figures

详情
AI中文摘要

我们针对两个独立正态样本(方差未知且不等)的Behrens-Fisher问题,发展了一种偏微分方程公式。通过正交分解分离均值分量和残差分量(对应于去除均值方向后中心化的样本内变异),并将样本均值的学生化差异重新表述为尺度不变的几何约束。这种简化将分布问题转化为球面楔概率的评估,这些概率被识别为调和测度以及拉普拉斯-狄利克雷边值问题在原点的值。在此框架下,我们导出了累积分布函数和概率密度函数的精确有限样本表示,形式为贝塔函数,仅依赖于样本量和方差比。这些表示将Behrens-Fisher分布置于标准特殊函数形式中,可直接在广泛可用的商业软件(包括Microsoft Excel)中使用,从而便于分布评估和分位数计算。我们还得到了相关调和延拓及其阈值导数的Gegenbauer分离变量展开,系数为封闭的贝塔-伽马形式,并导出了具有显式首项常数和高阶修正的尖锐尾部分布展开。

英文摘要

We develop a partial differential equation formulation of the Behrens-Fisher problem for two independent normal samples with unknown and unequal variances. An orthogonal decomposition separates mean and residual components (corresponding to the centered within-sample variation left after removal of the mean directions) and recasts the studentized difference of sample means as a scale-invariant geometric constraint. This reduction transforms the distributional problem into the evaluation of spherical wedge probabilities, which are identified with harmonic measure and with the value at the origin of a Laplace-Dirichlet boundary value problem. From this framework, we derive exact finite-sample representations for the cumulative distribution function and the probability density function in terms of beta functions, with dependence only on the sample sizes and the variance ratio. These representations place the Behrens-Fisher law in a standard special-function form that is directly accessible in widely available commercial software -- including Microsoft Excel -- thereby facilitating distributional evaluation and quantile computation. We also obtain a Gegenbauer separation-of-variables expansion for the associated harmonic extension and its threshold derivative, with coefficients in closed Beta-Gamma form, and derive sharp tail expansions with explicit leading constants and higher-order corrections.

2606.11171 2026-06-19 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Indexed Bellman Information Complexity

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结 本文通过统一MAIR框架,将GP-UCB与MAMS算法置于共同语言下,提出结合两者优势的安全主算法,并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)和决策估计系数(DEC)方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中,用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的(而非真实的)高斯过程先验,并利用实现轨迹的复杂度以及计算可处理性,而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验,我们推广了GP-UCB分析和MAMS算法,提出了一种结合两者优势的安全主算法,并提供了一个核赌博机构造,表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是:算法信息和类宽极小极大系数回答不同的问题,并可能导致不同的差距;核赌博机提供了一个干净的环境,使得这种区别在数学上变得可见。

英文摘要

We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

2605.20541 2026-06-19 math.ST math.PR stat.TH 版本更新

Finite-Sample Bounds for Expected Signature Estimation under Weak Dependence

有限样本下弱依赖条件下期望签名估计的界限

Bryson Schenck

AI总结 本文研究了在弱依赖条件下,从单一长依赖轨迹估计期望签名的有限样本界限,通过块平均估计器证明了非渐近的均方误差界,并探讨了在不同Hurst指数下的收敛性。

Comments 59 pages, 1 figure

详情
AI中文摘要

期望签名在满足矩增长条件时唯一确定随机粗糙路径的分布,但此前缺乏从单一长依赖轨迹估计其有限样本界限。本文研究了一个平稳随机过程,其样本路径可解释为几何粗糙路径,被划分为等间距观测的块,并证明了块平均估计器的非渐近均方误差界。当路径的Hölder正则性至多为1/2时,需要粗糙路径理论来定义估计量,因为Young积分和Riemann-Stieltjes积分无法定义签名的迭代积分。在矩、平稳性和块签名协方差衰减条件(严格弱于α-混合且适用于长程依赖驱动器)下,误差分为离散化项和波动项,其速率分别由路径正则性和依赖强度决定。通过逐层粗糙因子方差分析,保持有限截断常数显式,并在固定观测预算下获得最优分配规则。本文验证了分数奥本海姆-乌伦贝克过程在三个制度下的假设,即粗糙(Hurst H<1/2)、半鞅(H=1/2)和长程(H>1/2)。蒙特卡罗实验显示经验收敛速率快于理论上界。

英文摘要

The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating its truncations from a single long dependent trajectory remain unavailable. We study a strictly stationary stochastic process equipped with a geometric rough-path lift, observed in non-overlapping blocks of equally-spaced samples, and prove a non-asymptotic mean-squared error (MSE) bound for the block-averaging estimator of its truncated expected signature. Under moment and stationarity assumptions together with a direct covariance-decay condition on block signatures -- strictly weaker than $α$-mixing and applicable to long-range-dependent processes -- the error separates into a discretization term and a fluctuation term, with rates determined respectively by path regularity and dependence strength. A levelwise rough-factorial variance analysis keeps finite-truncation constants explicit and yields an optimal allocation rule under a fixed observation budget. We verify the assumptions for independent-coordinate fractional Ornstein--Uhlenbeck processes in three regimes: short-range (Hurst $1/4<H<1/2$), semimartingale ($H=1/2$), and long-range ($H>1/2$); in all three, the block-signature covariance is summable, so the fluctuation term decays at the same rate as in the independent-block case, even under long memory at $H>1/2$. Monte Carlo experiments show empirical slopes steeper than the guaranteed upper-bound rates.

2604.02336 2026-06-19 math.FA math.ST stat.TH 版本更新

The Shift Operator Calculus for Stationary Time Series Analysis

平稳时间序列分析的移位算子演算

Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

AI总结 本文为平稳时间序列建模建立了严格的移位算子演算,证明了不同函数族下转移函数算子的存在性和等距性,并统一了平稳过程可逆性与转移函数算子可逆性的概念。

Comments 7 pages

详情
AI中文摘要

本文为平稳时间序列建模建立了严格的移位算子演算,填补了文献中的空白。它提供了转移函数算子 $f(B)$ 和 $f(T)$ 的存在性和等距性的证明,其中 $B$ 是双边移位算子,$T$ 是单边移位算子,针对不同的函数族 $f$。本文建立了在 Wiener 代数 $\mathbb{W}_+$ 下 $f(B)$ 和 $f(T)$ 的幂级数在算子范数下的收敛性,以及基于 Abel 和的使用,对于 $H^{\infty}$ 中的 $f$ 在强算子拓扑下的收敛性。基于此演算,它将平稳过程可逆性的概念与转移函数 $f(T)$ 的算子可逆性统一起来。

英文摘要

The article establishes a rigorous shift operator calculus for stationary time series modeling, addressing a certain gap in the literature. It provides proofs of existence and isometry for the transfer function operators $f(B)$ and $f(T)$ where $B$ is the bilateral shift operator and $T$ is the unilateral shift operator for different families of functions $f$. The article establishes convergence of the power series of $f(B)$ and $f(T)$ under the operator norm for the Wiener algebra $\mathbb{W}_+$, and convergence under strong operator topology for $f$ in $H^{\infty}$, based on the use of Abel sums. Based on this calculus, it unifies the notion of stationary process invertibility with the operator invertibility of the transfer function $f(T)$.

2602.04550 2026-06-19 quant-ph math.ST stat.TH 版本更新

Locally Gentle State Certification for High Dimensional Quantum Systems

高维量子系统的局部温和态认证

Cristina Butucea, Jan Johannes, Henning Stein

AI总结 研究局部温和量子态认证中非破坏性测量的信息代价,推导出样本复杂度为Θ(d³/(ε²α²)),揭示了α-温和性惩罚与希尔伯特空间维度d的线性关系。

详情
AI中文摘要

量子统计推断的标准方法依赖于引起波函数坍缩的测量,从而消耗量子态以提取信息。在本工作中,我们研究了\emph{局部温和}量子态认证的基本极限,其中学习算法被限制在迹范数下最多扰动态$\alpha$,从而允许样本重用。我们分析了区分未知态$\rho$等于参考态$\rho_0$还是与其$\epsilon$-远的问题。我们推导了该问题的极小极大样本复杂度,量化了非破坏性测量的信息代价。具体地,通过构造显式测量算子,我们证明了$\alpha$-温和性约束施加了$\frac{d}{\alpha^2}$的样本量惩罚,导致总样本复杂度为$n = \Theta(\frac{d^3}{\epsilon^2 \alpha^2})$。我们的结果阐明了信息提取与态扰动之间的权衡,并突出了量子学习中物理测量约束与隐私机制之间的深层联系。关键地,我们发现施加$\alpha$-温和性所导致的样本量惩罚与希尔伯特空间维度$d$呈线性关系,而非高维私有估计中典型的参数数量$d^2-1$。

英文摘要

Standard approaches to quantum statistical inference rely on measurements that induce a collapse of the wave function, effectively consuming the quantum state to extract information. In this work, we investigate the fundamental limits of \emph{locally-gentle} quantum state certification, where the learning algorithm is constrained to perturb the state by at most $α$ in trace norm, thereby allowing for the reuse of samples. We analyze the hypothesis testing problem of distinguishing whether an unknown state $ρ$ is equal to a reference $ρ_0$ or $ε$-far from it. We derive the minimax sample complexity for this problem, quantifying the information-theoretic price of non-destructive measurements. Specifically, by constructing explicit measurement operators, we show that the constraint of $α$-gentleness imposes a sample size penalty of $\frac{d}{α^2}$, yielding a total sample complexity of $n = Θ(\frac{d^3}{ε^2 α^2})$. Our results clarify the trade-off between information extraction and state disturbance, and highlight deep connections between physical measurement constraints and privacy mechanisms in quantum learning. Crucially, we find that the sample size penalty incurred by enforcing $α$-gentleness scales linearly with the Hilbert-space dimension $d$ rather than the number of parameters $d^2-1$ typical for high-dimensional private estimation.

2504.09564 2026-06-19 math.ST stat.TH 版本更新

The weak-feature-impact effect on the NPMLE in monotone binary regression

单调二元回归中弱特征影响对NPMLE的影响

Dario Kieffer, Angelika Rohde

AI总结 研究单调二元回归中非参数最大似然估计在弱特征关系下的极限分布,发现一种新的分布连续插值于两个极端情况,并改进了小样本近似。

Comments Added Theorem 3.3 and several visualizations

详情
AI中文摘要

统计文献提供了单调二元回归中非参数最大似然估计(NPMLE)在两种极端情况下的逐点极限分布:如果特征-标签关系严格单调且足够光滑,则以立方根$n$速率收敛,具有缩放Chernoff型极限分布;如果底层关系平坦,则以参数$\sqrt{n}$速率收敛。本文提供了NPMLE分布演变的完整图景,揭示了一种新的极限分布,在弱特征-标签关系的情况下,为小样本提供了显著更好的分布近似。该分布被证明连续插值于两个极端情况之间。确定该分布的创新方法是将其作为新引入的弱特征影响三角阵列中NPMLE的极限,针对特定的参数-样本量配置。此外,在适当缩放的$L^{1}$误差中同样观察到弱特征影响场景下的相变。作为副产品,获得了平坦回归函数下的极限分布,这是先前未知的。证明开发了一种全新的策略,特别是不基于开关关系。伴随这些结果的新型局部极小极大下界。

英文摘要

Statistical literature provides pointwise limiting distributions of the nonparametric maximum likelihood estimator (NPMLE) in monotone binary regression for the two extremal cases: If the feature-label relation is strictly monotone and sufficiently smooth, it converges at a cube-root-$n$ rate with scaled Chernoff-type limiting distribution, and it converges at the parametric $\sqrt{n}$-rate if the underlying relation is flat. In this article, we provide the complete picture of the distributional metamorphosis of the NPMLE, revealing a new limiting distribution which provides a significantly better distributional approximation for small samples in case of a weak feature-label relationship. It is shown to continuously interpolate between the two extremal cases. The innovative way to determine this distribution is to generate it as a limit of the NPMLE in the newly introduced weak-feature-impact triangular array for a particular parameter-sample-size constellation. Moreover, the phase transition is likewise observed for the suitably rescaled $L^{1}$-error in this weak-feature-impact scenario. As a by-product, its limiting distribution for flat regression functions is obtained, which was unknown before. The proof develops a completely new strategy, notably not based on the switch relation. A novel type of local minimax lower bounds accompanies these results.

2507.15475 2026-06-19 eess.SP math.PR stat.AP

On the Distribution of a Two-Dimensional Random Walk with Restricted Angles

二维受限角度随机游走的分布

Karl-Ludwig Besser

AI总结 研究受限角度二维随机游走的分布,推导两步联合与边缘分布,提供一般步数的数值解及大步数近似,明确支持集的精确描述。

Comments 14 pages, 14 figures

Journal ref IEEE Transactions on Signal Processing, vol. 74, pp. 2316-2330, 2026

详情
AI中文摘要

本文推导了二维(复数)随机游走的分布,其中每一步的角度被限制在圆的一个子集。这种设置出现在信号处理中的空中计算等领域。特别地,我们推导了两步的联合和边缘分布,给出了任意步数的数值解,并对大步数提供了近似解。此外,我们为任意步数提供了支持集的精确描述。本文的结果为未来涉及此类问题的研究提供了参考。

英文摘要

In this paper, we derive the distribution of a two-dimensional (complex) random walk in which the angle of each step is restricted to a subset of the circle. This setting appears in various domains, such as in over-the-air computation in signal processing. In particular, we derive the exact joint and marginal distributions for two steps, numerical solutions for a general number of steps, and approximations for a large number of steps. Furthermore, we provide an exact characterization of the support for an arbitrary number of steps. The results in this work provide a reference for future work involving such problems.

2506.23396 2026-06-19 stat.ML cs.LG

AICO: Feature Significance Tests for Supervised Learning

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering(斯坦福大学管理科学与工程系和计算与数学工程研究所) Upstart, Inc.(Upstart公司) Stanford University, Institute for Computational and Mathematical Engineering(斯坦福大学计算与数学工程研究所)

详情
英文摘要

Machine learning is central to modern science, industry, and policy, yet its predictive power often comes at the cost of transparency: we rarely know which input features truly drive a model's predictions. Without such understanding, researchers cannot draw reliable conclusions, practitioners cannot ensure fairness or accountability, and policymakers cannot trust or govern model-based decisions. Existing tools for assessing feature influence are limited; most lack statistical guarantees, and many require costly retraining or surrogate modeling, making them impractical for large modern models. We introduce AICO, a broadly applicable framework that turns model interpretability into an efficient statistical exercise. AICO tests whether each feature genuinely improves predictive performance by masking its information and measuring the resulting change. The method provides exact, finite-sample feature p-values and confidence intervals for feature importance through a simple, non-asymptotic hypothesis testing procedure. It requires no retraining, surrogate modeling, or distributional assumptions, making it feasible for large-scale algorithms. In both controlled experiments and real applications, from credit scoring to mortgage-behavior prediction, AICO reliably identifies the variables that drive model behavior, providing a scalable and statistically principled path toward transparent and trustworthy machine learning.

2412.20298 2026-06-19 cs.LG cs.CY stat.ML

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

发表机构 * Banking Academy of Vietnam(越南银行学院) Vietnam Academy of Science and Technology(越南科学技术 academy) Hanoi University of Science and Technology(河内科学技术大学) University of Koblenz(科隆大学)

Comments The manuscript is submitted to Springer Nature's journal

详情
英文摘要

The digitalization of credit scoring has become essential for financial institutions and commercial banks, especially in the era of digital transformation. Machine learning techniques are commonly used to evaluate customers' creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets. The experimental results show that fairness-aware models achieve a better balance between predictive accuracy and fairness compared to traditional classification models.

2510.05013 2026-06-19 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology(冲绳科学技术大学院大学)

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2505.01318 2026-06-19 stat.ME

Modeling Large Nonstationary Spatial Data with the Full-Scale Basis Graphical Lasso

用全尺度基图拉索方法建模大非平稳空间数据

Matthew LeDuc, William Kleiber, Tomoko Matsuo

AI总结 本文提出了一种结合隐含低秩过程和稀疏协方差模型的新方法,用于建模大非平稳空间数据,通过灵活的图高斯马尔可夫随机场模型对低秩组件系数进行建模,并结合全尺度近似和基图拉索方法,提出全尺度基图拉索方法(FSBGL),采用图拉索惩罚似然进行估计,通过差异凸方案优化,通过合成场和热层高分辨率模拟数据集验证,与现有空间模型相比,在有限训练数据下更能捕捉热层温度场的显著特征。

详情
AI中文摘要

我们提出了一种新的方法,用于建模大非平稳空间过程的数据集,该方法结合了隐含的低秩过程和稀疏协方差模型。低秩组件的系数被赋予了灵活的图高斯马尔可夫随机场模型。利用低秩和紧支撑协方差结构结合了全尺度近似和基图拉索;我们称这种新方法为全尺度基图拉索(FSBGL)。估计采用图拉索惩罚似然,通过差异凸方案进行优化。我们在合成场以及具有挑战性的高分辨率热层模拟数据集上展示了所提出的方法。在与现有空间模型的比较中,即使在可用训练数据有限的情况下,FSBGL在捕捉热层温度场的显著特征方面表现更好。

英文摘要

We propose a new approach for the modeling large datasets of nonstationary spatial processes that combines a latent low rank process and a sparse covariance model. The low rank component coefficients are endowed with a flexible graphical Gaussian Markov random field model. The utilization of a low rank and compactly-supported covariance structure combines the full-scale approximation and the basis graphical lasso; we term this new approach the full-scale basis graphical lasso (FSBGL). Estimation employs a graphical lasso-penalized likelihood, which is optimized using a difference-of-convex scheme. We illustrate the proposed approach on synthetic fields as well as with a challenging high-resolution simulation dataset of the thermosphere. In a comparison against state-of-the-art spatial models, the FSBGL performs better at capturing salient features of the thermospheric temperature fields, even with limited available training data.

2408.15920 2026-06-19 math.ST math.PR stat.TH

Nonlinear Filtering and Spatial Asymptotic Consistency for SPDEs Observed via Spatio-Temporal Point Processes

Jan Szalankiewicz, Cristina Martinez-Torres, Wilhelm Stannat

Comments Fixed several typos throughout the manuscript, substantially revised Section 4 with improved theoretical bounds, and updated simulations with corresponding code base improvements

Journal ref Stoch PDE: Anal Comp (2026)

详情
英文摘要

In this paper, we develop the mathematical framework for filtering problems arising from biophysical applications where data is collected from confocal laser scanning microscopy recordings of the space-time evolution of intracellular wave dynamics of biophysical quantities. In these applications, signals are described by stochastic partial differential equations (SPDEs) and observations can be modelled as functionals of marked point processes whose intensities depend on the underlying signal. We derive both the unnormalized and normalized filtering equations for these systems, demonstrate the asymptotic consistency and approximations of finite dimensional observation schemes respectively partial observations. Our theoretical results are validated through extensive simulations using synthetic and real data. These findings contribute to a deeper understanding of filtering with point process observations and provide a robust framework for future research in this area.

2307.06655 2026-06-19 stat.ME

Stochastic Reaction-Diffusion Systems in Biophysics: Towards a Toolbox for Quantitative Model Evaluation

Gregor Pasemann, Carsten Beta, Wilhelm Stannat

Journal ref In: Stich, M., Carballido-Landeira, J. (eds) Nonlinear Dynamics for Biological Systems. SEMA SIMAI Springer Series, vol 40, 2025, Springer, Cham

详情
英文摘要

We develop a statistical toolbox for a quantitative model evaluation of stochastic reaction-diffusion systems modeling space-time evolution of biophysical quantities on the intracellular level. Starting from space-time data $X_N(t,x)$, as, e.g., provided in fluorescence microscopy recordings, we discuss basic modelling principles for conditional mean trend and fluctuations in the class of stochastic reaction-diffusion systems, and subsequently develop statistical inference methods for parameter estimation. With a view towards application to real data, we discuss estimation errors and confidence intervals, in particular in dependence of spatial resolution of measurements, and investigate the impact of misspecified reaction terms and noise coefficients. We also briefly touch implementation issues of the statistical estimators. As a proof of concept we apply our toolbox to the statistical inference on intracellular actin concentration in the social amoeba Dictyostelium discoideum.

1812.05678 2026-06-19 stat.ME

Objective-Driven Ensembles: Bridging the Gap Between Interpretable Sparsity and Algorithmic Prediction

目标驱动集成:弥合可解释稀疏性与算法预测之间的差距

Anthony Christidis, Stefan Van Aelst, Ruben Zamar

AI总结 本文提出目标驱动集成方法,通过将最优子集选择推广为联合数学优化问题,生成可解释的集成模型,并理论证明惩罚预测变量重叠可限制预测协方差、减轻有限样本虚假相关的影响,实现机器学习级精度与稀疏模型可解释性的兼顾。

详情
AI中文摘要

稀疏方法(如最优子集选择、弹性网)是获得可解释模型的标准方法,但可能遭受高方差和易受虚假相关影响的问题。另一方面,算法集成(如随机森林、梯度提升)实现了高预测精度,但产生了由随机化或顺序残差拟合驱动的不可解释黑箱。近年来,一种统一的范式出现了:目标驱动集成。通过将最优子集选择推广为联合数学优化问题,该方法通过将预测变量最优地分配到少量不同模型中来生成可解释的集成。在本文中,我们综合了这一日益增长的文献,并为其经验成功提供了理论见解。具体来说,我们表明惩罚预测变量重叠在数学上限制了预测协方差,并减轻了有限样本虚假相关的影响。我们使用精确的组合预言机证明了这些性质,并回顾了最近的计算近似如何成功地将这一框架扩展到各种领域,包括高维数据、分类任务以及存在逐案例或逐单元污染的场景,实现了机器学习级别的精度,同时保留了稀疏模型的可解释性。

英文摘要

Sparse methods (e.g., Best Subset Selection, Elastic Net) are the standard approach for obtaining interpretable models, but they can suffer from high variance and vulnerability to spurious correlations. Alternatively, algorithmic ensembles (e.g., Random Forests, Gradient Boosting) achieve high prediction accuracy but yield uninterpretable black boxes driven by randomization or sequential residual fitting. In recent years, a unifying paradigm has emerged: Objective-Driven Ensembles. By generalizing best subset selection into a joint mathematical optimization problem, this approach generates interpretable ensembles by optimally splitting predictors across a small number of diverse models. In this paper, we synthesize this growing body of literature and illustrate the statistical principles driving its empirical success. Specifically, we utilize finite-sample bounds to demonstrate how penalizing predictor overlap controls ensemble covariance and provides a mathematical hedge against spurious correlations. We evaluate these mechanics using an exact combinatorial oracle, and review how recent computational approximations have successfully scaled this framework to a variety of domains, including high-dimensional data, classification tasks, and settings with casewise or cellwise contamination, achieving machine-learning-level accuracy while retaining the interpretability of sparse models.

1909.03488 2026-06-19 math.AT cs.CG math.PR math.ST stat.TH

Probabilistic Convergence and Stability of Random Mapper Graphs

Adam Brown, Omer Bobrowski, Elizabeth Munch, Bei Wang

详情
英文摘要

We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $\mathbb{X}$ equipped with a continuous function $f: \mathbb{X} \rightarrow \mathbb{R}$. We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $\mathbb{R}$. We then introduce a variant of the classic mapper graph of Singh et al.~(2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of $(\mathbb{X}, f)$ when it is applied to points randomly sampled from a probability density function concentrated on $(\mathbb{X}, f)$. Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (2018), we first show that the mapper graph of $(\mathbb{X}, f)$, a constructible $\mathbb{R}$-space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of $(\mathbb{X},f)$ to the mapper of a super-level set of a probability density function concentrated on $(\mathbb{X}, f)$. Finally, building on the approach of Bobrowski et al.~(2017), we show that, with high probability, we can recover the mapper of the super-level set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data.

1406.0214 2026-06-19 eess.SY cs.SY math.AT stat.ML

Topological and Statistical Behavior Classifiers for Tracking Applications

拓扑与统计行为分类器用于跟踪应用

Paul Bendich, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Elizabeth Munch, Andrew Newman, David Porter, David Rouse, Nate Strawn, Adam Watkins

AI总结 本文提出基于多假设跟踪、拓扑数据分析和机器学习的统一理论,通过拓扑特征编码行为信息,利用统计模型拟合拓扑特征分布,并结合目标类型分类方法提升跟踪性能。

详情
AI中文摘要

我们介绍了一种基于多假设跟踪、拓扑数据分析和机器学习的统一理论,用于目标跟踪。我们的创新包括:1)利用鲁棒的拓扑特征编码行为信息;2)对这些拓扑特征的分布拟合统计模型;3)采用Wigren和Bar Shalom等人的目标类型分类方法,利用所得的拓扑特征似然值提升跟踪过程。为证明我们方法的有效性,我们在由Simulation of Urban Mobility包生成的合成车辆数据上进行了测试。

英文摘要

We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package.