URL PDF HTML ☆

赞 0 踩 0

2606.10123 2026-06-10 stat.ME 新提交

Methods for adjusting for covariate measurement error in flexible modelling of functional form: results of a blinded, controlled neutral comparison simulation study

在函数形式的灵活建模中调整协变量测量误差的方法：一项盲法、受控中性比较模拟研究的结果

Mohammed Sedki, Aris Perperoglou, Anne C. M. Thiébaut, Steve Ferreira Guerra, Paul Gustafson, Frank E. Harrell, Willi Sauerbrei, Michal Abrahamowicz, Laurence S. Freedman

AI总结通过盲法多阶段中性比较模拟研究，评估了六类测量误差校正方法与四种灵活回归模型结合在非线性关联估计中的表现，发现点态SIMEX最准确稳健，贝叶斯方法和回归校准次之，多重插补较差，B样条最差。

详情

AI中文摘要

协变量测量误差在流行病学研究中普遍存在，并扭曲估计的暴露-结果关联，然而校正方法几乎仅在线性建模假设下研究。当潜在关联是非线性且本身通过灵活回归估计时，这些方法的行为仍不清楚。我们报告了一项在STRATOS倡议内进行的盲法、多阶段中性比较模拟研究，评估了测量误差校正与函数形式灵活建模的结合。六类校正方法（点态和基于系数的模拟外推[SIMEX]、对数尺度和风险尺度的贝叶斯推断、多重插补[MI]和回归校准[RC]）分别与B样条（BS）、惩罚样条（PS）、分数多项式（FP）和自然样条（NS）结合，产生了23种分析方法。这些方法应用于在五种函数形式（J形、线性、两种阈值模型和饱和模型）下生成的病例对照数据，跨越不同样本量、重复子研究规模、误差幅度和误差分布的数据集，采用经典加性误差和用于误差校准的重复子研究。性能通过暴露分布中心95%范围内估计函数的对数均方误差进行评估。点态SIMEX总体最准确且最稳健，其次是贝叶斯方法和与PS、FP或NS配对的RC；MI表现较差，而使用无惩罚BS的贝叶斯估计表现最差。PS、FP和NS几乎等效，而BS始终较差。没有单一方法在所有场景中占主导地位，强调了敏感性分析的价值。

英文摘要

Covariate measurement error is pervasive in epidemiological research and distorts estimated exposure-outcome associations, yet correction methods have been studied almost exclusively under linear modelling assumptions. Their behaviour when the underlying association is non-linear and is itself estimated with flexible regression, remains poorly characterised. We report a blinded, multi-stage neutral comparison simulation study, conducted within the STRATOS initiative, evaluating measurement error correction coupled with flexible modelling of functional form. Six families of correction methods (pointwise and coefficient-based Simulation Extrapolation [SIMEX], Bayesian inference on the logit and risk scales, Multiple Imputation [MI], and Regression Calibration [RC]) were each combined with B-splines (BS), penalised splines (PS), fractional polynomials (FP), and natural splines (NS), yielding 23 analytic methods. Methods were applied to case-control data generated under five functional forms (J-shape, linear, two threshold models, and saturation) across simulated datasets spanning varying sample sizes, replication substudy sizes, error magnitudes, and error distributions, with classical additive error and a replication substudy for error calibration. Performance was assessed by the log mean squared error of the estimated function over the central 95 % of the exposure distribution. Pointwise SIMEX was the most accurate and most robust approach overall, followed by Bayesian methods and RC when paired with PS, FP, or NS; MI performed less well, and Bayesian estimation with unpenalised BS performed worst. PS, FP, and NS were near-equivalent, whereas BS was consistently inferior. No single method dominated across all scenarios, underscoring the value of sensitivity analyses.

URL PDF HTML ☆

赞 0 踩 0

2606.10096 2026-06-10 stat.ME 新提交

Estimating the Wasserstein barycenter of one-dimensional distributions under sparse sampling

稀疏采样下一维分布的Wasserstein重心估计

James Peng, Florian Stijven, Linbo Wang, Peter B. Gilbert

AI总结针对每个单元仅通过少量独立同分布样本观测到一维分布的数据，提出边际构造重心（MCB）估计量，通过二项混合方法估计潜在分位数分布，克服稀疏采样下经验Wasserstein重心的偏差，并证明其一致性和渐近正态性。

详情

AI中文摘要

我们研究稀疏采样下的分布数据，其中每个单元由实直线上的概率分布表示，仅通过少量独立同分布样本观测。一维分布数据的一个自然中心趋势概念是Wasserstein重心，其分位数函数是单元级分位数函数的逐点平均。我们关注Wasserstein重心分位数函数的逐点估计：在给定分位数水平下，目标是相应单元级分位数的总体均值。一个朴素的插件估计量是经验Wasserstein重心，它将观测到的单元级经验分布视为真实的潜在单元级分布。然而，在稀疏采样下，该估计量可能存在严重偏差。我们提出了一种避免直接估计单元级分布或分布总体分布的方法。我们从更宏大的目标开始：刻画给定分位数水平下潜在单元级分位数的分布。我们证明该分布可以用单元级CDF值的边际分布表示，而后者可以通过二项混合方法估计。这激发了我们的估计量——边际构造重心（MCB）估计量，通过取估计的潜在单元级分位数分布的均值得到。我们建立了MCB估计量逐点一致且渐近正态的条件，并通过模拟表明，在稀疏采样下它能够显著优于经验Wasserstein重心。我们在HVTN 502/503疫苗效力试验的HIV-1序列数据分析中说明了该方法，当每个参与者只有少量序列可用时，使用重心来总结和比较参与者内部病毒序列特征的分布。

英文摘要

We study distributional data under sparse sampling where each unit is represented by a probability distribution on the real line observed only through a small i.i.d.~sample. A natural notion of central tendency for one-dimensional distributional data is the Wasserstein barycenter, whose quantile function is the pointwise average of the unit-level quantile functions. We focus on pointwise estimation of the Wasserstein barycenter quantile function: at a given quantile level, the target is the population mean of the corresponding unit-level quantiles. A naive plug-in estimator is the empirical Wasserstein barycenter, which treats observed unit-level empirical distributions as the true latent unit-level distributions. Under sparse sampling, however, this estimator can be severely biased. We propose an approach that avoids directly estimating either the unit-level distributions or the full population law of distributions. We start with the more ambitious goal of characterizing the distribution of latent unit-level quantiles at a given quantile level. We show that this distribution can be written in terms of the marginal distributions of the unit-level CDF values, which can be estimated using binomial mixture methods. This motivates our estimator, the marginal-constructed barycenter (MCB) estimator, obtained by taking the mean of the estimated distribution of latent unit-level quantiles. We establish conditions under which the MCB estimator is pointwise consistent and asymptotically normal, and show through simulations that it can substantially outperform the empirical Wasserstein barycenter under sparse sampling. We illustrate the method in an analysis of HIV-1 sequence data from the HVTN 502/503 vaccine efficacy trials, using the barycenter to summarize and compare within-participant distributions of viral sequence features when only a small number of sequences are available per participant.

URL PDF HTML ☆

赞 0 踩 0

2606.09906 2026-06-10 stat.ME q-bio.PE 新提交

An information-geometric framework for mapping maximum potential biodiversity

一种用于映射最大潜在生物多样性的信息几何框架

Shinto Eguchi

AI总结提出信息几何框架，通过约束变分原理定义潜在组成和多样性差距，统一处理Hill型多样性和Rao二次熵，为生态保护提供基准比较。

Comments 22 pages, 1 figure

详情

AI中文摘要

生物多样性度量通常被描述性地使用：从观测或估计的群落组成计算多样性指数，并将结果值映射到空间上。然而，保护规划还需要一个特定地点的基准，以便将观测到的群落与之进行比较。本章为这种“潜在多样性”和相关的“多样性差距”开发了一个信息几何框架。核心对象是物种单纯形上的一对概率向量：观测或实现的组成$p^{\mathrm{obs}}$，以及通过约束变分原理获得的潜在组成$p^{\mathrm{pot}}$。然后通过比较这两个组成处的多样性泛函来定义差距。该框架针对Hill型多样性（衡量丰度和均匀度）和Rao二次熵（包含物种间的性状、系统发育或生态差异）进行了开发。空间点过程解释阐明了如何在进入单纯形之前定义局部生态容量。然后，护航约束、容量约束和散度投影提供了一种统一的方法来定义超出均匀分布的非平凡基准。得到的公式区分了两个不同的问题：一个群落有多多样化，以及它离局部允许的潜在基准有多远。它还将暗多样性的生态概念与概率单纯形上的连续、丰度加权比较联系起来。我们还概述了一个动态扩展，其中容量、物种迁移和气候驱动的变化随时间变化。使用大规模公民科学生物多样性数据和性状数据库的实证实施留待未来工作。

英文摘要

Biodiversity measures are often used descriptively: one computes a diversity index from an observed or estimated community composition and maps the resulting values across space. Conservation planning, however, also requires a site-specific benchmark against which the observed community can be compared. This chapter develops an information-geometric framework for such \emph{potential diversity} and the associated \emph{diversity gap}. The central object is a pair of probability vectors on the species simplex: an observed or realized composition $p^{\mathrm{obs}}$, and a potential composition $p^{\mathrm{pot}}$ obtained by a constrained variational principle. The gap is then defined by comparing a diversity functional at these two compositions. The framework is developed for both Hill-type diversity, which measures abundance and evenness, and Rao's quadratic entropy, which incorporates trait, phylogenetic, or ecological dissimilarities among species. A spatial point-process interpretation clarifies how local ecological capacities can be defined before passing to the simplex. Escort constraints, capacity constraints, and divergence projections then provide a unified way to define nontrivial benchmarks beyond the uniform distribution. The resulting formulation separates two distinct questions: how diverse a community is, and how far it is from a locally admissible potential benchmark. It also connects the ecological idea of dark diversity with a continuous, abundance-weighted comparison on the probability simplex. We also outline a dynamic extension in which capacities, species migration, and climate-driven shifts vary over time. Empirical implementation with large-scale citizen-science biodiversity data and trait databases is left for future work.

URL PDF HTML ☆

赞 0 踩 0

2606.10770 2026-06-10 stat.ME cs.LG 新提交

Correcting Variable Importance Scored by Random Forests

修正随机森林产生的变量重要性评分

Guancheng Zhou, Haiping Xu, Jason Liu, Donghui Yan

发表机构 * Computer and Information Science（计算机与信息科学）； Mathematics and Data Science（数学与数据科学）； University of Massachusetts, Dartmouth, MA（马萨诸塞大学达特茅斯分校）； The Rivers School, Weston, MA（韦斯特on学校的河流学校）

AI总结针对随机森林变量重要性受变量间相关性影响的问题，提出基于条件相关性的分组方法进行修正，实验证明两种计算高效方案均能有效校正变量重要性。

Comments 22 pages, 10 figures

详情

AI中文摘要

随机森林产生的变量重要性在统计分析中广泛应用，在辅助模型解释、模型选择和诊断、成本受限学习等任务中发挥重要作用。然而，RF中变量重要性的计算未考虑变量间的相关性，与许多其他变量相关的变量往往会获得较低的重要性指数，或被其他强相关变量完全掩盖（即重要性指数接近零）。为了在计算变量重要性时避免不相关变量的影响，我们提出根据变量的条件相关性（以响应变量为条件）对变量进行分组。我们探索了两种计算高效的方案：一种将变量单独分组，然后将感兴趣的变量与所有相关变量分离；另一种使用聚类根据变量间的成对条件相关性进行分组。实验表明，两种方法都能对变量重要性进行合理的修正。

英文摘要

Variable importance produced by Random Forests (RF) is used widely in statistical data analysis, and has played an important role in a variety of tasks such as assisting model interpretation, model selection and diagnosis, and cost-bounded learning etc. However, the calculation of variable importance in RF does not take into account of the correlations among variables, and variables that are correlated to many other variables tend to receive a lower importance index or being completely masked (i.e., with an importance index near zero) by other strongly correlated variables. To prevent influence from unwanted correlated variables in calculating variable importance, we propose to group variables by their conditional correlations (conditional on the response variable). We explore two computationally efficient options, with one grouping variables individually, and then separates the variable of interest from all correlated variables, while the other uses clustering to group variables according to their pair-wise conditional correlations. Our experiments show that both lead to sensible corrections to the importance of variables.

URL PDF HTML ☆

赞 0 踩 0

2606.11136 2026-06-10 math.ST stat.ME stat.ML stat.TH 新提交

Conformal Prediction for Dyadic Regression Under Complex Missingness

复杂缺失机制下二元回归的共形预测

Robert Lunde, Minjie Yang, Elizaveta Levina, Ji Zhu

AI总结针对复杂缺失机制下的二元回归问题，提出共形预测框架，通过分布不变性条件替代可交换性，并利用双射论证处理随机子集样本，同时提出多种共形预测程序，包括图论加权方法，实现渐近条件有效性。

详情

AI中文摘要

我们针对复杂缺失机制下的二元回归问题，建立了一个共形预测框架。在理论层面，我们在弱于可交换性的分布不变性条件下建立了共形预测的超均匀性。一个关键结果通过一种新颖的双射论证处理了样本本身是指标集的随机子集的情况，该情况未被现有理论覆盖，该论证构造了事件之间的显式保测对应。此外，我们针对联合可交换数组提出了共形预测程序，包括全共形、分裂共形、利用行和列内相似性的行列方法，以及实现掩码条件有效性的选择性共形程序。对于缺失元素，我们在缺失机制的非参数图论模型下建立了图论加权共形程序的渐近有效性。我们进一步建立了连续和离散响应的条件有效性结果；据我们所知，这是首次在非随机缺失假设下对加权共形预测的渐近条件有效性进行正式证明。所提出的方法在合成和真实网络数据上进行了说明。

英文摘要

We develop a framework for conformal prediction in dyadic regression problems under complex missingness mechanisms. At the theoretical level, we establish super-uniformity of conformal prediction under distributional invariance conditions weaker than exchangeability. A key result handles the case where the sample itself is a random subset of the index set, a setting not covered by existing theory, via a novel bijection argument that constructs an explicit measure-preserving correspondence between events. In addition, we propose conformal prediction procedures for jointly exchangeable arrays, including full conformal, split conformal, a row-column approach exploiting similarities within rows and columns, and a selective conformal procedure achieving mask-conditional validity. For missing elements, we establish asymptotic validity of a graphon-weighted conformal procedure under a nonparametric graphon model for the missingness mechanism. We further establish conditional validity results for both continuous and discrete responses; to the best of our knowledge, this is first formal proof of asymptotic conditional validity for weighted conformal prediction under a missing-not-at-random assumption. The proposed methods are illustrated on synthetic and real network data.

URL PDF HTML ☆

赞 0 踩 0

2606.07762 2026-06-10 stat.ME stat.OT 版本更新

Probabilistic Win Ratio Method For Hierarchical Composite Endpoints With Coarsened Outcomes

用于粗化结果的分层复合终点的概率胜率方法

Lei Li, Jing Lei, Yuexiao Dong

AI总结提出概率胜率（PWR）方法，通过条件概率替代确定性比较，处理删失和缺失数据，提高效率并减少偏倚，在完全观测时退化为标准胜率。

详情

AI中文摘要

胜率越来越多地用于分析临床试验中的优先复合终点，但标准实现依赖于确定性成对比较，在存在删失和特定终点缺失的情况下表现不佳。在这种情况下，未解决的比较通常被视为平局，导致效率损失和潜在的偏倚推断，尤其是当低优先级结果不完全观测时。我们提出了概率胜率（PWR），一个在粗化观测下估计经典胜率的框架。PWR用给定观测数据下的胜、负或平局的条件概率替代确定性成对决策，允许部分观测的比较按不确定性明确惩罚后贡献分数。比较的粗化程度越大，有效权重越小，而完全观测的比较与经典分析中一样贡献，保留了临床优先级结构。当结果完全观测时，PWR精确退化为标准胜率估计量。模拟研究表明，PWR在一系列删失和缺失场景中保持低偏倚和均方误差。两个临床试验案例研究展示了互补的数据机制，在近乎完整的数据中展示了校准，在大量右删失下展示了稳定性。

英文摘要

The win ratio is increasingly used to analyze prioritized composite endpoints in clinical trials, but standard implementations rely on deterministic pairwise comparisons and can perform poorly in the presence of censoring and endpoint-specific missingness. In such settings, unresolved comparisons are often treated as ties, leading to loss of efficiency and potentially biased inference, particularly when lower-priority outcomes are incompletely observed. We propose the probabilistic win ratio (PWR), a framework for estimating the classical win ratio under coarsened observation. The PWR replaces deterministic pairwise decisions with conditional probabilities of win, loss, or tie given the observed data, allowing partially observed comparisons to contribute fractionally while being explicitly penalized according to their uncertainty. Comparisons with greater coarsening receive smaller effective weight, whereas fully observed comparisons contribute as in the classical analysis, preserving the clinical priority structure. When outcomes are fully observed, the PWR reduces exactly to the standard win ratio estimator. Simulation studies show that the PWR maintains low bias and mean squared error across a range of censoring and missingness scenarios. Two clinical trial case studies illustrate complementary data regimes, demonstrating calibration in near-complete data and stability under substantial right censoring.

URL PDF HTML ☆

赞 0 踩 0

2606.06482 2026-06-10 stat.ME math.ST stat.TH 版本更新

Two-Sample Hypothesis Testing for Subspace Equality in Network Data

网络数据中子空间相等的双样本假设检验

Rajdeep Brahma, Joshua Agterberg, Yuguo Chen

AI总结针对两个网络是否共享相同子空间（如社区结构）的零假设，提出基于投影矩阵差的Frobenius范数检验统计量，证明其在平均期望度对数增长下渐近正态，并给出均值和方差估计及局部功效。

详情

AI中文摘要

在许多场景中，人们常常需要确定两个网络是否共享某些联合结构连接模式，例如社区。然而，尽管社区可能在网络间共享，边概率可能显著不同。因此，在本文中，我们考虑检验一个一般的零假设，即两个网络具有相同的潜在子空间，这特别包括社区相同的情形（对于随机块模型或混合成员随机块模型，即使边概率不同）。我们提出了一个基于前主子空间投影矩阵之差的Frobenius范数的检验统计量，并证明了当平均期望度随顶点数至少以对数增长时，我们的检验统计量在适当中心化和缩放后依分布收敛到高斯随机变量。然后，我们给出了渐近均值和方差的估计量，并在更强的信号条件下证明了一致性，同时给出了网络足够稠密时检验的局部功效。我们的理论结果基于经验特征向量与真实特征向量投影差的一个极限定理，该定理也可视为检验统计量的单样本版本，且可能具有独立意义。我们通过数值模拟和在美国航班数据上的应用展示了我们的结果。

英文摘要

In many settings one is often interested in determining whether two networks share some joint structural connectivity patterns such as communities. However, while communities may be shared across networks, edge probabilities may differ significantly. Therefore, in this paper we consider testing a general null hypothesis that two networks have the same underlying subspace, which in particular includes the setting that communities are the same for either stochastic blockmodels or mixed-membership stochastic blockmodels (even if edge probabilities are different). We propose a test statistic based on the Frobenius norm of the difference of the leading subspace projection matrices, and we prove that our test statistic, after appropriate centering and scaling, converges in distribution to a Gaussian random variable as long as the average expected degree grows at least logarithmically in the number of vertices. We then provide estimators for the asymptotic mean and variance and show consistency under a stronger signal condition, and we give the local power of our test when the networks are sufficiently dense. Our theoretical results are based on a limit theorem for the projection difference of empirical and true eigenvectors which can also be viewed as the one-sample version of our test statistic, and this result may be of independent interest. We demonstrate our results through numerical simulations and an application to US Flight data.

URL PDF HTML ☆

赞 0 踩 0

2602.01509 2026-06-10 hep-ph hep-ex stat.ME 版本更新

HDSense: An efficient method for ranking observable sensitivity

HDSense：一种有效的可观测灵敏度排序方法

Benoît Assi, Christian Bierlich, Rikab Gambhir, Phil Ilten, Tony Menzo, Stephen Mrenna, Manuel Szewc, Michael K. Wilkinson, Jure Zupan

AI总结提出HDSense评分，利用一维直方图高效排序可观测集对模型参数的约束能力，通过Fisher信息框架剖析未知相关性，平衡信息量与冗余，验证于Lund弦碎裂模型参数估计。

Comments 26+11 pages, 9 figures, code available at: https://gitlab.com/pythia8-contrib/packages/hdsense. Updated version with minor revision recommended by SciPost Physics

详情

AI中文摘要

在考虑许多相关可观测量的完整似然时，识别哪些可观测量最有效地约束模型参数可能在计算上代价高昂。这对于例如强子化模型尤为重要，因为需要高精度来解释对撞机实验结果。我们引入了高维灵敏度（HDSense）评分，这是一种仅使用一维直方图来对可观测量集进行排序的计算高效指标。该评分通过剖析Fisher信息框架中的未知相关性推导得出，平衡了总信息量与可观测量之间的冗余。我们将HDSense应用于对一组可观测量进行排序，以衡量它们对Pythia中实现的Lund弦强子化模型五个参数的约束能力，使用了在$Z$极点模拟的轻子对撞机事件。基于机器学习的全似然近似的验证表明，HDSense成功识别了接近最优的可观测量子集。该框架自然地处理来自不同接受度的多个实验的数据，并包含探测器效应。虽然在强子化模型上进行了演示，但该方法广泛适用于相关性未知或难以建模的通用参数估计问题。

英文摘要

Identifying which observables most effectively constrain model parameters can be computationally prohibitive when considering full likelihoods of many correlated observables. This is especially important for, e.g., hadronization models, where high precision is required to interpret the results of collider experiments. We introduce the High-Dimensional Sensitivity (HDSense) score, a computationally efficient metric for ranking observable sets using only one-dimensional histograms. Derived by profiling over unknown correlations in the Fisher information framework, the score balances total information content against redundancy between observables. We apply HDSense to rank a set observables in terms of their constraining power with respect to five parameters of the Lund string model of hadronization implemented in Pythia using simulated leptonic collider events at the $Z$ pole. Validation against machine-learning--based full-likelihood approximations demonstrates that HDSense successfully identifies near-optimal observable subsets. The framework naturally handles data from multiple experiments with different acceptances and incorporates detector effects. While demonstrated on hadronization models, the methodology applies broadly to generic parameter estimation problems where correlations are unknown or difficult to model.

URL PDF HTML ☆

赞 0 踩 0

2406.10473 2026-06-10 stat.ME 版本更新

Robust Design-Based Estimation and Inference for Stratified Randomized Trials with Varying Cluster Sizes

基于设计的聚类大小不等分层随机试验的稳健估计与推断

Xinhe Wang, Ben B. Hansen

AI总结针对聚类大小异质的分层随机试验，揭示分层平均估计量不一致性问题，提出Hájek比率估计量作为稳健替代，并开发基于设计的方差估计量。

详情

AI中文摘要

聚类随机对照试验通常采用分层或配对匹配来改善协变量平衡和效率。样本平均处理效应（SATE）通常通过平均层内处理-对照均值对比来估计——这是一种自然且广泛使用的方法。我们证明，在聚类大小异质的分层聚类试验中，此类估计量不一定对SATE一致。即使随机化正确且模型无设定错误，它们也可能收敛到错误的极限。原因在于聚类大小与处理效应之间的协方差：按层平均会以产生常数阶偏差的方式错误加权聚类，无论样本量大小如何。我们研究Hájek（比率）估计量作为稳健替代。通过先聚合处理组内的结果再取差异，它在通过增加层大小或层数而扩大的聚类试验中保持一致性。尽管如此，其在聚类试验基于设计的分析中的应用一直受到缺乏方差估计量的限制。我们开发了一个基于设计的方差估计量，适用于任意数量和大小的层，并证明其渐近保守性，即使某些层仅包含一个处理或对照单元，该性质也成立。我们还提出了在聚类数量适中时改进Wald检验覆盖率的检验。该框架通过方差正交性质自然地扩展到协变量调整估计量。

英文摘要

Clustered randomized controlled trials are often stratified or pair-matched to improve covariate balance and efficiency. Sample average treatment effects (SATEs) are commonly estimated by averaging stratum-level treatment-control mean contrasts -- an approach that is natural and widely used. We show that, in stratified clustered trials with heterogeneous cluster sizes, such estimators need not be consistent for the SATE. They can converge to the wrong limit even under correct randomization and without model misspecification. The source is a covariance between cluster sizes and treatment effects: stratumwise averaging mis-weights clusters in a way that produces bias of constant order, regardless of sample size. We study the Hájek (ratio) estimator as a robust alternative. By aggregating outcomes within treatment groups before taking their difference, it remains consistent in clustered trials that grow by increasing strata sizes or the number of strata. Despite that, its use in design-based analyses of clustered trials has been limited by the lack of variance estimators. We develop a design-based variance estimator that applies to any number of strata of any size, and show that it is asymptotically conservative, a property that holds even when some strata contain only a single treated or control unit. We also present tests improving the coverage of Wald tests when the number of clusters is moderate. The framework extends naturally to covariate-adjusted estimators via a variance orthogonality property.

URL PDF HTML ☆

赞 0 踩 0

2606.10409 2026-06-10 stat.ME 新提交

Robust Bayesian Predictive Model Selection using Bregman Divergence

使用Bregman散度的稳健贝叶斯预测模型选择

Jongwoo Choi, Neil A. Spencer, Dipak K. Dey

AI总结针对基于对数得分的ELPD对异常值和尾部不匹配敏感的问题，提出基于Bregman散度的广义ELPD框架，通过β-散度族控制低密度观测影响，实现稳健模型选择。

详情

AI中文摘要

预测性贝叶斯模型比较通常依赖于留一法交叉验证准则，如期望对数预测密度（ELPD）。然而，由于ELPD基于对数得分，模型排名可能对异常值和尾部不匹配过于敏感。我们提出一个得分匹配的广义ELPD框架，用Bregman评分规则替换对数得分，通过广义后验更新模型参数并评估留一法预测效用。候选后验预测分布根据所选评分规则下的样本外效用进行排序，从而得到标准ELPD的直接正确得分推广。我们特别关注β-散度族，其中β控制预测比较对低密度观测的敏感性。在模型误设定下，该过程渐近选择预测分布与数据生成过程在所选Bregman散度下最接近的模型。模拟研究和微生物及法医数据应用表明，广义ELPD通过降低对低密度观测的敏感性可以改变所选模型。

英文摘要

Predictive Bayesian model comparison often relies on leave-one-out (LOO) cross-validation criteria such as the expected log predictive density (ELPD). However, model rankings can be overly sensitive to outliers and tail mismatch because ELPD is based on the log score. We propose a score-matched generalized ELPD framework that replaces the log score by a Bregman scoring rule to update model parameters through a generalized posterior and to evaluate LOO predictive utility. Candidate posterior predictive distributions are ranked by out-of-sample utility under the chosen scoring rule, yielding a direct proper-score generalization of standard ELPD. We focus especially on the $β$-divergence family, where $β$ controls the sensitivity of predictive comparison to low-density observations. Under model misspecification, the procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence. A simulation study and applications to microbial and forensic data show that the generalized ELPD can change the selected model through reduced sensitivity to low-density observations.

URL PDF HTML ☆

赞 0 踩 0

2606.11183 2026-06-10 math.ST math.DG stat.ME stat.TH 新提交

Nonparametric Riemannian Empirical Bayes, and Denoising Measurements on Manifolds

非参数黎曼经验贝叶斯与流形上的测量去噪

Adam Quinn Jaffe, Leonardo V. Santoro, Bodhisattva Sen

AI总结针对流形上潜变量与测量值的去噪问题，提出基于Tweedie-Eddington公式的切向贝叶斯去噪器，利用拉普拉斯-贝尔特拉米算子实现数据驱动近似，并证明其在低噪声下接近贝叶斯风险，但收敛速率慢于欧氏情形。

Comments 56 pages, 11 figures. Abstract shortened to meet arXiv requirements. Comments welcome!

详情

AI中文摘要

我们启动了在紧黎曼流形上潜变量及其测量值均位于流形上、似然为黎曼高斯分布的非参数经验贝叶斯去噪方法研究。起点是黎曼高斯混合模型的一个新颖的Tweedie-Eddington公式，该公式通过测量的边际分布识别出某个替代神谕去噪器；它通过一阶近似避免了显式计算后验弗雷歇均值（贝叶斯去噪器所需），因此我们称之为“切向”贝叶斯去噪器。我们证明该替代神谕在低噪声条件下几乎达到贝叶斯风险，利用拉普拉斯-贝尔特拉米算子的谱理论构建其完全数据驱动的近似，并建立替代神谕与其近似之间距离的有限样本收敛速率。与欧氏情形中近乎参数的速率相比，黎曼情形中的速率较慢，这是由于黎曼高斯密度在其弗雷歇均值的割迹处存在奇异性；在圆环的特殊情形下，我们建立了匹配的下界，表明所提出的去噪器是极小化最优的，并且去噪问题呈现出真正的非参数收敛速率。最后，我们将方法应用于两个科学问题：天文学中球面值伽马射线暴位置去噪，以及结构生物学中环面值蛋白质相邻氨基酸扭转角对（即拉马钱德兰图）去噪。

英文摘要

We initiate the study of nonparametric empirical Bayes denoising methods in the setting where both the latent variables and their measurements lie on a compact Riemannian manifold, and where the likelihood is a Riemannian Gaussian distribution. Our starting point is a novel Tweedie-Eddington formula for Riemannian Gaussian mixture models which identifies a certain surrogate oracle denoiser in terms of the marginal distribution of the measurements; it avoids the explicit computation of the posterior Fréchet mean (as required by the Bayes denoiser) via a first-order approximation, hence we refer to it as the "tangential" Bayes denoiser. We show that this surrogate oracle achieves nearly the Bayes risk in a low-noise regime, we construct a fully data-driven approximation of it using the spectral theory of the Laplace-Beltrami operator, and we establish finite-sample rates of convergence for the distance between the the surrogate oracle and its approximation. Contrasting the nearly-parametric rates from the Euclidean setting, the rates in the Riemannian setting are slower due to the singularities of the Riemannian Gaussian density at the cut locus of its Fréchet mean; in the special case of the circle we establish matching lower bounds which show that our proposed denoiser is minimax-optimal, and that the denoising problem exhibits a genuinely nonparametric rate of convergence. Lastly, we implement our methodology in two scientific applications: in astronomy, the sphere-valued problem of denoising the locations of gamma ray bursts; in structural biology, the torus-valued problem of denoising pairs of torsion angles of adjacent amino acids in a protein (i.e., the Ramachandran plot).

URL PDF HTML ☆

赞 0 踩 0

2606.10256 2026-06-10 physics.data-an hep-ex stat.AP 新提交

Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics

置信度、统计证据与相对信念及其在粒子物理问题中的应用

Michael Evans, Siqi Zheng

AI总结本文提出相对信念推断方法，在泊松信号加背景模型中构建不确定性量化区间，并与Feldman-Cousins区间对比，满足似然排序和频率学派要求。

2605.19163 2026-06-10 stat.ME 版本更新

Progression to the mean: A comparison of Bayesian clinical prediction models outputting the posterior mean versus conventional plug-in predictions

走向均值：一种实用的贝叶斯工作流，用于开发和部署临床预测模型

Mohsen Sadatsafavi, Richard D. Riley

AI总结本文提出了一种实用的贝叶斯工作流，用于开发和部署临床预测模型，通过使用收缩先验和个体后验均值决策方法，提高了预测性能和不确定性量化。

Comments 26 pages, 6 tables, 5 figures

详情

AI中文摘要

临床预测模型为每个人提供预测（例如，估计风险），通常以点估计形式表达，来源于确定性函数如逻辑回归方程。此类'插件'预测隐藏了内在的不确定性。相比之下，贝叶斯方法提供了一种基于个体特定后验风险分布的不确定性量化机制。然而，由于感知的主观性、计算成本和实施复杂性，贝叶斯预测模型使用率较低。为此，我们提出了一种实用的贝叶斯流程，用于生成和部署预测模型。主要组成部分是（i）收缩先验，导致基于拉普拉斯/正态近似的回归系数后验分布，这避免了蒙特卡罗采样；以及（ii）使用个体的后验均值进行决策，这从期望效用视角得到支持。对于（i），我们建议具有互补特征的先验（简单性、用户输入、自动收缩）。对于（ii），我们建议计算后验均值的精确和近似方法，包括二次积分、麦克凯近似和投影预测映射的适应，从而创建一个简单的逻辑方程近似均值。通过示例和模拟，我们展示了贝叶斯工作流在预测性能上往往与插件预测相当或更好，同时能够通过合适覆盖的不确定性量化。在大多数模拟中，使用后验均值预测比插件预测在临床效用上更高，有时相当显著。总之，临床预测建模和部署的贝叶斯方法既实用又具有临床优势，因此高度推荐。

英文摘要

Clinical prediction models provide predictions for individuals, typically expressed as point estimates derived from a deterministic function, such as a logistic regression equation. Such 'plug-in' predictions hide inherent uncertainty. In contrast, Bayesian methods offer a coherent mechanism for uncertainty propagation, and allow the computation of the posterior mean as the measure of centrality of choice for clinical decision-making. However, Bayesian methods are not widely utilised in predictive analytics for healthcare. We investigated the feasibility and performance of a Bayesian adaptation of the commonly used frequentist framework for risk prediction modelling. We assessed (i) the use of shrinkage priors with complementary features (simplicity, user input, and automatic shrinkage) that enable Laplace/normal approximation of the posterior, and (ii) exact and approximate methods for efficient computation of the posterior mean. Using examples and simulations, we demonstrate that this Bayesian approach is feasible and improves predictive performance, while enabling uncertainty quantification with suitable coverage. In small-to-medium sample sizes, the gain in clinical utility by using the posterior mean over plug-in predictions was equivalent to the gain from using a noticeably larger sample size. Adapting the widely used parametric regression methods to an approximate Bayesian framework for prediction modelling is both pragmatic and clinically advantageous.

URL PDF HTML ☆

赞 0 踩 0

2606.11013 2026-06-10 stat.ME 新提交

Empirical stratification for treatment effect heterogeneity with post-treatment variables

治疗后变量处理效应异质性的经验分层

Chao Cheng, Rui Wang, Yichi Zhang

AI总结提出一种假设精简的经验分层框架，通过基于基线协变量预测的潜在治疗后变量响应定义经验得分，构建可识别的经验分层处理效应，并连接主分层因果效应。

详情

AI中文摘要

治疗后变量（PVs），如治疗不依从、行为反应、中间事件，常常改变对主要结局的最终处理效应。然而，现有方法在研究中针对PVs的处理效应异质性方面提供的工具有限。传统的异质性处理效应估计量以基线协变量为条件。然而，类似地以观察到的PV为条件会引发处理效应估计的内生选择偏差。主分层为研究跨主分层的因果效应提供了严格的框架，但主分层是潜在的，其识别通常需要严格的假设。本文开发了一个假设精简的经验分层框架，用于表征针对PVs的处理效应异质性。我们使用基于基线协变量预测的潜在PV响应来定义经验得分，并利用经验得分构建经验上可访问的子组。由此产生的经验分层处理效应（ETEs）在标准因果假设下是可识别的。我们将所提出的框架与主分层联系起来，表明平均ETE在主忽略性假设下恢复了主因果效应，但在违反该假设时仍然具有信息量。我们进一步引入了投影ETE曲线，并开发了基于高效影响函数的半参数推断估计量。我们通过两个实际应用说明了所提出的框架。

英文摘要

Post-treatment variables (PVs), such as treatment noncompliance, behavioral responses, intercurrent events, often modify the ultimate treatment effect on the primary outcome. However, existing methods provide limited tools for studying treatment effect heterogeneity with respect to PVs. Conventional heterogeneous treatment effect estimands condition on baseline covariates. However, similarly conditioning on the observed PV can induce endogenous selection bias for the treatment effect estimation. Principal stratification offers a rigorous framework for studying principal causal effects across principal strata, but principal strata are latent and their identification often requires stringent assumptions. This paper develops an assumption-lean empirical stratification framework for characterizing treatment effect heterogeneity with respect to PVs. We define empirical scores using the predicted potential PV responses based on baseline covariates, and use the empirical scores to construct empirically accessible subgroups. The resulting empirical-stratum treatment effects (ETEs) are identifiable under standard causal assumptions. We connect the proposed framework to principal stratification by showing that the average ETE recovers principal causal effects under the principal ignorability assumption, but remains informative under violations of this assumption. We further introduce projected ETE curves and develop efficient influence function-based estimators for the semiparametric inference. We illustrate the proposed framework with two real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.09892 2026-06-10 cs.LG stat.ME 新提交

LMT: A Bayesian Framework for Causal Discovery from Textual Alarm Records in Manufacturing Systems

LMT: 制造系统中文本告警记录的因果发现贝叶斯框架

Xiaofeng Xiao, Jianhong Chen, Qiuzhuang Sun, Naichen Shi, Xubo Yue

发表机构 * Department of Mechanical & Industrial Engineering, Northeastern University, Boston, MA, USA（东北大学机械与工业工程系）； College of Integrative Studies, Singapore Management University, Singapore（新加坡国立大学整合研究学院）； Department of Industrial Engineering and Management Sciences, Department of Mechanical Engineering, Northwestern University, IL, USA（西北大学工业工程与管理科学系、机械工程系）

AI总结提出LMT框架，结合大语言模型提取的语义信号和基于泊松过程的时间证据，通过贝叶斯方法从文本告警记录中发现因果图，在小样本场景下表现优异。

Comments 19 pages

详情

AI中文摘要

文本事件记录（如告警日志）已成为工程和制造系统中越来越常见的数据源。除了识别相关性或重复模式外，工程师通常有兴趣了解在系统运行过程中哪些类型的事件因果性地触发或影响其他事件。文本事件描述可能包含关于此类因果关系的语义线索，而最近的大语言模型（LLM）为提取这些信号提供了有前景的工具。然而，仅依赖LLM编码的文本信息不足以进行准确的因果发现，因为语义模式并不直接揭示因果机制，并且可能将因果关系与相关性或频繁的顺序模式混淆。为了解决这些挑战，我们提出了\textbf{LMT}，一个用于工程事件数据的贝叶斯因果发现框架，它联合利用了文本描述和时间戳。具体来说，LMT首先使用LLM从事件描述中提取语义因果信号，并构建事件类型或事件簇之间因果图的先验分布。然后，它通过基于泊松过程的似然函数纳入时间证据，使得基于时间戳的统计证据能够精炼LLM信息先验。通过整合文本和时间信息，LMT生成一个既可解释又有数据支持的因果图。模拟研究表明，所提出的框架在不同设置下都是有效的，并且在样本量较小的告警事件场景中尤其具有优势。

英文摘要

Textual event records, such as alarm logs, have become an increasingly common data source in engineering and manufacturing systems. Beyond identifying correlations or recurring patterns, engineers are often interested in understanding which types of events causally trigger or influence other events during system operation. Textual event descriptions may contain semantic clues about such causal relationships, and recent large language models (LLMs) provide a promising tool for extracting these signals. However, relying solely on LLM-encoded textual information is insufficient for accurate causal discovery, since semantic patterns do not directly reveal causal mechanisms and may confuse causation with correlation or frequent sequential patterns. To address these challenges, we propose \textbf{LMT}, a Bayesian causal discovery framework for engineering event data that jointly leverages textual descriptions and timestamps. Specifically, LMT first uses LLMs to extract semantic causal signals from event descriptions and constructs a prior distribution over causal graphs among event types or event clusters. It then incorporates temporal evidence through a Poisson-process-based likelihood, allowing the LLM-informed prior to be refined by timestamp-based statistical evidence. By integrating the textual and temporal information, LMT produces a causal graph that is both interpretable and data-supported. Simulation studies show that the proposed framework is effective across different settings and is especially advantageous in small-sample alarm-event scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.10497 2026-06-10 stat.ME math.ST stat.TH 新提交

Minimum free energy randomized design to improve covariate balance

最小自由能随机化设计以改善协变量平衡

Haolin Chen, Jun Yu

AI总结提出最小自由能随机化设计，通过平衡协变量与最大化熵的权衡，结合高效动态分配算法，提升统计效率与鲁棒性。

Comments 30 pages, 2 figures

详情

AI中文摘要

“分块你能分的，随机化你不能分的”是随机对照试验中处理效应估计的核心原则。尽管已经开发了丰富的分配策略，但分块实现的协变量平衡与随机化保证的鲁棒性之间的明确权衡很少被量化。受热力学第二定律的启发，本文提出一个新准则，即降低协变量不平衡的同时最大化量化对比和分配多样性的熵。由此推导出最优策略，称为最小自由能随机化设计，从而正式实现这种权衡。为了便于实际实施，我们进一步开发了一种计算高效的动态分配算法，并具有理论保证。通过有限样本方差分解，表明所提出的随机化策略能够控制协变量不平衡，同时防止未观测到的异质性主导均方误差，从而在规定的设计约束下保持极小极大效率。大量数值模拟表明，我们的方法比现有方法具有更优的统计效率和更强的鲁棒性。

英文摘要

``Block what you can and randomize what you cannot'' is the core principle for treatment effect estimation in randomized controlled trials. Although a wealth of allocation strategies has been developed, an explicit trade-off between the covariate balance achieved by blocking and the robustness guaranteed by randomization is seldom quantified. Motivated by the second law of thermodynamics, this work posits a new criterion that lowers the covariate imbalance while maximizing the entropy that quantifies contrast and allocation diversity. The resulting optimal strategy, termed the minimum free energy randomized design, is then derived, thereby formally achieving such a trade-off. To facilitate practical implementation, we further develop a computationally efficient dynamic allocation algorithm with theoretical guarantees. Using a finite-sample variance decomposition, the proposed randomization strategy is shown to control covariate imbalance while preventing unobserved heterogeneity from dominating the mean squared error, thus retaining minimax efficiency under the prescribed design constraints. Extensive numerical simulations demonstrate that our method achieves superior statistical efficiency and greater robustness than existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2501.17835 2026-06-10 stat.ME stat.AP 版本更新

An Estimator-Robust Design for Augmenting Randomized Controlled Trials with External Real-World Data

一种估计量鲁棒的设计：用外部真实世界数据增强随机对照试验

Sky Qiu, Jens Tarp, Andrew Mertens, Mark van der Laan

AI总结提出使用自适应目标最大似然估计（A-TMLE）结合匹配抽样策略，通过分解平均处理效应为合并效应和偏倚效应，并基于试验入组倾向分和外部数据倾向分进行匹配，提高估计鲁棒性和置信区间覆盖率。

详情

AI中文摘要

用外部真实世界数据（RWD）增强随机对照试验（RCT）有可能提高处理效应估计量的有限样本效率。我们描述了使用自适应目标最大似然估计（A-TMLE）来估计平均处理效应（ATE），通过将ATE估计量分解为两个部分：一个结合了RCT和外部数据的合并ATE估计量，以及一个捕捉RCT入组对结果的条件效应的偏倚估计量。该方法将RCT数据视为参考，并纠正RCT与外部数据源之间的任何不一致性。鉴于现代电子健康记录中外部RWD的日益丰富，确定选择候选外部患者进行数据整合的最优策略仍然是一个开放但关键的问题。在这项工作中，我们首先研究A-TMLE估计量的鲁棒性，然后提出一种基于匹配的抽样策略，旨在提高估计量相对于目标估计量的鲁棒性。我们提出的策略是结果盲的，并基于两个一维分数进行匹配：试验入组分数和外部数据中的倾向分数。我们在模拟中证明，我们的抽样策略提高了A-TMLE产生的置信区间的覆盖率和窄度。我们通过使用Optum Clinformatics索赔数据库增强DEVOTE心血管安全性试验的案例研究来说明我们的方法。

英文摘要

Augmenting randomized controlled trials (RCTs) with external real-world data (RWD) has the potential to improve the finite sample efficiency of treatment effect estimators. We describe using adaptive targeted maximum likelihood estimation (A-TMLE) for estimating the average treatment effect (ATE) by decomposing the ATE estimand into two components: a pooled-ATE estimand that combines data from both the RCT and external sources, and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. This approach views the RCT data as the reference and corrects for inconsistencies of any kind between the RCT and the external data source. Given the growing abundance of external RWD from modern electronic health records, determining the optimal strategy to select candidate external patients for data integration remains an open yet critical problem. In this work, we begin by studying the robustness property of the A-TMLE estimator and then propose a matching-based sampling strategy that attempts to improve the robustness of the estimator with respect to the target estimand. Our proposed strategy is outcome-blind and involves matching based on two one-dimensional scores: the trial enrollment score and the propensity score in the external data. We demonstrate in simulations that our sampling strategy improves the coverage and narrows the widths of confidence intervals produced by A-TMLE. We illustrate our method with a case study of augmenting the DEVOTE cardiovascular safety trial by using the Optum Clinformatics claims database.

URL PDF HTML ☆

赞 0 踩 0

2606.10593 2026-06-10 stat.ME stat.CO 新提交

留出一个窗口：修改刀切法用于时间序列的预测推断

Hanyang Jiang, Rina Foygel Barber, Ashwin Pananjady, Yao Xie

发表机构 * Schools of Industrial and Systems Engineering and Electrical and Computer Engineering（工业与系统工程系和电气与计算机工程系）； Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结针对时间序列中数据非可交换性和记忆预测器的问题，提出留出一个窗口（LWO）方法，通过修改刀切法实现有效覆盖，并产生比分裂共形预测更窄的区间。

Comments 40 pages, 8 figures

详情

AI中文摘要

共形预测方法在数据可交换且预测器以无记忆方式训练时，具有强大的理论和经验预测推断性能。然而，这些假设和约束在许多真实数据场景中不切实际，例如时间序列（其中时间依赖性违反了可交换性，并且无记忆预测器不可避免地具有较差的预测准确性）。最近的研究表明，分裂共形预测方法对于记忆预测器和偏离可交换性（这是时间序列数据的常见特征）具有鲁棒性。然而，由于使用样本分裂可能导致较低的准确性，这促使我们探究其他不依赖数据分裂的预测推断方法是否也能可靠地用于时间序列设置。在这项工作中，我们表明即使在具有轻微时间依赖性的典型时间序列模型中，原始的留一刀切法也可能遭受任意的覆盖损失。作为补救措施，我们提出了一种针对此类设置的精心修改，称为留出一个窗口（LWO）方法，并表明只要模型拟合过程满足温和的稳定性条件，它就能实现有效的覆盖。我们的证明基于量化数据偏离循环可交换性的程度，并引入了新的系数来衡量这种偏离的程度。在时间序列数据上的实验表明，当原始刀切法无法覆盖时，我们的LWO方法通常能实现有效的覆盖，同时产生比分裂共形预测更窄的区间。

英文摘要

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.

URL PDF HTML ☆

赞 0 踩 0

2512.08232 2026-06-10 stat.ME math.PR math.ST stat.AP stat.TH 版本更新

Wishart kernel density estimation for strongly mixing time series on the cone of positive definite matrices

正定矩阵锥上强混合时间序列的Wishart核密度估计

Léo R. Belzile, Christian Genest, Frédéric Ouimet, Donald Richards

AI总结提出Wishart核密度估计器用于正定矩阵锥上的密度估计，该估计器具有边界感知性，能缓解边界偏差，并在混合条件下建立了均方误差、一致强相合性和渐近正态性，模拟和实例表明其优于其他方法。

Comments 43 pages, 4 figures, 2 tables

详情

AI中文摘要

在正定矩阵锥上引入了一种Wishart核密度估计器用于密度估计。该估计器具有边界感知性，减轻了传统核密度估计器遭受的边界偏差，同时易于实现。在Lebesgue测度和适当的混合条件下，建立了其均方误差、在扩张紧集上的一致强相合性以及渐近正态性。这项工作是在任何度量下对该空间上相依数据进行密度估计的首项研究。对于独立观测，还导出了平均绝对误差的渐近上界。一项模拟研究将Wishart核密度估计器与log-Gaussian核密度估计器（另一种基于Schwartzman [Int. Stat. Rev., 2016, 84(3), 456--486]提出的矩阵变量对数正态分布的边界感知估计器）以及环境欧氏空间上的朴素高斯核密度估计器进行了性能比较。在估计Wishart自回归过程的平稳边际密度时，针对多个自回归系数矩阵和新息协方差矩阵，Wishart核密度估计器表现出最佳的整体准确性和稳定性。通过估计Amazon Corp.股票和标准普尔500交易所交易基金5分钟日内收益计算的已实现协方差矩阵一年时间序列的边际密度，说明了Wishart核密度估计器的实际效用。所有代码均通过R包ksm公开提供，以促进该方法的实施和结果的可重复性。

英文摘要

A Wishart kernel density estimator (KDE) is introduced for density estimation in the cone of positive definite matrices. The estimator is boundary-aware and mitigates the boundary bias suffered by conventional KDEs, while remaining simple to implement. Its mean squared error, uniform strong consistency on expanding compact sets, and asymptotic normality are established under the Lebesgue measure and suitable mixing conditions. This work represents the first study of density estimation for dependent data on this space under any metric. For independent observations, an asymptotic upper bound on the mean absolute error is also derived. A simulation study compares the performance of the Wishart KDE with that of the log-Gaussian KDE, another boundary-aware estimator based on the matrix-variate lognormal distribution proposed by Schwartzman [Int. Stat. Rev., 2016, 84(3), 456--486], and with the naive Gaussian KDE on the ambient Euclidean space. When estimating the stationary marginal density of a Wishart autoregressive process for several autoregressive coefficient matrices and innovation covariance matrices, the Wishart KDE exhibits the best overall accuracy and stability. The practical utility of the Wishart KDE is illustrated by estimating the marginal density of a one-year time series of realized covariance matrices computed from 5-minute intra-day returns on Amazon Corp. shares and on the Standard & Poor's 500 exchange-traded fund. All code is publicly available via the R package ksm to facilitate implementation of the method and reproducibility of the findings.

URL PDF HTML ☆

赞 0 踩 0

2508.13972 2026-06-10 econ.EM stat.ME 版本更新

A Flexible Approach to Augmenting a Bayesian VAR with Nonlinear Factors

一种增强贝叶斯VAR与非线性因子的灵活方法

Todd Clark, Florian Huber, Gary Koop

AI总结本文提出一种用回归树非参数建模非线性因子的向量自回归模型，通过因子方法简洁建模非线性，避免误设，实现高效贝叶斯计算，并适用于结构冲击识别。

详情

AI中文摘要

本文提出了一种向量自回归模型，该模型通过回归树非参数地建模非线性因子。我们的模型有四个主要优点。第一，因子方法的使用确保了非线性偏离被简洁地建模。特别是，它们表现出功能池化，即使用少量非线性因子来建模变量间的共同非线性。第二，非参数地建模潜在非线性降低了误设的风险。第三，即使在非常高维的模型中，使用MCMC的贝叶斯计算也是直接的，允许高效的逐方程估计，从而避免了诸如时变参数VAR等流行替代方法中出现的计算瓶颈。第四，现有的线性因子模型中识别结构性经济冲击的方法可以通过我们的模型直接适用于非线性情况。涉及人工数据和宏观经济数据的实验说明了我们模型的性质及其在预测和结构性经济分析中的有用性。

英文摘要

This paper proposes a vector autoregression augmented with nonlinear factors that are modeled nonparametrically using regression trees. There are four main advantages of our model. First, the use of factor methods ensures that departures from linearity are modeled parsimoniously. In particular, they exhibit functional pooling where a small number of nonlinear factors are used to model common nonlinearities across variables. Second, modeling potential nonlinearities nonparametrically lessens the risk of misspecification. Third, Bayesian computation using MCMC is straightforward even in very high-dimensional models, allowing for efficient, equation-by-equation estimation, thus avoiding computational bottlenecks that arise in popular alternatives such as the time-varying parameter VAR. Fourth, existing methods for identifying structural economic shocks in linear factor models can be adapted for the nonlinear case in a straightforward fashion using our model. Exercises involving artificial and macroeconomic data illustrate the properties of our model and its usefulness for forecasting and structural economic analysis.

URL PDF HTML ☆

赞 0 踩 0

2501.04339 2026-06-10 stat.ML cs.LG physics.app-ph 版本更新

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

可解释的深度卷积模型用于复杂系统中的非线性多元时间序列

Domjan Baric, Davor Horvatic

发表机构 * Department of Physics, Faculty of Science, University of Zagreb（扎格拉布大学物理系）

AI总结提出DCIts架构，通过分解为Focuser和Modeler组件，实现非线性多元时间序列的局部可解释交互结构学习，在保持预测精度的同时恢复稳定的符号化滞后交互模式。

Comments 40 pages, 13 figures

详情

DOI: 10.1063/5.0325209
Journal ref: Chaos 36, 063116 (2026)

子采样自然梯度算法的草图-投影分析

Gil Goldshlager, Jiang Hu, Lin Lin

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过将子采样自然梯度下降（SNG）视为草图-投影方法，提出基于平方体积采样的新代理，证明单小批量下SNG方向期望等于预处理梯度下降步，给出全局收敛保证和显式收敛率，并解释SNG相对于SGD的优势在于更有效利用模型雅可比矩阵的谱衰减。

Comments 26 pages, 7 figures

详情

AI中文摘要

子采样自然梯度下降（SNG）已被用于实现高精度科学机器学习，但基于随机预条件的标准分析无法洞察实际小样本设置。我们通过将SNG分析为草图-投影方法克服了这一限制。受此视角启发，我们摒弃了使用两个独立小批量解耦梯度和预条件的常规理论代理，取而代之的是基于平方体积采样的新代理。在这个新代理下，我们证明即使存在耦合，SNG方向的期望也等于预处理梯度下降步，从而得到：(i) 使用任意大小的单个小批量时的全局收敛保证，以及(ii) 用与草图-投影结构相关的量显式表征收敛速率。这些发现进而为小样本设置提供了新见解，例如表明SNG相对于SGD的优势在于它能更有效地利用模型雅可比矩阵中的谱衰减。我们还扩展这些思想以解释SNG的一种流行结构化动量方案SPRING，通过证明它自然源于加速草图-投影方法。

英文摘要

Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample settings. We overcome this limitation by instead analyzing SNG as a sketch-and-project method. Motivated by this lens, we discard the usual theoretical proxy which decouples gradients and preconditioners using two independent mini-batches, and we replace it with a new proxy based on squared volume sampling. Under this new proxy we show that the expectation of the SNG direction becomes equal to a preconditioned gradient descent step even in the presence of coupling, leading to (i) global convergence guarantees when using a single mini-batch of any size, and (ii) an explicit characterization of the convergence rate in terms of quantities related to the sketch-and-project structure. These findings in turn yield new insights into small-sample settings, for example by suggesting that the advantage of SNG over SGD is that it can more effectively exploit spectral decay in the model Jacobian. We also extend these ideas to explain a popular structured momentum scheme for SNG, known as SPRING, by showing that it arises naturally from accelerated sketch-and-project methods.

URL PDF HTML ☆

赞 0 踩 0

2601.22814 2026-06-10 stat.CO 版本更新

Wasserstein Geometry of Information Loss in Nonlinear Dynamical Systems

非线性动力学系统中信息损失的Wasserstein几何

Yiting Duan, Zhikun Zhang, Yi Guo

AI总结针对非线性系统时间延迟重构映射非单射导致的多值演化问题，提出基于测度论框架量化模糊性，引入内在随机性指标，并用k近邻估计实现有限分辨率下的数值计算。

详情

AI中文摘要

时间延迟嵌入是重构非线性系统动力学的强大技术。然而，重构映射并不总是嵌入，这一条件在实践中很少得到验证。当重构映射非单射时，多个潜在状态可能映射到同一重构状态，导致多值$n$步演化。因此，诱导系统不再允许确定性闭包，未来轨迹的分散导致模糊性。在这项工作中，我们建立了一个测度论框架来量化多值演化引起的模糊性，并引入内在随机性来量化有限时间范围内的模糊性。对于数值实现，我们使用$k$近邻估计器在有限分辨率和有限采样设置下近似内在随机性。在合成和真实世界数据集上的数值实验与预期一致：更接近确定性闭包的重构倾向于产生更低的分数，而将具有较低经验闭包分数的重构作为输入的确定性预测器与更低的展开误差相关，这表明内在随机性为理解重构失败提供了新视角，并可作为选择重构映射的诊断工具。

英文摘要

Time-delay embedding is a powerful technique for reconstructing the dynamics of nonlinear systems. However, the reconstruction map is not always an embedding, a condition rarely verified in practice. When the reconstruction map is non-injective, multiple latent states may map to the same reconstructed state, leading to multi-valued $n$-step evolution. Consequently, the induced system no longer admits a deterministic closure, and the dispersion of future trajectories leads to ambiguity. In this work, we establish a measure-theoretic framework to quantify the ambiguity induced by multi-valued evolution and introduce intrinsic stochasticity to quantify the ambiguity over a finite horizon. For numerical implementation, we use the $k$-nearest-neighbor estimator to approximate intrinsic stochasticity under finite-resolution and finite-sampling settings. Numerical experiments on the synthetic and real-world datasets are consistent with the expectation: reconstructions closer to deterministic closure tend to produce lower scores, and deterministic predictors that take reconstructions with lower empirical closure scores as input are associated with lower rollout errors, suggesting that intrinsic stochasticity provides a new perspective for understanding failures of reconstruction and serves as a diagnostic for selecting reconstruction maps.

URL PDF HTML ☆

赞 0 踩 0

2506.03672 2026-06-10 stat.ML cs.LG math.OC 版本更新

Latent Guided Sampling for Combinatorial Optimization

面向组合优化的潜在引导采样

Sobihan Surendran, Adeline Fermanian, Sylvain Le Corff

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, F-75005 Paris, France（索邦大学和巴黎Cité大学，法国国家科学研究中心，概率、统计与建模实验室，法国巴黎F-75005）； LOPF, Califrais' Machine Learning Lab, Paris, France（LOPF，Califrais机器学习实验室，法国巴黎）

AI总结提出LGS-Net潜在空间模型，结合马尔可夫链蒙特卡洛与随机逼近的潜在引导采样方法，在路由任务上达到最先进性能。

详情

Journal ref: International Conference on Machine Learning, Jul 2026, Seoul, South Korea

AI中文摘要

组合优化问题在物流、制造和药物发现等领域广泛存在，但其NP-hard性质使其计算上具有挑战性。最近的神经组合优化（NCO）方法利用深度学习来学习构建解的策略，通过监督学习或强化学习进行训练。尽管有前景，但这些方法通常依赖于任务特定的增强，在分布外实例上表现不佳，并且缺乏鲁棒的推理机制。此外，现有的潜在空间模型要么需要标记数据，要么使用与实例无关的潜在分布。在这项工作中，我们提出了LGS-Net，一种新颖的以问题实例为条件的潜在空间模型，并引入了一种高效的推理方法——潜在引导采样（LGS），基于马尔可夫链蒙特卡洛和随机逼近。我们证明了我们方法的迭代形成一个时间非齐次马尔可夫链，并提供了严格的理论收敛保证。在基准路由任务上的实证结果表明，我们的方法在NCO基线中达到了最先进的性能。

英文摘要

Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization (NCO) methods leverage deep learning to learn policies for constructing solutions, trained via Supervised or Reinforcement Learning. While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms. Moreover, existing latent space models either require labeled data or use an instance-independent latent distribution. In this work, we propose LGS-Net, a novel latent space model that conditions on problem instances, and introduce an efficient inference method, Latent Guided Sampling (LGS), based on Markov Chain Monte Carlo and Stochastic Approximation. We show that the iterations of our method form a time-inhomogeneous Markov Chain and provide rigorous theoretical convergence guarantees. Empirical results on benchmark routing tasks show that our method achieves state-of-the-art performance among NCO baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.11156 2026-06-10 stat.ML cs.LG 新提交

Itô maps for any-step SDEs

任意步SDE的Itô映射

Zhengkai Pan, Peter Potaptchik, Wenxi Yao, Michael S. Albergo, Jakiw Pidstrigach

发表机构 * Harvard University（哈佛大学）； University of Oxford（牛津大学）； Kempner Institute（凯门研究所）

AI总结提出Itô映射，一种任意步随机流映射，通过单次前向传播预测未来状态，实现随机动力学的精确蒸馏，并支持推理时控制和后验采样。

2606.11044 2026-06-10 stat.ML cs.LG 新提交

Generalized Conformal Predictive Systems Under Distributional Shifts

广义共形预测系统在分布偏移下的应用

Jef Jonkers, Johanna Ziegel

发表机构 * IDLab Seminar for Statistics（统计研究所研讨会）； Department of Electronics（电子系）； ETH Zurich（苏黎世联邦理工学院）； Information Systems Zurich, Switzerland（苏黎世信息系统，瑞士）； Ghent University（根特大学）

AI总结针对分布偏移，通过观测特定置换权重编码偏移，扩展广义共形预测系统，提出偏移感知预测系统，并引入权重不确定性框构建鲁棒共形预测系统包络，提供有限样本或渐近置信保证。

Comments 27 pages, 10 figures

2606.10906 2026-06-10 stat.ML cs.AI cs.LG 新提交

Human-AI Teaming Through the Lens of Calibration

通过校准视角看人机协作

Eric Nalisnick, Chi Zhang, Sophia Qian, Yixin Wang

发表机构 * Department of Computer Science, Johns Hopkins University（计算机科学系，约翰霍普金斯大学）； Department of Statistics, University of Michigan（统计学系，密歇根大学）

AI总结研究通过统计校准视角分析人机协作模型，发现组合方法不保留人类校准度，而委托方法将校准负担转移给拒绝器元模型，且当人类依赖系统不可观测信息时无法实现。

Comments 19 pages, 5 figures (including appendix)

详情

AI中文摘要

我们通过统计校准的视角研究人机协作模型。假设团队由AI模型和人类组成——两者相对于特征空间的某种划分都是校准的——并揭示校准假设如何传播到协作框架中。特别地，我们考虑两种框架：(i) 结合人类和模型预测，或 (ii) 将预测责任委托给人类或模型。通过理论和实证结果，我们表明现有的组合方法不保留人类的校准程度。委托方法（通过委托行为本身）保留了后续预测器的校准，但将负担转移到了决定谁进行预测的拒绝器元模型上。拒绝器必须足够精细地校准，以定位每个成员的优势所在，这一需求随着人类专业知识的增长而增加，并且当人类依赖系统无法观测的信息时变得无法实现。

英文摘要

We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how the calibration assumptions propagate into the teaming framework. In particular, we consider frameworks that either (i) combine human and model predictions or (ii) delegate prediction responsibility to either a human or model. We show via theoretical and empirical results that existing methods for combination do not preserve the human's degree of calibration. Methods for delegation (by the very act of delegation) preserve calibration of the downstream predictors but shift the burden onto the rejector meta-model that decides who predicts. The rejector must be calibrated finely enough to locate where each member is superior, a demand that grows with the human's expertise and becomes unattainable when the human relies on information the system cannot observe.

URL PDF HTML ☆

赞 0 踩 0

2606.10361 2026-06-10 stat.ML cs.LG 新提交

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

基于玻尔兹曼间隔的kNN分类近指数收敛速率

Luyuan Yang, Shayan Shafaei, Chao Lan

发表机构 * School of Computer Science, University of Oklahoma（计算机科学系，俄克拉荷马大学）

AI总结提出玻尔兹曼间隔条件，介于Tsybakov与Massart间隔之间，首次证明kNN分类器可实现近指数收敛速率。

Comments Conference on Uncertainty in Artificial Intelligence (UAI)

2606.10187 2026-06-10 stat.ML cs.LG 新提交

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

面向流式广告中节奏控制的决策校准共形不确定性

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics, Embry-Riddle Aeronautical University（数学系，埃姆伯里-瑞德航空大学）

AI总结提出一种决策校准共形框架，通过衡量预测误差对实际部署策略的最大影响来校准不确定性，理论证明该分数是保护所有可部署节奏控制策略的最小有效不确定性度量，并在公开数据集上显著降低不确定性半径。

详情

AI中文摘要

我们开发了一个决策校准的共形框架，用于流式广告中的节奏控制决策。节奏控制依赖于不确定的未来库存、需求压力、增量响应和会员体验负载。该框架不是校准通用的预测残差，而是通过预测误差对实际可能部署的策略的最大影响来衡量预测误差。主要定理表明，所提出的分数是统一保护所有可部署节奏控制策略的最小有效不确定性度量。几何上，它是有符号策略敏感性集的支持函数。分裂共形校准为该分数提供了有限样本覆盖。一个高维分离定理表明，传统的残差校准可能因支付干扰库存维度而任意保守，而一个鲁棒的节奏控制结果结合了库存、响应和体验不确定性。在基于Criteo Uplift和KuaiRand数据集构建的公开数据校准节奏控制回放中，传统共形节奏控制仍然未解决，在Criteo上残差半径高达7236.7，在KuaiRand上为4629.4。采用所提出的决策校准方法，不确定性半径分别降至18.4和278.6，并为价值、交付、预算和会员负载设置了单独的边际。在Criteo上，所提出的方法证明了比点预测基线更不激进的节奏控制策略，并将保留的任何违规率从16.7%降至3.3%，且预算和会员负载违规为零。在KuaiRand上，选择仍未解决。简而言之，本文确立了预测、响应估计和会员体验模型应根据它们是否缩小节奏控制决策使用的不确定性来判断，因为这会导致自信且不过度保守的决策。

英文摘要

We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.

URL PDF HTML ☆

赞 0 踩 0

2606.10125 2026-06-10 stat.ML cs.DB cs.LG 新提交

Robust Active Learning for Few-Shot Example Selection in Text-to-SQL

鲁棒主动学习用于文本到SQL中的少样本示例选择

Arash Pourhabib

发表机构 * NVIDIA

AI总结针对文本到SQL中少样本示例选择，提出一种鲁棒主动学习方法，通过分层贪婪算法最大化异方差互信息目标，在嵌入流形上实现常数因子近似保证，显著减少标注成本。

Comments 31 pages, 4 figures, 5 tables

详情

AI中文摘要

少样本示例检索是将大型语言模型（LLM）应用于特定领域文本到SQL系统的主要范式。然而，标注示例库的质量直接决定系统准确性，且专家标注成本高昂。我们将这些示例的主动选择形式化为一个在语义查询嵌入的内在低维流形上的约束实验设计问题。与标准主动学习框架不同，我们的设置引入了三个关键挑战：依赖于查询的可变标注可靠性（异方差性）、跨语义主题的空间多样性严格要求（划分拟阵约束），以及嵌入空间真实协方差结构未知的固有现实（模型误设）。为了解决这些问题，我们提出了一种分层贪婪算法，该算法最大化异方差互信息目标。我们证明该目标在内在流形上保持子模性和近似单调性，从而得到理论上的常数因子近似保证。我们建立了一个谱界，表明当假设的替代核与真实数据生成过程存在偏差时，该近似保证会优雅地退化，而非灾难性地崩溃。实验结果表明，所提出的策略显著减少了标注工作量，同时保持了较高的文本到SQL检索准确性。

英文摘要

Few-shot example retrieval is the dominant paradigm for grounding large language models (LLMs) in domain-specific text-to-SQL systems. However, the quality of the annotated example bank directly governs system accuracy, and expert annotation is prohibitively expensive. We formalize the active selection of these examples as a constrained experimental design problem over the intrinsic, low-dimensional manifold of semantic query embeddings. Unlike standard active learning frameworks, our setting introduces three critical challenges: varying, query-dependent annotation reliability (heteroscedasticity), strict requirements for spatial diversity across semantic topics (partition matroid constraints), and the inherent reality that the true covariance structure of the embedding space is unknown (misspecification). To address these, we propose a stratified greedy algorithm that maximizes a heteroscedastic mutual information objective. We prove that this objective remains submodular and approximately monotonic on the intrinsic manifold, yielding a theoretical constant-factor approximation guarantee. We establish a spectral bound demonstrating that this approximation guarantee degrades gracefully, rather than catastrophically, when the assumed surrogate kernel diverges from the true underlying data-generating process. Empirical results demonstrate that the proposed strategy significantly reduces labeling effort while maintaining high text-to-SQL retrieval accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.11057 2026-06-10 cs.LG q-bio.BM stat.ML 新提交

Flexible Kernels for Protein Property Prediction

用于蛋白质性质预测的灵活核函数

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

发表机构 * University of Cambridge（剑桥大学）

AI总结提出利用进化替代矩阵和局部线性性的序列核函数，结合高斯过程实现数据高效的蛋白质性质预测，并融入结构信息进行多任务学习。

Comments 50 pages; to appear at ICML 2026

2606.10913 2026-06-10 cs.LG stat.ML 新提交

Conservation Laws from Data Symmetry in Neural Networks

神经网络中数据对称性导致的守恒律

Jakob Galley, Vahid Shahverdi, Axel Flinth

发表机构 * Umeå University（于默奥大学）

AI总结研究训练数据的对称性是否在梯度流训练中产生守恒量，证明对于解析非多项式损失函数，数据对称性一般不产生额外守恒量；对于均方误差损失，数据增强可产生额外守恒量，并利用可张量化网络框架描述该现象。

2606.10734 2026-06-10 cs.LG stat.ME stat.ML 新提交

SPACR: Single-Pass Adaptive Training of Uncertainty-Aware Conformal Regressors

SPACR: 单次自适应训练的不确定性感知共形回归器

Soundouss Messoudi, Sylvain Rousseau, Sébastien Destercke

发表机构 * Heudiasyc - UMR CNRS 7253, Université de Technologie de Compiègne（法国贡比涅技术大学 - CNRS 7253联合实验室 Heudiasyc）

AI总结提出SPACR方法，通过可微损失直接训练不确定性感知回归器，联合优化效率和有效性，无需批分割或预定义置信水平，单个模型在推理时支持多置信水平预测区间，实验表明其区间更窄、覆盖-效率权衡更优且计算成本更低。

2606.09885 2026-06-10 cs.LG stat.ML 新提交

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

TENP：用于混合专家的梯形专家神经元剪枝

Jiangyang He, Shaolin Zhu, Deyi Xiong

发表机构 * TJUNLP Lab, School of Computer Science and Technology, Tianjin University（天津大学计算机科学与技术学院 TJUNLP实验室）

AI总结提出TENP框架，通过识别重要专家并对其余专家进行神经元剪枝，保留梯形参数模式，在40%路由专家稀疏度和平均63.76%激活参数下，DeepSeek模型准确率仅下降1点，代码生成任务提升10%。

详情

AI中文摘要

混合专家大语言模型通过稀疏激活实现高效扩展，但其部署受到专家大量静态参数占用的根本限制。现有压缩方法要么移除整个专家，破坏路由拓扑并损害性能，要么依赖非结构化权重剪枝，实际效率有限。为解决这些局限，我们提出TENP，一种结构化的梯形专家神经元剪枝框架。使用少量样本，我们识别并保留重要专家，同时对次要专家应用专家神经元剪枝（ENP），从浅层到深层以梯形模式保留模型参数。在评估专家重要性时，我们联合考虑专家输出的幅度及其改变输入向量方向的能力。对于ENP，我们测量每个神经元对专家输出的投影贡献，以识别并保留重要神经元。我们在Qwen和DeepSeek模型上进行了广泛实验。在路由专家稀疏度为40%且平均激活63.76%专家参数的情况下，DeepSeek模型相比全参数模型准确率仅下降1点。此外，在代码生成任务上，它比全参数模型提升10%。

英文摘要

Mixture-of-Experts large language models (LLMs) scale efficiently through sparse activation, yet their deployment is fundamentally constrained by the large static parameter footprint of experts. Existing compression approaches either remove entire experts, disrupting routing topology and harming performance, or rely on unstructured weight pruning with limited practical efficiency. To address the limitations, we propose TENP, a structured Trapezoidal ExpertNeuron Pruning framework. Using a few samples, we identify and retain important experts, while applying expert neuron pruning (ENP) to less important experts, reserving model parameters in a trapezoidal pattern from shallow to deep layers. When evaluating expert importance, we jointly consider both the magnitude of the expert output and its ability to change the direction of the input vector. For ENP, we measure each neuron's projected contribution to the expert output to identify and retain important neurons. We conduct extensive experiments on the Qwen and DeepSeek models. Under a routing expert sparsity of 40% and an average of 63.76% activated expert parameters, the DeepSeek model suffers only a 1-point drop in accuracy compared to the full-parameter model. Moreover, it outperforms the full-parameter model by 10% on code generation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.09875 2026-06-10 cs.LG cs.AI stat.ML 新提交

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

集成局部和全局熵用于大语言模型的不确定性量化

Johanne Medina, Tianyi Zhou, Keivin Isufaj, Aristides Gionis, Sanjay Chawla

AI总结本文提出GLU方法，通过融合隐藏状态几何熵（全局）和token级熵（局部）来量化LLM不确定性，有效捕捉自信但错误的失败模式，无需额外训练。

Comments 17 pages, 2 figures

详情

AI中文摘要

大语言模型会自信地产生幻觉，使得不确定性量化（UQ）对于可靠部署至关重要。现有方法主要依赖token级信号，而中间隐藏状态的几何结构未被充分利用。在本文中，我们将隐藏状态矩阵的几何复杂度作为LLM全局不确定性的度量，同时将token级不确定性估计视为局部度量。我们表明，隐藏状态几何熵（全局不确定性）和token级熵（局部不确定性）在统计上近似正交，捕捉了可靠性预测的不同失败模式。特别地，全局几何恢复了局部信号系统性遗漏的自信但错误的失败模式。基于此，我们提出了全局-局部不确定性（GLU），这是一种无监督、单次前向传播的分数，通过乘法门融合两种信号。在三个模型族和六个基准测试中，GLU匹配或优于所有无监督基线，同时仅需一次前向传播，且保持长度归一化和架构无关性。

英文摘要

Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.

URL PDF HTML ☆

赞 0 踩 0

2606.09856 2026-06-10 cs.CL cs.AI cs.LG stat.ML 新提交

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

使用概率程序训练大型语言模型的归纳推理

Liyi Zhang, Akshay K. Jagadish, Brenden M. Lake, Thomas L. Griffiths

AI总结提出基于程序的后验训练（PPT）方法，利用LLM生成概率程序场景，通过推理产生分布目标，微调模型以提升归纳推理准确性、与人类判断的一致性及校准能力。

Comments 20 pages, 5 figures

详情

AI中文摘要

大型语言模型（LLM）的后训练推理通常专注于数学和编码等演绎任务，其中正确性可验证。然而，许多现实世界的推理问题是归纳性的：智能体必须从稀疏、模糊的观测中推断不确定的信念。使用标准微调方法进行归纳推理面临挑战，包括难以策划大规模、高质量标注数据集以及处理本质上是分布式的目标。在这项工作中，我们引入了一种称为基于程序的后验训练（PPT）的新方法来解决这些局限性：我们使用LLM生成多样化的开放世界场景作为概率程序，运行概率推理以产生查询的分布式目标响应，然后在这些概率软标签上进行微调。使用这种方法，我们在10,000个程序生成的场景上微调LLM，并在保留的模板、人工标注的判断和外部基准上进行评估。总体而言，PPT显著提高了保留归纳任务的估计准确性，增强了与人类判断的一致性，并迁移到估计和校准的外部基准。此外，原始校准的增益并未被事后温度缩放所涵盖，表明与输出重新缩放相比，模型更深入地内化了不确定性。这些结果表明，概率程序介导的微调是一种有前景的方法，用于后训练LLM以可靠地执行近似归纳推理。

英文摘要

Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer uncertain beliefs from sparse, ambiguous observations. There are challenges to using standard fine-tuning methods for inductive reasoning, including difficulties in curating large-scale, high-quality labeled datasets and in handling targets that are inherently distributional. In this work, we introduce a novel approach, called Program-based Posterior Training (PPT), to address these limitations: we use an LLM to generate diverse open-world scenarios as probabilistic programs, run probabilistic inference to produce distributional target responses to queries, and then fine-tune on these probabilistic soft labels. Using this approach, we fine-tune LLMs on 10,000 programmatically generated scenarios and evaluate on held-out motifs, human-labeled judgments, and external benchmarks. Overall, PPT substantially improves estimation accuracy on held-out inductive tasks, increases alignment with human judgments, and transfers to external benchmarks for estimation and calibration. Additionally, the gains in raw calibration are not subsumed by post-hoc temperature scaling, showing that the models have more deeply internalized uncertainty compared to output rescaling. Together, these results suggest that probabilistic-program-mediated fine-tuning is a promising approach for post-training LLMs to reliably perform approximate inductive inference.

URL PDF HTML ☆

赞 0 踩 0

2606.10944 2026-06-10 cs.LG cs.DS math.ST stat.ME stat.ML stat.TH 新提交

Express Language Modeling

Express 语言建模

Albert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey

AI总结提出 Express 工具，将非因果注意力近似转换为因果近似，结合 Thinformer 实现最优因果注意力保证，并加速语言建模中的四个资源瓶颈。

2606.10916 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 新提交

Range Penalization: Theoretical Insights with Applications in Federated Learning

范围惩罚：理论洞见及其在联邦学习中的应用

Yiyuan She, Zhaojun Hu, Yifan Sun

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出范围正则化方法，通过极值聚类实现跨客户端正则化，并开发非渐近统计精度与模式恢复的新证明技术，以及利用局部强凸性的快速优化算法。

2606.10295 2026-06-10 stat.ML cs.LG math.ST stat.TH 新提交

$k$-Nearest Neighbors in Gromov--Wasserstein Space

Gromov--Wasserstein空间中的$k$-最近邻

Kaitlyn Hohmeier, Nicolas Fraiman, Caroline Moosmueller

发表机构 * University of North Carolina at Chapel Hill, Department of Mathematics（北卡罗来纳大学教堂山分校数学系）； University of North Carolina at Chapel Hill, Department of Statistics and Operations Research（北卡罗来纳大学教堂山分校统计与运筹学系）

AI总结本文在Gromov-Wasserstein距离框架下实现k-最近邻分类，证明了度量测度空间和图上分类器的普适一致性，并通过实验验证了其有效性。

详情

AI中文摘要

非线性最小二乘中基于学习特征几何的泛化性

Ayub Kharel, Ilja Kuzborskij, Patrick Rebeschini, Yasin Abbasi-Yadkori

发表机构 * University of Oxford（牛津大学）； Google DeepMind（谷歌DeepMind）； Sapient Intelligence（智睿科技）

AI总结通过算法稳定性分析岭正则化非线性最小二乘的泛化误差，利用经验雅可比Gram矩阵和残差曲率项定义数据依赖的有效维度，并证明其与内在维度而非参数数量相关。

Comments Preprint, under review

详情

AI中文摘要

我们通过平均算法稳定性研究了岭正则化非线性最小二乘模型的泛化性，推导了局部极小值点的误差界，该误差界依赖于数据依赖的有效维度，该维度通过经验雅可比Gram矩阵和残差-曲率项反映了训练参数处梯度模型的几何结构。在线性情况下，曲率项消失，这恢复了雅可比核协方差的经典有效维度，但评估的是训练后的模型而非初始化时的模型（如神经正切核分析中常见）。我们进一步通过梯度特征的覆盖复杂度来界定该有效维度，从而得到依赖于学习几何而非参数数量的保证。特别地，对于流形支持的数据和分段Lipschitz雅可比矩阵，界限随内在维度缩放；而对于单隐层ReLU网络，该机制可通过激活稳定区域的数量显式表达。在合成流形、聚类分布和基准数据集上的实验展示了训练后雅可比矩阵的压缩、残差-曲率线性化的紧致性，以及稳定性界限与观测泛化差距的一致性。我们界限的一个关键特征是推导的简洁性，它基于强对数凹噪声下的Brascamp-Lieb不等式从第一性原理得出。

英文摘要

We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.

URL PDF HTML ☆

赞 0 踩 0

2606.04212 2026-06-10 cs.LG stat.ML 版本更新

Edge of Stability Selectively Shapes Learning Across the Data Distribution

稳定性边缘选择性地塑造数据分布上的学习

Shauna Kwag, Anakha Ganesh, Tomaso Poggio, Pierfrancesco Beneventano

发表机构 * MIT（麻省理工学院）

AI总结本文发现优化中的稳定性边缘（EoS）具有选择性，通过分支干预因果证明了EoS在训练数据子集间重新分配学习，并识别了受益组需满足的两个条件：梯度与Hessian主特征向量对齐，以及梯度幅度持续非零。

Comments ICML HiLD 2026; 27 pages, 22 figures

详情

AI中文摘要

现有对稳定性边缘（EoS）的分析将其视为优化的全局属性。我们表明它也具有选择性：稳定性约束在训练分布的各个子集之间重新分配学习，放大某些组上的进展，同时抑制其他组上的进展。通过从相同训练状态进入或退出EoS regime的分支干预，我们因果地证明了这种权衡，并识别了组受益的两个必要条件。首先，其聚合梯度必须与顶部Hessian特征向量对齐。我们通过一个受控扰动隔离了这一机制，该扰动保持距离但随机化方向，破坏了对齐并消除了优势。其次，该组必须随时间保持非零梯度幅度。在交叉熵损失下，梯度饱和使置信度高的组解耦，将优势转移到输出异常值，后者的梯度持续存在。总之，这些结果表明EoS不仅作为稳定性边界，而且作为控制数据分布上学习分配的机制。

英文摘要

Existing analyses of the edge of stability (EoS) treat it as a global property of optimization. We show that it is also selective: the stability constraint redistributes learning across subsets of the training distribution, amplifying progress on some groups while suppressing progress on others. Using a branching intervention that enters or exits the EoS regime from the same training state, we causally demonstrate this trade-off and identify two necessary conditions for a group to benefit. First, its aggregate gradient must align with the top Hessian eigenvector. We isolate this mechanism with a controlled perturbation that preserves distance but randomizes direction, destroying alignment and eliminating the advantage. Second, the group must sustain non-vanishing gradient magnitude over time. Under cross-entropy loss, gradient saturation decouples confidently classified groups, shifting the advantage to output-outliers, whose gradients persist. Together, these results show that EoS functions not only as a stability boundary, but as a mechanism governing the allocation of learning across the data distribution.

URL PDF HTML ☆

赞 0 踩 0

2605.17189 2026-06-10 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Sample-efficient inductive matrix completion with noise and inexact side-information

具有噪声和不精确侧信息的样本高效归纳矩阵补全

Yuepeng Yang, Cong Ma

发表机构 * Yale Department of Statistics and Data Sciences, Yale University（耶鲁大学统计与数据科学系）； UChicago Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结本文研究了在存在噪声和不精确侧信息的情况下，通过非凸投影梯度下降算法实现样本高效的归纳矩阵补全，提出了一个适用于有效问题规模的正则性条件，实现了线性收敛和估计误差仅依赖于有效问题规模的结论。

详情

AI中文摘要

低秩矩阵补全是一个广泛研究的问题，具有许多变体。归纳矩阵补全（IMC）结合了行和列的侧信息以显著缩小搜索空间。先前的工作分为两个领域：利用这种结构实现减少样本复杂度的方法，但仅适用于无噪声环境；以及处理噪声但需要样本复杂度与环境矩阵维度相匹配的方法，从而放弃了侧信息应提供的样本效率。在本文中，我们通过研究具有噪声的IMC并使用非凸投影梯度下降算法进行谱初始化来填补这一差距。我们的主要技术贡献是建立一个适用于由有效问题规模决定的减少样本复杂度的IMC损失函数的正则性条件，其规模与侧信息维度而非环境维度成比例。这直接导致了线性收敛和估计误差仅依赖于有效问题规模而非环境矩阵维度。我们进一步将分析扩展到不精确侧信息设置，证明减少的样本复杂度得以保持，并且估计误差在不精确性方面是最佳的。广泛的模拟和在MovieLens数据集上的实际实验验证了我们的理论发现。

英文摘要

Inductive matrix completion (IMC) is a variant of low-rank matrix completion that incorporates row and column side-information. In principle, it can reduce the effective dimension of the recovery problem from the ambient matrix size to the dimension of the side-information features. Existing theory, however, does not fully realize this advantage in the noisy setting: sample-efficient guarantees only apply to noiseless recovery, while noisy guarantees require sample sizes comparable to ordinary matrix completion. This paper closes this gap for noisy IMC. We analyze a nonconvex projected gradient descent algorithm with spectral initialization and prove that, under exact side-information, it achieves linear convergence and stable recovery at a sample complexity governed by the effective side-information dimension rather than the ambient matrix dimension. The key technical ingredient is a local regularity condition for the IMC loss that holds at this reduced sample size, despite the mismatch between the observation pattern and the side-information subspaces. We further extend the analysis to inexact side-information, showing that the same reduced sample complexity is preserved and that the estimation error degrades optimally with the level of subspace misspecification. Motivated by this trade-off, we also propose a penalized interpolation between IMC and ordinary matrix completion that balances sample efficiency against robustness to imperfect side-information. Simulations and experiments on the MovieLens dataset support the theoretical findings and illustrate the practical benefits of exploiting side-information in low-sample regimes.

URL PDF HTML ☆

赞 0 踩 0

2603.02673 2026-06-10 stat.ML cs.LG 版本更新

神经算子混合体降低算子学习中的主动复杂度

Anastasis Kratsios, Takashi Furuya, Jose Antonio Lara Benitez, Matti Lassas, Maarten de Hoop

发表机构 * McMaster University and Vector Institute（麦斯特大学和向量研究所）； Shimane University（岛根大学）； Rice University（里士满大学）； University of Helsinki（赫尔辛基大学）

AI总结通过路由混合神经算子（MoNO）与固定单神经算子构造的比较，证明MoNO在主动专家规模上具有更优的深度、宽度和秩缩放，且对Lipschitz目标这些量以O(ε^{-1})为界。

详情

AI中文摘要

算子学习系统并非仅由总参数数量决定；对于一次查询，相关瓶颈可能是必须加载和评估的模型。我们通过路由混合神经算子（MoNO）与固定单神经算子构造之间的建设性比较，在紧致Sobolev子集上研究了经典神经算子的这一区别。该比较涉及相对于基线的专家主动复杂度，其中总存储大小和路由搜索分别考虑。MoNO将每个输入函数通过树路由到一个专家。我们的主要定理表明，在近似集上，每个具有有界输出Sobolev半径的标量一致连续非线性算子都存在一个MoNO近似，其主动专家具有比所分析的单神经算子构造更小的深度、宽度和秩缩放；对于Lipschitz目标，这些专家量以$\mathcal{O}(\varepsilon^{-1})$为界。该定理将局部化转化为主动专家大小、路由深度和专家数量的算子级核算。我们还证明了底层神经算子架构的定量通用近似定理，明确依赖于紧集直径和连续模。

英文摘要

Operator-learning systems are not governed solely by total parameter count; for one query, the relevant bottleneck can be the model that must be loaded and evaluated. We study this distinction for classical neural operators on compact Sobolev subsets through a constructive comparison between routed mixtures of neural operators (MoNOs) and a fixed single-neural-operator construction. The comparison concerns expert-active complexity relative to that baseline, with total stored size and routing search accounted separately. A MoNO routes each input function through a tree to one expert. Our main theorem shows that every scalar uniformly continuous nonlinear operator with bounded output Sobolev radius on the approximation set admits a MoNO approximation whose active expert has smaller depth, width, and rank scaling than the analyzed single-neural-operator construction; for Lipschitz targets these expert quantities are bounded by $\mathcal{O}(\varepsilon^{-1})$. The theorem turns localization into an operator-level accounting of active expert size, routing depth, and number of experts. We also prove a quantitative universal approximation theorem for the underlying neural-operator architecture, with explicit dependence on compact-set diameter and modulus of continuity.

URL PDF HTML ☆

赞 0 踩 0

2503.11553 2026-06-10 math.OC stat.ML 版本更新

Infinity-norm-based Input-to-State-Stable Long Short-Term Memory networks: a thermal systems perspective

基于无穷范数的输入到状态稳定的长短期记忆网络：热系统视角

Stefano De Carli, Davide Previtali, Leandro Pitturelli, Mirko Mazzoleni, Antonio Ferramosca, Fabio Previdi

AI总结本文提出基于无穷范数的输入到状态稳定性条件，改进LSTM网络稳定性，通过惩罚项和早停策略提升热系统建模性能，优于物理模型和GRU网络。

Comments Accepted for publication in the proceedings of the European Control Conference 2025 (ECC25). 8 pages, 3 figures and 1 table

详情

DOI: 10.23919/ECC65951.2025.11187211
Journal ref: 2025 European Control Conference (ECC), 2025, pp. 911-916

AI中文摘要

递归神经网络（RNNs）在系统辨识中表现出色，尤其在非线性动力学系统如热过程方面。然而，稳定性在实际应用中仍是一个关键挑战：尽管底层过程可能本质上是稳定的，但所得RNN模型可能无法保证捕捉这种行为。本文通过推导基于无穷范数（ISS∞）的输入到状态稳定性条件，解决了稳定性问题。所获得的条件依赖于比先前工作更少的网络参数。开发了ISS∞促进的训练策略，将惩罚项纳入损失函数以促进稳定性，并采用一种自定义的早停方法。通过热系统案例研究验证了训练后的LSTM模型质量，其中ISS∞促进的LSTM网络在性能上优于物理模型和ISS∞促进的门控循环单元（GRU）网络，同时优于非ISS∞促进的LSTM和GRU RNN。

英文摘要

Recurrent Neural Networks (RNNs) have shown remarkable performances in system identification, particularly in nonlinear dynamical systems such as thermal processes. However, stability remains a critical challenge in practical applications: although the underlying process may be intrinsically stable, there may be no guarantee that the resulting RNN model captures this behavior. This paper addresses the stability issue by deriving a sufficient condition for Input-to-State Stability based on the infinity-norm (ISS$_{\infty}$) for Long Short-Term Memory (LSTM) networks. The obtained condition depends on fewer network parameters compared to prior works. A ISS$_{\infty}$-promoted training strategy is developed, incorporating a penalty term in the loss function that encourages stability and an ad hoc early stopping approach. The quality of LSTM models trained via the proposed approach is validated on a thermal system case study, where the ISS$_{\infty}$-promoted LSTM outperforms both a physics-based model and an ISS$_{\infty}$-promoted Gated Recurrent Unit (GRU) network while also surpassing non-ISS$_{\infty}$-promoted LSTM and GRU RNNs.

URL PDF HTML ☆

赞 0 踩 0

2509.17251 2026-06-10 stat.ML cs.LG 版本更新

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

线性回归中的风险比较：隐式正则化主导显式正则化

Jingfeng Wu, Peter L. Bartlett, Sham M. Kakade, Jason D. Lee, Bin Yu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Alphabetical order ； Harvard University（哈佛大学）； Google DeepMind（谷歌DeepMind）

AI总结本文通过实例比较线性回归中梯度下降、岭回归和随机梯度下降的有限样本风险，发现梯度下降优于岭回归，但与随机梯度下降不可比，且在某些问题中梯度下降可能更差。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情

AI中文摘要

现有理论表明，对于按容量和源条件分类的线性回归问题，梯度下降（GD）始终是极小化最优的，而岭回归和在线随机梯度下降（SGD）对于某些类别的问题则是多项式次优的。超越极小化理论，本文为任何良好设定的线性回归问题提供了这些算法有限样本风险的实例比较。我们的分析得出三个关键发现。首先，GD 优于岭回归：在可比较的正则化下，GD 的过剩风险始终在岭回归的一个常数因子内，但即使经过最优调整，岭回归也可能多项式地更差。其次，GD 与 SGD 不可比。虽然已知对于某些问题 GD 可以多项式地优于 SGD，但反之亦然：我们受良性过拟合理论启发构造了问题，其中最优停止的 GD 多项式地更差。最后，对于一类重要子问题——具有快速且连续衰减协方差谱的问题，GD 优于 SGD，这包括所有满足标准容量条件的问题。

英文摘要

Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems. Moving beyond minimax theory, this work provides instance-wise comparisons of the finite-sample risks for these algorithms on any well-specified linear regression problem. Our analysis yields three key findings. First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of that of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with SGD. While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition.

URL PDF HTML ☆

赞 0 踩 0

2505.11702 2026-06-10 cs.LG stat.ML 版本更新

Post-Training Augmentation Invariance

训练后增强不变性

Keenan Eikenberry, Lizuo Liu, Yoonsang Lee

发表机构 * Department of Mathematics, Dartmouth College（达特茅斯学院数学系）

AI总结提出训练后增强不变性框架，通过轻量级MLP适配器网络在预训练模型潜空间上实现近似不变性，无需微调且保持原始特征。

详情

AI中文摘要

本文开发了一个训练后增强不变性的框架，其目标是为预训练网络添加不变性属性，同时不改变其在原始非增强输入分布上的行为。我们精确定义了这一概念，并引入了增强编码器，这是一种概率编码器，形式化了基于增强的编码过程，并作为我们的基本研究对象。我们提出了两种增强编码器的损失函数，即马尔可夫-瓦瑟斯坦最小化和瓦瑟斯坦相关性最大化，并通过实验证明，这两种损失函数可用于训练轻量级的单隐藏层MLP适配器网络$E_{\theta}$，当将其附加到预训练网络$F$的潜空间时，确实能实现（近似）训练后增强不变性。例如，在STL10上使用$F=\text{DINO}$特征时，复合网络$C\circ E_{\theta}\circ F$（其中$C$是线性分类器，$E_{\theta}$是我们提出的适配器网络之一）在任意旋转图像上达到94%的分类准确率，而没有适配器$E_{\theta}$的$C\circ F$网络则降至71%。类似地，我们可以将噪声不变分类结果从58%提升至86%。重要的是，我们无需微调即可获得这些结果（$F$的权重全程冻结），并且我们的方法对原始特征的破坏很小，因为$E_{\theta}$在非增强潜分布上几乎等距作用。相比之下，我们展示了使用替代候选损失函数（特别是SimCLR和HSIC最大化）训练的适配器网络产生了不具竞争力的分类结果，并从根本上破坏了原始潜空间。代码见https://this URL。

英文摘要

This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks $E_θ$ that, when appended to the latent space of a pretrained network $F$, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with $F=\text{DINO}$ features, the composite network $C\circ E_θ\circ F$, where $C$ is a linear classifier and where $E_θ$ is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form $C\circ F$ without the adapter $E_θ$ drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of $F$ remain frozen throughout), and our methods introduce little corruption to the original features, since $E_θ$ acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at https://github.com/keenan-eikenberry/augmentation_invariance

URL PDF HTML ☆

赞 0 踩 0

2503.20272 2026-06-10 stat.ML cs.LG 版本更新

An $(ε,δ)$-accurate level set estimation with a stopping criterion

一个具有停止准则的 $(\epsilon,\delta)$-精确水平集估计

Hideaki Ishibashi, Kota Matsui, Kentaro Kutsukake, Hideitsu Hino

发表机构 * Kyushu Institute of Technology（九州工业技术大学）； Nagoya University / RIKEN AIP（名古屋大学 / RIKEN AIP）； The Institute of Statistical Mathematics/ RIKEN AIP（统计数学研究所 / RIKEN AIP）

AI总结提出一种带停止准则的水平集估计获取策略，理论上证明满足 $\epsilon$-精确度和 $1-\delta$ 置信水平，减少不必要的函数评估，实验验证了其有效性。

详情

AI中文摘要

水平集估计问题旨在识别候选点集内未知且评估代价高昂的函数值超过指定阈值的区域，为全面评估函数值提供了一种高效替代方案。传统方法通常采用序列优化策略来寻找 $\epsilon$-精确解，该解允许在阈值轮廓周围留有余量，但往往缺乏有效的停止准则，导致过度探索和效率低下。本文引入了一种带有停止准则的水平集估计获取策略，确保算法在进一步探索不太可能带来改进时停止，从而减少不必要的函数评估。我们从理论上证明，该方法在 $1-\delta$ 的置信水平下满足 $\epsilon$-精确度，弥补了现有方法的一个关键空白。此外，我们表明这还带来了对 F-score 等性能指标下限的保证。数值实验表明，所提出的获取函数在达到与现有方法相当的精确度的同时，确认了停止准则在充分探索后有效终止算法。

英文摘要

The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function's value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often use sequential optimization strategies to find $ε$-accurate solutions, which permit a margin around the threshold contour but frequently lack effective stopping criteria, leading to excessive exploration and inefficiencies. This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion, ensuring the algorithm halts when further exploration is unlikely to yield improvements, thereby reducing unnecessary function evaluations. We theoretically prove that our method satisfies $ε$-accuracy with a confidence level of $1 - δ$, addressing a key gap in existing approaches. Furthermore, we show that this also leads to guarantees on the lower bounds of performance metrics such as F-score. Numerical experiments demonstrate that the proposed acquisition function achieves comparable precision to existing methods while confirming that the stopping criterion effectively terminates the algorithm once adequate exploration is completed.

URL PDF HTML ☆

赞 0 踩 0

2606.10866 2026-06-10 stat.ME stat.AP stat.CO 新提交

Adressing Separation: A Firth-corrected Joint Model for Longitudinal and Time-to-event Data with an Application on Dropout from Vocational Training

解决分离问题：纵向与时间-事件数据的Firth校正联合模型及其在职业培训辍学中的应用

Sophie Potts, Viola Deutscher, Elisabeth Bergherr

AI总结针对联合模型中分类协变量分离导致估计偏差的问题，引入Firth校正到极大似然估计中，通过EM算法实现参数估计，模拟和实际数据表明该方法能降低偏差，并应用于德国职业培训辍学影响因素分析。

详情

AI中文摘要

纵向与时间-事件数据的联合模型常用于建模内源性纵向协变量与时间-事件结局的关系。然而，该类模型继承了生存子模型的一些局限性，包括分类协变量每个类别必须非分离。因此，我们将Firth校正引入联合模型的频率学派估计过程，使模型类适用于存在分离情况的数据集。我们推导了校正项所需的量，并在联合模型的参数估计中将其实现于期望最大化算法。我们的模拟研究表明，在存在分离问题的数据情境下，Firth校正估计过程产生更少偏差的估计，且相应系数趋近于非分离情况下观察到的估计值。在关于职业培训满意度和辍学数据集上的应用展示了Firth校正联合模型在真实世界分离数据集中的优势。结果通过明确建模社会经济和培训特定因素对辍学风险的直接效应以及它们通过培训满意度的间接贡献，补充了德国职业培训辍学研究的文献。

英文摘要

Joint Models for longitudinal and time-to-event data are frequently used to model endogenous longitudinal covariates alongside a time-to-event outcome. However, the model class borrows some limitations of the survival submodels, including the necessity for non-separation for each category of categorical covariates. We therefore incorporate Firth's correction into the frequentist estimation procedure of joint models in order to make the model class applicable in settings with separation cases. We derive the needed quantities for the correction term and implement it in the Expectation-Maximization Algorithm for the parameter estimation in joint models. Our simulation study shows, that in data situations with separation issues, the Firth-corrected estimation procedure yields less biased estimates and the respective coefficients approach the estimated values observed in the non-separation cases. The application on a data set on satisfaction with and dropouts from vocational training demonstrates the advantages of the Firth-corrected joint model in a real world data set with separation. The results add to the literature on dropout from vocational training in Germany by explicitly modeling direct effects of socioeconomic and training-specific factors on the risk of dropout as well as their indirect contribution via satisfaction with the training.

URL PDF HTML ☆

赞 0 踩 0

2606.10574 2026-06-10 stat.AP stat.ME 新提交

Two-stage imputation of longitudinal anthropometric data with cross-reference harmonisation: a simulation study

纵向人体测量数据的二阶段插补与交叉参考协调：一项模拟研究

Flavia Alves

AI总结提出一种二阶段方法，通过线性插补和基于LMS方法的生长参考插补，解决纵向数据中缺失的人体测量值，并显式处理不同参考标准，模拟显示误差小且无偏。

详情

AI中文摘要

目标。纵向数据集经常缺失体重和身高测量值，而合并数据源的研究可能针对不同的生长参考标准（例如WHO参考和CDC图表）对测量值进行索引。我们描述并评估了一种可复现的二阶段方法，该方法在将参考标准的选择作为显式参数的同时，对缺失的人体测量数据进行插补。方法。阶段1在访视日期之间应用受试者内线性插值（仅内部间隙，无外推）。阶段2使用LMS方法，通过估计每个受试者的百分位数，在受试者内向前和向后携带该百分位数，当受试者从未被测量时默认使用第50百分位数，并从访视年龄的参考标准中读取期望值，从而从年龄和性别特异性生长参考中插补剩余值。可以为每个数据源提供不同的参考标准，以便记录和审计所应用的标准。我们通过掩盖并重新插补随机20%的观测值来评估恢复准确性。所有评估均使用计算机生成的合成数据。结果。在合成数据（n=60名受试者，288次访视，30%缺失）上，该方法将缺失率解决为100%完整。掩盖值恢复的体重平均绝对误差为1.78 kg（平均绝对百分比误差3.5%），身高为2.84 cm（2.0%），偏差可忽略。受试者内插值恢复的值比从生长参考恢复的值更准确，符合预期，支持二阶段顺序。结论。该方法提供了一种简单、无依赖且可审计的人体测量插补方法，显式处理不同的参考标准和每个值的来源。在用于实质性分析之前，下一步必要的工作是应用于实证数据并将插补不确定性传播到下游模型中。

英文摘要

Objective. Longitudinal datasets frequently contain missing weight and height measurements, and studies that combine data sources may index measurements against different growth reference standards (e.g., the WHO reference and CDC charts). We describe and evaluate a reproducible two-stage method that imputes missing anthropometry while making the choice of reference standard an explicit parameter. Methods. Stage 1 applies within-subject linear interpolation across visit dates (interior gaps only, no extrapolation). Stage 2 imputes remaining values from an age- and sex-specific growth reference using the LMS method by estimating each subject's centile, carrying it forward and backwards within the subject, defaulting to the 50th centile when a subject is never measured, and reading the expected value off the reference at the visit age. Different references can be supplied per data source so that the standard applied is recorded and auditable. We assessed recovery accuracy by masking and re-imputing a random 20% of observed values. All evaluations used computer-generated synthetic data. Results. On synthetic data (n = 60 subjects, 288 visits, 30% missing), the method resolved missingness to 100% completeness. Masked-value recovery gave a mean absolute error of 1.78 kg for weight (3.5% mean absolute percentage error) and 2.84 cm for height (2.0%), with negligible bias. Values recovered by within-subject interpolation were more accurate than those recovered from the growth reference, as expected, supporting the two-stage ordering. Conclusion. The method offers a simple, dependency-free, and auditable approach to anthropometric imputation, with explicit handling of differing reference standards and per-value provenance. Application to empirical data and propagation of imputation uncertainty into downstream models are the necessary next steps before use in substantive analyses.

URL PDF HTML ☆

赞 0 踩 0

2606.10093 2026-06-10 stat.AP stat.ME 新提交

Predicting Hospitalization from a Whole-Person Health Score with Incomplete Electronic Health Records Data: A Case Study

从不完整的电子健康记录数据中的全人健康评分预测住院：一项案例研究

Grayson E. Weavil, Joseph Rigdon, Sarah C. Lotspeich

AI总结本研究利用统计建模和机器学习，从不完整的电子健康记录中计算全因负荷指数（ALI），并评估其预测住院的能力，发现模式子模型方法在样本内表现最佳（AUC=0.73），但交叉验证效果较差（AUC=0.63）。

Comments 13 pages, 5 figures, 2 tables, R code and simulated dataset available on GitHub

详情

AI中文摘要

将标准化的全人健康测量嵌入电子健康记录（EHR）可能对预防性护理至关重要。全因负荷指数（ALI）由三个身体系统的十个压力源成分计算得出，提供了整体健康的有前景的快照。ALI可以从EHR数据计算，但许多成分缺失，因为并非所有患者都接受所有测试。使用统计建模和机器学习，来自大型学术健康系统的$1000$名患者的EHR数据被用于从ALI预测住院（作为计数或二元变量），并控制年龄和性别。评估了各种方法来填补患者缺失的ALI成分的信息空白，包括结合成分或单独使用它们的汇总度量。性能通过受试者工作特征（ROC）曲线和相应的ROC曲线下面积（AUC）来衡量。住院的计数建模并未优于二元建模，逻辑回归优于随机森林。总体而言，汇总度量表现相似，其中完整病例比例（即“不健康”的非缺失成分比例）表现最佳（AUC $= 0.64$），但差异$\leq 0.01$。当单独使用成分时，模式子模型方法在样本中最准确地预测了住院（AUC $= 0.73$），但交叉验证效果不佳（AUC $= 0.63$）。所有汇总度量表现相似。然而，当单独包含ALI成分时，为具有相同缺失数据模式的患者子集定制模型表现最佳。下一步包括实施EHR以实现预测并支持临床决策者大规模决策。

英文摘要

Embedding a standardized whole-person health measure in electronic health records (EHR) could be instrumental to preventative care. The allostatic load index (ALI), calculated from ten component stressors across three body systems, offers a promising snapshot of holistic health. The ALI can be calculated from EHR data, but many components are missing, since not all patients undergo all tests. Using statistical modeling and machine learning, EHR data for $1000$ patients from a large academic health system were used to predict in-patient hospitalization (as a count or binary) from ALI, controlling for age and sex. Various methods were evaluated to fill in information gaps for patients' missing ALI components, including summary measures combining components or using them separately. Performance was measured using receiver operating characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). Count modeling of hospitalization did not improve upon binary, and logistic regression beat random forest. Overall, summary measures performed similarly, with the complete-case proportion (i.e., the proportion of non-missing components that were "unhealthy") performing best (AUC $= 0.64$) but by $\leq 0.01$. When using components separately, the pattern submodel approach most accurately predicted hospitalization (AUC $= 0.73$) in sample, but did not cross-validate as well (AUC $= 0.63$). All summary measures performed similarly. However, when including the ALI components separately, tailoring models to subsets of patients with the same missing data pattern performed best. Next steps include EHR implementation to enable prediction and support clinician decision-making at scale.

URL PDF HTML ☆

赞 0 踩 0

2606.11144 2026-06-10 cs.LG q-bio.GN q-bio.QM stat.AP 新提交

OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib

OncoTraj：EGFR突变非小细胞肺癌奥希替尼耐药纵向预测的公共基准

Abhijoy Sarkar, Aarchi Singh Thakur

发表机构 * Span AI

AI总结针对EGFR突变非小细胞肺癌一线奥希替尼耐药预测缺乏公共基准的问题，提出OncoTraj基准，整合813名患者数据，定义三项任务，并发现单时间点组织NGS特征导致所有模型性能接近随机，而TP53共突变与进展率升高相关。

Comments 24 pages, 7 figures, 4 tables. Code, data, and trained model weights: https://github.com/span-ai-labs/oncotraj. Python package: pip install oncotraj. Dataset: https://huggingface.co/datasets/span-ai-labs/oncotraj-v1

详情

AI中文摘要

EGFR突变非小细胞肺癌（NSCLC）对一线奥希替尼的耐药是治疗压力下可预测克隆演化的典型例子，但目前尚无用于训练或评估相应纵向患者轨迹计算模型的公共基准。我们推出OncoTraj，这是一个来自三个真实世界临床基因组数据源（MSK-CHORD（672名患者）、AACR Project GENIE BPC NSCLC（34名患者）和FLAURA分子耐药补充（107名患者））的813名接受一线奥希替尼治疗的EGFR突变NSCLC患者的公共基准。OncoTraj定义了三个锁定任务：（A）固定12个月标志点的进展二元分类，（B）首次进展时间（天）的回归，以及（C）主要耐药机制的六类分类。我们发布了统一的数据集、经过审计的无泄漏保证的患者级训练/验证/测试划分、一个开源评估框架，以及六个参考基线，涵盖多数类预测器、逻辑回归、随机森林、XGBoost、LSTM和多任务Transformer。使用v1的单时间点快照特征，所有模型在干净的源内评估中均未超过随机水平：这种天花板在不同模型类别中的一致性表明限制在于输入模态（单快照组织NGS而非连续ctDNA），而非算法。该基准确实恢复了可重复的、与文献一致的关联：TP53共突变使整个队列的12个月进展率从29%提高到59%。OncoTraj建立了一个可重复、经泄漏审计的基线，并将模态限制转化为针对富集连续ctDNA的v2的具体设计要求。

英文摘要

Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computational models on the corresponding longitudinal patient trajectories. We introduce OncoTraj, a public benchmark of 813 EGFR-mutant NSCLC patients receiving first-line osimertinib, harmonized from three real-world clinical-genomic sources: MSK-CHORD (672 patients), AACR Project GENIE BPC NSCLC (34 patients), and the FLAURA molecular-resistance supplement (107 patients). OncoTraj defines three locked tasks: (A) binary classification of progression by a fixed 12-month landmark, (B) regression of time-to-first-progression in days, and (C) six-class classification of the dominant resistance mechanism. We release the harmonized dataset, patient-level train/validation/test splits with an audited no-leakage guarantee, an open-source evaluation harness, and six reference baselines spanning a majority-class predictor, logistic regression, random forest, XGBoost, an LSTM, and a multi-task transformer. With v1's single-timepoint snapshot features, no task clears chance on clean within-source evaluation: the uniformity of this ceiling across every model class localizes the limit to the input modality (single-snapshot tissue NGS rather than serial ctDNA), not the algorithm. The benchmark does recover a reproducible literature-consistent association: TP53 co-mutation raises the 12-month progression rate from 29% to 59% cohort-wide. OncoTraj establishes a reproducible, leakage-audited baseline and converts the modality limit into concrete design requirements for a serial-ctDNA-enriched v2.

URL PDF HTML ☆

赞 0 踩 0

2606.09860 2026-06-10 cs.LG cs.AI stat.AP stat.ML 新提交

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

基于梯度提升与无分布覆盖的非酒精性脂肪肝病共形风险预测

Xinze Zhang

AI总结提出结合梯度提升决策树与共形预测的机器学习框架Method，实现非酒精性脂肪肝病个体风险的无分布校准覆盖预测，在中国多中心队列中AUROC达0.912，优于多种方法。

详情

AI中文摘要

非酒精性脂肪肝病（NAFLD）影响全球约25%的成年人，带来显著的肝脏和心血管风险。然而，人群层面的筛查工具仍不充分。我们提出Method，一种用于NAFLD风险预测的机器学习框架，将梯度提升决策树与共形预测相结合，以在个体风险估计上产生校准的、无分布的覆盖保证。它集成了基于互信息的稳定性选择过程，通过自助重采样识别紧凑、临床可解释的特征子集，构建预测集，其边际覆盖可证明超过用户指定的置信水平。我们在中国广州的多中心队列（主要n=2,187；外部验证n=412）上评估了Method，使用了涵盖人口统计学、代谢生物标志物和生活方式因素的78个候选特征。Method内部AUROC为0.912，外部为0.891，优于深度神经网络、TabNet、支持向量机和逻辑回归。共形预测集在90%名义水平下达到91.3%的经验覆盖。从这些分数得出的三层风险分层将人群分为不同组别，高风险亚组的12个月进展率是低风险组的4.7倍。选定的特征——特别是腰围、ALT、GGT、甘油三酯、空腹血糖和BMI——与已建立的代谢风险因素一致，提供了生物学合理性。

英文摘要

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

URL PDF HTML ☆

赞 0 踩 0

2606.07129 2026-06-10 stat.AP 版本更新

Collaborative estimation and evaluation of SARS-CoV-2 variant nowcasting in the United States

美国SARS-CoV-2变异株实时预测的协作估计与评估

Isaac MacArthur, Thomas Robacker, Bren Case, Spencer J. Fox, Dylan H. Morris, Evan L. Ray, Benjamin Rogers, Becky Sweger, Natalie M. Linton, John Huddleston, Andrew Magee, Zachary Susswein, Jover Lee, Trevor Bedford, Marlin D. Figgins, Ehsan Suez, Rajath Prabhakar, Tomas Leon, Brent Siegel, Mugdha Thakur, Christopher M. Hoover, Rahil Ryder, Jesse Elder, Michael Kupperman, Ruian Ke, Emma Goldberg, Sebastian Funk, Maryclare Griffin, Nicholas G. Reich, Kaitlyn E. Johnson

AI总结本文介绍美国SARS-CoV-2变异株实时预测中心的构建，评估五种模型和基线模型在2024-2025年流感季的表现，发现基线模型整体表现良好，测序量低的地区模型性能波动更大。

Comments 32 pages, 9 figures

详情

AI中文摘要

估计和预测病原体变异动态的能力可以为公共卫生应对措施提供信息，包括规划传播性或严重性的增加、群体免疫的变化或疫苗或治疗有效性的改变。COVID-19大流行表明，通过病毒基因组测序监测SARS-CoV-2变异株演变的重要性，使得预测模型能够估计近期、现在和短期未来的变异频率。协作预测中心在大流行期间为集中预测病例、住院和死亡等流行病学指标提供了宝贵途径；然而，针对变异动态的预测中心尚不存在。本文讨论了美国SARS-CoV-2变异株实时预测中心的创建，该中心旨在收集美国州级指定SARS-CoV-2变异株相对丰度的估计值。我们讨论了构建该中心的设计决策和挑战及其评分程序。利用该中心首个呼吸道病毒季节（实时预测日期为2024年10月9日至2025年6月4日）的提交数据，我们评估了五个个体模型和一个基线模型。我们发现，基线模型（汇集全美序列）整体表现良好，大多数个体模型表现相似或略差。测序量较低的地区模型性能变异性更大。针对单个地点提交的模型优于针对所有地点提交的模型，这可能是由于本地数据的及时性和规模更大。关于不同变异出现阶段相对模型性能的许多问题仍有待研究，我们最后提出了该中心内外的未来方向。

英文摘要

The ability to estimate and predict pathogen variant dynamics can inform public health responses, including planning for increased transmission or severity, shifts in population immunity, or changes to vaccine or therapeutic effectiveness. The COVID-19 pandemic demonstrated the importance of monitoring SARS-CoV-2 variant evolution through viral genome sequencing, enabling predictive models to estimate variant frequencies in the recent past, present, and short-term future. Collaborative forecasting Hubs provided a valuable way to centralize predictive modeling of epidemiological indicators such as cases, hospitalizations, and deaths during the pandemic; however, none existed for variant dynamics. Here, we discuss the creation of the United States SARS-CoV-2 Variant Nowcast Hub, designed to solicit estimates of the relative abundance of a specified set of SARS-CoV-2 variants at the U.S. state level. We discuss the design decisions and challenges in building the Hub and its scoring procedures. Using submissions from the Hub's first respiratory virus season (nowcast dates October 9th, 2024 to June 4th, 2025), we evaluate five individual models and a baseline model. We found that the baseline model, which pools sequences across the U.S., performs well overall, with most individual models performing similarly or slightly worse. Locations with lower sequencing volumes exhibited greater variability in model performance. Models submitted for a single location outperformed those submitted for all locations, potentially due to greater timeliness and magnitude of local data. Much remains to be investigated regarding relative model performance across different phases of variant emergence, and we conclude by proposing future directions within and beyond this Hub.

URL PDF HTML ☆

赞 0 踩 0

2603.01374 2026-06-10 stat.AP 版本更新

Multi-pathogen situational assessment and forecasting of respiratory disease in Aotearoa New Zealand

新西兰呼吸道疾病的多病原体态势评估与预测

M. J. Plank, A. R. Young, K. L. Senior, R. J. Tobin, M. O'Hara-Wild, F. Callaghan, F. Shearer, O. Eales

AI总结针对SARS-CoV-2、流感和RSV三种病原体，利用实时监测数据建立模型进行流行趋势评估和28天预测，为公共卫生规划提供支持。

详情

AI中文摘要

实时分析流行趋势和预测有助于支持公共卫生规划和应对季节性呼吸道疾病。本文介绍了用于2025年新西兰冬季态势评估项目的两个模型，针对三种呼吸道病原体：SARS-CoV-2、流感和呼吸道合胞病毒（RSV）。SARS-CoV-2数据来自国家新冠监测系统；流感和RSV数据仅限于哨点医院监测项目。模型于2025年5月至10月每周运行，基于这些实时疾病监测数据，提供当前流行趋势的定量表示，以及流行增长率的估计和病例发病率的28天预测。模型结果和解释作为澳大利亚-新西兰流行病预测与分析联盟（ACEFA）运行的跨塔斯曼冬季项目的一部分，每周向公共卫生合作伙伴提供报告。我们将这些报告中包含的季中结果与季节完整数据的回顾性分析进行比较。结论是实时分析表现良好，并指出了未来冬季态势评估项目的一些改进领域。

英文摘要

Real-time analysis of epidemic trends and forecasts can help support public health planning and the response to seasonal respiratory disease. Here, we present two models that were used in a 2025 New Zealand winter situational assessment programme for three respiratory pathogens: SARS-CoV-2, influenza and respiratory syncytial virus (RSV). Data on SARS-CoV-2 were obtained from the national Covid-19 surveillance system; data on influenza and RSV were limited to a sentinel hospital surveillance programme. Models were run weekly from May to October 2025 on these real-time disease surveillance data and provided a quantitative representation of the current epidemic trend, along with estimates of the epidemic growth rate and 28-day ahead forecasts of case incidence. Model results and interpretation were provided in weekly reports to public health partners as part of a trans-Tasman winter programme run by the Australia--Aotearoa Consortium for Epidemic Forecasting and Analytics (ACEFA). We compare in-season results that were included in these reports to a retrospective analysis of the complete data for the season. We conclude that real-time analyses performed reasonably well, and identify some areas for improvement in future winter situational assessment programmes.

URL PDF HTML ☆

赞 0 踩 0

2512.13629 2026-06-10 stat.ME 版本更新

Empirical comparison of win ratio and joint frailty models for recurrent event endpoints with applications in oncology and cardiology

胜率比与联合脆弱模型在复发事件终点中的实证比较及其在肿瘤学和心脏病学中的应用

Adrien Orué, Derek Dinart, Laurent Billot, Carine Bellera, Virginie Rondeau

AI总结比较联合脆弱模型（JFM）与末事件辅助复发事件胜率比（LWR）在复合终点分析中的性能，发现JFM在统计功效和推断可靠性上更优，而LWR提供方向性总结度量。

详情

DOI: 10.1002/sim.70627

AI中文摘要

将复发性非致命事件与终末事件结合的复合终点在随机临床试验中日益常用，然而传统首次事件分析可能掩盖临床相关信息。我们比较了两种针对此类终点的统计框架：联合脆弱模型（JFM）和末事件辅助复发事件胜率比（LWR）。JFM通过共享脆弱性指定复发事件和终末事件的比例风险，产生经协变量调整的、特定组件的风险比，该风险比考虑了信息性复发和与死亡的依赖性。LWR是一种非参数、优先化的成对比较，它纳入随访期间观察到的所有事件，并在尊重死亡与复发之间预设层次的同时总结治疗的人群水平获益。我们首先使用改变伽马脆弱性方差和事件率的模拟评估了这些方法的性能。接着，我们通过肿瘤学和心脏病学中的两个临床应用实例说明了两种方法，强调了结论如何取决于治疗主要影响复发事件、死亡率还是两者。JFM提供了特定组件的估计，而LWR产生了具有方向性的治疗效应总结度量。JFM的系统性功效更高，因此成为推断和样本量估计最可靠的方法。LWR在方法学上的扩展，以适当处理删失和形式化因果估计量，仍是未来研究的有前景方向。

英文摘要

Composite endpoints that combine recurrent non-fatal events with a terminal event are increasingly used in randomized clinical trials, yet conventional time-to-first event analyses may obscure clinically relevant information. We compared two statistical frameworks tailored to such endpoints: the joint frailty model (JFM) and the last-event assisted recurrent-event win ratio (LWR). The JFM specifies proportional hazards for the recurrent and terminal events linked through a shared frailty, yielding covariate-adjusted, component-specific hazard ratios that account for informative recurrences and dependence with death. The LWR is a nonparametric, prioritized pairwise comparison that incorporates all observed events over follow-up and summarizes a population-level benefit of treatment while respecting a pre-specified hierarchy between death and recurrences. We first assessed the performance of the methods using simulations that varied both the gamma-frailty variance and the event rates. We next illustrated both approaches using two clinical application examples in oncology and cardiology, highlighting how conclusions depend on whether treatment primarily affects recurrent events, mortality, or both. The JFM provided component-specific estimates, while the LWR led to a summary measure of treatment effect with direction. Power was systematically improved with JFM, which thus appeared as the most reliable approach for inference and sample size estimation. Methodological extensions of the LWR to appropriately handle censoring and to formalize causal estimands remain a promising direction for future research.

URL PDF HTML ☆

赞 0 踩 0

2506.22349 2026-06-10 stat.ME 版本更新

Measuring frailty in the elderly: an indicator based on a super-classifier

测量老年人虚弱程度：基于超级分类器的指标

Sara Rebottini, Margherita Silan, Pietro Belloni

AI总结提出一种基于行政医疗数据的复合指标，通过多结局逻辑分类器组合似然来量化老年人虚弱程度，允许灵活使用不同结局的虚弱决定因素。

详情

AI中文摘要

识别老龄化人口中的虚弱老年人对于改善医疗服务至关重要。本研究提出了一种利用行政医疗数据评估个体虚弱水平的复合指标。鉴于虚弱的复杂性和多维性，采用了多结局方法。经过广泛的文献综述，选择一组不良健康事件作为虚弱的代理指标。这些事件使用逻辑分类器建模，以虚弱决定因素（与不良健康事件相关，通过梯度树提升选择）作为协变量。每个分类器的敏感性和特异性用于组成其组合似然。由此，我们推导出一个能够量化人群中虚弱程度的指标。该指标在多个结局和时间上表现出稳健的性能。其主要创新在于允许使用多样且结局特定的虚弱决定因素集，而无需任何结构约束。总体而言，我们提供了一个有效的工具来量化老年人的虚弱程度，可能支持卫生当局预防与虚弱相关的不良事件。

英文摘要

Identifying frail older adults in an ageing population is essential for improving healthcare services. This study proposes a composite indicator to assess individual frailty levels using administrative healthcare data. Given the complex and multidimensional nature of frailty, a multi-outcome approach is adopted. Following an extensive literature review, a set of adverse health events is selected as proxies for frailty. These events were modelled using logistic classifiers, with frailty determinants (associated to adverse health events, selected using a gradient tree boosting) serving as covariates. The sensitivity and specificity of each classifier is used to compose their combined likelihood. From this, we derive an indicator capable of quantifying frailty across the population. The indicator shows robust performance across multiple outcomes and over time. Its primary innovation lies in allowing the use of diverse and outcome-specific sets of frailty determinants without any structural constraint. Overall, we offer an effective tool for quantifying frailty among older adults, potentially supporting health authorities in the prevention of frailty-related adverse events.

URL PDF HTML ☆

赞 0 踩 0

2410.12936 2026-06-10 stat.AP 版本更新

Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning

基于微观模拟和Q学习的COVID-19加强针疫苗政策制定

Guoxuan Ma, Sicong Xie, Lili Zhao, Jian Kang

AI总结提出结合表格Q学习与微观模拟的框架，利用RNN数字孪生环境安全学习疫苗政策，在COVID-19加强针政策中优于当前实践。

详情

DOI: 10.1080/01621459.2026.2682540

AI中文摘要

COVID-19大流行凸显了对有效疫苗政策的迫切需求，但传统临床试验往往缺乏足够的数据来捕捉全面公共卫生策略所需的多样化人群特征。大流行期间随机试验的伦理问题进一步使公共卫生政策制定复杂化。强化学习为疫苗政策制定提供了一种有前景的替代方案。然而，在现实场景中直接进行在线RL探索可能导致次优甚至有害的决策。本研究提出了一种新颖框架，将表格Q学习与微观模拟相结合，其中循环神经网络作为目标人群的数字孪生环境模拟器。该数字孪生体捕捉感染与患者特征之间的时间关联，以生成真实的个体疾病轨迹，从而在无需现实交互的情况下实现安全高效的政策学习。我们的表格Q学习模型生成一个可解释的政策表，平衡严重感染风险与疫苗接种副作用。应用于COVID-19加强针政策时，基于Q学习的政策优于当前实践，为更有效的疫苗接种策略提供了途径。介绍我们工作的项目网页，包括软件链接、简短介绍视频和逐步教程视频，可在以下网址获取：this https URL。

英文摘要

The COVID-19 pandemic highlighted the urgent need for effective vaccine policies, but traditional clinical trials often lack sufficient data to capture the diverse population characteristics necessary for comprehensive public health strategies. Ethical concerns around randomized trials during a pandemic further complicate policy development for public health. Reinforcement Learning (RL) offers a promising alternative for vaccine policy development. However, direct online RL exploration in real-world scenarios can result in suboptimal and potentially harmful decisions. This study proposes a novel framework combining tabular Q-learning with microsimulation, where a Recurrent Neural Network (RNN) serves as a digital twin environment simulator of the target population. This digital twin captures temporal associations between infection and patient characteristics to generate realistic individual disease trajectories, enabling safe and efficient policy learning without real-world interaction. Our tabular Q-learning model produces an interpretable policy table that balances the risks of severe infection against vaccination side effects. Applied to COVID-19 booster policies, the learned Q-learning-based policy outperforms current practices, offering a path toward more effective vaccination strategies. A project webpage introducing our work, including links to the software, a brief introductory video, and a step-by-step tutorial video, is available at https://public.websites.umich.edu/~jiankang/software/dtpl_website_umich/index.html.

URL PDF HTML ☆

赞 0 踩 0

2606.10772 2026-06-10 stat.AP 新提交

Structural Under-Representation of Women in News: Nonparametric Bayesian Mixtures Capture Time-Dependent Dynamics

新闻中女性的结构性低代表性：非参数贝叶斯混合模型捕捉时间依赖动态

Isabella Habereder, Thomas Kneib, Isao Echizen, Timo Spinde

AI总结采用时间依赖贝叶斯混合模型分析加拿大新闻数据，揭示女性引述比例在所有主题和地区中均存在结构性低代表性，且超过85%的时间序列未见改善。

详情

AI中文摘要

女性作为新闻媒体引用来源的低代表性是性别偏见的一种显著表现。理解性别偏见的集中区域及其演变方式对于有针对性的缓解至关重要。由于性别代表性随主题、时间和报道地区而变化，产生难以用参数化方法捕捉的复杂依赖关系，我们采用非参数模型来揭示潜在聚类结构和时间动态。我们将时间依赖贝叶斯混合建模技术与针对女性引述份额（介于0和1之间）的Beta混合核相结合。该模型拟合了2019年至2024年的加拿大新闻文章，揭示了所有聚类中女性的结构性低代表性，其中新闻主题对女性引述份额差异的影响比报道地区更强。超过85%的主题-地区时间序列在观察期内未显示向性别平等的改善。动态密度估计证实，女性引述份额的总体分布在2019年至2024年间保持稳定。我们的应用表明，高级概率模型不仅能复现性别偏见研究中的发现，还能揭示简单方法遗漏的潜在依赖关系和结构模式，鼓励未来采用基于模型的框架研究媒体偏见。

英文摘要

The under-representation of women as sources cited in news media is one prominent representation of gender bias. Understanding where gender bias concentrates and how it evolves is essential for targeted mitigation. Because gender representation varies across topics, time, and reported-on regions, creating complex dependencies that are difficult to capture parametrically, we employ a nonparametric model to uncover latent cluster structures and temporal dynamics. We combine time-dependent Bayesian mixture modeling techniques with a Beta mixture kernel tailored to female quote shares, bounded between 0 and 1. Fitted on Canadian news articles from 2019 to 2024, the model reveals structural under-representation of women across all clusters, with news topic driving differences in female quote shares more strongly than the reported-on region. More than 85% of topic-region time series show no improvement toward gender parity over the observation period. Dynamic density estimation confirms that the aggregate distribution of female quote shares remains stable between 2019 and 2024. Our application demonstrates that advanced probabilistic models not only reproduce findings in gender bias research but also reveal latent dependencies and structural patterns that simpler approaches miss, encouraging future adoption of model-based frameworks for studying media bias.

URL PDF HTML ☆

赞 0 踩 0

2606.10342 2026-06-10 stat.AP 新提交

Binomial Smoothing for Inventory and Information Control in Supply Chains

供应链中库存与信息控制的二项式平滑

Rene Caldentey, Avi Giloni, Clifford Hurvich, Prem Talwai, Yichen Zhang

AI总结针对分散供应链中零售商订单平滑与上游预测的权衡，提出二项式平滑策略，在最小化制造商预测误差的同时保持可逆性，并实现常数因子近似最优。

Comments 59 pages, 7 figures, 4 tables

详情

AI中文摘要

在许多分散的供应链中，上游企业不直接观察市场需求，而是从订单流推断下游状况。因此，零售商的补货策略扮演双重角色：它管理库存补货并塑造上游预测可用的信息。这产生了一个基本权衡：更平滑的订单提高上游可预测性，但延迟对需求的响应可能增加下游库存成本。我们研究在一个由一个零售商和一个制造商组成的两层供应链中，当制造商根据零售商的订单历史预测未来订单时，零售商应如何最优地平滑需求。我们提出二项式平滑，一类补货策略，通过使用二项式权重将每个需求单位分散到有限时间范围内来实现延迟需求响应。该类策略可解释、易于校准且解析易处理。在满足温和正则条件的弱平稳高斯需求下，我们证明，对于任何固定平滑时间范围，在所有具有相同平滑程度的策略中，二项式策略最小化制造商的预测误差。它保持可逆性，因此制造商可以从观察到的订单中恢复需求历史。更一般地，二项式平滑相对于最优策略实现了常数因子近似保证。我们的结果产生更广泛的见解：补货策略的设计不应仅仅像传统牛鞭效应度量那样减少订单方差，而应减少订单的不可预测成分。精心设计的平滑可以提高供应链绩效并部分替代信息共享，为无需协作的协调提供具体机制。

英文摘要

In many decentralized supply chains, upstream firms do not observe market demand directly and instead infer downstream conditions from the order stream. A retailer's replenishment policy therefore plays a dual role: it governs inventory replenishment and shapes the information available for upstream forecasting. This creates a fundamental trade-off. Smoother orders improve upstream predictability, but delaying the response to demand can increase downstream inventory costs. We study how a retailer should optimally smooth demand in a two-tier supply chain with one retailer and one manufacturer when the manufacturer forecasts future orders from the retailer's order history. We propose Binomial Smoothing, a class of replenishment policies that implements delayed demand response by spreading each unit of demand over a finite horizon using binomial weights. The class is interpretable, easy to calibrate, and analytically tractable. Under weakly stationary Gaussian demand satisfying mild regularity conditions, we show that, for any fixed smoothing horizon, the Binomial policy minimizes the manufacturer's forecast error among all policies with the same degree of smoothing. It remains invertible, so the manufacturer can recover demand history from observed orders. More generally, Binomial Smoothing achieves a constant-factor approximation guarantee relative to an optimal policy. Our results yield a broader insight: replenishment policies should be designed not merely to reduce order variance, as in the traditional bullwhip measure, but to reduce the unpredictable component of orders. Carefully designed smoothing can improve supply-chain performance and partially substitute for information sharing, providing a concrete mechanism for coordination without collaboration.

URL PDF HTML ☆

赞 0 踩 0

2606.10330 2026-06-10 cs.GT cs.CY stat.AP 新提交

损失函数对称化以在存在噪声标签的情况下实现神经网络的鲁棒训练

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering（计算机科学与软件工程系）

AI总结本文研究了通过将交叉熵损失对称化来设计鲁棒损失函数的方法，提出了一种多类对称损失函数，并展示了其在噪声标签下的有效性。

Comments 28 pages, 1 figure, 4 tables. v2: Added relevant prior-work citations and revised the related-work discussion and Section 5.2. Minor wording corrections

详情

AI中文摘要

训练集的标注通常是昂贵且易出错的，因此设计对噪声具有鲁棒性的损失函数是一个重要的问题。对称条件为这种噪声的鲁棒性提供了理论保证。在本文中，我们研究了一种源自任何多类损失函数唯一分解为对称部分和类无关项的对称化方法。特别是，对交叉熵损失进行对称化会导致多类线性扩展的unhinged损失。与二分类情况不同，多类版本必须具有特定的系数才能满足对称条件。在适当假设下，我们证明这种多类unhinged损失是唯一的凸多类对称损失。我们还证明它在局部上具有根本作用：任何对称损失在具有相等分量的分数向量处的线性近似等价于多类unhinged损失。然后我们引入了SGCE和alpha-MAE两种损失函数，它们在多类unhinged损失和均值绝对误差之间进行插值，同时允许控制损失的beta-平滑性。在标准的噪声标签基准上的实验表明，其性能与现有的鲁棒损失函数相比具有竞争力。

英文摘要

Labeling a training set is often expensive and susceptible to errors, making the design of robust loss functions for label noise an important problem. The symmetry condition provides theoretical guarantees for robustness to such noise. In this work, we study a symmetrization method arising from the unique decomposition of any multi-class loss function into a symmetric component and a class-insensitive term. In particular, symmetrizing the cross-entropy loss leads to a linear multi-class extension of the unhinged loss. Unlike in the binary case, the multi-class version must have specific coefficients in order to satisfy the symmetry condition. Under suitable assumptions, we show that this multi-class unhinged loss is the unique convex multi-class symmetric loss. We also show that it has a fundamental local role: the linear approximation of any symmetric loss around score vectors with equal components is equivalent to the multi-class unhinged loss. We then introduce SGCE and alpha-MAE, two loss functions that interpolate between the multi-class unhinged loss and the Mean Absolute Error while allowing control of the beta-smoothness of the loss. Experiments on standard noisy-label benchmarks show competitive performance compared with existing robust loss functions.

URL PDF HTML ☆

赞 0 踩 0

2606.10673 2026-06-10 stat.OT cs.LG 新提交

ClusBench: The Clustering Benchmark Data Resource You've All Been Waiting For (?)

ClusBench：你一直期待的聚类基准测试数据资源（？）

David P. Hofmeyr

发表机构 * School of Mathematical Sciences, Lancaster University（兰卡斯特大学数学科学学院）

AI总结本文通过拟合灵活的非参数分布，从200多个公开数据集生成近3000个合成数据集，用于大规模聚类方法评估，保留真实数据细微差别。

2606.09874 2026-06-10 cs.LG stat.ML 新提交

Disjoint or Overlapping? Inference Windowing for Reconstruction-Based Time Series Anomaly Detection

不相交还是重叠？基于重构的时间序列异常检测中的推理窗口化

Guillaume Coulaud, Reza Akbarinia, Florent Masseglia

发表机构 * University of Montpellier, Inria, CNRS, LIRMM（蒙彼利埃大学、Inria、CNRS、LIRMM）

AI总结研究推理步长（重叠窗口）对基于重构的时间序列异常检测性能的影响，提出统一评估协议，实验表明重叠窗口平均提升28%且改变方法排名。

详情

AI中文摘要

基于重构的方法广泛用于时间序列异常检测，其中模型被训练来重构子序列，并通过重构误差识别异常。然而，由于异构的评估实践和不明确的推理过程，报告的结果往往难以比较。在本文中，我们重新审视单变量离线设置下的基于重构的异常检测，并研究推理步长的作用，该步长控制子序列是作为不相交窗口处理还是重叠处理。我们在精心策划的TSB-AD基准上提出了一个统一的训练、调优和多种子评估协议，并研究了重叠推理如何影响一系列重构模型的异常检测性能，包括基于PCA的基线、DLinear、AutoEncoder、TimesNet和Transformer变体。结果表明，在所有模型中，重叠窗口带来一致的改进，平均相对增益高达+28%，并且可以改变方法排名。我们进一步分析了跨数据集、随机种子和超参数配置的变异性。最后，我们使用与滑动窗口重构对齐的定位标准，在完整的UCR存档上补充了基准研究。总体而言，我们的结果强调，基于重构的异常检测性能不仅取决于模型架构和训练，还取决于推理选择，这促使采用清晰且可重复的协议。我们的结果表明，基于重构的基线在TSB-AD和UCR基准上都取得了强劲的性能，支持它们作为单变量时间序列异常检测的竞争性和实用方法。

英文摘要

Reconstruction-based methods are widely used for time series anomaly detection, where models are trained to reconstruct subsequences, and anomalies are identified through reconstruction errors. However, reported results are often hard to compare due to heterogeneous evaluation practices and underspecified inference procedures. In this paper, we revisit reconstruction-based anomaly detection in the univariate offline setting and study the role of the inference stride, which controls whether subsequences are processed as disjoint windows or with overlap. We propose a unified training, tuning, and multi-seed evaluation protocol on the curated TSB-AD benchmark, and study how overlapping inference affects anomaly detection performance for a range of reconstruction models, including PCA-based baselines, DLinear, an AutoEncoder, TimesNet, and Transformer variants. The results show that across all models, overlapping windows yield consistent improvements, with average relative gain up to +28%, and can alter method rankings. We further analyze variability across datasets, random seeds, and hyperparameter configurations. Finally, we complement the benchmark study with an evaluation on the full UCR archive using localization criteria aligned with sliding-window reconstruction. Overall, our results highlight that reconstruction-based anomaly detection performance depends not only on model architecture and training, but also on inference choices, motivating a clear and reproducible protocol. Our results show that reconstructionbased baselines achieve strong performance on both TSB-AD and UCR benchmarks, supporting them as competitive and practical approaches for univariate time series anomaly detection.

URL PDF HTML ☆

赞 0 踩 0

2606.06742 2026-06-10 cs.LG stat.ML 版本更新

TorchKM: A GPU-Oriented Library for Kernel Learning and Model Selection

TorchKM：面向GPU的核学习与模型选择库

Yikai Zhang, Gaoxiang Jia, Jie Ding, Boxiang Wang

发表机构 * University of Iowa（爱荷华大学）； University of Minnesota（明尼苏达大学）； Individual Researcher（独立研究者）； AIScientists, Inc. (MorphMind)（AIScientists公司（MorphMind））； Department of Statistics and Actuarial Science, University of Iowa（爱荷华大学统计与精算科学系）

AI总结提出GPU加速的核学习库TorchKM，通过智能复用矩阵运算加速SVM、核逻辑回归等模型的训练与模型选择，性能优于标准基线。

Comments 14 pages, 2 figures

2603.29730 2026-06-10 stat.ML cs.LG 版本更新

mlr3mbo: Bayesian Optimization in R

mlr3mbo：R语言中的贝叶斯优化

Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl

发表机构 * Department of Statistics, LMU Munich（慕尼黑大学统计系）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； University of St Andrews（圣安德鲁大学）

AI总结介绍mlr3mbo，一个模块化的R语言贝叶斯优化工具箱，支持单/多目标优化、多提议、并行化，并通过坐标下降搜索和基准测试验证其性能与现有优化器相当。

详情

AI中文摘要

我们提出mlr3mbo，一个用于R语言中贝叶斯优化的模块化工具箱。mlr3mbo支持单目标和多目标优化、多点提议、批量与异步并行化以及稳健的错误处理。虽然它可用于许多标准贝叶斯优化变体的应用场景，但研究人员也可以从其灵活的构建块中构建自定义贝叶斯优化算法。除了介绍软件、设计原则和构建块外，本文还在基于代理的基准套件YAHPO Gym上进行了两次广泛的实证评估。为了识别数值和混合层次优化场景下的稳健默认配置，并进一步了解各个设置的各自影响，我们在mlr3mbo配置空间上运行坐标下降搜索并分析其结果。此外，我们将mlr3mbo与包括HEBO、SMAC3、Ax和Optuna在内的多种现有优化器进行基准测试，发现其性能与最新技术相当。

英文摘要

We present mlr3mbo, a modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom Bayesian optimization algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we benchmark mlr3mbo against a wide range of established optimizers, including HEBO, SMAC3, Ax, and Optuna, and find that it performs on par with state-of-the-art.

URL PDF HTML ☆

赞 0 踩 0

2603.08924 2026-06-10 stat.AP cs.AI cs.IR 版本更新

Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement

量化AI可见性的不确定性：生成式搜索测量的统计框架

Ronald Sielinski

发表机构 * IQRush

AI总结针对AI生成式搜索中可见性测量的随机性问题，提出将引用指标视为样本估计量，通过重复采样和Bootstrap置信区间揭示测量噪声，并给出样本量建议。

Comments 39 pages, 13 figures

详情

AI中文摘要

AI驱动的答案引擎本质上是不确定性的：在不同时间提交相同的查询可能会产生不同的响应并引用不同的来源。尽管存在这种随机行为，当前测量生成式搜索中领域可见性的方法通常依赖于单次运行的引用份额和普遍性的点估计，隐含地将其视为固定值。本文认为，引用可见性指标应被视为底层响应分布的样本估计量，而非固定值。我们通过三个生成式搜索平台——Perplexity Search、OpenAI SearchGPT和Google Gemini——对三个消费品主题进行重复采样，实证研究了引用变异性。采用了两种采样方案：连续九天的每日收集和十分钟间隔的高频采样。我们表明，引用分布遵循幂律形式，并在重复样本间表现出显著变异性。Bootstrap置信区间显示，许多领域间的明显差异落在测量过程的噪声基底内。全分布排名稳定性分析进一步表明，引用排名在样本间不稳定，不仅限于排名靠前的领域，而且在频繁引用的领域集中也是如此。这些发现表明，单次运行的可见性指标提供了对生成式搜索中领域性能的误导性精确描述。我们认为，必须附带不确定性估计报告引用可见性，并为实现可解释置信区间所需的样本量提供实用指导。

英文摘要

AI-powered answer engines are inherently non-deterministic: identical queries submitted at different times can produce different responses and cite different sources. Despite this stochastic behavior, current approaches to measuring domain visibility in generative search typically rely on single-run point estimates of citation share and prevalence, implicitly treating them as fixed values. This paper argues that citation visibility metrics should be treated as sample estimators of an underlying response distribution rather than fixed values. We conduct an empirical study of citation variability across three generative search platforms--Perplexity Search, OpenAI SearchGPT, and Google Gemini--using repeated sampling across three consumer product topics. Two sampling regimes are employed: daily collections over nine days and high-frequency sampling at ten-minute intervals. We show that citation distributions follow a power-law form and exhibit substantial variability across repeated samples. Bootstrap confidence intervals reveal that many apparent differences between domains fall within the noise floor of the measurement process. Distribution-wide rank stability analysis further demonstrates that citation rankings are unstable across samples, not only among top-ranked domains but throughout the frequently cited domain set. These findings demonstrate that single-run visibility metrics provide a misleadingly precise picture of domain performance in generative search. We argue that citation visibility must be reported with uncertainty estimates and provide practical guidance for sample sizes required to achieve interpretable confidence intervals.

URL PDF HTML ☆

赞 0 踩 0

2601.07532 2026-06-10 stat.CO 版本更新

Population-Adjusted Indirect Treatment Comparison with the outstandR Package in R

基于R包outstandR的人群调整间接比较

Nathan Green

AI总结针对缺乏头对头试验时的间接治疗比较，提出R包outstandR，通过G计算和多重插补边际化方法实现人群调整，提供统一框架进行稳健证据合成。

Comments 35 pages

详情

AI中文摘要

间接治疗比较（ITC）在缺乏头对头临床试验时对卫生技术评估（HTA）至关重要。当试图将具有可用个体患者数据（IPD）的治疗与仅报告汇总水平数据（ALD）的竞争者进行比较时，常见挑战是试验人群在效应修饰因子上的差异。虽然存在如匹配调整间接比较（MAIC）等方法来调整这些跨试验差异，但它们正逐渐被基于回归的边际化方法所取代。历史上，这些方法的软件实现常常分散或范围有限。本文介绍了outstandR，一个旨在为人群调整间接比较（PAIC）提供全面统一框架的R包。outstandR实现了先进的G计算方法——在最大似然和贝叶斯框架内，以及多重插补边际化（MIM）来解决非可折叠性问题。通过简化协变量模拟、模型标准化和对比估计的工作流程，outstandR能够在复杂的决策场景中实现稳健且兼容的证据合成。

英文摘要

Indirect treatment comparisons (ITCs) are essential in Health Technology Assessment (HTA) when head-to-head clinical trials are absent. A common challenge arises when attempting to compare a treatment with available individual patient data (IPD) against a competitor with only reported aggregate-level data (ALD), particularly when trial populations differ in effect modifiers. While methods such as Matching-Adjusted Indirect Comparison (MAIC) exist to adjust for these cross-trial differences, they are increasingly being superseded by regression-based marginalization methods. Historically, software implementations for these methods have often been fragmented or limited in scope. This article introduces outstandR, an R package designed to provide a comprehensive and unified framework for population-adjusted indirect comparison (PAIC). outstandR implements advanced G-computation methods - within both maximum likelihood and Bayesian frameworks, and Multiple Imputation Marginalization (MIM) to address non-collapsibility. By streamlining the workflow of covariate simulation, model standardization, and contrast estimation, outstandR enables robust and compatible evidence synthesis in complex decision-making scenarios.

URL PDF HTML ☆

赞 0 踩 0

2510.04514 2026-06-10 cs.AI cs.CE cs.CL cs.CV stat.ME 版本更新

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

ChartAgent: 一种用于复杂图表问答中视觉基础推理的多模态智能体

Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Sumitra Ganesh, Manuela Veloso

发表机构 * J.P. Morgan AI Research（摩根大通人工智能研究）

AI总结提出ChartAgent框架，通过迭代分解查询为视觉子任务并利用图表专用视觉工具（如绘制注释、裁剪区域）进行空间域推理，在ChartBench和ChartX上取得最先进性能，尤其对无标注图表提升显著。

Comments Accepted at ACL 2026 (Main Conference). Also presented as an oral paper at the NeurIPS 2025 Multimodal Algorithmic Reasoning Workshop (https://marworkshop.github.io/neurips25/)

详情

AI中文摘要

最近的多模态大语言模型在基于图表的视觉问答中显示出潜力，但在无标注图表上——即那些需要精确视觉解释而非依赖文本捷径的图表——其性能急剧下降。为了解决这个问题，我们引入了ChartAgent，一种新颖的智能体框架，它直接在图表的空间域内显式执行视觉推理。与文本思维链推理不同，ChartAgent通过专门的行动（如绘制注释、裁剪区域（例如分割饼图切片、隔离条形图）和定位坐标轴）迭代地将查询分解为视觉子任务，并主动操作和交互图表图像，使用图表专用视觉工具库来完成每个子任务。这种迭代推理过程密切模仿了人类理解图表的认知策略。ChartAgent在ChartBench和ChartX基准测试上达到了最先进的准确率，整体上比先前方法绝对提升高达16.07%，在无标注、数值密集的查询上提升17.31%。此外，我们的分析表明，ChartAgent (a) 在多种图表类型上有效，(b) 在不同视觉和推理复杂度水平上均取得最高分数，(c) 作为一个即插即用的框架，提升了多种基础LLM的性能。我们的工作是首批使用工具增强的多模态智能体展示图表理解中视觉基础推理的工作之一。

英文摘要

Recent multimodal LLMs have shown promise in chart-based visual question answering, but their performance declines sharply on unannotated charts-those requiring precise visual interpretation rather than relying on textual shortcuts. To address this, we introduce ChartAgent, a novel agentic framework that explicitly performs visual reasoning directly within the chart's spatial domain. Unlike textual chain-of-thought reasoning, ChartAgent iteratively decomposes queries into visual subtasks and actively manipulates and interacts with chart images through specialized actions such as drawing annotations, cropping regions (e.g., segmenting pie slices, isolating bars), and localizing axes, using a library of chart-specific vision tools to fulfill each subtask. This iterative reasoning process closely mirrors human cognitive strategies for chart comprehension. ChartAgent achieves state-of-the-art accuracy on the ChartBench and ChartX benchmarks, surpassing prior methods by up to 16.07% absolute gain overall and 17.31% on unannotated, numerically intensive queries. Furthermore, our analyses show that ChartAgent is (a) effective across diverse chart types, (b) achieves the highest scores across varying visual and reasoning complexity levels, and (c) serves as a plug-and-play framework that boosts performance across diverse underlying LLMs. Our work is among the first to demonstrate visually grounded reasoning for chart understanding using tool-augmented multimodal agents.

URL PDF HTML ☆

赞 0 踩 0

2510.08906 2026-06-10 stat.ML cs.LG physics.chem-ph 版本更新

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

梯度引导的最远点采样用于鲁棒训练集选择

Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld

发表机构 * Berlin Institute for the Foundations of Learning（柏林学习与数据基础研究院）； Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada（化学物理理论组，化学系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, ON, Canada（材料科学与工程系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Vector Institute for Artificial Intelligence, Toronto, ON, Canada（人工智能研究所，多伦多，ON，加拿大）； Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada（物理系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Acceleration Consortium, University of Toronto, Toronto, ON, Canada（加速联盟，多伦多大学，多伦多，ON，加拿大）

AI总结提出梯度引导最远点采样（GGFPS），利用分子力范数指导构型空间采样，在MD17数据集上相比FPS和随机采样显著提升数据效率和模型鲁棒性。

Comments 41 pages, 43 figures, 2 algorithms; journal article with supplementary information appended

详情

DOI: 10.1088/2632-2153/ae68b8
Journal ref: Machine Learning: Science and Technology 7, 035047 (2026)

AI中文摘要

训练集采样方法用于提高机器学习问题中与化学相关的模型性能并降低数据成本。我们引入了梯度引导最远点采样（GGFPS），这是最远点采样（FPS）的一个简单扩展，利用分子力范数指导分子构型空间的高效采样。针对一个玩具系统（Styblinski-Tang函数）以及来自MD17数据集的分子动力学轨迹，提供了数值证据。我们的数值结果表明，与FPS、均匀随机采样（URS）以及已有的监督式FPS风格选择器PCov-FPS和PCov-CUR相比，使用GGFPS时数据效率和模型鲁棒性更优。对MD17数据的分布分析表明，FPS系统性地欠采样平衡几何结构，导致松弛结构测试误差较大。GGFPS纠正了这一缺陷，并且（i）在二维Styblinski-Tang系统中，与FPS相比，在不牺牲预测精度的情况下，训练成本可降低两倍；（ii）系统性地降低了MD17中平衡以及应变结构的预测误差；（iii）在所有MD17构型空间中系统性地降低了预测误差方差。这些结果表明，梯度感知采样方法作为有效的训练集选择工具具有很大潜力，而简单使用FPS可能导致训练不平衡和预测结果不一致。

英文摘要

Training set sampling methods are used to improve model performance and lower data costs in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy system (the Styblinski-Tang function) as well as for molecular dynamics trajectories from the MD17 dataset. Our numerical results indicate superior data efficiency and model robustness when using GGFPS compared to FPS and uniform random sampling (URS), as well as established supervised FPS-style selectors, PCov-FPS and PCov-CUR. Distribution analysis of the MD17 data suggests that FPS systematically under-samples equilibrium geometries, resulting in large test errors for relaxed structures. GGFPS cures this artifact and (i) enables up to twofold reductions in training cost without sacrificing predictive accuracy compared to FPS in the 2-dimensional Styblinski-Tang system, (ii) systematically lowers prediction errors for equilibrium as well as strained structures in MD17, and (iii) systematically decreases prediction error variances across all of the MD17 configuration spaces. These results suggest that gradient-aware sampling methods hold great promise as effective training set selection tools, and that naive use of FPS may result in imbalanced training and inconsistent prediction outcomes.

URL PDF HTML ☆

赞 0 踩 0

2510.02405 2026-06-10 stat.ME math.ST stat.ML stat.TH 版本更新

指定输出分布的最小失真量化

Aolin Xu

AI总结本文推导了在输出分布指定条件下最小化均方误差的最优量化器，形式为X=σ(F_{σ^{-1}(X)}^{-1}(F_W(W)))，并证明了在均匀分布下简化为X=F_X^{-1}(F_W(W))，主要贡献在于通过优化排列和累积分布函数实现最小失真。

详情

AI中文摘要

我们推导了实值随机变量 $W$（分布为 $P_W$）的最优量化器，使得 1) 量化输出 $X$（可取 $k$ 个值）的分布遵循 $\{1,\ldots,k\}$ 上的任意指定分布 $P_X$，且 2) 从 $X$ 估计 $W$ 的最小均方误差 (MMSE) 最小化。结果表明，最优量化器形式为 $X=\sigma\big(F_{\sigma^{-1}(X)}^{-1}(F_W(W))\big)$，其中 $\sigma$ 是 $\{1,\ldots,k\}$ 上所有排列中使 MMSE 最小的最优排列，$F$ 为累积分布函数。当 $P_W$ 在区间上均匀分布或 $P_X$ 在 $\{1,\ldots,k\}$ 上均匀分布时，量化器简化为 $X=F_{X}^{-1}(F_W(W))$。优超概念在最优性证明中起关键作用。指定输出分布有助于设计具有显式控制输出熵、最大化输入输出互信息、定制输出分布以匹配通信信道输入要求以及数据匿名化的量化器。

英文摘要

MMD经验估计的精确收敛速率与幂核

Francesco Colasanto, Matteo Focardi, Massimo Fornasier, Francesco Mattesini

AI总结本文研究了使用幂核的最大均值差异（MMD）对概率测度进行经验估计的收敛速率，证明了在满足Ahlfors正则条件的测度下，最佳经验逼近的衰减速率为N的负一次方乘以(1+q/β)的平方根。

Comments References update and typos correction. Comments very welcome!

详情

AI中文摘要

我们建立了通过最大均值差异（MMD）使用幂核K_q(x,y) = -|x-y|^q，q∈(0,2)对概率测度进行经验估计的收敛速率的定量结果。所得到的差异是经典的能量距离$$\mathcal E_q^2(μ, ω) = -\frac{1}{2}\iint_{\mathbb{R}^d \times \mathbb{R}^d} |x-y|^q \, d(μ- ω)(x)\, d(μ- ω)(y)，$$我们询问当N→∞时，最佳N点经验逼近$$\inf_{μ_N \in \mathcal{P}^N}\mathcal{E}_q(μ_N,ω)$$衰减的速度。给定一个在\mathbb{R}^d上满足指数为β的Ahlfors正则条件的概率测度ω，我们证明了对于最坏情况的经验测度μ_N（下界，对任意N点配置成立）和最优选择的经验测度μ_N（上界），精确的双侧界$$\mathcal E_q(μ_N, ω) \asymp N^{-\frac{1}{2}\left(1 + \frac{q}β\right)}$$成立。这补充了Fornasier和Hütter [1] 的定性一致性结果，他们证明了在经验测度上MMD^2(·, ω)的最小值在N→∞时的窄收敛，但没有定量速率。

英文摘要

We establish quantitative rates of convergence for the empirical estimation of probability measures by means of the Maximum Mean Discrepancy (MMD) with power kernel $K_q(x,y) = -|x-y|^q$, $q \in (0,2)$. The resulting discrepancy is the classical \emph{energy distance} $$\mathcal E_q^2(μ, ω) = -\frac{1}{2}\iint_{\mathbb{R}^d \times \mathbb{R}^d} |x-y|^q \, d(μ- ω)(x)\, d(μ- ω)(y),$$ and we ask how fast the best $N$-point empirical approximation $\inf_{μ_N \in \mathcal{P}^N}\mathcal{E}_q(μ_N,ω)$ decays as $N \to \infty$. Given a probability measure $ω$ on $\mathbb{R}^d$ with compact support satisfying an Ahlfors regularity condition of exponent $β\in (0,d]$, we prove that the sharp two-sided bound $$\mathcal E_q(μ_N, ω) \asymp N^{-\frac{1}{2}\left(1 + \frac{q}β\right)}$$ holds both for the worst-case empirical measure $μ_N$ (lower bound, holding for every configuration of $N$ points) and for an optimally chosen empirical measure $μ_N$ (upper bound). This complements the qualitative consistency result of Fornasier and Hütter \cite{fornasier2014consistency}, who proved narrow convergence of the minimizers of $\mathcal E_q^2(\cdot, ω)$ over empirical measures without quantitative rates.

URL PDF HTML ☆

赞 0 踩 0

2601.06688 2026-06-10 cs.IT math.IT math.ST stat.TH 版本更新

The Sample Complexity of Lossless Data Compression

无损数据压缩的样本复杂度

Terence Viaud, Ioannis Kontoyiannis

AI总结提出非渐近框架研究无损压缩的基本极限，定义样本复杂度为在给定速率和超概率约束下所需的最小块长，证明无记忆源的样本复杂度由1/2阶Rényi熵决定，并推广至马尔可夫源和通用压缩。

Comments Several minor revisions and reviewer comments taken into account, additional content on the "actual compression rate" and asymmetric formulation for general target rates

详情

AI中文摘要

引入了一个新框架来检验和评估无损数据压缩的基本极限，该框架强调真正的非渐近结果。给定源的{\em 样本复杂度}定义为在特定约束速率和指定超概率范围内压缩该源所需的最小块长。这一表述与统计学和计算机科学中的相应发展相平行，并便于利用各种假设检验问题的样本复杂度的现有结果。对于任意源，一般变长压缩机的样本复杂度被证明与前缀码和定长码的样本复杂度紧密耦合。对于无记忆源，样本复杂度不是由源熵决定，而是由它的1/2阶Rényi熵决定。获得了样本复杂度的非渐近界，并带有显式常数。推广到马尔可夫源，表明样本复杂度由源的1/2阶Rényi熵率决定。最后，针对无记忆源族，发展了通用数据压缩的样本复杂度界。在那里，样本复杂度由族中元素与均匀分布之间的1/2阶Rényi散度的最小值决定。探讨并讨论了该问题与身份检验及相应分离率之间的联系。

英文摘要

A new framework is introduced for examining and evaluating the fundamental limits of lossless data compression, that emphasizes genuinely non-asymptotic results. The {\em sample complexity} of compressing a given source is defined as the smallest blocklength at which it is possible to compress that source at a specifically constrained rate and to within a specified excess-rate probability. This formulation parallels corresponding developments in statistics and computer science, and it facilitates the use of existing results on the sample complexity of various hypothesis testing problems. For arbitrary sources, the sample complexity of general variable-length compressors is shown to be tightly coupled with the sample complexity of prefix-free codes and fixed-length codes. For memoryless sources, it is shown that the sample complexity is characterized not by the source entropy, but by its Rényi entropy of order~$1/2$. Nonasymptotic bounds on the sample complexity are obtained, with explicit constants. Generalizations to Markov sources are established, showing that the sample complexity is determined by the source's Rényi entropy rate of order~$1/2$. Finally, bounds on the sample complexity of universal data compression are developed for families of memoryless sources. There, the sample complexity is characterized by the minimum Rényi divergence of order~$1/2$ between elements of the family and the uniform distribution. The connection of this problem with identity testing and with the associated separation rates is explored and discussed.

URL PDF HTML ☆

赞 0 踩 0

2604.09317 2026-06-10 math.ST stat.TH 版本更新

Testing axial symmetry around an unspecified direction

关于未指定方向的轴向对称性检验

Alejandro Cholaquidis, Juan Cuesta-Albertos, Ricardo Fraiman, Manuel Hernández-Banadik

AI总结针对多元分布未知方向的轴向对称性检验问题，利用协方差矩阵的简单谱假设将候选方向缩减为有限个，通过投影数据和样本分裂构造Kolmogorov-Smirnov型统计量，并证明其渐近分布和bootstrap有效性。

Comments 22 pages, 4 figures

2603.12785 2026-06-10 cs.LG math.ST stat.TH 版本更新

Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks

三层神经网络局部学习系数的上界

Yuki Kurumadani

发表机构 * sigmath.es.osaka-u.ac.jp（大阪大学）

AI总结针对三层神经网络的奇异参数点，提出一种基于预算、需求和供给约束的计数规则来推导局部学习系数的上界，覆盖了swish等激活函数，并在一维输入下与已知精确值一致。

详情

AI中文摘要

已知三层神经网络构成奇异学习模型，其贝叶斯渐近行为由学习系数（或实对数规范阈值）控制。尽管该量在正则模型和某些特殊奇异模型中已被阐明，但在神经网络中广泛适用的评估方法仍然有限。最近，半正则模型的局部学习系数公式被提出，给出了学习系数的上界。然而，该公式仅适用于实现参数集中的非奇异点，不能用于奇异点。特别是对于三层神经网络，所得上界在某些情况下与已知的学习系数值存在显著差异。本文推导了三层神经网络中一类奇异实现参数的局部学习系数上界公式。该公式可解释为在预算、需求和供给约束下的计数规则。在非多项式实解析情况下，该公式适用于一般设置；而在多项式情况下，它适用于真实分布没有隐藏单元的限制。特别地，我们的结果涵盖了诸如swish函数等激活函数，并在上述限制下包括多项式激活函数，从而将先前结果扩展到更广泛的激活函数类。我们进一步证明，当输入维度为一时，上界公式右侧的数值与先前已知的学习系数一致，从而提供了与已知精确结果的有用比较。我们的结果还提供了关于三层神经网络权重参数如何影响学习系数的系统视角。

英文摘要

Three-layer neural networks are known to form singular learning models, and their Bayesian asymptotic behavior is governed by the learning coefficient, or real log canonical threshold. Although this quantity has been clarified for regular models and for some special singular models, broadly applicable methods for evaluating it in neural networks remain limited. Recently, a formula for the local learning coefficient of semiregular models was proposed, yielding an upper bound on the learning coefficient. However, this formula applies only to nonsingular points in the set of realization parameters and cannot be used at singular points. In particular, for three-layer neural networks, the resulting upper bound has been shown to differ substantially from learning coefficient values already known in some cases. In this paper, we derive a formula for an upper bound on local learning coefficients at a class of singular realization parameters in three-layer neural networks. This formula can be interpreted as a counting rule under budget, demand, and supply constraints. In the non-polynomial real-analytic case, the formula applies in general settings, whereas in the polynomial case it applies under the restriction that the true distribution has no hidden units. In particular, our result covers activation functions such as the swish function and also includes polynomial activation functions under the above restriction, thereby extending previous results to a broader class of activation functions. We further show that, when the input dimension is one, the numerical value given by the right-hand side of our upper-bound formula agrees with the previously known learning coefficient, thereby providing a useful comparison with known exact results. Our result also provides a systematic perspective on how the weight parameters of three-layer neural networks affect the learning coefficient.

URL PDF HTML ☆

赞 0 踩 0

2512.18084 2026-06-10 econ.EM math.ST stat.TH 版本更新

Inference in partially identified moment models via regularized optimal transport

通过正则化最优传输进行部分识别矩模型的推断

Grigory Franguridi, Laura Liu

AI总结提出基于正则化最优传输的部分识别GMM模型推断方法，用熵正则化近似支撑函数并利用Sinkhorn算法高效计算，建立熵正则化OT的CLT，通过bootstrap获得有效临界值，在蒙特卡洛模拟和幸福度面板logit模型中验证性能。

详情

AI中文摘要

许多统计和计量经济学问题涉及由联合分布的矩定义的参数，但仅观测到边际分布，这自然导致部分识别。我们开发了一种用于相应部分识别GMM模型的识别、估计和推断方法。我们通过支撑函数/最优传输（OT）表示来刻画感兴趣参数的尖锐识别集。为了估计识别集，我们采用熵正则化，它提供了经典OT问题的光滑近似，可以使用Sinkhorn算法高效计算。我们还提出了用于假设检验和构建识别集置信区域的检验统计量。为了推导其渐近分布，我们建立了在一般光滑成本函数下熵正则化OT值的新中心极限定理。然后，我们使用Fang和Santos（2019）的方向可微泛函的bootstrap获得有效临界值。所得检验过程在局部均匀地控制大小，包括在识别集边界上的参数值处。我们在蒙特卡洛模拟中展示了我们方法的良好有限样本性能。最后，作为实证说明，我们使用来自“理解美国研究”的数据，估计了一个带有流失和补充的自报幸福度的面板logit模型。

英文摘要

Many statistical and econometric problems involve parameters defined by moments of a joint distribution when only marginal distributions are observed, leading naturally to partial identification. We develop a methodology for identification, estimation, and inference in the corresponding partially identified GMM model. We characterize the sharp identified set for the parameter of interest via a support-function/optimal-transport (OT) representation. To estimate the identified set, we employ entropic regularization, which yields a smooth approximation to the classical OT problem that can be computed efficiently using the Sinkhorn algorithm. We also propose a test statistic for hypothesis testing and the construction of confidence regions for the identified set. To derive its asymptotic distribution, we establish a novel central limit theorem for the entropic OT value under general smooth cost functions. We then obtain valid critical values using the bootstrap for directionally differentiable functionals of Fang and Santos (2019). The resulting testing procedure controls size locally uniformly, including at parameter values on the boundary of the identified set. We demonstrate good finite-sample performance of our methodology in Monte Carlo simulations. Finally, as an empirical illustration, we estimate a panel logit model of self-reported happiness with attrition and refreshment, using data from the Understanding America Study.

URL PDF HTML ☆

赞 0 踩 0

2406.03360 2026-06-10 math.ST math.PR stat.TH 版本更新

On determinantal point processes with nonsymmetric kernels

关于非对称核的行列式点过程

Poinas Arnaud

AI总结本文利用$P_0$矩阵理论给出非对称核行列式点过程良定义的必要充分条件，并推广常见结果，进而构造对称核正则DPP的吸引耦合以建模异标记点间的吸引。

详情

DOI: 10.1214/26-EJP1550
Journal ref: Electronic Journal of Probability 2026, Vol. 31, paper no. 94, 1-32

AI中文摘要

行列式点过程（简称DPP）是一类排斥点过程。它们在统计中用于建模具有近距离排斥性的空间点模式数据集。在有限集上的DPP中，它们由一个称为DPP核的矩阵定义，该矩阵通常假设为对称的。虽然存在一些非对称核的DPP例子，但关于这对它们通常性质的影响知之甚少。在本文中，我们展示了如何将关于$P_0$矩阵的结果适应到DPP设置中，以获得非对称核DPP良定义的充分必要条件。我们还推广了DPP上的各种常见结果。然后，我们展示了如何利用这些结果构造具有对称核的正则DPP的吸引耦合，以建模具有相同标记点之间排斥和不同标记点之间吸引的空间标记点模式。

英文摘要

Determinantal point processes (DPPs for short) are a class of repulsive point processes. They have found some statistical applications to model spatial point pattern datasets with repulsion between close points. In the case of DPPs on finite sets, they are defined by a matrix called the DPP kernel which is usually assumed to be symmetric. While there are a few known examples of DPPs with nonsymmetric kernels, not much is known on how this affects their usual properties. In this paper, we demonstrate how to adapt the results on $P_0$ matrices to the DPP setting in order to get necessary and sufficient conditions for the well-definedness of DPPs with nonsymmetric kernels. We also generalize various common results on DPPs. We then show how to use these results to construct attractive couplings of regular DPPs with symmetric kernels in order to model spatial marked point patterns with repulsion between points of the same mark and attraction between points of different marks.

URL PDF HTML ☆

赞 0 踩 0

2503.05588 2026-06-10 math.PR math.ST stat.TH 版本更新

Optimal linear filtering of partially observed polynomial processes in discrete and continuous time

离散和连续时间下部分观测多项式过程的最优线性滤波

Jan Kallsen, Ivo Richert

AI总结针对部分观测的多项式过程，利用其与高斯过程在二阶矩上的不可区分性，构造高斯等价过程并显式计算最优线性滤波器、预测器和平滑器。

2310.13668 2026-06-10 math.PR math.ST stat.TH 版本更新

Variance Inequalities for Transformed Fréchet Means in Hadamard Spaces

Hadamard空间中变换Fréchet均值的方差不等式

Christof Schötz

AI总结研究Hadamard空间中变换Fréchet均值的方差不等式，涵盖Fréchet中位数、均值及Huber损失诱导均值，刻画了远离最小化器时期望变换距离的增长，并给出了Fréchet中位数唯一性的刻画。

详情

AI中文摘要

Fréchet均值（或重心）通过最小化到随机变量的期望平方距离，将随机变量的期望推广到度量空间。类似地，中位数可以通过其最小化期望绝对距离的性质来推广。我们考虑具有非递减凸变换且其导数为凹的一类变换Fréchet均值。该类包括Fréchet中位数、Fréchet均值、Huber损失诱导的Fréchet均值，以及与度量空间中稳健统计相关的其他统计量。我们研究这些变换Fréchet均值的方差不等式。这些不等式描述了当远离最小化器（即变换Fréchet均值）时，期望变换距离如何增长。方差不等式在变换Fréchet均值的估计和数值逼近理论中很有用。我们重点关注Hadamard空间（全局非正曲率的度量空间）中的方差不等式。值得注意的是，一些结果对欧几里得空间也是新的。此外，我们能够刻画变换Fréchet均值的唯一性，特别是Fréchet中位数的唯一性。

英文摘要

The Fréchet mean (or barycenter) generalizes the expectation of a random variable to metric spaces by minimizing the expected squared distance to the random variable. Similarly, the median can be generalized by its property of minimizing the expected absolute distance. We consider the class of transformed Fréchet means with nondecreasing, convex transformations that have a concave derivative. This class includes the Fréchet median, the Fréchet mean, the Huber loss-induced Fréchet mean, and other statistics related to robust statistics in metric spaces. We study variance inequalities for these transformed Fréchet means. These inequalities describe how the expected transformed distance grows when moving away from a minimizer, i.e., from a transformed Fréchet mean. Variance inequalities are useful in the theory of estimation and numerical approximation of transformed Fréchet means. Our focus is on variance inequalities in Hadamard spaces - metric spaces with globally nonpositive curvature. Notably, some results are new also for Euclidean spaces. Additionally, we are able to characterize uniqueness of transformed Fréchet means, in particular of the Fréchet median.

URL PDF HTML ☆

赞 0 踩 0

2403.17469 2026-06-10 math.ST cs.DB cs.DM math.CO stat.TH 版本更新

Geometric planted matchings beyond the Gaussian model

超越高斯模型的几何植入匹配

Lucas da Rocha Schwengber, Roberto Imbuzeiro Oliveira

AI总结研究随机点集与其扰动点集之间未知匹配的恢复问题，利用随机几何图中的匹配推导极小极大下界，并证明最小化欧氏距离平方和的估计器在固定维度下达到最优，在高维条件下以高概率无差错。

Comments 36 pages, 2 figures

详情

AI中文摘要

我们考虑在 $\mathbb{R}^d$ 中随机放置的 $n$ 个点与其随机扰动之间恢复未知匹配的问题。这可以视为粒子追踪以及更一般的实体解析的模型。我们利用随机几何图中的匹配来推导该问题在极大 generality 下的极小极大下界。利用这些结果，我们证明对于固定的 $d$，只要噪声分布具有有限的 $d$ 阶矩，且初始位置和噪声都具有有界连续密度，该问题的极小极大率以 $\Theta(n^2\sigma^d \wedge n)$ 缩放。在更强的假设下（噪声尾部为次高斯），我们证明当 $d$ 固定时，最小化欧氏距离平方和的估计器产生的错误数量阶是极小极大最优的；当 $d = o(\log n)$ 时，该最优性达到 $n^{o(1)}$ 因子内。在高维情形中，我们考虑初始位置和扰动都具有独立次高斯坐标的设置。在此设置下，我们给出充分条件，使得同一估计器以高概率不犯错误。我们对该估计器的改编版本（融入扰动协方差矩阵信息）证明了类似结果。

英文摘要

We consider the problem of recovering an unknown matching between a set of $n$ randomly placed points in $\mathbb{R}^d$ and random perturbations of these points. This can be seen as a model for particle tracking and more generally, entity resolution. We use matchings in random geometric graphs to derive minimax lower bounds for this problem that hold under great generality. Using these results we show that for a fixed $d$, as long as the noise distribution has finite $d$-th moment, and both initial positions and noise have bounded continuous densities, the minimax rate for the problem scales as $Θ(n^2σ^d \wedge n)$. Under the stronger assumptions that the tail of the noise is sub-Gaussian, we show that the order of the number of mistakes made by an estimator that minimizes the sum of squared Euclidean distances is minimax optimal when $d$ is fixed and is optimal up to $n^{o(1)}$ factors when $d = o(\log n)$. In the high-dimensional regime we consider a setup where both initial positions and perturbations have independent sub-Gaussian coordinates. In this setup we give sufficient conditions under which the same estimator makes no mistakes with high probability. We prove an analogous result for an adapted version of this estimator that incorporates information on the covariance matrix of the perturbations.

URL PDF HTML ☆

赞 0 踩 0

2502.03942 2026-06-10 stat.ME

A framework for joint assessment of a terminal event and a score existing only in the absence of the terminal event

终端事件和其不存在时的评分联合评估框架

Klaus Kähler Holst, Andreas Nordland, Julie Funch Furberg, Lars Holm Damgaard, Christian Bressen Pipper

AI总结本文提出一种框架，用于同时评估终端事件和在终端事件不存在时存在的评分，利用半参数统计方法估计风险和评分，并通过闭合检验程序验证治疗效果。

详情

DOI: 10.1080/10543406.2026.2670526

AI中文摘要

在易感人群中进行随机对照试验的数据分析需要特别关注，当通过测量疾病阶段或活动的评分以及终端事件的发生来评估治疗效果时。实际上，由于评分在终端事件后不再有临床意义，无法将疾病评分与终端事件分开。本文提出同时评估终端事件和在终端事件不存在时的评分。我们的方法基于自然的数据生成机制，尊重疾病评分在终端事件之后不存在的事实。我们使用现代半参数统计方法，提供稳健且高效的估计，以估计终端事件的风险和在预指定时间点无终端事件时的预期疾病评分。我们还利用估计量的联合渐近行为，开发一种强大的闭合检验程序，用于确认性评估终端事件的发生和在终端事件不存在时的评分水平。通过模拟研究和对实际试验的分析来评估性能。

英文摘要

Analysis of data from randomized controlled trials in vulnerable populations requires special attention when assessing treatment effect by a score measuring, e.g., disease stage or activity together with onset of prevalent terminal events. In reality, it is impossible to disentangle a disease score from the terminal event, since the score is not clinically meaningful after this event. In this work, we propose to assess treatment interventions simultaneously on the terminal event and the disease score in the absence of a terminal event. Our proposal is based on a natural data-generating mechanism, respecting that a disease score does not exist beyond the terminal event. We use modern semi-parametric statistical methods to provide robust and efficient estimation of the risk of terminal event and expected disease score conditional on no terminal event at a pre-specified landmark time. We also use the simultaneous asymptotic behaviour of our estimators to develop a powerful closed testing procedure for confirmatory assessment of treatment effect on both onset of terminal event and level of disease score in the absence of a terminal event. A simulation study mimicking a large-scale outcome trial in chronic kidney patients as well as an analysis of that trial is provided to assess performance.

URL PDF HTML ☆

赞 0 踩 0

2510.03844 2026-06-10 cs.LG stat.AP stat.ME

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

利用大型语言模型增强电子健康记录中临床驱动的缺失数据恢复算法

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

发表机构 * Department of Statistical Sciences, Wake Forest University（统计科学系，威克森林大学）； Wake Forest University（威克森林大学）； Wake Forest University School of Medicine（威克森林大学医学院）； Department of Psychology, North Carolina State University（心理学系，北卡罗来纳州立大学）； Department of Biostatistics and Data Science, Wake Forest University School of Medicine（生物统计学与数据科学系，威克森林大学医学院）； Department of Anesthesiology, Division of Critical Care Medicine, Wake Forest University School of Medicine（麻醉学系，重症医学科，威克森林大学医学院）； Outcomes Research Consortium（结局研究联盟）

AI总结本文探讨利用大型语言模型改进电子健康记录中缺失数据恢复算法的准确性与可扩展性，通过临床专家和LLM协同优化路标，实现与专家审查相似的数据恢复效果。

详情

DOI: 10.1093/jamiaopen/ooag080
Journal ref: 2026

AI中文摘要

目的：电子健康记录（EHR）数据易出现缺失和错误。先前，我们设计了一种“增强”图表审查协议，利用辅助诊断（路标）来恢复EHR数据中的缺失值（例如，糖尿病控制不良可能暗示缺失的血红蛋白A1c值不健康）。然而，图表审查成本高且耗时，限制了可审查患者的数量。现在，我们研究了基于ICD-10代码的路标驱动算法的准确性和可扩展性，以模拟专家图表审查并恢复缺失值。材料和方法：除了临床专家原始的路标外，我们考虑了通过大型语言模型（LLM）与临床专业知识结合迭代优化的新版本，以扩展辅助诊断列表。使用100名患者在扩展学习健康系统中的图表审查数据，我们检验了不同路标下的算法性能。在1000名患者的更大研究中，我们应用了最终算法，该算法使用了经临床专家批准的LLM添加的路标。结果：该算法恢复的缺失数据量与专家图表审查相当，甚至更多。讨论：临床驱动的算法（通过LLM增强）可以以与图表审查相似的准确性恢复EHR数据，并可应用于大规模样本。将这些算法扩展以监控其他数据质量维度（如合理性）是具有前景的未来方向。

英文摘要

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

URL PDF HTML ☆

赞 0 踩 0

1. 统计理论与方法 12 篇

A Functional Data Framework For Analyzing Shapes and Textures in Images

Two-Sample Homogeneity Test via Entropic Optimal Transport

Predicting Current Outcomes From Historical Survey Data With Weighted Conformal Prediction

Methods for adjusting for covariate measurement error in flexible modelling of functional form: results of a blinded, controlled neutral comparison simulation study

Estimating the Wasserstein barycenter of one-dimensional distributions under sparse sampling

An information-geometric framework for mapping maximum potential biodiversity

Correcting Variable Importance Scored by Random Forests

Conformal Prediction for Dyadic Regression Under Complex Missingness

Probabilistic Win Ratio Method For Hierarchical Composite Endpoints With Coarsened Outcomes

Two-Sample Hypothesis Testing for Subspace Equality in Network Data

HDSense: An efficient method for ranking observable sensitivity

Robust Design-Based Estimation and Inference for Stratified Randomized Trials with Varying Cluster Sizes

2. 贝叶斯统计与概率建模 4 篇

Robust Bayesian Predictive Model Selection using Bregman Divergence

Nonparametric Riemannian Empirical Bayes, and Denoising Measurements on Manifolds

Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics

Progression to the mean: A comparison of Bayesian clinical prediction models outputting the posterior mean versus conventional plug-in predictions

3. 因果推断与实验设计 4 篇

Empirical stratification for treatment effect heterogeneity with post-treatment variables

LMT: A Bayesian Framework for Causal Discovery from Textual Alarm Records in Manufacturing Systems

Minimum free energy randomized design to improve covariate balance

An Estimator-Robust Design for Augmenting Randomized Controlled Trials with External Real-World Data

4. 高维统计与正则化 3 篇

Data compression for fast dimension reduction and clustering of high-dimensional discrete data

Distributionally Robust PCA with Data-Adaptive Wasserstein Geometry

Symmetry-Aware Convex Shrinkage for High-Dimensional Covariance Estimation

5. 时间序列与空间统计 8 篇

Spatial Prediction of Local Soil Erosion Distribution in the Wasserstein Space

Stochastic weather generators for high-frequency wind vector time series

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

Wishart kernel density estimation for strongly mixing time series on the cone of positive definite matrices

A Flexible Approach to Augmenting a Bayesian VAR with Nonlinear Factors

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

Multiple change-point detection for Poisson point processes

Time series forecasting from partial observations via Non-negative Matrix Factorization

6. 计算统计与MCMC 7 篇

Nonlinear Estimator: Dual Bayesian Affine Estimators for Parameter Learning

Deterministic Denominator Design for Localized Tamed Stochastic-Gradient Langevin Dynamics

Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques

Rare Event Analysis via Stochastic Optimal Control

A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

Wasserstein Geometry of Information Loss in Nonlinear Dynamical Systems

Latent Guided Sampling for Combinatorial Optimization

7. 机器学习统计基础 30 篇

Itô maps for any-step SDEs

Generalized Conformal Predictive Systems Under Distributional Shifts

Human-AI Teaming Through the Lens of Calibration

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

Robust Active Learning for Few-Shot Example Selection in Text-to-SQL

Flexible Kernels for Protein Property Prediction

Conservation Laws from Data Symmetry in Neural Networks

SPACR: Single-Pass Adaptive Training of Uncertainty-Aware Conformal Regressors

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

Express Language Modeling

Range Penalization: Theoretical Insights with Applications in Federated Learning

$k$-Nearest Neighbors in Gromov--Wasserstein Space

Convergence Rates for Neural-Network Estimation with Current-Status Data

A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy Training

Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

Generalization in Nonlinear Least Squares via Learned Feature Geometry

Edge of Stability Selectively Shapes Learning Across the Data Distribution

Sample-efficient inductive matrix completion with noise and inexact side-information

Exact Functional ANOVA Decomposition for Categorical Inputs Models

Blind denoising diffusion models and the blessings of dimensionality

Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Mixtures of Neural Operators Reduce Active Complexity in Operator Learning

Infinity-norm-based Input-to-State-Stable Long Short-Term Memory networks: a thermal systems perspective

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

Post-Training Augmentation Invariance

An $(ε,δ)$-accurate level set estimation with a stopping criterion

8. 生物统计与医学统计 10 篇

Adressing Separation: A Firth-corrected Joint Model for Longitudinal and Time-to-event Data with an Application on Dropout from Vocational Training

Two-stage imputation of longitudinal anthropometric data with cross-reference harmonisation: a simulation study

Predicting Hospitalization from a Whole-Person Health Score with Incomplete Electronic Health Records Data: A Case Study

OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib