Divide-and-shrink: An efficient and heterogeneity-agnostic approach for transfer estimation using summary statistics
Divide-and-shrink: 一种利用汇总统计量进行迁移估计的高效且异质性无关的方法
Ruoyu Wang, Xihong Lin
AI总结 提出Divide-and-shrink方法,利用目标与外部总体的汇总统计量闭式估计目标参数,保证任意异质性下均优于仅用目标数据的估计,且无需模型或调参。
详情
跨数据源的知识转移通过利用来自不同来源的数据日益增长的可用性,有望改善目标总体参数的估计。然而,知识转移的有效性常常受到数据源之间复杂且普遍的异质性以及无法访问个体层面数据的挑战。本文提出了divide-and-shrink (dShrink) 方法,这是一种迁移估计方法,它利用来自目标总体和一些外部源总体的汇总统计量以闭式形式估计目标总体参数,同时考虑总体异质性。dShrink估计器在任意总体异质性下,保证在期望二次误差方面优于仅基于目标总体的估计器。当目标总体与源总体相似或潜在真实参数值接近零时,增益可能很大。值得注意的是,dShrink是无模型的,不需要用户指定的调优参数,对数据源之间的各种异质性具有鲁棒性,并适用于广泛的参数估计问题。即使外部汇总统计量的协方差矩阵不可访问,dShrink仍然有效,并提供了整合来自多个源总体的辅助信息和汇总统计量的灵活性。模拟和真实数据分析展示了dShrink估计器的优越性能及其作为迁移估计的稳健工具的潜力。
Knowledge transfer across data sources holds great promise for improving the estimation of target population parameters by leveraging the growing availability of data from different sources. However, the effectiveness of knowledge transfer is often challenged by the complex and pervasive heterogeneity between data sources and the lack of access to individual-level data. This paper proposes the divide-and-shrink (dShrink) method, a transfer estimation method that estimates target population parameters in a closed form using summary statistics from a target population and some external source populations while accounting for population heterogeneity. The dShrink estimator is guaranteed to outperform the estimator based solely on the target population in terms of expected quadratic error under arbitrary population heterogeneity. The gain can be substantial when the target and source populations are similar, or the underlying true parameter values are near zero. Notably, dShrink is model-free, requires no user-specified tuning parameters, robust to various types of heterogeneity between data sources, and applies to a broad range of parameter estimation problems. dShrink remains effective even when the covariance matrix is not accessible for the external summary statistics and offers flexibility in incorporating side information and summary statistics from multiple source populations. Simulations and real data analyses demonstrate the superior performance of the dShrink estimator and its potential as a robust tool for transfer estimation.