arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2136
专题追踪
2606.10540 2026-06-10 eess.SP 新提交

Complex VAE with Heavy-Tailed Likelihood for Radar Target Detection in Sea Clutter

基于重尾似然的复变分自编码器在海杂波中雷达目标检测

Ting Bai, Jun Tang, Yuxin Xu

AI总结 针对海杂波重尾、尖峰特性及目标标签稀缺问题,提出无监督复变分自编码器,采用Student-t负对数似然捕获重尾重构误差,并引入时域幅度误差约束,实现恒虚警率下的雷达目标检测。

详情
AI中文摘要

为了解决海杂波的重尾、尖峰特性以及标记目标数据的稀缺性,提出了一种用于海上雷达目标检测的无监督复值变分自编码器(VAE)。在实现中,每个复基带慢时间序列由其同相和正交分量表示,模型学习仅从杂波数据中重构它们。采用Student-\(t\)负对数似然来捕获重尾重构误差,同时减少杂波学习期间对异常值的敏感性。此外,引入了时域幅度误差约束,以惩罚重构中的慢时间幅度失配。在推理时,重构偏差用作检测统计量,并通过从仅杂波验证集估计的经验分位数设置决策阈值,以实现恒虚警率(CFAR)。在实测海杂波数据上的实验表明,在CFAR约束下,检测性能相对于MF、AMF和实值\(\beta\)-VAE持续提升。

英文摘要

To address the heavy-tailed, spike-prone nature of sea clutter and the scarcity of labeled target data, an unsupervised complex-valued variational autoencoder (VAE) for maritime radar target detection is proposed. In implementation, each complex baseband slow-time sequence is represented by its in-phase and quadrature components, and the model learns their joint reconstruction from clutter-only data. A Student-\(t\) negative log-likelihood is adopted to capture heavy-tailed reconstruction errors while reducing sensitivity to outliers during clutter learning. In addition, a time-domain amplitude error constraint is introduced to penalize slow-time magnitude mismatch in the reconstruction. At inference, reconstruction deviation is used as the detection statistic, and the decision threshold is set via an empirical quantile estimated from a clutter-only validation set to enforce a constant false-alarm rate (CFAR). Experiments on measured sea-clutter data show that detection performance is consistently improved over MF, AMF, and a real-valued \(β\)-VAE under CFAR constraints.

2606.10464 2026-06-10 eess.AS 新提交

GC-LoRA: Gated Convolutional LoRA for Parameter-Efficient Acoustic Adaptation

GC-LoRA:用于参数高效声学适应的门控卷积LoRA

Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan

AI总结 提出GC-LoRA适配器架构,通过注入Conformer风格的局部卷积处理到预训练Transformer编码器中,高效捕捉局部声学依赖,在多种声学失配领域实现高达10.9%的词错误率降低。

Comments Accepted for publication at Interspeech 2026

详情
AI中文摘要

基于Transformer的语音基础模型在大多数自动语音识别任务中表现出色,但在应用于声学特性不匹配的领域时,性能往往会下降。虽然参数高效微调(PEFT)方法(如低秩适应(LoRA))调整全局注意力,但它们缺乏对于捕捉领域特定变化至关重要的局部上下文建模。我们提出了GC-LoRA,一种新颖的适配器架构,将Conformer风格的局部卷积处理注入到预训练的Transformer编码器中。通过将轻量级适配器集成到编码器注意力输出投影中,我们的方法在不干扰预训练全局表示的情况下,高效地捕捉局部声学依赖。在多种数据集(声学退化、带限、方言、儿童语音)上的实验证明了我们方法的有效性,与基线相比,实现了高达10.9%的词错误率(WER)降低,同时仅增加少量可训练参数。

英文摘要

Transformer-based Speech Foundation Models excel in most Automatic Speech Recognition tasks but often suffer performance degradation when applied to domains with mismatched acoustic characteristics. While Parameter Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), adjust global attention, they lack the local context modeling crucial for capturing domain-specific variations. We propose GC-LoRA, a novel adapter architecture that injects Conformer-style local convolutional processing into pretrained Transformer encoders. By integrating a lightweight adapter to encoder attention output projections, our method efficiently captures local acoustic dependencies without disrupting pretrained global representations. Experiments across diverse datasets (acoustically-degraded, bandlimited, dialectal, child) demonstrate the efficacy of our approach, achieving Word Error Rate (WER) reductions of up to 10.9% compared to baselines while adding minimal trainable parameters.

2606.10240 2026-06-10 eess.IV 新提交

Laplace-Mixture Dipole Inversion for Quantitative Susceptibility Mapping

拉普拉斯混合偶极子反演用于定量磁化率成像

Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

AI总结 提出一种基于拉普拉斯混合先验的自动偶极子反演方法(LAMDI),无需手动调参即可在定量磁化率成像中保留精细解剖结构,性能与现有方法相当。

详情
AI中文摘要

目的:开发一种用于定量磁化率成像(QSM)的自动偶极子反演方法,在无需手动调整正则化参数的情况下保留精细解剖结构。理论:原始的带参数估计的近似消息传递(AMP-PE)框架使用单一拉普拉斯先验对图像梯度建模,未能充分捕捉脑磁化率图的重尾梯度分布。这种先验不匹配可能导致过度正则化和块状重建。我们通过使用双分量拉普拉斯混合先验对梯度建模来解决这一局限性。方法:我们提出一种拉普拉斯混合偶极子反演(LAMDI)方法,将双分量拉普拉斯混合先验融入具有自动参数估计的AMP-PE框架中。LAMDI在公开的体内数据集上进行了评估。其性能与FANSI、MEDI以及使用单一拉普拉斯先验的AMP-PE(AMP-PE-L1)在标准默认设置和参考调优设置下进行了比较。结果:在公开的多方向QSM数据集上,LAMDI实现了与AMP-PE-L1相当的NRMSE和SSIM,同时显著降低了HFEN,表明其更好地保留了高频解剖细节。在基于参考的调优下,FANSI和MEDI在某些指标上达到了最佳性能,但LAMDI在无需参考图或手动正则化调优的情况下仍具有竞争力。结论:LAMDI通过结合有竞争力的重建精度和改进的精细解剖细节保留,为QSM偶极子反演提供了一种有效且自动的参数估计替代方案。

英文摘要

Purpose: To develop an automatic dipole inversion method for quantitative susceptibility mapping (QSM) that preserves fine anatomical structures without the need for manual regularization-parameter tuning. Theory: The original approximate message passing with parameter estimation (AMP-PE) framework models image gradients with a single Laplace prior, which does not fully capture the heavy-tailed gradient distribution of brain susceptibility maps. This prior mismatch can lead to over-regularization and blocky reconstructions. We address this limitation by modeling the gradients with a two-component Laplace mixture prior. Methods: We propose a Laplace-Mixture Dipole Inversion (LAMDI) method by incorporating a two-component Laplace mixture prior into the AMP-PE framework with automatic parameter estimation. LAMDI was evaluated on a public in vivo dataset. Its performance was compared with FANSI, MEDI, and AMP-PE with a single-Laplace prior (AMP-PE-L1) under both standard default and reference-tuned settings. Results: On a public multi-orientation QSM dataset, LAMDI achieved NRMSE and SSIM comparable to AMP-PE-L1 while substantially reducing HFEN, suggesting improved preservation of high-frequency anatomical detail. Under reference-based tuning, FANSI and MEDI achieved the best performance for some metrics, but LAMDI remained competitive without requiring reference maps or manual regularization tuning. Conclusion: LAMDI provides an effective and automatic parameter-estimation alternative for QSM dipole inversion by combining competitive reconstruction accuracy with improved preservation of fine anatomical detail.

2606.10190 2026-06-10 eess.SP 新提交

Optimal Illumination via Joint Movement and Phase Optimization for Movable Antenna-RIS Configuration

可移动天线-RIS配置的联合移动与相位优化的最优照明

Yan Zhang, Nicola Marchetti, Indrakshi Dey

AI总结 提出可移动天线增强RIS架构,利用随机微分方程建模天线移动,通过两时间尺度框架优化长期信噪比,实现高达36 dB稳态增益和16倍能效提升。

详情
AI中文摘要

可重构智能表面(RIS)能够实现对无线传播的可编程控制,但在静态部署中仍易受持续深度衰落的影响。本文引入了一种可移动天线增强的RIS(MA-RIS)架构,其中天线元件物理重新定位以采样独立的空间信道,从而实现移动性带来的分集。我们使用随机微分方程(SDE)框架对天线运动进行建模,该框架捕获了受控漂移和环境扩散。基于伊藤微积分的分析表征了稳态天线分布、空间去相关和中断概率,揭示了控制强度与移动随机性之间的基本权衡。为了在考虑控制开销的同时最大化长期信噪比,我们提出了一种开销感知的两时间尺度框架,将慢速天线轨迹控制与快速相位适应分离。通过汉密尔顿-雅可比-贝尔曼(HJB)公式的预测近似求解随机最优控制问题,实现了实时实现。仿真验证了理论预测:两时间尺度策略实现了高达36 dB的稳态信噪比,具有显著的稳定性,比仅位置控制高出15 dB,比未控制基线高出30 dB以上。尽管信噪比低于有源RIS,但所提出的方法在不同系统规模下实现了高达16倍的能效提升,为弹性无线系统建立了移动性驱动的信道适应新范式。

英文摘要

Reconfigurable intelligent surfaces (RIS) enable programmable control of wireless propagation but remain vulnerable to persistent deep fades in static deployments. This paper introduces a Movable Antenna-enhanced RIS (MA-RIS) architecture where antenna elements physically reposition to sample independent spatial channels, enabling mobility-induced diversity. We model antenna motion using a Stochastic Differential Equation (SDE) framework capturing controlled drift and environmental diffusion. It^o calculus-based analysis characterizes steady-state antenna distributions, spatial decorrelation, and outage probability, revealing fundamental trade-offs between control strength and mobility randomness. To maximize long-term SNR while accounting for control overhead, we propose an overhead-aware Two-timescale framework separating slow antenna trajectory control from fast phase adaptation. The stochastic optimal control problem is solved via predictive approximation of the Hamilton-Jacobi-Bellman (HJB) formulation, enabling real-time implementation. Simulations validate theoretical predictions: the Two-timescale strategy achieves up to 36 dB steady-state SNR with remarkable stability, outperforming position-only control by up to 15 dB and uncontrolled baselines by over 30 dB. Despite experiencing a lower SNR than Active RIS, the proposed approach delivers up to 16 times higher energy efficiency (EE) across varying system scales, establishing a new paradigm of mobility-enabled channel adaptation for resilient wireless systems.

2606.10164 2026-06-10 eess.SP 新提交

Curved Beam Enabled Wireless Communications: Modeling, Analysis and Optimization

弯曲波束赋能无线通信:建模、分析与优化

Jiawei Yao, Xiaoren Xu, Walid Saad, Mingzhe Chen

AI总结 针对障碍物场景,提出利用连续孔径阵列生成弯曲波束以提升无线通信性能,通过建模波束控制与分段信道,设计基于分数规划和增强块坐标上升的迭代算法优化加权和速率。

详情
AI中文摘要

本文研究了在存在障碍物的情况下,利用弯曲波束提升无线通信性能的问题。特别地,配备连续孔径阵列的发射机可以通过允许信号沿直线和弯曲路径传播来生成弯曲波束,以服务多个接收机。为了优化加权和速率,本文开发了一种弯曲波束模型,用于控制波束转向、波束聚焦和波束弯曲功能,并建立了一种分段信道模型来表征由障碍物引起的实际信道。基于所引入的弯曲波束模型,提出了一个优化问题,目标是在发射功率预算和弯曲波束物理约束下最大化所有用户的加权和速率。为了解决该问题,首先通过对连续坐标进行离散采样,将连续孔径转换为有限求和。然后,分析了理想连续孔径设计与其实际离散孔径近似之间的性能差距。基于上述离散近似,开发了一种迭代算法来优化弯曲波束控制参数。具体地,通过分数规划(FP)将原问题重新表述为可处理的形式。然后,通过设计一种增强的块坐标上升(BCA)方法来解决变换后的问题,该方法利用先前迭代的局部下降来确定代理构造点,从而加速收敛。接着,在代理函数中加入近端正则化项以控制更新幅度并抑制激进更新,从而提高更新稳定性。最后,基于有效信道增益计算波束幅度。仿真结果表明,与仅使用直线波束相比,所提方法可以改善加权和速率。

英文摘要

In this paper, the problem of using curved beams to improve wireless communication performance in the presence of a blockage is studied. In particular, a transmitter equipped with a continuous aperture array can generate curved beams to serve multiple receivers by allowing signals to propagate along both straight and curved paths. To optimize the weighted sum-rate, a curved beam model is developed for controlling the beam steering, beam focusing, and beam curving functions, along with a segmented channel model to characterize practical channels induced by the blockage. Based on the introduced curved beam model, an optimization problem is posed with the goal of maximizing the weighted sum-rate of all users under a transmit power budget and physical constraints of curved beams. To solve this problem, the continuous aperture is first converted into finite summations via a discrete sampling of the continuous coordinate. Then, the performance gap between the ideal continuous aperture design and its practical discrete aperture approximation is analyzed. Based on the above discrete approximation, an iterative algorithm is developed to optimize curved beam control parameters. In particular, the original problem is reformulated as a trackable form via fractional programming (FP). Then, the transformed problem is solved by designing an enhanced block coordinate ascent (BCA) method which determines a surrogate-construction point leveraging the local descent from previous iterations, thereby accelerating convergence. Then, a proximal regularization term is included into the surrogate function to control the update magnitude and suppress aggressive update, thereby improving updates stability. Finally, the beam amplitudes are computed based on the effective channel gains. Simulation results show that the proposed method can improve the weighted sum-rate compared to using only straight beam.

2606.10048 2026-06-10 eess.SP 新提交

Human Walking Sensing and Pose Estimation in the 6 GHz Band Using Amplitude and Phase CSI

使用幅度和相位CSI在6 GHz频段进行人体行走感知与姿态估计

Zhaorui Yin, Mattia Brambilla, Monica Nicoli

AI总结 研究利用6 GHz OFDM信号的幅度和相位CSI进行室内人体姿态估计,设计处理流程并适配四种深度学习模型,实验表明幅度CSI性能与联合幅度-相位处理相当,相位信息作为补充特征更有效。

详情
AI中文摘要

本文研究了在6 GHz频段运行的室内多基地无线网络中,利用正交频分复用(OFDM)信号进行人体姿态估计。我们设计并验证了一个处理流程,该流程利用来自多个无线电链路的信道状态信息(CSI)的幅度和相位来估计人体姿态。文献中的四种深度学习架构,即DT-Pose、MetaFi++、HPE-Li和VST-Pose,被适配到OFDM CSI结构,并扩展以联合利用幅度和相位信息。这些模型估计在网络覆盖区域内行走的人体姿态。使用标准姿态估计指标如Procrustes对齐平均每关节位置误差(PA-MPJPE)和骨骼长度损失(BLL)在开放访问数据集上进行性能评估。结果表明,从6 GHz OFDM CSI测量中可以实现可靠的人体姿态重建,其中DT-Pose提供了最佳的整体精度。平均而言,仅幅度CSI的性能与联合幅度-相位处理相当,而相位信息作为补充特征比作为独立输入更有益。

英文摘要

This paper investigates human pose estimation from Orthogonal Frequency-Division Multiplexing (OFDM) signals in an indoor multistatic wireless network operating in the 6 GHz band. We design and validate a processing pipeline that exploits both the amplitude and phase of the Channel State Information (CSI) from multiple radio links to estimate the human body pose. Four deep learning architectures from the literature, namely DT-Pose, MetaFi++, HPE-Li, and VST-Pose, are adapted to the OFDM CSI structure and extended to jointly exploit the amplitude and phase information. The models estimate the pose of a human walking within the network coverage area. Performance evaluation is conducted on an open-access dataset using standard pose-estimation metrics such as Procrustes-aligned Mean Per-Joint Position Error (PA-MPJPE) and Bone Length Loss (BLL). Results indicate that reliable human pose reconstruction can be achieved from 6 GHz OFDM CSI measurements, with DT-Pose providing the best overall accuracy. On average, amplitude-only CSI yields performance comparable to joint amplitude-phase processing, whereas phase information is more beneficial as a complementary feature rather than as a standalone input.

2606.11013 2026-06-10 stat.ME 新提交

Empirical stratification for treatment effect heterogeneity with post-treatment variables

治疗后变量处理效应异质性的经验分层

Chao Cheng, Rui Wang, Yichi Zhang

AI总结 提出一种假设精简的经验分层框架,通过基于基线协变量预测的潜在治疗后变量响应定义经验得分,构建可识别的经验分层处理效应,并连接主分层因果效应。

详情
AI中文摘要

治疗后变量(PVs),如治疗不依从、行为反应、中间事件,常常改变对主要结局的最终处理效应。然而,现有方法在研究中针对PVs的处理效应异质性方面提供的工具有限。传统的异质性处理效应估计量以基线协变量为条件。然而,类似地以观察到的PV为条件会引发处理效应估计的内生选择偏差。主分层为研究跨主分层的因果效应提供了严格的框架,但主分层是潜在的,其识别通常需要严格的假设。本文开发了一个假设精简的经验分层框架,用于表征针对PVs的处理效应异质性。我们使用基于基线协变量预测的潜在PV响应来定义经验得分,并利用经验得分构建经验上可访问的子组。由此产生的经验分层处理效应(ETEs)在标准因果假设下是可识别的。我们将所提出的框架与主分层联系起来,表明平均ETE在主忽略性假设下恢复了主因果效应,但在违反该假设时仍然具有信息量。我们进一步引入了投影ETE曲线,并开发了基于高效影响函数的半参数推断估计量。我们通过两个实际应用说明了所提出的框架。

英文摘要

Post-treatment variables (PVs), such as treatment noncompliance, behavioral responses, intercurrent events, often modify the ultimate treatment effect on the primary outcome. However, existing methods provide limited tools for studying treatment effect heterogeneity with respect to PVs. Conventional heterogeneous treatment effect estimands condition on baseline covariates. However, similarly conditioning on the observed PV can induce endogenous selection bias for the treatment effect estimation. Principal stratification offers a rigorous framework for studying principal causal effects across principal strata, but principal strata are latent and their identification often requires stringent assumptions. This paper develops an assumption-lean empirical stratification framework for characterizing treatment effect heterogeneity with respect to PVs. We define empirical scores using the predicted potential PV responses based on baseline covariates, and use the empirical scores to construct empirically accessible subgroups. The resulting empirical-stratum treatment effects (ETEs) are identifiable under standard causal assumptions. We connect the proposed framework to principal stratification by showing that the average ETE recovers principal causal effects under the principal ignorability assumption, but remains informative under violations of this assumption. We further introduce projected ETE curves and develop efficient influence function-based estimators for the semiparametric inference. We illustrate the proposed framework with two real-world applications.

2606.10969 2026-06-10 stat.ME 新提交

A Functional Data Framework For Analyzing Shapes and Textures in Images

图像形状与纹理分析的函数数据框架

Issam-Ali Moindjié

AI总结 提出一种基于函数数据分析的星形域图像表示方法,降低维度与计算成本,并应用于监督分类。

详情
AI中文摘要

图像表示由轮廓和纹理特征刻画的物体。从统计角度看,这些特征可定义为连续随机函数的观测。然而,大多数现有方法依赖于基于像素的离散化,导致高维表示和沉重的计算成本。本文介绍了一种更经济的替代表示。该表示假设物体具有星形域内部。在此条件下,我们从函数数据分析的角度探索图像分析。所提出的框架在真实数据监督图像分类问题上进行了说明。

英文摘要

Images represent objects characterized by contours and textures. From a statistical perspective these features can be defined as observations of continuous random functions. However, most existing approaches rely on pixel-based discretizations which lead to high-dimensional representations and heavy computational costs. In this note, we introduce an alternative more frugal representation. This representation assumes that the object has a star-shaped domain interior. Under this condition, we explore the analysis of images from a functional data analysis perspective. The proposed framework is illustrated on a real data supervised image classification problem.

2606.10866 2026-06-10 stat.ME stat.AP stat.CO 新提交

Adressing Separation: A Firth-corrected Joint Model for Longitudinal and Time-to-event Data with an Application on Dropout from Vocational Training

解决分离问题:纵向与时间-事件数据的Firth校正联合模型及其在职业培训辍学中的应用

Sophie Potts, Viola Deutscher, Elisabeth Bergherr

AI总结 针对联合模型中分类协变量分离导致估计偏差的问题,引入Firth校正到极大似然估计中,通过EM算法实现参数估计,模拟和实际数据表明该方法能降低偏差,并应用于德国职业培训辍学影响因素分析。

详情
AI中文摘要

纵向与时间-事件数据的联合模型常用于建模内源性纵向协变量与时间-事件结局的关系。然而,该类模型继承了生存子模型的一些局限性,包括分类协变量每个类别必须非分离。因此,我们将Firth校正引入联合模型的频率学派估计过程,使模型类适用于存在分离情况的数据集。我们推导了校正项所需的量,并在联合模型的参数估计中将其实现于期望最大化算法。我们的模拟研究表明,在存在分离问题的数据情境下,Firth校正估计过程产生更少偏差的估计,且相应系数趋近于非分离情况下观察到的估计值。在关于职业培训满意度和辍学数据集上的应用展示了Firth校正联合模型在真实世界分离数据集中的优势。结果通过明确建模社会经济和培训特定因素对辍学风险的直接效应以及它们通过培训满意度的间接贡献,补充了德国职业培训辍学研究的文献。

英文摘要

Joint Models for longitudinal and time-to-event data are frequently used to model endogenous longitudinal covariates alongside a time-to-event outcome. However, the model class borrows some limitations of the survival submodels, including the necessity for non-separation for each category of categorical covariates. We therefore incorporate Firth's correction into the frequentist estimation procedure of joint models in order to make the model class applicable in settings with separation cases. We derive the needed quantities for the correction term and implement it in the Expectation-Maximization Algorithm for the parameter estimation in joint models. Our simulation study shows, that in data situations with separation issues, the Firth-corrected estimation procedure yields less biased estimates and the respective coefficients approach the estimated values observed in the non-separation cases. The application on a data set on satisfaction with and dropouts from vocational training demonstrates the advantages of the Firth-corrected joint model in a real world data set with separation. The results add to the literature on dropout from vocational training in Germany by explicitly modeling direct effects of socioeconomic and training-specific factors on the risk of dropout as well as their indirect contribution via satisfaction with the training.

2606.10772 2026-06-10 stat.AP 新提交

Structural Under-Representation of Women in News: Nonparametric Bayesian Mixtures Capture Time-Dependent Dynamics

新闻中女性的结构性低代表性:非参数贝叶斯混合模型捕捉时间依赖动态

Isabella Habereder, Thomas Kneib, Isao Echizen, Timo Spinde

AI总结 采用时间依赖贝叶斯混合模型分析加拿大新闻数据,揭示女性引述比例在所有主题和地区中均存在结构性低代表性,且超过85%的时间序列未见改善。

详情
AI中文摘要

女性作为新闻媒体引用来源的低代表性是性别偏见的一种显著表现。理解性别偏见的集中区域及其演变方式对于有针对性的缓解至关重要。由于性别代表性随主题、时间和报道地区而变化,产生难以用参数化方法捕捉的复杂依赖关系,我们采用非参数模型来揭示潜在聚类结构和时间动态。我们将时间依赖贝叶斯混合建模技术与针对女性引述份额(介于0和1之间)的Beta混合核相结合。该模型拟合了2019年至2024年的加拿大新闻文章,揭示了所有聚类中女性的结构性低代表性,其中新闻主题对女性引述份额差异的影响比报道地区更强。超过85%的主题-地区时间序列在观察期内未显示向性别平等的改善。动态密度估计证实,女性引述份额的总体分布在2019年至2024年间保持稳定。我们的应用表明,高级概率模型不仅能复现性别偏见研究中的发现,还能揭示简单方法遗漏的潜在依赖关系和结构模式,鼓励未来采用基于模型的框架研究媒体偏见。

英文摘要

The under-representation of women as sources cited in news media is one prominent representation of gender bias. Understanding where gender bias concentrates and how it evolves is essential for targeted mitigation. Because gender representation varies across topics, time, and reported-on regions, creating complex dependencies that are difficult to capture parametrically, we employ a nonparametric model to uncover latent cluster structures and temporal dynamics. We combine time-dependent Bayesian mixture modeling techniques with a Beta mixture kernel tailored to female quote shares, bounded between 0 and 1. Fitted on Canadian news articles from 2019 to 2024, the model reveals structural under-representation of women across all clusters, with news topic driving differences in female quote shares more strongly than the reported-on region. More than 85% of topic-region time series show no improvement toward gender parity over the observation period. Dynamic density estimation confirms that the aggregate distribution of female quote shares remains stable between 2019 and 2024. Our application demonstrates that advanced probabilistic models not only reproduce findings in gender bias research but also reveal latent dependencies and structural patterns that simpler approaches miss, encouraging future adoption of model-based frameworks for studying media bias.

2606.10767 2026-06-10 stat.ME 新提交

Two-Sample Homogeneity Test via Entropic Optimal Transport

基于熵正则最优传输的两样本同质性检验

Yiming Ma, Hang Liu, Weiwei Zhuang

AI总结 提出基于熵正则最优传输映射的两样本同质性检验,利用平方L2距离作为统计量,证明可识别性、中心极限定理及局部渐近功效,并通过加权乘子自助法校准零分布。

详情
AI中文摘要

本文提出了一种基于熵正则最优传输(EOT)映射的两样本同质性检验,该映射来自一个共同的参考分布——单位球上的均匀分布。检验统计量是两个经验EOT映射之间的平方$L^2$距离。对于固定的熵正则化参数,我们证明了总体映射差异是可识别的,推导了零假设下经验映射差异的函数中心极限定理,并建立了高斯二次型零极限。我们还证明了对固定备择假设的一致性,并刻画了连续备择假设下的局部渐近功效。提出了一种加权乘子自助法来校准非枢轴零分布,并证明了其有效性。大量模拟表明,所提出的EOT映射检验具有可靠的有限样本大小控制,并且与其他现有方法相比具有竞争性的功效。该方法对于位置备择假设特别有效,并且除了单一的标量差异外,它还提供了关于两个分布如何不同的额外诊断信息。最后,一个真实数据应用结束了本文。

英文摘要

This paper proposes a two-sample homogeneity test based on entropic optimal transport (EOT) maps from a common reference distribution -- the uniform law on the unit ball. The test statistic is the squared $L^2$-distance between the two empirical EOT maps. For fixed entropic regularization parameter, we prove that the population map discrepancy is identifiable, derive a functional central limit theorem for the empirical map difference under the null, and establish the Gaussian quadratic-form null limit. We also prove consistency against fixed alternatives and characterize local asymptotic power under contiguous alternatives. A weighted multiplier bootstrap is proposed to calibrate the non-pivotal null distribution, and its validity is established. Extensive simulations demonstrate that the proposed EOT-map test has reliable finite-sample size control and exhibits competitive power compared with other existing methods. The method is particularly powerful for location alternatives and, beyond a single scalar discrepancy, it provides additional diagnostic information on how the two distributions differ. Finally, a real data application concludes the paper.

2606.10593 2026-06-10 stat.ME stat.CO 新提交

Data compression for fast dimension reduction and clustering of high-dimensional discrete data

面向高维离散数据的快速降维与聚类的数据压缩方法

Silvia D'Angelo, Michael Fop

AI总结 提出一种确定性降维框架,通过缩放位置编码的加权和将高维离散观测压缩为低维连续表示,保证单射性、近似高斯性及聚类中心可分离性,计算高效且适用于多种数据类型。

详情
AI中文摘要

高维离散数据出现在许多当代应用中,包括基因组学、微生物组研究、调查研究以及数字行为分析。对此类数据进行聚类仍然具有挑战性,因为现有方法通常计算要求高、对稀疏性和离散性敏感,或针对特定数据类型设计。我们提出了一种用于聚类高维离散观测的确定性降维框架。该方法通过由缩放位置编码定义的加权和,将每个观测压缩为低维连续表示,产生一种适用于二值、分类和计数数据的数值稳定变换。我们建立了所提出压缩的几个理论性质。该映射是单射的,确保不同的观测在压缩后保持不同。在温和的正则条件下,压缩变量近似服从高斯分布,为压缩空间中的基于模型的聚类提供了理论基础。我们进一步证明,聚类中心之间的分离度在压缩下得以保持,这意味着降维后位置驱动的聚类结构仍然可识别。广泛的模拟研究表明,在多种现实场景下聚类恢复准确。所提出的方法计算效率高,与常用于聚类的降维技术相比,速度显著提升。对爱尔兰婴儿名字记录和微生物组数据的应用进一步说明了其实用性。该框架提供了一种可扩展、计算高效且广泛适用的高维离散数据聚类方法。

英文摘要

High-dimensional discrete data arise in many contemporary applications, including genomics, microbiome research, survey studies, and digital behavioral analysis. Clustering such data remains challenging because existing methods are often computationally demanding, sensitive to sparsity and discreteness, or designed for specific data types. We propose a deterministic dimension-reduction framework for clustering high-dimensional discrete observations. The method compresses each observation into a low-dimensional continuous representation through weighted sums defined by a scaled positional encoding, yielding a numerically stable transformation applicable to binary, categorical, and count-valued data. We establish several theoretical properties of the proposed compression. The mapping is injective, ensuring that distinct observations remain distinct after compression. Under mild regularity conditions, the compressed variables admit an approximate Gaussian representation, providing a theoretical basis for model-based clustering in the compressed space. We further show that separation between cluster centroids is preserved under compression, implying that location-driven cluster structure remains identifiable after dimension reduction. Extensive simulation studies demonstrate accurate cluster recovery across a wide range of realistic settings. The proposed approach is also computationally efficient, providing substantial speed improvements over commonly used dimension-reduction techniques often used in conjunction with clustering. Applications to Irish baby-name records and microbiome data further illustrate its practical utility. The proposed framework offers a scalable, computationally efficient, and broadly applicable approach to clustering high-dimensional discrete data.

2606.10574 2026-06-10 stat.AP stat.ME 新提交

Two-stage imputation of longitudinal anthropometric data with cross-reference harmonisation: a simulation study

纵向人体测量数据的二阶段插补与交叉参考协调:一项模拟研究

Flavia Alves

AI总结 提出一种二阶段方法,通过线性插补和基于LMS方法的生长参考插补,解决纵向数据中缺失的人体测量值,并显式处理不同参考标准,模拟显示误差小且无偏。

详情
AI中文摘要

目标。纵向数据集经常缺失体重和身高测量值,而合并数据源的研究可能针对不同的生长参考标准(例如WHO参考和CDC图表)对测量值进行索引。我们描述并评估了一种可复现的二阶段方法,该方法在将参考标准的选择作为显式参数的同时,对缺失的人体测量数据进行插补。方法。阶段1在访视日期之间应用受试者内线性插值(仅内部间隙,无外推)。阶段2使用LMS方法,通过估计每个受试者的百分位数,在受试者内向前和向后携带该百分位数,当受试者从未被测量时默认使用第50百分位数,并从访视年龄的参考标准中读取期望值,从而从年龄和性别特异性生长参考中插补剩余值。可以为每个数据源提供不同的参考标准,以便记录和审计所应用的标准。我们通过掩盖并重新插补随机20%的观测值来评估恢复准确性。所有评估均使用计算机生成的合成数据。结果。在合成数据(n=60名受试者,288次访视,30%缺失)上,该方法将缺失率解决为100%完整。掩盖值恢复的体重平均绝对误差为1.78 kg(平均绝对百分比误差3.5%),身高为2.84 cm(2.0%),偏差可忽略。受试者内插值恢复的值比从生长参考恢复的值更准确,符合预期,支持二阶段顺序。结论。该方法提供了一种简单、无依赖且可审计的人体测量插补方法,显式处理不同的参考标准和每个值的来源。在用于实质性分析之前,下一步必要的工作是应用于实证数据并将插补不确定性传播到下游模型中。

英文摘要

Objective. Longitudinal datasets frequently contain missing weight and height measurements, and studies that combine data sources may index measurements against different growth reference standards (e.g., the WHO reference and CDC charts). We describe and evaluate a reproducible two-stage method that imputes missing anthropometry while making the choice of reference standard an explicit parameter. Methods. Stage 1 applies within-subject linear interpolation across visit dates (interior gaps only, no extrapolation). Stage 2 imputes remaining values from an age- and sex-specific growth reference using the LMS method by estimating each subject's centile, carrying it forward and backwards within the subject, defaulting to the 50th centile when a subject is never measured, and reading the expected value off the reference at the visit age. Different references can be supplied per data source so that the standard applied is recorded and auditable. We assessed recovery accuracy by masking and re-imputing a random 20% of observed values. All evaluations used computer-generated synthetic data. Results. On synthetic data (n = 60 subjects, 288 visits, 30% missing), the method resolved missingness to 100% completeness. Masked-value recovery gave a mean absolute error of 1.78 kg for weight (3.5% mean absolute percentage error) and 2.84 cm for height (2.0%), with negligible bias. Values recovered by within-subject interpolation were more accurate than those recovered from the growth reference, as expected, supporting the two-stage ordering. Conclusion. The method offers a simple, dependency-free, and auditable approach to anthropometric imputation, with explicit handling of differing reference standards and per-value provenance. Application to empirical data and propagation of imputation uncertainty into downstream models are the necessary next steps before use in substantive analyses.

2606.10563 2026-06-10 stat.ME 新提交

Predicting Current Outcomes From Historical Survey Data With Weighted Conformal Prediction

基于加权共形预测从历史调查数据预测当前结果

Chihoon Lee, Sungkyu Jung, Hyokyung G. Hong

AI总结 针对大规模调查中部分结果仅在特定年份测量的缺失问题,提出加权共形预测框架,通过估计历史与目标协变量分布间的似然比,实现有效的总体水平预测,并保证覆盖概率。

Comments Submitted to Journal of the Royal Statistical Society Series B. 89 pages, 14 figures. Includes supplementary material

详情
AI中文摘要

在诸如国家健康与营养调查(NHANES)等大规模复杂调查中,某些结果仅在选定的年份进行测量,导致不同调查波次间记录不完整。我们开发了一个加权共形预测框架,能够利用早期调查的信息对未观测到的结果进行有效的总体水平预测。该方法适应协变量偏移,其中连续和分类协变量的分布随时间演变,同时调查设计影响代表性。它整合了子组特定的密度比和子组比例估计,以近似历史与目标协变量分布之间的似然比,并且我们为所得预测集建立了覆盖保证。模拟研究和一项预测当前美国人口低密度脂蛋白胆固醇(LDL-C)的应用表明,所提出的方法实现了接近名义水平的覆盖,并且在效率上优于现有方法,特别是在协变量分布复杂或未知的情况下。

英文摘要

In large-scale complex surveys such as the National Health and Nutrition Examination Survey (NHANES), some outcomes are measured only in selected years, leaving incomplete records across survey waves. We develop a weighted conformal prediction framework that enables valid population-level prediction of unobserved outcomes using information from earlier surveys. The method accommodates covariate shift, where both continuous and categorical covariate distributions evolve over time while survey design affects representativeness. It integrates subgroup-specific density ratio and subgroup-proportion estimation to approximate likelihood ratios between the historical and target covariate distributions, and we establish coverage guarantees for the resulting prediction sets. Simulation studies and an application predicting low-density lipoprotein cholesterol (LDL-C) for the current U.S. population show that the proposed approach achieves coverage close to the nominal level and improved efficiency over existing methods, particularly when covariate distributions are complex or unknown.

2606.10409 2026-06-10 stat.ME 新提交

Robust Bayesian Predictive Model Selection using Bregman Divergence

使用Bregman散度的稳健贝叶斯预测模型选择

Jongwoo Choi, Neil A. Spencer, Dipak K. Dey

AI总结 针对基于对数得分的ELPD对异常值和尾部不匹配敏感的问题,提出基于Bregman散度的广义ELPD框架,通过β-散度族控制低密度观测影响,实现稳健模型选择。

详情
AI中文摘要

预测性贝叶斯模型比较通常依赖于留一法交叉验证准则,如期望对数预测密度(ELPD)。然而,由于ELPD基于对数得分,模型排名可能对异常值和尾部不匹配过于敏感。我们提出一个得分匹配的广义ELPD框架,用Bregman评分规则替换对数得分,通过广义后验更新模型参数并评估留一法预测效用。候选后验预测分布根据所选评分规则下的样本外效用进行排序,从而得到标准ELPD的直接正确得分推广。我们特别关注β-散度族,其中β控制预测比较对低密度观测的敏感性。在模型误设定下,该过程渐近选择预测分布与数据生成过程在所选Bregman散度下最接近的模型。模拟研究和微生物及法医数据应用表明,广义ELPD通过降低对低密度观测的敏感性可以改变所选模型。

英文摘要

Predictive Bayesian model comparison often relies on leave-one-out (LOO) cross-validation criteria such as the expected log predictive density (ELPD). However, model rankings can be overly sensitive to outliers and tail mismatch because ELPD is based on the log score. We propose a score-matched generalized ELPD framework that replaces the log score by a Bregman scoring rule to update model parameters through a generalized posterior and to evaluate LOO predictive utility. Candidate posterior predictive distributions are ranked by out-of-sample utility under the chosen scoring rule, yielding a direct proper-score generalization of standard ELPD. We focus especially on the $β$-divergence family, where $β$ controls the sensitivity of predictive comparison to low-density observations. Under model misspecification, the procedure asymptotically selects the model whose predictive distribution is closest to the data-generating process under the chosen Bregman divergence. A simulation study and applications to microbial and forensic data show that the generalized ELPD can change the selected model through reduced sensitivity to low-density observations.

2606.10342 2026-06-10 stat.AP 新提交

Binomial Smoothing for Inventory and Information Control in Supply Chains

供应链中库存与信息控制的二项式平滑

Rene Caldentey, Avi Giloni, Clifford Hurvich, Prem Talwai, Yichen Zhang

AI总结 针对分散供应链中零售商订单平滑与上游预测的权衡,提出二项式平滑策略,在最小化制造商预测误差的同时保持可逆性,并实现常数因子近似最优。

Comments 59 pages, 7 figures, 4 tables

详情
AI中文摘要

在许多分散的供应链中,上游企业不直接观察市场需求,而是从订单流推断下游状况。因此,零售商的补货策略扮演双重角色:它管理库存补货并塑造上游预测可用的信息。这产生了一个基本权衡:更平滑的订单提高上游可预测性,但延迟对需求的响应可能增加下游库存成本。我们研究在一个由一个零售商和一个制造商组成的两层供应链中,当制造商根据零售商的订单历史预测未来订单时,零售商应如何最优地平滑需求。我们提出二项式平滑,一类补货策略,通过使用二项式权重将每个需求单位分散到有限时间范围内来实现延迟需求响应。该类策略可解释、易于校准且解析易处理。在满足温和正则条件的弱平稳高斯需求下,我们证明,对于任何固定平滑时间范围,在所有具有相同平滑程度的策略中,二项式策略最小化制造商的预测误差。它保持可逆性,因此制造商可以从观察到的订单中恢复需求历史。更一般地,二项式平滑相对于最优策略实现了常数因子近似保证。我们的结果产生更广泛的见解:补货策略的设计不应仅仅像传统牛鞭效应度量那样减少订单方差,而应减少订单的不可预测成分。精心设计的平滑可以提高供应链绩效并部分替代信息共享,为无需协作的协调提供具体机制。

英文摘要

In many decentralized supply chains, upstream firms do not observe market demand directly and instead infer downstream conditions from the order stream. A retailer's replenishment policy therefore plays a dual role: it governs inventory replenishment and shapes the information available for upstream forecasting. This creates a fundamental trade-off. Smoother orders improve upstream predictability, but delaying the response to demand can increase downstream inventory costs. We study how a retailer should optimally smooth demand in a two-tier supply chain with one retailer and one manufacturer when the manufacturer forecasts future orders from the retailer's order history. We propose Binomial Smoothing, a class of replenishment policies that implements delayed demand response by spreading each unit of demand over a finite horizon using binomial weights. The class is interpretable, easy to calibrate, and analytically tractable. Under weakly stationary Gaussian demand satisfying mild regularity conditions, we show that, for any fixed smoothing horizon, the Binomial policy minimizes the manufacturer's forecast error among all policies with the same degree of smoothing. It remains invertible, so the manufacturer can recover demand history from observed orders. More generally, Binomial Smoothing achieves a constant-factor approximation guarantee relative to an optimal policy. Our results yield a broader insight: replenishment policies should be designed not merely to reduce order variance, as in the traditional bullwhip measure, but to reduce the unpredictable component of orders. Carefully designed smoothing can improve supply-chain performance and partially substitute for information sharing, providing a concrete mechanism for coordination without collaboration.

2606.10224 2026-06-10 stat.ME stat.AP 新提交

Spatial Prediction of Local Soil Erosion Distribution in the Wasserstein Space

Wasserstein空间中局部土壤侵蚀分布的空间预测

Jiaming Qiu, Xiongtao Dai, Zhengyuan Zhu, Shuiqing Yin

AI总结 提出一种将局部侵蚀分布视为Wasserstein空间对象,通过基展开和多元随机场建模,结合局部回归和克里金法进行空间预测的新方法,在模拟和陕西省实际数据中优于现有方法。

Comments To appear in the Annals of Applied Statistics

详情
AI中文摘要

获取精确的侵蚀测量需要昂贵的实地工作,使得直接调查大范围区域(如省或流域)不可行。为了将实地结果扩展到如此广阔的区域,我们提出了一种新颖的空间预测方法,将局部侵蚀分布视为Wasserstein空间中的对象。这些分布被映射为平方可积轨迹,并通过基展开表示,形成捕捉空间依赖性的多元随机场。通过在这种表示中应用局部回归和克里金法,我们的方法灵活地建模和预测任意位置的侵蚀分布。该框架改进了对分布泛函(如均值和超越概率)的预测。模拟研究表明,所提出的方法优于错误指定的参数替代方法和现有的Fréchet回归方法。我们通过中国陕西省的详细侵蚀分析说明了该方法,其中将来自调查流域的局部测量结果扩展到使用土地利用和海拔等协变量预测整个省的侵蚀分布。

英文摘要

Obtaining precise erosion measurements requires costly fieldwork, making it infeasible to directly survey large domains such as a province or river basin. To extend fieldwork results across such extensive domains, we propose a novel spatial prediction method that treats local erosion distributions as objects in the Wasserstein space. These distributions are mapped into square-integrable trajectories and represented via basis expansion, forming a multivariate random field that captures spatial dependence. By applying local regression and Kriging in this representation, our approach flexibly models and predicts erosion distributions at arbitrary locations. This framework improves prediction for functionals of the distribution, such as the mean and exceedance probabilities. Simulation studies demonstrate that the proposed method outperforms a misspecified parametric alternative and existing Fréchet regression approaches. We illustrate the approach with a detailed erosion analysis in Shaanxi province, China, where local measurements from surveyed watersheds are extended to predict erosion distributions across the entire province using covariates such as land use and elevation.

2606.10123 2026-06-10 stat.ME 新提交

Methods for adjusting for covariate measurement error in flexible modelling of functional form: results of a blinded, controlled neutral comparison simulation study

在函数形式的灵活建模中调整协变量测量误差的方法:一项盲法、受控中性比较模拟研究的结果

Mohammed Sedki, Aris Perperoglou, Anne C. M. Thiébaut, Steve Ferreira Guerra, Paul Gustafson, Frank E. Harrell, Willi Sauerbrei, Michal Abrahamowicz, Laurence S. Freedman

AI总结 通过盲法多阶段中性比较模拟研究,评估了六类测量误差校正方法与四种灵活回归模型结合在非线性关联估计中的表现,发现点态SIMEX最准确稳健,贝叶斯方法和回归校准次之,多重插补较差,B样条最差。

详情
AI中文摘要

协变量测量误差在流行病学研究中普遍存在,并扭曲估计的暴露-结果关联,然而校正方法几乎仅在线性建模假设下研究。当潜在关联是非线性且本身通过灵活回归估计时,这些方法的行为仍不清楚。我们报告了一项在STRATOS倡议内进行的盲法、多阶段中性比较模拟研究,评估了测量误差校正与函数形式灵活建模的结合。六类校正方法(点态和基于系数的模拟外推[SIMEX]、对数尺度和风险尺度的贝叶斯推断、多重插补[MI]和回归校准[RC])分别与B样条(BS)、惩罚样条(PS)、分数多项式(FP)和自然样条(NS)结合,产生了23种分析方法。这些方法应用于在五种函数形式(J形、线性、两种阈值模型和饱和模型)下生成的病例对照数据,跨越不同样本量、重复子研究规模、误差幅度和误差分布的数据集,采用经典加性误差和用于误差校准的重复子研究。性能通过暴露分布中心95%范围内估计函数的对数均方误差进行评估。点态SIMEX总体最准确且最稳健,其次是贝叶斯方法和与PS、FP或NS配对的RC;MI表现较差,而使用无惩罚BS的贝叶斯估计表现最差。PS、FP和NS几乎等效,而BS始终较差。没有单一方法在所有场景中占主导地位,强调了敏感性分析的价值。

英文摘要

Covariate measurement error is pervasive in epidemiological research and distorts estimated exposure-outcome associations, yet correction methods have been studied almost exclusively under linear modelling assumptions. Their behaviour when the underlying association is non-linear and is itself estimated with flexible regression, remains poorly characterised. We report a blinded, multi-stage neutral comparison simulation study, conducted within the STRATOS initiative, evaluating measurement error correction coupled with flexible modelling of functional form. Six families of correction methods (pointwise and coefficient-based Simulation Extrapolation [SIMEX], Bayesian inference on the logit and risk scales, Multiple Imputation [MI], and Regression Calibration [RC]) were each combined with B-splines (BS), penalised splines (PS), fractional polynomials (FP), and natural splines (NS), yielding 23 analytic methods. Methods were applied to case-control data generated under five functional forms (J-shape, linear, two threshold models, and saturation) across simulated datasets spanning varying sample sizes, replication substudy sizes, error magnitudes, and error distributions, with classical additive error and a replication substudy for error calibration. Performance was assessed by the log mean squared error of the estimated function over the central 95 % of the exposure distribution. Pointwise SIMEX was the most accurate and most robust approach overall, followed by Bayesian methods and RC when paired with PS, FP, or NS; MI performed less well, and Bayesian estimation with unpenalised BS performed worst. PS, FP, and NS were near-equivalent, whereas BS was consistently inferior. No single method dominated across all scenarios, underscoring the value of sensitivity analyses.

2606.10096 2026-06-10 stat.ME 新提交

Estimating the Wasserstein barycenter of one-dimensional distributions under sparse sampling

稀疏采样下一维分布的Wasserstein重心估计

James Peng, Florian Stijven, Linbo Wang, Peter B. Gilbert

AI总结 针对每个单元仅通过少量独立同分布样本观测到一维分布的数据,提出边际构造重心(MCB)估计量,通过二项混合方法估计潜在分位数分布,克服稀疏采样下经验Wasserstein重心的偏差,并证明其一致性和渐近正态性。

详情
AI中文摘要

我们研究稀疏采样下的分布数据,其中每个单元由实直线上的概率分布表示,仅通过少量独立同分布样本观测。一维分布数据的一个自然中心趋势概念是Wasserstein重心,其分位数函数是单元级分位数函数的逐点平均。我们关注Wasserstein重心分位数函数的逐点估计:在给定分位数水平下,目标是相应单元级分位数的总体均值。一个朴素的插件估计量是经验Wasserstein重心,它将观测到的单元级经验分布视为真实的潜在单元级分布。然而,在稀疏采样下,该估计量可能存在严重偏差。我们提出了一种避免直接估计单元级分布或分布总体分布的方法。我们从更宏大的目标开始:刻画给定分位数水平下潜在单元级分位数的分布。我们证明该分布可以用单元级CDF值的边际分布表示,而后者可以通过二项混合方法估计。这激发了我们的估计量——边际构造重心(MCB)估计量,通过取估计的潜在单元级分位数分布的均值得到。我们建立了MCB估计量逐点一致且渐近正态的条件,并通过模拟表明,在稀疏采样下它能够显著优于经验Wasserstein重心。我们在HVTN 502/503疫苗效力试验的HIV-1序列数据分析中说明了该方法,当每个参与者只有少量序列可用时,使用重心来总结和比较参与者内部病毒序列特征的分布。

英文摘要

We study distributional data under sparse sampling where each unit is represented by a probability distribution on the real line observed only through a small i.i.d.~sample. A natural notion of central tendency for one-dimensional distributional data is the Wasserstein barycenter, whose quantile function is the pointwise average of the unit-level quantile functions. We focus on pointwise estimation of the Wasserstein barycenter quantile function: at a given quantile level, the target is the population mean of the corresponding unit-level quantiles. A naive plug-in estimator is the empirical Wasserstein barycenter, which treats observed unit-level empirical distributions as the true latent unit-level distributions. Under sparse sampling, however, this estimator can be severely biased. We propose an approach that avoids directly estimating either the unit-level distributions or the full population law of distributions. We start with the more ambitious goal of characterizing the distribution of latent unit-level quantiles at a given quantile level. We show that this distribution can be written in terms of the marginal distributions of the unit-level CDF values, which can be estimated using binomial mixture methods. This motivates our estimator, the marginal-constructed barycenter (MCB) estimator, obtained by taking the mean of the estimated distribution of latent unit-level quantiles. We establish conditions under which the MCB estimator is pointwise consistent and asymptotically normal, and show through simulations that it can substantially outperform the empirical Wasserstein barycenter under sparse sampling. We illustrate the method in an analysis of HIV-1 sequence data from the HVTN 502/503 vaccine efficacy trials, using the barycenter to summarize and compare within-participant distributions of viral sequence features when only a small number of sequences are available per participant.

2606.10093 2026-06-10 stat.AP stat.ME 新提交

Predicting Hospitalization from a Whole-Person Health Score with Incomplete Electronic Health Records Data: A Case Study

从不完整的电子健康记录数据中的全人健康评分预测住院:一项案例研究

Grayson E. Weavil, Joseph Rigdon, Sarah C. Lotspeich

AI总结 本研究利用统计建模和机器学习,从不完整的电子健康记录中计算全因负荷指数(ALI),并评估其预测住院的能力,发现模式子模型方法在样本内表现最佳(AUC=0.73),但交叉验证效果较差(AUC=0.63)。

Comments 13 pages, 5 figures, 2 tables, R code and simulated dataset available on GitHub

详情
AI中文摘要

将标准化的全人健康测量嵌入电子健康记录(EHR)可能对预防性护理至关重要。全因负荷指数(ALI)由三个身体系统的十个压力源成分计算得出,提供了整体健康的有前景的快照。ALI可以从EHR数据计算,但许多成分缺失,因为并非所有患者都接受所有测试。使用统计建模和机器学习,来自大型学术健康系统的$1000$名患者的EHR数据被用于从ALI预测住院(作为计数或二元变量),并控制年龄和性别。评估了各种方法来填补患者缺失的ALI成分的信息空白,包括结合成分或单独使用它们的汇总度量。性能通过受试者工作特征(ROC)曲线和相应的ROC曲线下面积(AUC)来衡量。住院的计数建模并未优于二元建模,逻辑回归优于随机森林。总体而言,汇总度量表现相似,其中完整病例比例(即“不健康”的非缺失成分比例)表现最佳(AUC $= 0.64$),但差异$\leq 0.01$。当单独使用成分时,模式子模型方法在样本中最准确地预测了住院(AUC $= 0.73$),但交叉验证效果不佳(AUC $= 0.63$)。所有汇总度量表现相似。然而,当单独包含ALI成分时,为具有相同缺失数据模式的患者子集定制模型表现最佳。下一步包括实施EHR以实现预测并支持临床决策者大规模决策。

英文摘要

Embedding a standardized whole-person health measure in electronic health records (EHR) could be instrumental to preventative care. The allostatic load index (ALI), calculated from ten component stressors across three body systems, offers a promising snapshot of holistic health. The ALI can be calculated from EHR data, but many components are missing, since not all patients undergo all tests. Using statistical modeling and machine learning, EHR data for $1000$ patients from a large academic health system were used to predict in-patient hospitalization (as a count or binary) from ALI, controlling for age and sex. Various methods were evaluated to fill in information gaps for patients' missing ALI components, including summary measures combining components or using them separately. Performance was measured using receiver operating characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). Count modeling of hospitalization did not improve upon binary, and logistic regression beat random forest. Overall, summary measures performed similarly, with the complete-case proportion (i.e., the proportion of non-missing components that were "unhealthy") performing best (AUC $= 0.64$) but by $\leq 0.01$. When using components separately, the pattern submodel approach most accurately predicted hospitalization (AUC $= 0.73$) in sample, but did not cross-validate as well (AUC $= 0.63$). All summary measures performed similarly. However, when including the ALI components separately, tailoring models to subsets of patients with the same missing data pattern performed best. Next steps include EHR implementation to enable prediction and support clinician decision-making at scale.

2606.10805 2026-06-10 q-fin.PM 新提交

Asymmetric Nonlinear Return Extrapolation and Optimal Portfolio Choice under Stochastic Volatility

随机波动率下的非对称非线性回报外推与最优投资组合选择

Dong Yan, Wenrui Ye, Zhiyue Zong, Wenting Chen

AI总结 将回报外推扩展至非对称非线性信念更新,求解Heston随机波动率下CRRA投资者的最优投资组合,发现饱和效应作为内生修正机制降低福利损失。

详情
AI中文摘要

我们将Atmaz (2022)的回报外推框架扩展,纳入线性基准中缺失的两个行为现实特征:信念更新的饱和性以及收益与损失之间的非对称性。我们引入一个平滑、非线性、非对称的外推函数,并将Heston (1993)随机波动率下CRRA投资者的最优投资组合刻画为情绪扭曲的投机需求、方差对冲需求和情绪对冲需求之和。由此产生的半线性Hamilton-Jacobi-Bellman方程通过两种独立数值方法求解:带时间步策略迭代的有限差分ADI格式和深度学习驱动的迭代格式。该模型产生了四个投资者层面的行为异常:对收益和损失的非对称反应、极端情况下的反应减弱、过度交易量以及随外推强度增加的福利损失,每个异常都与已记录的经验模式相对应。其核心发现是饱和效应作为一种内生修正机制:在原点处相同局部斜率下,非对称非线性外推者比线性外推者承受更小的福利损失。

英文摘要

We extend the return extrapolation framework of Atmaz (2022) to incorporate two behaviorally realistic features absent from the linear benchmark: saturation in belief updating and asymmetry between gains and losses. We introduce a smooth, nonlinear, asymmetric extrapolation function and characterize the optimal portfolio of a CRRA investor under Heston (1993) stochastic volatility as the sum of a sentiment-distorted myopic demand, a variance hedging demand, and a sentiment hedging demand. The resulting semilinear Hamilton-Jacobi-Bellman equation is solved by two independent numerical methods, a finite-difference ADI scheme with time-step policy iteration and a deep learning-driven iterative scheme. The model generates four investor-level behavioral anomalies: asymmetric responses to gains and losses, attenuated reactions at extremes, excess trading volume, and welfare loss rising with the strength of extrapolation, each of which maps onto documented empirical patterns. Its central finding is that saturation acts as an endogenous correction mechanism: at the same local slope at the origin, the asymmetric nonlinear extrapolator carries a smaller welfare loss than a linear one.

2606.10664 2026-06-10 econ.GN q-fin.EC 新提交

Commitment and the dynamics of household labor supply: new tests and evidence from Europe

承诺与家庭劳动力供给的动态:来自欧洲的新检验与证据

Pierre-Andre Chiappori, Alexandros Theloudis, Jorge Velilla, Jose Ignacio Gimenez-Nadal, Jose Alberto Molina

AI总结 利用生命周期集体模型,基于工资冲击对家庭劳动力供给的动态影响,提出区分完全、有限和无承诺的新检验,并在15个欧洲国家实施,发现有限承诺普遍成立。

详情
AI中文摘要

配偶对未来行为的承诺能力对资源在夫妻间及跨时期的分配具有重要意义。利用家庭行为的生命周期集体模型,我们基于工资冲击对家庭劳动力供给的动态影响,提出了区分完全承诺、有限承诺和无承诺的新检验。我们方法的一个新颖之处在于,除了其他两种类型外,它还能正式拒绝有限承诺,利用理论上的符号限制。我们使用2005-2019年欧盟收入与生活条件统计(EU-SILC)的数据,在15个欧洲国家实施了这些检验。我们发现,帕累托权重对有利的过去工资的弹性通常为正,这与有限承诺下的讨价还价一致。因此,过去的工资冲击会对劳动力供给产生讨价还价效应,增强受薪配偶的权力,削弱伴侣的权力。形式上,我们在除4个国家外的所有国家拒绝了完全承诺和无承诺,但未能拒绝有限承诺。

英文摘要

The ability of spouses to commit to future behavior has important implications for the allocation of resources between them and over time. Using a lifecycle collective model for household behavior, we propose new tests that distinguish between full, limited, and no commitment, based on the dynamic impact of wage shocks on household labor supply. A novelty of our approach is its ability to formally reject limited commitment, in addition to the other two types, exploiting sign restrictions from theory. We implement our tests across 15 European countries, drawing data from the EU-SILC over the years 2005-2019. We find that the elasticity of the Pareto weight with respect to favorable past wages is generally positive, consistent with bargaining under limited commitment. Past wage shocks thus induce bargaining effects on labor supply, empowering the recipient spouse and weakening the partner. Formally, we reject full and no commitment in all but 4 countries, but fail to reject limited commitment.

2606.10245 2026-06-10 q-fin.CP 新提交

A Fast Implied Volatility Method with Expansions

一种带展开的快速隐含波动率方法

Alper Hekimoglu, Ismail Hakki Gokgoz

AI总结 提出一种基于Black-Scholes价格渐近结构的解析种子与Householder迭代的隐含波动率求解器,达到机器精度且平均迭代少于两次,速度比现有最优方法快1.73-1.85倍。

详情
AI中文摘要

我们提出一种区域分裂的Black-Scholes隐含波动率求解器,其中每个初始种子都是完全闭式解析表达式,源自Black-Scholes价格在其自然域中的渐近结构。在平价处,精确高斯恒等式的级数反演产生一个四阶种子,误差为$\mathcal{O}(s^8)$。在适度价外区域,逐次增加阶数的高斯CDF近似产生显式初始种子公式,其精度通过数值证明,在种子阶段无需迭代或数值反演。在深度价外区域,高斯尾部抵消恒等式——Mills比率——揭示了Black-Scholes价格的渐近结构,并激发了一个比率校正种子,对于大货币性实现了接近机器精度的初始化。所有区域边界均从CDF截断容差和数值求解器理论误差界限解析推导,无需经验调谐常数。然后,一个通用的四阶Householder抛光器将所有区域驱动至机器精度,在标准和细粒度基准网格上的平均更新迭代次数严格低于两次——达到并超越了文献中最高精度参考实现(Jäckel, 2015)所设定的两次迭代目标。在相同硬件和编译器条件下,所得C实现比最先进基准(Jäckel, 2015)实现了1.73-1.85倍的吞吐量增益,最大绝对误差为$\mathcal{O}(10^{-14})$,在不同网格配置下稳定。Python/Numba实现验证了可移植性。所有源代码公开可用。

英文摘要

We present a regime-split Black--Scholes implied volatility solver in which every initial seed is a fully closed-form analytical expression, derived from the asymptotic structure of the Black--Scholes price in its natural domain. At the money, series reversion of an exact Gaussian identity yields a fourth-order seed with error $\mathcal{O}(s^8)$. In the moderate out-of-the-money region, successive Gaussian CDF approximations of increasing order produce explicit initial seed formulas whose accuracy is proved numerically, with no iteration or numerical inversion at the seed stage. In the deep out-of-the-money region, a Gaussian tail cancellation identity -- the Mills ratio -- reveals the asymptotic structure of the Black--Scholes price and motivates a ratio-corrected seed that achieves near-machine-precision initialisation for large moneyness. All regime boundaries are derived analytically from CDF truncation tolerances and numerical solver theoretical error bounds, with no empirically tuned constants. A universal fourth-order Householder polisher then drives all regimes to machine precision, with mean update iterations strictly below two on both standard and granular benchmark grids -- meeting and surpassing the two-iteration target established by the highest-accuracy reference implementation in the literature (Jäckel, 2015). The resulting C implementation achieves a $1.73$--$1.85\times$ throughput gain over the state-of-the-art benchmark (Jäckel, 2015) under identical hardware and compiler conditions, with maximum absolute error $\mathcal{O}(10^{-14})$, stable across grid configurations. A Python/Numba implementation confirms portability. All source code is publicly available.

2606.10191 2026-06-10 q-fin.MF 新提交

On regularity of finite-maturity American put options in the Heston model

Heston模型中有限期美式看跌期权的正则性

Khai Nguyen, Huy Chau

AI总结 本文利用PDE技术,证明了Heston模型中有限期美式看跌期权价值函数在行权区域具有C^{1,2}正则性,并建立了光滑拟合原则。

详情
AI中文摘要

本文研究了Heston模型中有限期美式价值函数的正则性。尽管Heston算子在波动率为零时退化,但我们能够利用PDE技术建立美式价值函数在行权区域的C^{1,2}正则性以及光滑拟合原则。

英文摘要

This paper studies the regularity of finite-maturity American value functions in the Heston model. Although the Heston operator is degenerate when the volatility is zero, we are able to establish C^{1,2} regularity of the American value functions in the exercise domain and the smooth-fit principle, using PDE techniques.

2606.10070 2026-06-10 econ.GN q-fin.EC 新提交

Introduction to gravity model for beginners

初学者引力模型导论

Luigi Capoani

AI总结 本文以教学方式介绍引力模型,从经典物理学到国际经济学的概念转换,强调经济质量(GDP)吸引贸易流而地理距离产生空间阻力,并梳理文献演变。

详情
AI中文摘要

本文对引力模型及其从经典物理学到国际经济学的概念转换进行了教学性和初学者友好的综述。在简要介绍之后,首先建立了牛顿万有引力定律与经济引力方程之间的结构和数学平行关系,展示了经济质量(GDP)如何吸引贸易流,而地理距离则作为空间阻力的来源。然后,本文考察了将刚性自然法则应用于集体人类行为所需的确定性哲学框架。最后,追溯了文献的时间演变,强调了早期人口学家和Walter Isard发展的基于物理学的方法与后来出现的基于效用的计量经济学适应之间的历史分歧。本文最终表明,尽管全球化,空间摩擦仍然是塑造国际贸易和地缘政治互动的重要且可测量的力量。

英文摘要

This paper provides a didactic and beginner friendly review of the gravity model and its conceptual translation from classical physics into international economics. After a brief introduction, it begins by establishing the structural and mathematical parallels between Newton's law of universal gravitation and the economic gravity equation, demonstrating how economic mass (GDP) attracts trade flows while geographic distance acts as a source of spatial resistance. The paper then examines the deterministic philosophical framework required to apply rigid natural laws to collective human behavior. Finally, it traces the chronological evolution of the literature, highlighting the historical divergence between the physics rooted approach developed by early demographers and Walter Isard and the utility based econometric adaptations that later emerged. The paper ultimately shows that, despite globalization, spatial friction remains a significant and measurable force shaping international trade and geopolitical interactions.

2606.09918 2026-06-10 econ.GN q-fin.EC 新提交

An economic geography dataset of U.S. skill specialization, relatedness, and complexity

美国技能专业化、关联性和复杂性的经济地理数据集

Anthony Howell, Maryann Feldman, Lauren Lanahan, Nikhil Kalathil, Evan Johnson

AI总结 基于2010-2024年4.336亿条职位发布,构建了覆盖3194个县的技能专业化、关联性、多样性和复杂性等经济地理变量,并分解至雇主实体类型。

详情
AI中文摘要

我们发布了一个新的美国技能专业化、关联性和复杂性的数据集,该数据集源自2010年至2024年间的4.336亿条职位发布。该面板数据覆盖了15年间的3194个县,并报告了201个变量,这些变量描述了职位发布的数量(例如,劳动力需求)、工作的形式与性质(例如,远程工作比例、实习比例)以及按类别划分的雇主技能需求结构(例如,专业化、软件和通用技能)。我们开发了一套经济地理变量:基于技能的县专业化、关联性、多样性、复杂性和动态性指标。这些指标进一步按雇主实体类型(企业、大学、政府和联邦实验室)分解,并包含实体对的匹配度、重叠度和定向技能差距指标。一个配套的交互式仪表板支持学术研究和实际应用,其功能包括时空可视化、县排名与趋势、成对县比较以及单个县概况。

英文摘要

We release a new dataset of U.S. skill specialization, relatedness, and complexity, derived from 433.6 million job postings between 2010 and 2024. The panel covers 3,194 counties across 15 years and reports 201 variables that describe the volume of job postings (e.g., labor demand), the modality and nature of work (e.g., remote share, internship share), and the structure of employer skill demand by category (e.g., specialized, software, and common). We develop a suite of economic geography variables: skill-based measures of county specialization, relatedness, diversity, complexity, and dynamics. These measures are further decomposed by employer entity type (corporate, university, government, and federal lab), along with entity-pair measures of alignment, overlap, and directional skill gaps. An accompanying interactive dashboard supports both academic research and applied use, with features including spatiotemporal visualization, county rankings and trends, pairwise county comparisons, and individual county profiles.

2606.09906 2026-06-10 stat.ME q-bio.PE 新提交

An information-geometric framework for mapping maximum potential biodiversity

一种用于映射最大潜在生物多样性的信息几何框架

Shinto Eguchi

AI总结 提出信息几何框架,通过约束变分原理定义潜在组成和多样性差距,统一处理Hill型多样性和Rao二次熵,为生态保护提供基准比较。

Comments 22 pages, 1 figure

详情
AI中文摘要

生物多样性度量通常被描述性地使用:从观测或估计的群落组成计算多样性指数,并将结果值映射到空间上。然而,保护规划还需要一个特定地点的基准,以便将观测到的群落与之进行比较。本章为这种“潜在多样性”和相关的“多样性差距”开发了一个信息几何框架。核心对象是物种单纯形上的一对概率向量:观测或实现的组成\(p^{\mathrm{obs}}\),以及通过约束变分原理获得的潜在组成\(p^{\mathrm{pot}}\)。然后通过比较这两个组成处的多样性泛函来定义差距。该框架针对Hill型多样性(衡量丰度和均匀度)和Rao二次熵(包含物种间的性状、系统发育或生态差异)进行了开发。空间点过程解释阐明了如何在进入单纯形之前定义局部生态容量。然后,护航约束、容量约束和散度投影提供了一种统一的方法来定义超出均匀分布的非平凡基准。得到的公式区分了两个不同的问题:一个群落有多多样化,以及它离局部允许的潜在基准有多远。它还将暗多样性的生态概念与概率单纯形上的连续、丰度加权比较联系起来。我们还概述了一个动态扩展,其中容量、物种迁移和气候驱动的变化随时间变化。使用大规模公民科学生物多样性数据和性状数据库的实证实施留待未来工作。

英文摘要

Biodiversity measures are often used descriptively: one computes a diversity index from an observed or estimated community composition and maps the resulting values across space. Conservation planning, however, also requires a site-specific benchmark against which the observed community can be compared. This chapter develops an information-geometric framework for such \emph{potential diversity} and the associated \emph{diversity gap}. The central object is a pair of probability vectors on the species simplex: an observed or realized composition \(p^{\mathrm{obs}}\), and a potential composition \(p^{\mathrm{pot}}\) obtained by a constrained variational principle. The gap is then defined by comparing a diversity functional at these two compositions. The framework is developed for both Hill-type diversity, which measures abundance and evenness, and Rao's quadratic entropy, which incorporates trait, phylogenetic, or ecological dissimilarities among species. A spatial point-process interpretation clarifies how local ecological capacities can be defined before passing to the simplex. Escort constraints, capacity constraints, and divergence projections then provide a unified way to define nontrivial benchmarks beyond the uniform distribution. The resulting formulation separates two distinct questions: how diverse a community is, and how far it is from a locally admissible potential benchmark. It also connects the ecological idea of dark diversity with a continuous, abundance-weighted comparison on the probability simplex. We also outline a dynamic extension in which capacities, species migration, and climate-driven shifts vary over time. Empirical implementation with large-scale citizen-science biodiversity data and trait databases is left for future work.

2606.10891 2026-06-10 q-bio.NC 新提交

Bilinear gating of motor primitives: a principle linking dendritic computation to rapid goal-directed adaptation

运动基元的双线性门控:连接树突计算与快速目标导向适应的原理

Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi

AI总结 研究发现猕猴运动皮层神经元的爆发比例编码目标信息,提出双线性门控机制解释其来源,并展示该机制支持零样本泛化和快速适应。

详情
AI中文摘要

运动需要运动皮层同时指定产生\emph{什么}动作以及该动作服务于\emph{哪个目标},但单个神经元如何分离这些因素尚不清楚。这里我们展示,在猕猴运动皮层中,神经元的\emph{爆发比例}(其尖峰中高频爆发的比例)编码到达方向的选择性远高于其总体放电率。这种分离高度一致:在跨越三只动物和两个实验室的12个记录会话中均成立(所有$p<10^{-12}$),并且通过控制去除放电率的任何贡献后仍然存在,表明目标信息特别集中在爆发中。然后我们展示,这种编码特征是第5层锥体神经元中树突符合检测的预测结果:当与目标相关的顶端输入与状态相关的基底部驱动同时发生时,神经元爆发,因此爆发概率计算目标与状态的乘积,即双线性门控$G(g)\,Y(s)$。一个最小两室尖峰模型重现了该效应,并且相同的乘法门控嵌入强化学习智能体后,支持对新目标的零样本泛化和快速在线适应,为将目标信息分离到爆发中提供了计算理由。这些结果确定了爆发比例作为运动皮层中的目标选择性编码,将其与具体的细胞机制联系起来,并表明该机制带来了学习优势。

英文摘要

Movement requires the motor cortex to specify both \emph{what} action to produce and \emph{which goal} it serves, yet how individual neurons separate these factors is not understood. Here we show that in macaque motor cortex the \emph{burst fraction} of a neuron, the proportion of its spikes emitted in high-frequency bursts, encodes reach direction far more selectively than its overall firing rate. This dissociation is highly consistent: it holds in every one of 12 recording sessions spanning three animals and two laboratories (all $p<10^{-12}$) and survives controls that remove any contribution of firing rate, showing that goal information is concentrated specifically in bursts. We then show that this coding signature is the predicted consequence of dendritic coincidence detection in layer-5 pyramidal neurons: when a goal-related apical input coincides with a state-related basal drive the neuron bursts, so burst probability computes the product of goal and state, a bilinear gate $G(g)\,Y(s)$. A minimal two-compartment spiking model reproduces the effect, and the same multiplicative gate, embedded in a reinforcement-learning agent, supports zero-shot generalisation to new goals and rapid online adaptation, providing a computational rationale for segregating goal information into bursts. These results identify burst fraction as a goal-selective code in motor cortex, tie it to a concrete cellular mechanism, and show that the mechanism confers a learning advantage.

2606.10879 2026-06-10 q-bio.OT 新提交

From the microscope to High Performance Computing centers, a national effort toward automated data workflows for microscopy facility users in France

从显微镜到高性能计算中心:法国显微镜设施用户自动化数据工作流的国家努力

Guillaume Gay, Théo Barnouin, Marc Mongy, Guillaume Maucort, Perrine Paul-Gilloteaux, Emmanuel Faure

AI总结 针对生物显微镜设施数据管理碎片化问题,法国国家生物成像基础设施开发了基于OMERO、iRODS等开源技术的BioImage Cloud平台,实现从采集到归档的完整数据生命周期自动化,并支持与HPC中心及公共数据库集成。

Comments 25 pages, 3 figures

详情
AI中文摘要

现代生物显微镜常规生成大型且复杂的图像数据集,包括多维、多模态和时间分辨采集。虽然成像技术迅速发展,但显微镜设施内的数据管理基础设施通常仍然分散,依赖于异构的本地解决方案,这些方案难以维护、扩展,并与高性能计算中心和公共数据存储库集成。为了解决这些问题,法国国家生物成像基础设施(France BioImaging, FBI)开发了此http URL及相关的BioImage Cloud平台。该倡议旨在通过可互操作和可扩展的此http URL架构,提供一个协调的国家基础设施,连接显微镜设施、集中存储资源、HPC环境和公共生物成像档案。所提出的架构结合了开源技术,包括用于图像管理的OMERO、用于分布式数据编排的iRODS、用于联合认证的Authentik,以及新兴标准如OME-Zarr和REMBI元数据建议。该基础设施旨在支持完整的成像数据生命周期,从采集和传输到可视化、分析、共享和长期归档。除了技术实现,本文还介绍了在分布式成像设施中部署共享国家基础设施所需的组织和治理策略。我们讨论了与互操作性、元数据标准化、可持续性和用户采纳相关的挑战,以及成像数据与大规模计算资源更紧密集成为未来AI驱动的生物图像分析工作流所开辟的前景。

英文摘要

Modern biological microscopy routinely generates large and complex image datasets, including multidimensional, multimodal, and time-resolved acquisitions. While imaging technologies have rapidly evolved, data management infrastructures within microscopy facilities often remain fragmented, relying on heterogeneous local solutions that are difficult to maintain, scale, and integrate with High-Performance Computing (HPC) centers and public data repositories. To address these issues, France BioImaging (FBI), the French national infrastructure for biological imaging, has developed FBI.DATA and the associated BioImage Cloud platform. This initiative aims to provide a coordinated national infrastructure connecting microscopy facilities, centralized storage resources, HPC environments, and public bioimaging archives through interoperable and scalable workflows.The proposed architecture combines open-source technologies including OMERO for image management, iRODS for distributed data orchestration, Authentik for federated authentication, and emerging standards such as OME-Zarr and REMBI metadata recommendations. The infrastructure is designed to support the complete imaging data lifecycle, from acquisition and transfer to visualization, analysis, sharing, and long-term archiving. Beyond the technical implementation, this work presents the organizational and governance strategies required to deploy a shared national infrastructure across distributed imaging facilities. We discuss the challenges associated with interoperability, metadata standardization, sustainability, and user adoption, as well as the perspectives opened by tighter integration between imaging data and large-scale computing resources for future AI-driven bioimage analysis workflows.

2606.10873 2026-06-10 q-bio.QM 新提交

Spatial Model Selection and Uncertainty Quantification: Comparing Continuous and Discrete Wound Healing Models

空间模型选择与不确定性量化:连续与离散伤口愈合模型的比较

John T. Nardini, Jana L. Gevertz

AI总结 针对空间过程建模中偏微分方程与基于智能体模型的选择问题,提出基于近似贝叶斯计算的模型选择流程,发现平均场PDE在计算速度和模型选择上优于ABM,并应用于伤口愈合数据。

详情
AI中文摘要

所有数据驱动的建模任务(例如参数估计、不确定性量化和数据预测)都需要选择一个数学模型。模型选择中一个被忽视的方面是模态;例如,对于空间过程,何时使用偏微分方程(PDE)模型或基于智能体模型(ABM)尚无指导原则。为解决这一问题,我们创建了一个模型选择流程,该流程使用近似贝叶斯计算进行参数估计、不确定性量化和模型选择(同时使用信息准则和样本外预测)。将该流程应用于人工数据集(由ABM生成)表明,虽然两种模态的参数估计性能相当,但ABM估计的不确定性更高,而PDE模型的计算速度快1000倍以上。令人惊讶的是,使用信息准则和数据预测,平均场PDE通常被选为优于真实生成ABM模型。将该流程应用于公共伤口愈合数据表明,具有细胞牵引和时间延迟的PDE模型是该数据最合适的模型,然而该模型具有较高的参数不确定性。该方法为选择空间生物数据的适当建模模态建立了一个初步框架。

英文摘要

All data-driven modeling tasks (e.g., parameter estimation, uncertainty quantification, and data forecasting) require the selection of a mathematical model. An overlooked aspect of model selection is modality; for example, there are no guidelines on when to use a partial differential equation (PDE) model or an agent-based model (ABM) for spatial processes. To address this, we created a model selection pipeline that uses approximate Bayesian computations to perform parameter estimation, uncertainty quantification, and model selection (using both information criteria and out-of-sample forecasting). Applying the pipeline to artificial datasets (generated from ABMs) reveals that while both modalities yield comparable parameter estimation performance, the ABM estimates exhibit higher uncertainty, and the PDE models compute more than 1,000$\times$ faster. Surprisingly, the mean-field PDE is often selected over the true generative ABM model using both information criteria and data forecasting. Applying the pipeline to public wound healing data indicates that a PDE model with cell pulling and a time delay is the most appropriate model for this data, however, this model has high levels of parametric uncertainty. This methodology establishes a preliminary framework for selecting the appropriate modeling modality for spatial biological data.