arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

2026-06-19 至 2026-06-19 收录 52 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP

1. 医学影像 14 篇

2606.20449 2026-06-19 cs.CV 新提交 85%

InfantFace: Detecting infant faces in neonatal clinical environments

InfantFace:新生儿临床环境中的婴儿面部检测

Abdullah Bin-Obaid, Maria M. Cobo, Rebeccah Slater, Lionel Tarassenko, Mauricio Villarroel

发表机构 * Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford(牛津大学生物医学工程研究所、工程科学系) Department of Paediatrics, University of Oxford(牛津大学儿科系) Universidad San Francisco de Quito USFQ, Colegio de Ciencias Biologicas y Ambientales(奎托大学圣弗朗西斯科德奎托大学,生物科学与环境学院)

专题命中 医学影像 :应用于新生儿临床环境,辅助医疗评估

AI总结 针对新生儿临床环境中的遮挡和光照问题,提出基于YOLOv11m的单阶段面部检测模型,在多个公开数据集预训练后,通过临床数据微调,AP50从0.87提升至0.96。

Comments 32 pages, 7 figures, 4 tables; supplementary information included

详情
AI中文摘要

新生儿面部的可靠定位是基于视频摄像头的非接触式评估的第一步,例如疼痛和痛苦相关的面部表情分析、疼痛评分、心肺信号提取和呼吸停止警报。然而,新生儿临床环境中仍存在重大挑战。杂乱的背景、光照变化和不良照明条件会降低面部检测模型的准确性。临床干预、监测设备以及在某些情况下的医疗设备可能会遮挡面部,使视觉评估变得困难。我们提出了一种基于YOLOv11m的单阶段模型,专门用于新生儿临床环境中的婴儿面部检测。我们结合了多个公开数据集(VGGFace2、CelebA、FDDB、WIDER FACE)来训练和评估我们提出的模型。然后,我们在一个新生儿研究数据集上对模型进行了微调,该数据集包含来自114个记录会话的228个视频,涉及113名独立婴儿。在微调之前,我们的模型达到了0.87的AP50,超过了三个最先进的通用面部检测器的性能。在临床领域适应后,性能进一步提高到0.96的AP50。由于缺乏公开的新生儿数据集,评估不同数据集上的面部检测性能仍然是一个挑战。优先创建此类数据集,同时在其创建和使用中维护适当的隐私保护措施和伦理标准,将极大地支持该领域的进一步进展。

英文摘要

Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major challenges persist in neonatal clinical environments. Cluttered backgrounds, illumination changes and poor lighting conditions can reduce the accuracy of face detection models. Clinical interventions, monitoring equipment and, in some cases, medical devices can obstruct the face, making visual assessment difficult. We propose a one-stage YOLOv11m-based model tailored for face detection of infants in neonatal clinical environments. We combined multiple publicly available datasets (VGGFace2, CelebA, FDDB, WIDER FACE) to train and evaluate our proposed model. We then fine-tuned our model on a neonatal research dataset involving 228 videos from 114 recording sessions of 113 independent infants. Before fine-tuning, our model achieved an AP50 of 0.87, surpassing the performance of three state-of-the-art general face detectors. Performance improved further to an AP50 of 0.96 after clinical-domain adaptation. Evaluating face detection performance across different datasets remains a challenge due to the lack of publicly available neonatal datasets. Prioritising the creation of such datasets, while upholding appropriate privacy safeguards and ethical standards in their creation and use, would greatly support further progress in this field.

2606.20303 2026-06-19 cs.CV 新提交 85%

GEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AI

GEN-Guard:纠正可部署联邦手术AI的泛化失败

Julia Alekseenko, Pietro Mascagni, AI4SafeChole Consortium, Nicolas Padoy

发表机构 * University of Strasbourg, CNRS, INSERM, ICube, UMR7357(斯特拉斯堡大学,法国国家科学研究中心,法国国家健康与医学研究院,ICube实验室,UMR7357) Bioimage Analysis Center, Fondazione Policlinico Universitario Agostino Gemelli IRCCS(生物图像分析中心,阿戈斯蒂诺·杰梅利大学综合医院基金会IRCCS) Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, University of Milan(米兰IRCCS卡格兰达基金会马焦雷综合医院,米兰大学) Monaldi Hospital, AORN dei Colli(莫纳尔迪医院,AORN dei Colli)

专题命中 医学影像 :联邦手术AI泛化失败检测与纠正

AI总结 提出GEN-Guard框架,通过客户端阻塞评估检测性能泄漏,并利用分歧感知蒸馏进行特征级校正,提升联邦手术AI的跨机构泛化能力。

Journal ref Int J Comput Assist Radiol Surg. 2026 Jun 14

详情
AI中文摘要

联邦学习(FL)在手术视频AI中实现了协作模型训练,无需共享敏感数据。然而,标准评估实践——仅基于参与医院的验证数据选择“最佳”全局模型——可能导致次优的部署选择。我们将这种关键失败模式识别为性能泄漏,即所选模型过拟合内部联邦数据,无法泛化到未见机构。我们提出GEN-Guard,一个实用的后处理框架,用于检测和纠正联邦手术AI中的泛化失败。它集成了通过客户端阻塞评估(CBE)进行泛化检测,该方法在隔离的客户端分布上验证性能以防止性能泄漏,以及通过分歧感知蒸馏(DAD)进行泛化纠正,该方法学习自适应的特征级校正以实现跨机构鲁棒性。两个组件在标准FL收敛后运行,同时为零样本适应未见环境提供鲁棒支持。我们首先量化了性能泄漏的严重性,观察到在标准评估下模型选择失败(MSF)超过80%。GEN-Guard在两个多中心临床挑战上进行了评估:腹腔镜胆囊切除术中的手术阶段识别和结肠镜中的息肉分割。在两个数据集上,GEN-Guard一致地纠正了这些失败,将联邦内F1分数提高了最多2个点,未见机构性能提高了最多3个点,最差情况机构性能提高了3-9个点。性能泄漏是联邦手术AI中一个系统性且以前未被充分认识的风险。GEN-Guard为检测和纠正此类失败提供了实用解决方案。通过提高跨机构鲁棒性和零样本泛化,它增强了FL在真实世界手术部署中的可靠性。

英文摘要

Federated Learning (FL) in surgical video AI enables collaborative model training without sharing sensitive data. However, standard evaluation practices - selecting the "best" global model based only on validation data from participating hospitals - can lead to suboptimal deployment choices. We identify this critical failure mode as performance leakage, where the selected model overfits internal federation data and fails to generalize to unseen institutions. We propose GEN-Guard, a practical post-hoc framework to detect and correct generalization failures in federated surgical AI. It integrates Generalization Detection via Client-Blocked Evaluation (CBE), which validates performance on isolated client distributions to prevent performance leakage, and Generalization Correction through Disagreement-Aware Distillation (DAD), which learns adaptive feature-level corrections for cross-institutional robustness. Both components operate after standard FL convergence while providing robust support for zero-shot adaptation to unseen environments. We first quantify the severity of performance leakage, observing Model Selection Failures (MSFs) exceeding 80% under standard evaluation. GEN-Guard is evaluated on two multi-center clinical challenges: surgical phase recognition in laparoscopic cholecystectomy and polyp segmentation in colonoscopy. Across both datasets, GEN-Guard consistently corrects these failures, improving in-federation F1 scores by up to 2 points, unseen-institution performance by up to 3 points, and worst-case institutional performance by 3-9 points. Performance leakage represents a systematic and previously under-recognized risk in federated surgical AI. GEN-Guard provides a practical solution for detecting and correcting such failures. By improving cross-institutional robustness and zero-shot generalization, it strengthens the reliability of FL for real-world surgical deployment.

2606.20115 2026-06-19 cs.LG cs.CV 新提交 85%

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

当校准失败于脆弱的医院:通过风险曲线收缩实现联邦共形风险控制

Nafis Fuad Shahid

发表机构 * institutetext: Dhaka, Bangladesh(达卡,孟加拉国)

专题命中 医学影像 :联邦共形风险控制用于脑肿瘤分割。

AI总结 针对联邦部署中标准共形风险控制(CRC)对个体机构覆盖不足的问题,提出基于风险曲线收缩的联邦CRC协议,在真实脑肿瘤数据上实现2.7/20的违规率且预测集仅扩大2.0倍。

Comments 9 pages, 3 figures, 2 tables. Submitted to the DeCaF Workshop at MICCAI 2026

详情
AI中文摘要

共形风险控制(CRC)通过在保留数据上校准预测集阈值,提供分割质量的无分布保证。在联邦部署中,标准方法将各站点的校准分数合并为一个阈值。我们在真实多机构脑肿瘤数据(FeTS-2022,1251名受试者,20个机构)上首次量化表明,这种朴素的合并CRC保护了平均医院,但违反了40%个体机构的覆盖,最差站点的假阴性率超出目标7.8个百分点。朴素的替代方案——每个站点本地CRC——基本恢复了覆盖,但将预测集扩大了83倍,使其在临床上无用。我们提出一种基于收缩的联邦CRC协议:每个站点仅将其经验风险曲线(G个标量)传输到服务器,服务器为每个站点计算收缩正则化阈值。单个超参数n0平滑地权衡最坏情况覆盖与预测集效率;留一站点敏感性分析确定n0=19,在2.0倍拉伸下实现2.7/20的违规。我们进一步表明,覆盖预算的直接拉格朗日优化失败,将风险集中在脆弱的医院,并且有限样本修正项是必不可少的:移除它会使违规增加三倍。在所述站点混合假设下,边际CRC保证通过构造得以保留;在三个种子下针对四个目标验证了每个站点的覆盖。没有患者级别的图像、掩膜或每体积分数离开任何站点。

英文摘要

Conformal risk control (CRC) provides distribution-free guarantees on segmentation quality by calibrating a prediction-set threshold on held-out data. In federated deployments, the standard approach pools calibration scores across sites into a single threshold. We provide the first quantification, on real multi-institutional brain tumor data (FeTS-2022, 1,251 subjects, 20 institutions), showing that this naive pooled CRC protects the average hospital but violates coverage at 40% of individual institutions, with the worst site exceeding the target false-negative rate by 7.8 percentage points. The naive alternative, per-site local CRC, largely restores coverage but inflates prediction sets by 83x, rendering them clinically useless. We propose a shrinkage-based federated CRC protocol: each site transmits only its empirical risk curve (G scalars) to a server, which computes a shrinkage-regularized threshold per site. A single hyperparameter n0 smoothly trades worst-case coverage for prediction-set efficiency; leave-one-site-out sensitivity analysis identifies n0=19, achieving 2.7/20 violations at 2.0x stretch. We further show that direct Lagrangian optimization of coverage budgets fails, concentrating risk on vulnerable hospitals, and that the finite-sample correction term is essential: removing it triples violations. The marginal CRC guarantee is preserved by construction under the stated site-mixture assumption; per-site coverage is validated across four targets with three seeds. No patient-level images, masks, or per-volume scores leave any site.

2606.20035 2026-06-19 cs.CV cs.LG 新提交 85%

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

PU-UNet:用于医学图像分割的稳定乘法交互

Ziyuan Li, Osamah Sufyan, Uwe Jaekel, Babette Dellen

发表机构 * Department of Mathematics, Informatics and Technology, University of Applied Sciences Koblenz(科布伦茨应用科学大学数学、信息学与技术系) Technical University of Munich(慕尼黑工业大学)

专题命中 医学影像 :提出PU-UNet用于医学图像分割。

AI总结 提出PU-UNet,通过稳定乘积单元残差块在低分辨率阶段实现显式乘法特征交互,在三个医学图像分割数据集上提升Dice和IoU,降低假阳性率。

Comments Accepted to the ICANN 2026

详情
AI中文摘要

许多密集预测网络依赖于加性特征变换,并且仅隐式地建模高阶特征交互。乘积单元为乘法特征建模提供了显式机制,但其对数-指数公式可能导致数值不稳定性,这限制了它们在深度密集预测网络中的使用。在这项工作中,我们提出了乘积单元U-Net(PU-UNet),这是一种残差U-Net,它将稳定的乘积单元残差块集成到丰富的低分辨率阶段,用于医学图像分割。所提出的公式结合了平滑正性映射和对数域裁剪,实现了稳定的乘法特征学习,且计算开销可忽略不计。在ISIC 2018、Kvasir-SEG和BUSI上,PU-UNet分别达到了0.942、0.959和高达0.925的Dice分数。与匹配的残差U-Net基线相比,PU-UNet在保持参数、FLOPs和推理延迟几乎不变的情况下,持续提高了Dice和IoU,并将正常BUSI病例的图像级假阳性率从0.077降至零。消融研究表明,这些增益与乘积单元交互相关,在低分辨率放置下最强,并受益于所提出的稳定化设计。这些结果表明,稳定的乘积单元残差学习可以成为通过显式乘法交互增强U-Net风格分割网络的有效方式。

英文摘要

Many dense prediction networks rely on additive feature transformations and model higher-order feature interactions only implicitly. Product units provide an explicit mechanism for multiplicative feature modeling, but their logarithmic--exponential formulation can cause numerical instability, which has limited their use in deep dense prediction networks. In this work, we propose Product-Unit U-Net (PU-UNet), a residual U-Net that integrates stable product-unit residual blocks into rich low-resolution stages for medical image segmentation. The proposed formulation combines smooth positivity mapping with log-domain clipping, enabling stable multiplicative feature learning with negligible computational overhead. On ISIC 2018, Kvasir-SEG, and BUSI, PU-UNet achieves Dice scores of 0.942, 0.959, and up to 0.925, respectively. Compared with a matched Residual U-Net baseline, PU-UNet consistently improves Dice and IoU while keeping parameters, FLOPs, and inference latency nearly unchanged, and reduces the image-level false-positive rate on normal BUSI cases from 0.077 to zero. Ablation studies suggest that the gains are associated with product-unit interactions, are strongest under low-resolution placement, and benefit from the proposed stabilization design. These results suggest that stable product-unit residual learning can be an effective way to enhance U-Net-style segmentation networks with explicit multiplicative interactions.

2606.20027 2026-06-19 cs.CV 新提交 85%

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

QG-MIL:一种用于医学影像中领域无关多实例学习的门控Transformer聚合器

Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo

发表机构 * Department of Mathematics and Computer Science, University of Cagliari(卡利亚里大学数学与计算机科学系) Institute of AI for Health, Helmholtz Munich(亥姆霍兹慕尼黑人工智能健康研究所)

专题命中 医学影像 :提出多实例学习聚合器用于医学影像分析。

AI总结 提出QG-MIL门控Transformer聚合器,通过RMSNorm预归一化、逐头QK归一化、细粒度注意力输出门控和SwiGLU前馈模块,解决注意力集中问题,在六个基准上平均提升+6.1个宏F1分数。

详情
AI中文摘要

医学影像中基于注意力的多实例学习聚合器容易出现注意力集中,导致预测过于自信且不稳定。我们引入QG-MIL,一种门控Transformer聚合器,通过四个协同架构组件解决这一问题:基于RMSNorm的预归一化、逐头QK归一化、细粒度注意力输出门控和SwiGLU风格的前馈模块。这些设计选择共同稳定了训练,并将注意力更均匀地分布在实例上,无需辅助损失、掩码或多阶段正则化。我们在涵盖全切片病理学和细胞级血液学的六个基准上评估了QG-MIL,覆盖两种根本不同的MIL尺度。性能最佳的QG-MIL变体在所有六个基准上均优于领先的基线,平均提升+6.1个宏F1分数。注意力覆盖图和注意力质量分析证实了更分布的实例权重。消融研究表明,虽然单个组件在特定数据集上可以匹配完整模型,但与所选基线相比,QG-MIL设计提供了最一致的跨域性能和最紧凑的方差。我们发布了一个可配置的实现以支持可重复性,网址为:this https URL

英文摘要

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL

2606.19908 2026-06-19 cs.CV 新提交 85%

Gaussian Process Prior Variational Autoencoder for Endoscopic Videos

用于内窥镜视频的高斯过程先验变分自编码器

Ivan De Boi, Xinxing Shi, Xiaoyu Jiang, Tim J. M. Jaspers, Francisco Caetano, Mauricio A. Alvarez, Fons van der Sommen, Sam Van der Jeught

发表机构 * Department of Electromechanics, InViLab, University of Antwerp(安特卫普大学机电工程系InViLab实验室) Department of Computer Science, University of Manchester(曼彻斯特大学计算机科学系) Department of Electrical Engineering, Eindhoven University of Technology(埃因霍温理工大学电气工程系)

专题命中 医学影像 :针对内窥镜视频缺失帧插值与修复。

AI总结 提出高斯过程先验变分自编码器(GPVAE),通过时间高斯过程先验替代因子化先验,结合两种可扩展GP近似和镜面反射掩码,实现内窥镜视频缺失帧的插值与修复,在C3VDv2数据集上平均降低RMSE 21.9%。

详情
AI中文摘要

内窥镜视频分析对于胃肠道诊断和计算机辅助干预至关重要,但视频序列经常受到镜面反射、运动伪影和缺失帧的退化影响。这些瞬态损坏会分散临床医生的注意力,降低图像可解释性,并干扰下游任务(如3D重建和导航)。因此,有效的修复需要利用时间连续性而非孤立处理帧的方法。我们提出了一种用于内窥镜视频修复的高斯过程先验变分自编码器(GPVAE)框架,该框架用时间高斯过程先验替代标准因子化潜在先验,从而能够以不确定性感知的重建方式插值缺失帧。该框架结合了内窥镜专用编码器(包括卷积EndoVAE骨干网络和来自GastroNet-5M的预训练Vision Transformer编码器)以及两种可扩展GP近似:层次先验近似(HPA)和稀疏精度近似(SPA)。镜面反射通过基于DUCKNet的掩码流水线处理,该流水线从重建目标中排除损坏像素。在C3VDv2结肠镜数据集上,最佳GPVAE变体相对于匹配的VAE基线,图像重建RMSE平均降低21.9%,最高降低26.1%。下游轨迹RMSE在经典视觉里程计和预训练PoseNet上平均降低12.7%,而每epoch训练时间平均增加27.3%。最后,GP后验提供每帧不确定性估计,反映时间支持并为修复帧提供置信度信号。

英文摘要

Endoscopic video analysis is essential for gastrointestinal diagnosis and computer-assisted interventions, but video sequences are routinely degraded by specular reflections, motion artifacts, and missing frames. These transient corruptions can distract clinicians, reduce image interpretability, and disrupt downstream tasks such as 3D reconstruction and navigation. Effective restoration therefore requires methods that exploit temporal continuity rather than treating frames in isolation. We introduce a Gaussian Process Prior Variational Autoencoder (GPVAE) framework for endoscopic video restoration that replaces the standard factorized latent prior with a temporal Gaussian process prior, enabling interpolation of missing frames with uncertainty-aware reconstruction. The framework combines endoscopy-specific encoders, including a convolutional EndoVAE backbone and pretrained Vision Transformer encoders from GastroNet-5M, with two scalable GP approximations: Hierarchical Prior Approximation (HPA) and Sparse Precision Approximation (SPA). Specular reflections are handled using a DUCKNet-based masking pipeline that excludes corrupted pixels from the reconstruction objective. On the C3VDv2 colonoscopy dataset, the best GPVAE variants reduced image reconstruction RMSE by 21.9\% on average, and by up to 26.1\%, relative to matched VAE baselines. Downstream trajectory RMSE was reduced by 12.7\% on average across classical visual odometry and a pretrained PoseNet, at an average increase of 27.3\% in training time per epoch. Finally, the GP posterior provides per-frame uncertainty estimates that reflect temporal support and offer a confidence signal for restored frames.

2606.19889 2026-06-19 cs.CV 新提交 85%

SurgVista: Long-Horizon Surgical World Modeling with Plausible Instrument-Tissue Dynamics

SurgVista:具有合理器械-组织动力学的长程手术世界建模

Wentao Pan, Wuyang Li, Shengyuan Liu, Xinyu Liu, Hengyu Liu, Yixuan Yuan

发表机构 * The Chinese University of Hong Kong(香港中文大学) EPFL(瑞士联邦理工学院洛桑) Imperial College London(伦敦帝国学院)

专题命中 医学影像 :手术世界模型,用于机器人手术策略学习。

AI总结 提出SurgVista手术世界模型,通过变形一致性正则化和漂移适应训练,解决空间交互不连贯和时间保真度崩溃问题,在长程预测中显著优于现有方法。

详情
AI中文摘要

将机器人策略学习扩展到自主手术面临挑战,因为专家演示成本高昂且体内探索存在重大安全风险。手术世界模型通过从初始观测生成逼真的、动作条件下的未来帧来解决这一问题,但现有方法存在两种持续失效模式:空间交互不连贯,即可见器械接触未能引起空间一致的组织变形;以及时间保真度崩溃,即预测误差在自回归展开中累积并逐渐破坏视觉质量。我们提出SurgVista,一种通过两种训练策略缓解这两种失效的手术世界模型。变形一致性正则化从训练视频中提取场景点轨迹,并通过潜在对比学习强制跨帧一致性,增强物理一致的器械-组织动力学。漂移适应训练通过用在线预测残差和根据长程漂移统计校准的光度增强扰动条件帧,减轻长程漂移,在扩展展开中维持视觉保真度。为了进行严格评估,我们进一步引入SurgWorld-Bench,包含多样化的手术类型、长程展开以及用于器械运动精度和组织响应保真度的解耦指标。大量实验表明,SurgVista在视觉质量、时间一致性和交互保真度方面持续优于最先进方法,且随着预测视界增长优势扩大。

英文摘要

Scaling robot policy learning for autonomous surgery is challenging, as expert demonstrations are expensive and in vivo exploration poses substantial safety risks. Surgical world models address this by generating realistic, action-conditioned future frames from an initial observation, but existing methods exhibit two persistent failure modes: spatial interaction incoherence, where visible instrument contact fails to induce spatially consistent tissue deformation, and temporal fidelity collapse, where prediction errors compound across autoregressive rollouts and progressively corrupt visual quality. We present SurgVista, a surgical world model that mitigates both failures through two training recipes. Deformation Consistency Regularization extracts scene-point trajectories from training videos and enforces cross-frame coherence through latent contrastive learning, strengthening physically consistent instrument-tissue dynamics. Drift Adaptation Training mitigates long-horizon drift by perturbing conditioning frames with online prediction residuals and photometric augmentations calibrated to long-horizon drift statistics, sustaining visual fidelity over extended rollouts. To enable rigorous evaluation, we further introduce SurgWorld-Bench, featuring diverse procedure types, long-range rollouts, and decoupled metrics for instrument-motion accuracy and tissue-response fidelity. Extensive experiments show that SurgVista consistently outperforms state-of-the-art methods across visual quality, temporal consistency, and interaction fidelity, with gains widening as the prediction horizon grows.

2606.19867 2026-06-19 cs.CV cs.AI 新提交 85%

PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement

PSCT-Net: 通过可微反投影和注意力引导细化实现几何感知的儿科颅骨CT重建

Dong Yeong Kim, Jaewon Choi, Youmin Shin, Jungyu Lee, Myeongseop Kim, Jinwook Choi, Joo Whan Kim, Young-Gon Kim

发表机构 * Interdisciplinary Program in Bioengineering, Seoul National University(首尔大学生物工程跨学科项目) Department of Transdisciplinary Medicine, Seoul National University Hospital(首尔大学医院跨学科医学系) Department of Artificial Intelligence, Yonsei University(延世大学人工智能系) Department of Medicine, Seoul National University College of Medicine(首尔大学医学院医学系) Healthcare AI Research Institute, Seoul National University Hospital(首尔大学医院医疗人工智能研究所)

专题命中 医学影像 :儿科颅骨CT重建,低剂量替代方案。

AI总结 提出PSCT-Net,利用可微反投影建立空间先验,结合注意力引导投影和双向Mamba模块,从稀疏双平面X射线重建3D CT,缓解深度模糊并改善骨边界。

Comments 11pages, 5 figures

详情
AI中文摘要

计算机断层扫描(CT)对于诊断儿科颅面异常至关重要,但对发育中的解剖结构存在辐射风险。从稀疏双平面X射线重建3D CT提供了一种低剂量替代方案,但问题严重不适定。现有方法采用几何无关的特征提升,将2D特征天真地投影到3D中,缺乏显式空间建模,导致深度模糊和骨边界退化。我们提出PSCT-Net,一种具有可微反投影的几何感知框架。可微反投影建立了空间保真的体积先验,缓解了深度模糊。然后,注意力引导投影(AGP-3D)模块学习2D区域与3D位置之间的非线性体素级对应关系。双向Mamba(BiM-3D)模块以线性复杂度捕获长程体积依赖关系。我们进一步整理了一个私有的机构儿科颅骨CT数据集PedSkull-CT,包含正常和病理病例用于内部评估,弥补了以成人中心和躯干为主的数据集的空白。

英文摘要

Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation risks to developing anatomies. Reconstructing 3D CT from sparse bi-planar X-rays offers a low-dose alternative but is severely ill-posed. Existing methods employ geometry-agnostic feature lifting, naively projecting 2D features into 3D without explicit spatial modeling, causing depth ambiguity and degraded osseous boundaries. We present PSCT-Net, a geometry-aware framework with differentiable back-projection. Differentiable back-projection establishes a spatially faithful volumetric prior, alleviating depth ambiguity. An Attention-Guided Projection (AGP-3D) module then learns non-linear voxel-wise correspondences between 2D regions and 3D locations. A Bidirectional Mamba (BiM-3D) module captures long-range volumetric dependencies with linear complexity. We further curate a private institutional pediatric skull CT cohort, PedSkull-CT, comprising normal and pathological cases for internal evaluation, addressing the gap in adult-centric, trunk-focused datasets.

2606.19767 2026-06-19 eess.IV cs.CV physics.med-ph 新提交 85%

Contour-Constrained Deformable Registration with Parameter Characterization for Head and Neck Surgical Guidance

面向头颈外科引导的带参数表征的轮廓约束可变形配准

Qingyun Yang, Jon S. Heiselman, Ayberk Acar, Morgan J. Ringel, Michael I. Miga, Matthieu Chabanas, Michael C. Topf, Jie Ying Wu

发表机构 * Vanderbilt University(范德比尔特大学) Vanderbilt University Medical Center(范德比尔特大学医学中心)

专题命中 医学影像 :头颈外科手术引导的可变形配准

AI总结 提出一种基于正则化Kelvinlet基函数的可变形配准框架,通过表面点云、基准标记和轮廓约束校正术后组织变形,在9例头颈标本上将配准误差从刚性配准的11.11mm降至5.62mm,降幅达49.41%。

详情
AI中文摘要

全球每年新增89万例头颈部鳞状细胞癌,其复发率在实体恶性肿瘤中最高。尽管冰冻切片分析是术中切缘评估的标准方法,但由于切除标本与切除床之间的对准不精确,加上切除后黏膜组织收缩,准确地将检测到的阳性切缘重新定位到切除床上仍然具有挑战性。我们提出了一种生物力学驱动的可变形配准框架,用于校正术后组织变形以提供术中引导。该方法基于正则化Kelvinlet基函数的可变形配准方法,将3D标本网格配准到术中切除床点云。配准匹配表面点云、基准标记和边界轮廓约束,直接惩罚标本与切除床边界之间的垂直距离一致性。在来自皮肤、颊粘膜和舌部位的9个标本上,使用刚性配准的整体平均目标配准误差为$11.11 \pm 4.07$ mm,使用无轮廓约束的可变形配准则降至$8.20 \pm 2.68$ mm(降低26.19%)。所提出的轮廓约束可变形配准进一步将误差降至$5.62 \pm 2.28$ mm,相对于刚性配准降低了49.41%。我们在临床最具挑战性的舌标本中观察到最大降幅。我们还进行了系统的两阶段参数搜索,以表征表面配准、基准对应、轮廓约束和应变能正则化的相对重要性。该搜索表明,对于具有大侧向变形的组织类型,轮廓权重主导配准精度,而算法在广泛的参数组合范围内均可运行。

英文摘要

With 890,000 annual new cases globally, head and neck squamous cell carcinoma has one of the highest recurrence rates among solid malignancies. Although frozen section analysis is the standard of care for intraoperative margin assessment, accurately relocating detected positive margins on the resection bed remains challenging due to imprecise alignment between resected specimens and their resection bed, compounded by post-resection mucosal tissue shrinkage. We present a biomechanics-driven deformable registration framework that corrects post-resection tissue deformation to provide intraoperative guidance. Our approach registers 3D specimen meshes to intraoperative resection bed point clouds using a deformable registration approach based on regularized Kelvinlet basis functions. The registration matches surface point clouds, fiducial landmarks, and boundary contour constraints that directly penalize perpendicular distance-to-agreement between specimen and resection bed boundaries. Across nine specimens from skin, buccal mucosa, and tongue sites, the overall mean target registration error was $11.11 \pm 4.07$ mm using rigid registration, which decreased to $8.20 \pm 2.68$ mm (26.19\% reduction) using deformable registration without contour constraint. The proposed contour-constrained deformable registration further reduced the error to $5.62 \pm 2.28$ mm, a 49.41\% reduction relative to rigid registration. We observed the largest reduction in the most clinically challenging tongue specimens. We also performed a systematic two-stage parameter search to characterize the relative importance of surface alignment, fiducial correspondences, contour constraint, and strain energy regularization. This search revealed that contour weighting dominates registration accuracy for tissue types with large lateral deformation, while the algorithm operates over a broad range of parameter combinations.

2512.02748 2026-06-19 physics.med-ph 85%

BART Streams: Real-time Reconstruction Using a Modular Framework for Pipeline Processing

BART Streams: 用模块化框架进行管道处理的实时重建

Philip Schaten, Moritz Blumenthal, Bernhard Rapp, Christina Unterberg-Buchwald, Martin Uecker

专题命中 医学影像 :实时MRI重建,属于医学影像处理

AI总结 本文提出基于BART的模块化框架,用于实时MRI的交互式重建,通过流式处理多维数组实现高效重建,展示了在心脏实时MRI中结合迭代重建与动态线圈压缩等高级功能的成果。

Comments Submitted to Magnetic Resonance in Medicine

详情
AI中文摘要

目的:创建用于交互式实时MRI的模块化解决方案,使用BART实现的重建算法。方法:提出了一种新的多维数组流式传输协议,并将其整合到BART中。通过基于径向FLASH的心脏交互式实时MRI示例演示了新功能,结合迭代重建与动态线圈压缩和梯度延迟校正等高级功能。我们分析了重建的延迟,并测量了整个成像过程的端到端延迟。结果:使用脚本以模块化方式构建了包含迭代重建和高级功能的重建管道。延迟测量显示,BART处理和网络传输时间的延迟约为30 ms,端到端延迟包括采集、供应商处理和显示,约为200 ms。结论:通过新的流式处理能力,可以使用BART灵活地构建实时重建管道,使快速原型设计高级应用如交互式实时MRI成为可能。

英文摘要

Purpose: To create modular solutions for interactive real-time MRI using reconstruction algorithms implemented in BART. Methods: A new protocol for streaming of multidimensional arrays is presented and integrated into BART. The new functionality is demonstrated using examples for cardiac interactive real-time MRI based on radial FLASH, where iterative reconstruction is combined with advanced features such as dynamic coil compression and gradient-delay orrection. We analyze the latency of the reconstruction and measure end-to-end latency of the full imaging process. Results: Reconstruction pipelines with iterative reconstruction and advanced functionality were built in a modular way using scripting. Latency measurements demonstrate latency sufficient for interactive real-time MRI, on the order of 30 ms for BART processing and network transfer time, or 200 ms for end-to-end latency including acquisition, vendor processing, and display. Conclusion: With the new streaming capabilities, real-time reconstruction pipelines can be assembled using BART in a flexible way, enabling rapid prototyping of advanced applications such as interactive real-time MRI.

2603.01250 2026-06-19 cs.CV cs.AI 版本更新 85%

The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction

MAMA-MIA挑战:推进乳腺MRI肿瘤分割与治疗反应预测的泛化性和公平性

Lidia Garrucho, Smriti Joshi, Kaisar Kushibar, Richard Osuala, Maciej Bobowicz, Xavier Bargalló, Paulius Jaruševičius, Kai Geissler, Raphael Schäfer, Muhammad Alberb, Tony Xu, Anne Martel, Daniel Sleiman, Navchetan Awasthi, Hadeel Awwad, Joan C. Vilanova, Robert Martí, Daan Schouten, Jeong Hoon Lee, Mirabela Rusu, Eleonora Poeta, Luisa Vargas, Eliana Pastor, Maria A. Zuluaga, Jessica Kächele, Dimitrios Bounias, Alexandra Ertl, Katarzyna Gwoździewicz, Maria-Laura Cosaka, Pasant M. Abo-Elhoda, Sara W. Tantawy, Shorouq S. Sakrana, Norhan O. Shawky-Abdelfatah, Amr Muhammad Abdo-Salem, Androniki Kozana, Eugen Divjak, Gordana Ivanac, Katerina Nikiforaki, Michail E. Klontzas, Rosa García-Dosdá, Meltem Gulsun-Akpinar, Oğuz Lafcı, Carlos Martín-Isla, Oliver Díaz, Laura Igual, Karim Lekadir

发表机构 * Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona(巴塞罗那人工智能在医学实验室(BCN-AIM),巴塞罗那大学数学与计算机学院)

专题命中 医学影像 :乳腺MRI肿瘤分割与治疗反应预测

AI总结 提出MAMA-MIA挑战,通过标准化基准评估乳腺MRI肿瘤分割和病理完全缓解预测,在跨洲多中心数据上分析模型泛化性与公平性,发现性能与亚组公平性之间存在权衡。

详情
AI中文摘要

乳腺癌是全球女性中最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因之一。动态对比增强磁共振成像在肿瘤表征和治疗监测中发挥核心作用,尤其是接受新辅助化疗的患者。然而,现有的乳腺磁共振成像人工智能模型通常使用异质性数据集、研究人群和评估协议进行开发和评估,使得直接比较困难,并限制了跨机构和临床相关患者亚组的模型鲁棒性理解。MAMA-MIA挑战旨在通过提供标准化基准来解决这些问题,该基准用于联合评估原发性肿瘤分割和仅使用治疗前磁共振成像预测病理完全缓解。训练队列包括来自美国多家机构的1506名患者,而评估则在来自三个独立欧洲中心的574名患者的外部测试集上进行,以评估跨大陆和跨机构的泛化性。统一的评分框架结合了预测性能与年龄、绝经状态和乳腺密度方面的亚组一致性。26个国际团队参加了最终评估阶段。结果表明,在共同的外部评估框架下,性能存在显著差异,并揭示了整体准确性与亚组公平性之间的权衡。该挑战提供了标准化数据集、评估协议和公共资源,以促进开发稳健且公平的乳腺癌影像人工智能系统。

英文摘要

Breast cancer is the most frequently diagnosed malignancy among women worldwide and a leading cause of cancer-related mortality. Dynamic contrast-enhanced magnetic resonance imaging plays a central role in tumor characterization and treatment monitoring, particularly in patients receiving neoadjuvant chemotherapy. However, existing artificial intelligence models for breast magnetic resonance imaging are typically developed and evaluated using heterogeneous datasets, study populations, and assessment protocols, making direct comparison difficult and limiting understanding of model robustness across institutions and clinically relevant patient subgroups. The MAMA-MIA Challenge was designed to address these challenges by providing a standardized benchmark for the joint evaluation of primary tumor segmentation and prediction of pathologic complete response using pre-treatment magnetic resonance imaging only. The training cohort comprised 1,506 patients from multiple institutions in the United States, while evaluation was conducted on an external test set of 574 patients from three independent European centers to assess cross-continental and cross-institutional generalization. A unified scoring framework combined predictive performance with subgroup consistency across age, menopausal status, and breast density. Twenty-six international teams participated in the final evaluation phase. Results demonstrate substantial performance variability under a common external evaluation framework and reveal trade-offs between overall accuracy and subgroup fairness. The challenge provides standardized datasets, evaluation protocols, and public resources to promote the development of robust and equitable artificial intelligence systems for breast cancer imaging.

2606.19365 2026-06-19 cs.LG 新提交 80%

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

跨GPU架构的3D生成扩散模型性能分析与优化

Jeeho Ryoo, Yongchan Jung, Muhammad Ali Khaliq, Weidong Zhang, Jiatong Han, Byeong Kil Lee

发表机构 * Fairleigh Dickinson University(费尔利·迪金森大学) The University of Colorado at Colorado Springs(科罗拉多大学科罗拉多斯普林斯分校) Northeastern University(东北大学)

专题命中 医学影像 :优化3D MRI扩散模型Med-DDPM的性能。

AI总结 针对3D MRI扩散模型Med-DDPM,分析其在三代NVIDIA架构上的内核级性能瓶颈,提出TF32 Tensor Core激活和3D channels-last布局优化,实现SM周期和动态指令减少100倍,Tensor Core利用率提升至9.98倍,IPC提升7%。

详情
AI中文摘要

扩散模型已成为高保真3D MRI合成的关键,但由于每个样本需要数百次U-Net评估以及高度异构的内核行为,其部署仍受到大量GPU资源需求的限制。本文对最先进的医学扩散模型Med-DDPM在三代NVIDIA架构上进行了全面的性能分析,研究了内核级运行时分解、指令混合特征、内存系统利用率、线程束级活动以及分析器优先级得分估计。我们发现训练主要由cuDNN卷积和隐式GEMM内核主导,效率低下源于内存访问模式、张量布局转换和有限的Tensor Core利用率。基于这些洞察,我们评估了两种架构感知优化——TF32 Tensor Core激活和3D channels-last布局,并证明它们将SM周期减少多达100倍,动态指令减少100倍,Tensor Core利用率从1.45倍提高到9.98倍,并在A100上将IPC提高7%,且不降低合成质量。

英文摘要

Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly heterogeneous kernel behavior. This paper performs a comprehensive performance analysis of the state-of-the-art medical diffusion model, Med-DDPM, across three generations of NVIDIA architectures to study kernel-level runtime breakdowns, instruction-mix characteristics, memory system utilization, warp-level activities, and profiler priority-score estimates. We show that training is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels, with inefficiencies arising from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Guided by these insights, we evaluate two architecture-aware optimizations TF32 Tensor Core activation and a 3D channels-last layout and demonstrate that they reduce SM cycles by up to 100x, cut dynamic instructions by 100x, raise Tensor Core utilization from 1.45 to 9.98x, and increase IPC by 7% on A100, all without degrading synthesis quality.

2606.18970 2026-06-19 cs.LG cs.AI cs.CV 新提交 80%

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics(数学系) Department of Political and Social Sciences(政治与社会科学系)

专题命中 医学影像 :量子GAN增强脑MRI数据,属于医学影像

AI总结 通过受控基准测试,比较量子与经典生成器在脑MRI数据增强中的性能,发现两者均未显著优于仅用真实数据训练,且量子生成器无额外优势。

详情
AI中文摘要

医学图像分类常受限于有限的标注数据,因此生成式增强被提出;最近,量子生成模型被用于此目的,并经常报告准确率提升。然而,这些声称通常基于单次训练运行,未匹配量子与经典生成器的参数预算,也未表征任何收益出现的数据范围。我们提出了一个受控基准测试,隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中,在该空间中,使用变分量子生成器或参数数量几乎相同的经典生成器(1648 vs. 1632)训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器,覆盖从5%到100%的标注数据比例,通过八个随机种子进行配对显著性检验(多重比较校正)以及集内多样性和潜在分布分析。在所有比例下,没有增强变体显著优于仅用真实数据训练,且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展:合成样本分布外移,并且在数据稀缺时严重模式崩溃,而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

2602.22959 2026-06-19 cs.CV 版本更新 80%

Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

智能体能否在零样本设置中区分视觉上难以分离的疾病?一项初步研究

Zihao Zhao, Frederik Hauke, Juliana De Castilhos, Sven Nebelung, Daniel Truhn

发表机构 * Department of Diagnostic and Interventional Radiology, University Hospital Aachen, 52074 Aachen, Germany(诊断与介入放射科,亚琛大学医院,德国亚琛,52074)

专题命中 医学影像 :区分视觉混淆疾病的零样本诊断

AI总结 本研究探索多模态大语言模型智能体在零样本下区分视觉混淆疾病(如黑色素瘤与不典型痣、肺水肿与肺炎)的能力,提出基于对比裁决的多智能体框架,在皮肤镜数据上准确率提升11个百分点,但总体性能仍不足临床部署。

Comments Code available at https://github.com/TruhnLab/Contrastive-Agent-Reasoning. Accepted by MICCAI 2026

详情
AI中文摘要

多模态大语言模型(MLLMs)的快速进展引发了对基于智能体系统的日益关注。尽管大多数医学影像先前工作集中于自动化常规临床工作流程,我们研究了一个未被充分探索但临床意义重大的场景:在零样本设置中区分视觉上难以分离的疾病。我们在两个仅基于影像的代理诊断任务上对代表性智能体进行基准测试:(1)黑色素瘤与不典型痣,以及(2)肺水肿与肺炎,尽管临床管理存在显著差异,但视觉特征高度混淆。我们引入了一种基于对比裁决的多智能体框架。实验结果显示诊断性能提升(在皮肤镜数据上准确率提高11个百分点),并在定性样本上减少了无根据的声明,尽管整体性能仍不足以用于临床部署。我们承认人类注释中固有的不确定性以及临床背景的缺失,这进一步限制了向真实世界场景的转化。在此受控设置中,这项初步研究为视觉混淆场景下的零样本智能体性能提供了初步见解。

英文摘要

The rapid progress of multimodal large language models (MLLMs) has led to increasing interest in agent-based systems. While most prior work in medical imaging concentrates on automating routine clinical workflows, we study an underexplored yet clinically significant setting: distinguishing visually hard-to-separate diseases in a zero-shot setting. We benchmark representative agents on two imaging-only proxy diagnostic tasks, (1) melanoma vs. atypical nevus and (2) pulmonary edema vs. pneumonia, where visual features are highly confounded despite substantial differences in clinical management. We introduce a multi-agent framework based on contrastive adjudication. Experimental results show improved diagnostic performance (an 11-percentage-point gain in accuracy on dermoscopy data) and reduced unsupported claims on qualitative samples, although overall performance remains insufficient for clinical deployment. We acknowledge the inherent uncertainty in human annotations and the absence of clinical context, which further limit the translation to real-world settings. Within this controlled setting, this pilot study provides preliminary insights into zero-shot agent performance in visually confounded scenarios.

2. 诊断辅助 1 篇

2606.20174 2026-06-19 cs.LG 新提交 85%

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

基于无细胞DNA分析的多癌早期检测的计算方法与挑战

Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki

发表机构 * AGH University of Krakow(AGH克拉科夫大学) Norwegian Institute of Public Health(挪威公共卫生研究所)

专题命中 诊断辅助 :cfDNA多癌早期检测计算方法综述。

AI总结 综述2022-2025年cfDNA多癌早期检测的计算方法,重点分析片段组学和表观遗传特征提取技术,指出多模态集成方法最具临床整合潜力,但需标准化评估协议。

详情
AI中文摘要

无细胞DNA(cfDNA)是非侵入性多癌早期检测(MCED)的一个有前景的途径,因为它可以通过单次抽血同时检测多种癌症,尤其对目前缺乏既定筛查程序的癌症具有敏感性。本文综述了2022年至2025年间基于cfDNA的MCED计算方法。我们重点关注如何提取和分析片段组学和表观遗传特征以在早期阶段检测癌症。我们首先简要概述cfDNA信号的生物学基础,然后回顾经典的统计和机器学习方法以及深度学习框架,包括基于自编码器的模型。对于每种方法,我们讨论其生物学可解释性、验证策略以及临床整合的准备情况。此外,我们将当前挑战分为技术、计算和方法论三类,并概述该领域的开放问题。本综述表明,多模态集成方法在临床整合方面具有最强的前景和最高的准备度。然而,为了更好地评估未来工作和进行并排比较,标准化评估协议和报告结果至关重要。

英文摘要

Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

3. 临床大模型 3 篇

2606.19950 2026-06-19 cs.CV cs.AI 新提交 85%

Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA

多模态大语言模型的置信度校准:基于医学视觉问答的实证研究

Yuetian Du, Yucheng Wang, Ming Kong, Tian Liang, Qiang Long, Bingdi Chen, Qiang Zhu

发表机构 * College of Computer Science and Technology, Zhejiang University(浙江大学计算机科学与技术学院) School of Computer Science and Technology, Xidian University(西安电子科技大学计算机科学与技术学院) Zhihui Medical Technology (Shanghai) Co., Ltd.(智汇医疗科技(上海)有限公司)

专题命中 临床大模型 :研究MLLM在医学VQA中的置信度校准

AI总结 针对多模态大语言模型在医学任务中置信度与准确性不匹配的问题,提出结合多策略融合询问与专家大语言模型评估的方法,在三个医学VQA数据集上将期望校准误差平均降低40%,提升了模型可靠性。

Comments Accepted by MICCAI 2025

详情
AI中文摘要

多模态大语言模型(MLLMs)在医学任务中展现出巨大潜力,但其引发的置信度常常与实际准确性不一致,可能导致误诊或忽略正确建议。本研究首次全面分析了医学MLLMs中准确性与置信度之间的关系。提出了一种新方法,将多策略融合询问(MS-FBI)与辅助专家大语言模型评估相结合,旨在改善医学视觉问答(VQA)中的置信度校准。实验表明,我们的方法在三个医学VQA数据集上将期望校准误差(ECE)平均降低了40%,显著增强了MLLMs的可靠性。研究结果强调了领域特定校准对医疗领域MLLMs的重要性,为AI辅助诊断提供了更可信的解决方案。

英文摘要

Multimodal Large Language Models (MLLMs) show great potential in medical tasks, but their elicited confidence often misaligns with actual accuracy, potentially leading to misdiagnosis or overlooking correct advice. This study presents the first comprehensive analysis of the relationship between accuracy and confidence in medical MLLMs. It proposes a novel method that combines Multi-Strategy Fusion-Based Interrogation (MS-FBI) with auxiliary expert LLM assessment, aiming to improve confidence calibration in Medical Visual Question Answering (VQA). Experiments demonstrate that our method reduces the Expected Calibration Error (ECE) by an average of 40\% across three Medical VQA datasets, significantly enhancing MLLMs' reliability. The findings highlight the importance of domain-specific calibration for MLLMs in healthcare, offering a more trustworthy solution for AI-assisted diagnosis.

2606.19852 2026-06-19 cs.CL cs.LG 新提交 85%

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

提示、规划、提取:用于从临床叙述中提取肺部病理学的零样本智能体LLM工作流

Aman Pathak, Cheng Peng, Mengxian Lyu, Ziyi Chen, Reema Solan, Sankalp Talankar, Yasir Khan, Hiren Mehta, Aokun Chen, Yi Guo, Yonghui Wu

发表机构 * Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida(健康结果与生物医学信息学系,医学院,佛罗里达大学) Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida(呼吸科、重症医学科和睡眠医学科,医学系,医学院,佛罗里达大学) College of Nursing, Florida State University(护理学院,佛罗里达州立大学)

专题命中 临床大模型 :零样本LLM工作流提取肺部病理信息。

AI总结 提出零样本智能体工作流,利用开源大语言模型从肺切除病理报告中提取13个CAP字段,在无训练下达到0.893 Micro-F1,接近监督方法。

Comments 7 pages, 2 figures, 3 tables. Affiliations: (1) Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; (2) Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA; (3) College of Nursing, Florida State University, Tallahassee, FL, USA

详情
AI中文摘要

从病理报告中提取信息对于癌症分期和肿瘤登记人群至关重要。然而关键数据仍嵌入在叙述性报告中,使得手动提取劳动密集且易出错。传统的监督自然语言处理流程通过完全监督的命名实体识别和关系提取来解决这一问题,但需要昂贵的人工标注,并且当上游实体缺失时会出现级联故障。在本研究中,我们开发了一个零样本智能体工作流,并评估了五个开源生成式大语言模型(LLMs),以从肺切除病理报告中填充13个美国病理学家学会的概要字段。我们使用一种新颖的、与注册对齐的评估框架,将它们与最先进的监督GatorTron NER-RE基线进行比较。基线达到了0.960的Micro-F1,而最佳零样本模型(GPT-OSS-20B)达到了0.893的Micro-F1(召回率:0.949),在没有任务特定训练的情况下准确提取了复杂关系(如病理分期)。这些结果表明,开源零样本智能体LLMs是提取肺部病理信息的低成本解决方案。

英文摘要

Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional supervised Natural Language Processing pipelines address this through fully supervised Named Entity Recognition and Relation Extraction, but require expensive manual annotation and suffer cascading failures when upstream entities are missed. In this study, we developed a zero-shot, agentic workflow, and evaluated five open-source generative Large Language Models (LLMs) to populate 13 College of American Pathologists synoptic fields from lung resection pathology reports. We compared them against a state-of-the-art supervised GatorTron NER-RE baseline using a novel, registry-aligned evaluation framework. The baseline achieved Micro-F1of 0.960, while the best zero-shot model (GPT-OSS-20B) achieved Micro-F1 of 0.893 (recall: 0.949), accurately extracting complex relations like Pathologic Stage without task-specific training. These results suggest that open-source, zero-shot agentic LLMs are a low-cost solution for extracting lung pathology information.

2606.18613 2026-06-19 cs.CL cs.AI 新提交 85%

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

LLMs 是否已准备好辅助医生?PhysAssistBench:交互式医患-电子病历辅助基准

Tianming Du, Peijie Yu, Sihan Shang, Danli Shi, My Linh Nguyen, Shengbo Gao, Guangyuan Li, Yinghong Yu, Yan Jiang, Qianlong Zhao, Behzad Bozorgtabar, Shaoxiong Ji, Jiazhen Pan, Daniel Rueckert, Jiancheng Yang

发表机构 * Aalto University(阿尔托大学) Tencent(腾讯) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Hong Kong Polytechnic University(香港理工大学) Aarhus University(奥胡斯大学) Technical University of Munich(慕尼黑工业大学)

专题命中 临床大模型 :LLM辅助医生交互基准,属于临床大模型

AI总结 提出PhysAssistBench基准,通过构建交互式患者代理评估LLM在医患-EHR交互中的协调能力,发现当前模型不可靠,瓶颈在于多维度协调而非单一能力。

Comments 34 pages with 8 figures

详情
AI中文摘要

医疗LLM最合理的近期角色是辅助而非替代医生,但当前的评估通常测试孤立能力:临床知识、EHR系统交互或患者沟通。而医生辅助需要在同一交互中协调这些能力,其中医生提出不明确的请求,患者模糊描述症状,EHR系统要求精确的工具使用。我们引入PhysAssistBench,一个用于交互式医患-EHR辅助的基准。基于真实的MIMIC-IV病例,PhysAssistBench使用可扩展的流水线构建交互式、记录驱动的患者代理,将静态EHR记录转化为多轮临床场景,同时保持临床事实准确性。PhysAssistBench提供了一个精选的双语评估集,包含1,296个经过人工审查和医生验证的轮次。与领先LLM的实验表明,当前模型在此设置下仍不可靠,这暴露了临床LLM的关键瓶颈:可靠的辅助需要知识、沟通和系统之间的协调,而非任何单一能力的孤立提升。

英文摘要

The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication. Physician assistance instead requires coordinating these capabilities within the same interaction, where physicians issue underspecified requests, patients describe symptoms ambiguously, and EHR systems demand precise tool use. We introduce PhysAssistBench, a benchmark for interactive doctor-patient-EHR assistance. Built from real MIMIC-IV cases, PhysAssistBench uses a scalable pipeline to construct agentic patients: interactive, record-grounded agents that turn static EHR records into multi-turn clinical scenarios while preserving clinical factuality. PhysAssistBench provides a curated bilingual evaluation set of 1,296 manually reviewed and physician-validated turns. Experiments with leading LLMs show that current models remain unreliable in this setting, which exposes a key bottleneck for clinical LLMs: reliable assistance requires coordination across knowledge, communication, and systems, not isolated gains in any of them.

4. 健康监测 3 篇

2606.20074 2026-06-19 eess.SP cs.AI cs.LG 新提交 80%

Evaluation of EEG Foundation Models for Event-Based Burst-Suppression Detection in ICU

用于ICU中基于事件的爆发-抑制检测的EEG基础模型评估

Elisa Vasta, Thorir Mar Ingolfsson, Andrea Cossettini, Luca Benini, Tilman Beck, Emanuela Keller, Una Pale

发表机构 * DEI, University of Bologna, Bologna, Italy(DEI,博洛尼亚大学,博洛尼亚,意大利)

专题命中 健康监测 :ICU中EEG监测,辅助临床决策,属于医学AI

AI总结 本研究首次评估EEG基础模型在ICU中无需患者校准的爆发检测性能,REVE-base模型在事件级F1分数上达到0.868,并将每分钟爆发错误率分别降低52.1%和36.2%。

Comments 4 pages, 1 figure. Code available upon publication

详情
AI中文摘要

爆发抑制(BS)是一种临床相关的脑电图(EEG)模式,用于监测危重患者的镇静深度和脑活动,特别是在重症监护病房(ICU)的诱导昏迷期间。自动爆发检测仍然具有挑战性,因为BS模式在不同患者之间差异很大,且标注数据集稀缺。最近,EEG基础模型(FMs)在多个下游EEG应用中显示出前景,但它们在BS检测中的实用性尚未被探索。我们提出了第一项研究,评估EEG FMs在减少导联的ICU EEG中无需患者校准的爆发检测性能。我们将REVE-base、LUNA-large和LuMamba-Tiny与自适应阈值基线以及任务特定的EEGNet基线进行比较。此外,我们补充了基于事件的爆发检测评估,以替代传统的EEG窗口分类。这有助于临床评估爆发事件是否被正确检测,减少预期标注变异性的影响。最佳模型REVE-base取得了最高的事件级F1分数($0.868 \pm 0.167$),并且与EEGNet和自适应阈值相比,分别将每分钟爆发错误减少了52.1%和36.2%,支持了FMs在ICU中可扩展的EEG监测。消融实验表明,与冻结骨干训练、两步微调和基于LoRA的适应相比,全微调是最有效的适应策略,对于LUNA-large,事件级F1分数比冻结骨干训练提高了最多$+0.102$。在减少标注数据集的情况下,预训练的REVE-base在25%的队列中比随机初始化高出$+0.723$事件级F1点,证明了在有限标注数据下适应爆发检测时预训练FM表示的优势。

英文摘要

Burst suppression (BS) is a clinically relevant electroencephalographic (EEG) pattern used to monitor sedation depth and brain activity in critically ill patients, particularly during induced coma in Intensive Care Units (ICUs). Automatic burst detection remains challenging because BS patterns vary substantially between patients and annotated datasets are scarce. Recently, EEG Foundation Models (FMs) have shown promise across several downstream EEG applications, but their usefulness for BS detection remains unexplored. We present the first study to evaluate EEG FMs for burst detection in reduced-montage ICU EEG without patient-specific calibration. We compare REVE-base, LUNA-large and LuMamba-Tiny with an adaptive thresholding baseline and a task-specific EEGNet baseline. Additionally, we complement conventional EEG window-based classification with event-based burst detection evaluation. This helps assessing clinically whether burst episodes are correctly detected, reducing the impact of expected annotation variability. The best model, REVE-base, achieved the highest event-based F1-score ($0.868 \pm 0.167$) and reduced burst-per-minute error by 52.1% and 36.2% compared to EEGNet and adaptive thresholding respectively, supporting FMs for scalable EEG monitoring in ICU. Ablation experiments showed that full fine-tuning was the most effective adaptation strategy with respect to frozen-backbone training, two-step fine-tuning, and LoRA-based adaptation, improving event-based F1-score over frozen-backbone training by up to $+0.102$ for LUNA-large. With reduced labeled datasets, pretrained REVE-base outperformed random initialization by $+0.723$ event-based F1 points at 25% of the cohort, demonstrating the benefit of pretraining FM representations when adapted to burst detection with limited labeled data.

2606.19888 2026-06-19 cs.LG cs.AI 新提交 80%

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

SL-S4Wave:基于结构化状态空间模型的生理波形自监督学习

Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman

发表机构 * Massachusetts Institute of Technology(麻省理工学院) OpenEvidence, USA(OpenEvidence(美国)) New York University(纽约大学) Xi’an Jiaotong University(西安交通大学) University of Toronto(多伦多大学) Emory University(埃默里大学)

专题命中 健康监测 :自监督学习生理波形,用于心律失常检测。

AI总结 提出SL-S4Wave框架,结合对比学习与基于结构化状态空间模型的编码器,通过多尺度子核全局卷积捕获多通道生理波形的局部和长程依赖,在心律失常检测等任务中优于现有方法。

详情
AI中文摘要

由于高采样率、多通道信号复杂性、固有噪声和有限的标记数据,对长序列医学时间序列数据(如心电图)进行建模面临重大挑战。尽管最近基于各种编码器架构(如卷积神经网络)的自监督学习方法被提出用于从未标记数据中学习表示,但它们往往在捕获长程依赖和噪声不变特征方面存在不足。结构化状态空间模型擅长长序列建模,但现有的S4架构无法捕获多通道生理波形的独特特征。在这项工作中,我们提出了SL-S4Wave,一个自监督学习框架,它将对比学习与基于结构化状态空间模型的定制编码器相结合。该编码器利用多尺度子核实现多层全局卷积,从而能够在嘈杂的高分辨率多通道波形中捕获细粒度局部模式和长程时间依赖。在真实世界数据集上的大量实验表明,SL-S4Wave(1)在具有挑战性的心律失常检测任务中持续优于最先进的监督和自监督基线,(2)使用显著更少的标记示例实现高性能,展示了强大的标签效率,(3)在长波形片段上保持稳健性能,突出了其对大多数现有方法无法有效建模的长序列中复杂时间动态的建模能力,以及(4)有效迁移到未见的心律失常类型,强调了其强大的跨域泛化能力。我们还在多个EEG任务上评估了SL-S4Wave,在强基线上取得了优越性能,证明了我们的方法在心脏波形之外的泛化能力。

英文摘要

Modeling long-sequence medical time series data, such as electrocardiograms (ECG), poses significant challenges due to high sampling rates, multichannel signal complexity, inherent noise, and limited labeled data. While recent self-supervised learning (SSL) methods, based on various encoder architectures such as convolutional neural networks, have been proposed to learn representations from unlabeled data, they often fall short in capturing long-range dependencies and noise-invariant features. Structured state space models (S4) excel at long-sequence modeling, but existing S4 architectures fail to capture the unique characteristics of multichannel physiological waveforms. In this work, we propose SL-S4Wave, a self-supervised learning framework that combines contrastive learning with a tailored encoder built on structured state space models. The encoder incorporates multi-layer global convolution using multiscale subkernels, enabling the capture of both fine-grained local patterns and long-range temporal dependencies in noisy, high-resolution multichannel waveforms. Extensive experiments on real-world datasets demonstrate that SL-S4Wave (1) consistently outperforms state-of-the-art supervised and self-supervised baselines in a challenging arrhythmia detection task, (2) achieves high performance with significantly fewer labeled examples, showcasing strong label efficiency, and (3) maintains robust performance on long waveform segments, highlighting its capacity to model complex temporal dynamics in long sequences that most existing approaches fail to efficiently model, and (4) transfers effectively to unseen arrhythmia types, underscoring its robust cross-domain generalization. We additionally evaluate SL-S4Wave on multiple EEG tasks, achieving superior performance over strong baselines, demonstrating generalizability of our approach beyond cardiac waveforms.

2606.19405 2026-06-19 q-bio.QM math.DS q-bio.PE 新提交 70%

Multi-type branching inference on contact trees with application to COVID-19

接触树上的多类型分支推断及其在COVID-19中的应用

Augustine Okolie, Johannes Müller, Eno Akarawakc, Isaac Ajiboye

专题命中 健康监测 :应用于COVID-19流行病学参数推断

AI总结 提出一种直接作用于接触树上传播树的似然框架,通过多类型分支过程考虑接触度异质性,从部分解析的传播树中推断流行病学参数,并在COVID-19接触追踪数据中验证。

Comments 26 pages, 8 Figures

详情
AI中文摘要

从传播树推断流行病学参数对于理解传染病动态至关重要。现有的基于树的似然方法,包括最初应用于系统动力学环境中的多类型出生-死亡模型,提供了强大的工具,但大多数假设均匀混合,很少捕捉当个体感染更多接触者时传播潜力的变化。在这项工作中,我们开发了一个直接作用于传播树的似然框架,其中节点是个体,边是报告的传播事件,不涉及序列数据。我们推导了一个在有根接触树上的随机SIR过程的似然,其中每个感染个体由有效接触总数和已感染的下游接触数来刻画。我们得到了一个分支完全未被观察到的概率以及它产生一个处于给定状态的观察(采样)末端的概率密度的闭式常微分方程。对于已知末端状态的有根接触树,可以评估得到的似然,并且我们通过将内部分支时间视为潜在变量,将其扩展到部分解析的树。在模拟爆发上的验证确认了准确的参数恢复和良好校准的不确定性。应用于印度卡纳塔克邦的经验COVID-19接触追踪数据,展示了该框架在实际流行病学环境中的实用性。通过在多类型分支似然中纳入接触度异质性,我们的工作为从完全或部分解析的传播树推断传播动态和接触结构提供了一个原则性的基线,补充而非依赖于基于序列的系统动力学推断。

英文摘要

Inferring epidemiological parameters from transmission trees is essential for understanding infectious disease dynamics. Existing tree-based likelihood methods, including the multi-type birth-death models originally applied in phylodynamic settings, provide powerful tools, but most assume homogeneous mixing and rarely capture how transmission potential changes as an individual infects more of their contacts. In this work, we develop a likelihood framework that operates directly on transmission trees, in which nodes are individuals and edges are reported transmission events, with no sequence data involved. We derive a likelihood for a stochastic SIR process on a rooted contact tree in which each infected individual is characterised by the total number of effective contacts, and the number of already infected downstream contacts. We obtain closed-form ordinary differential equations for the probability that a clade goes entirely unobserved and for the probability density that it produces an observed (sampled) tip in a given state. The resulting likelihood can be evaluated for a rooted contact tree with known tip states, and we extend it to partially resolved trees by treating internal branching times as latent variables. Validation on simulated outbreaks confirms accurate parameter recovery and well calibrated uncertainty. Application to empirical COVID-19 contact-tracing data from Karnataka, India, demonstrates the framework's utility for real epidemiological settings. By incorporating contact-degree heterogeneity in a multi-type branching likelihood, our work provides a principled baseline for inferring both transmission dynamics and contact structure from fully or partially resolved transmission trees, complementing rather than relying on sequence-based phylodynamic inference

5. 其他医学AI 1 篇

2606.19827 2026-06-19 cs.LG cs.AI 新提交 80%

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

何时、何地以及如何:面向表格自监督学习的自适应分箱

Daehwan Kim, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(汉阳大学) Hankuk University of Foreign Studies(韩国外国语大学)

专题命中 其他医学AI :自适应分箱用于医疗表格自监督学习,提升性能。

AI总结 提出自适应分箱方法,通过特征级粗到细课程学习动态优化离散化,结合类别重建与顺序监督,在医疗表格数据上提升自监督学习性能。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

医疗表格数据在临床研究中无处不在,但表格数据的深度学习仍未被充分探索,因为可靠的标签通常需要昂贵的专家判定,尽管结构化临床变量通常以表格形式常规可用。自监督学习可以利用这些未标记的表格,而最近基于分箱的前置任务提供了一种有前景的归纳偏置,但现有目标固定单个全局分位数离散化并应用特征无关的监督。我们提出自适应分箱,一种用于表格自监督学习的训练自适应离散化前置任务,通过特征级粗到细课程将离散化与学习耦合。受神经网络的频谱偏差和课程学习原则的启发,我们的方法在检测到平台期时逐步细化每个特征的离散化,并选择表示感知的分割点,以联合改善值空间浓度和表示空间一致性。一种异质性感知目标统一了类别重建与数值特征的顺序监督,在统一评估协议下对公共医疗表格数据集的实验显示,线性探测和微调均取得一致改进,无需数据集特定的离散化调整。我们进一步引入一个医疗表格自监督学习基准,配备标准化协议,以支持这一未被充分探索领域的可重复进展。我们的代码可在该网址获取。

英文摘要

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.