医学 AI

2606.20112 2026-06-19 cs.CV eess.IV 新提交 95%

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

像素级残差扩散Transformer：可扩展的3D CT体生成

Zhenkai Zhang, Markus Hiller, Krista A. Ehinger, Tom Drummond

发表机构 * School of Computing and Information Systems, The University of Melbourne（墨尔本大学计算与信息系统学院）

专题命中医学影像：提出3D CT体生成方法，用于医学影像

AI总结提出像素级残差扩散Transformer（PRDiT），通过两阶段训练（局部MLP盲估计器分离低频结构+全局残差扩散Transformer建模高频残差）实现高保真3D CT体生成，在LIDC-IDRI和RAD-ChestCT数据集上优于现有方法。

Comments Accepted at ICLR 2026. Code available at https://github.com/Fredy-Zhang/PRDiT

详情

AI中文摘要

由于现有生成模型固有的巨大计算需求和优化困难，生成具有精细细节的高分辨率3D CT体仍然具有挑战性。在本文中，我们提出了像素级残差扩散Transformer（PRDiT），这是一种可扩展的生成框架，可直接在体素级别合成高质量的3D医学体。PRDiT引入了一个两阶段训练架构，包括：1）一个局部去噪器，形式为基于MLP的盲估计器，作用于重叠的3D块，以有效分离低频结构；2）一个全局残差扩散Transformer，采用内存高效注意力来建模和细化整个体上的高频残差。这种从粗到细的建模策略简化了优化，增强了训练稳定性，并有效保留了细微结构，而无需自编码器瓶颈。在LIDC-IDRI和RAD-ChestCT数据集上进行的大量实验表明，PRDiT始终优于最先进的模型，如HA-GAN、3D LDM和WDM-3D，在3D FID、MMD和Wasserstein距离指标上显著降低。

英文摘要

Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck. Extensive experiments conducted on the LIDC-IDRI and RAD-ChestCT datasets demonstrate that PRDiT consistently outperforms state-of-the-art models, such as HA-GAN, 3D LDM and WDM-3D, achieving significantly lower 3D FID, MMD and Wasserstein distance scores.

URL PDF HTML ☆

赞 0 踩 0

2606.20108 2026-06-19 cs.CV cs.LG 新提交 95%

EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

EFIQA: 基于解剖先验的可解释眼底图像质量评估

Pengwei Wang, José Morano, Qian Wan, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria（维也纳医科大学医学数据科学中心人工智能研究所）； Christian Doppler Lab for Artificial Intelligence in Retina, Medical University of Vienna, Austria（维也纳医科大学视网膜人工智能克里斯蒂安·多普勒实验室）

专题命中医学影像：眼底图像质量评估，医学影像应用

AI总结提出无需质量标签的EFIQA框架，利用解剖先验通过掩膜解剖修复学习正常结构，生成空间质量图，在多个基准上超越监督方法，兼具可解释性。

Comments Accepted in MIDL 2026. Code: https://github.com/penway/EFIQA

Journal ref Proceedings of Machine Learning Research 315:2248-2264, 2026

详情

AI中文摘要

图像质量控制对于广泛的下游应用至关重要。基于深度学习的图像质量评估方法通常根据数据集特定的质量标签训练分类器，这继承了两种局限性：（1）泛化能力受限于训练集的标注标准；（2）这些方法无法提供质量下降的空间反馈，缺乏可解释性。在这项工作中，我们提出了EFIQA，一个无需质量相关监督的框架，并通过设计生成空间质量图。EFIQA不是从人工标注的标签中学习“什么是退化”，而是通过利用解剖先验来学习“应该有什么”。对于眼底摄影，我们将其实例化为两阶段方法：首先通过掩膜解剖修复训练无监督异常检测器，以识别缺失血管区域；然后将这一先验知识蒸馏到一个浅层适配器中，将冻结基础模型的特征映射到精确的质量图。外部数据集评估表明，这种无需标签且只需最小适配的方法，在不同质量标准的基准上，与监督方法相比，实现了更好的性能和可解释性，突显了其在现实应用中的潜力。

英文摘要

Image quality control is vital for a wide range of downstream applications. Deep learning-based image quality assessment methods typically train classifiers on dataset-specific quality labels, inheriting two limitations: (1) generalization is tied to the labeling criteria of the training set and (2) these methods cannot provide spatial feedback on where the quality is degraded, lacking explainability. In this work, we propose EFIQA, a framework that requires no quality-related supervision and produces spatial quality maps by design. Rather than learning ``what is degradation" from human-annotated labels, EFIQA learns ``what should be there" by leveraging anatomical priors. For fundus photography, we instantiate this as a two-stage approach, by first training an unsupervised anomaly detector via masked anatomical inpainting to identify regions of missing vasculature, and then distilling this prior knowledge into a shallow adapter mapping features of a frozen foundation model to precise quality maps. External-dataset evaluation demonstrates that this label-free approach with minimal adaptation achieves better performance and explainability compared with supervised methods across benchmarks with different quality criteria, highlighting its potential for real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.19838 2026-06-19 cs.CV 新提交 95%

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

OTCHA: 基于最优传输的置信度感知潜在中心对齐用于多视图医学图像分类

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

专题命中医学影像：多视图医学图像分类，应用于乳腺X光和胸片。

AI总结提出OTCHA模块，通过最优传输对齐多视图补丁令牌与共享潜在中心令牌，结合置信度门控和部分匹配，消除无关特征，提升多视图医学图像分类鲁棒性。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

多视图成像（如乳腺X线摄影和胸部X线摄影）是临床实践的标准组成部分。然而，医学图像通常未配准，且包含视图特定的伪影或无关背景线索，这些可能掩盖诊断相关发现。许多现有方法直接融合每个视图的表征，使得此类无关内容污染融合嵌入，并在不同视图配置下降低鲁棒性。我们提出OTCHA，一种基于最优传输（OT）的置信度感知潜在中心令牌对齐模块，在融合前细化补丁令牌以用于多视图分类。OTCHA引入一组跨视图共享的可学习潜在中心令牌。对于每个视图，我们计算补丁令牌与中心令牌之间的OT计划，该计划联合考虑特征相似性和几何结构，并通过令牌条件尘埃箱增强OT公式以实现部分匹配并丢弃无关令牌。所得传输计划提供令牌级匹配置信度，该置信度门控中心介导的消息传递，并加权一种新的基于最优传输的表征对齐损失以稳定细化。在三个多视图医学图像数据集上的实验表明，在不同解剖结构和视图配置下，相比竞争基线方法取得一致改进。我们的代码可在该https URL获取。

英文摘要

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

URL PDF HTML ☆

赞 0 踩 0

2606.19824 2026-06-19 cs.CV cs.AI 新提交 95%

CSWinUNETR: Segmentation of Thin Anatomical Structures in Medical Images

CSWinUNETR: 医学图像中薄解剖结构的分割

Junho Moon, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

专题命中医学影像：分割视网膜血管、脑血管等薄解剖结构。

AI总结提出CSWinUNETR通用骨干网络，通过交叉形条带自注意力、循环移位、细节增强多尺度自注意力和稀疏控制动态蛇形卷积，解决薄结构分割中的低对比度、断裂和类不平衡问题，在眼科、神经血管和皮肤科基准上超越现有方法。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

准确分割薄而曲折的解剖结构，如视网膜血管、脑血管和面部皱纹，由于低对比度、频繁断裂和严重的类别不平衡仍然具有挑战性。尽管最近的卷积和基于Transformer的模型提高了性能，但它们常常产生碎片化的预测，并且无法恢复细小的分支。我们提出了CSWinUNETR，一个用于2D和3D薄结构分割的通用骨干网络。它采用交叉形条带自注意力来建模长距离主轴上下文，并结合循环移位以增强条带间的信息交换。为了更好地保留细粒度细节，我们进一步引入了一个细节增强的多尺度自注意力模块，该模块从多分辨率表示中聚合上下文特征。此外，我们提出了稀疏控制动态蛇形卷积，它从稀疏预测的控制点重建可靠的密集曲线核，以更好地跟随曲折的几何形状。在眼科、神经血管成像和皮肤科的四个基准上的大量实验表明，CSWinUNETR在没有任务特定后处理或拓扑感知损失的情况下，始终优于最先进的方法。代码可在该网址获取。

英文摘要

Accurate segmentation of thin, tortuous anatomical structures, such as retinal vessels, cerebral vasculature, and facial wrinkles, remains challenging due to low contrast, frequent discontinuities, and severe class imbalance. Although recent convolutional and Transformer-based models have improved performance, they often yield fragmented predictions and fail to recover fine branches. We propose CSWinUNETR, a general-purpose backbone for 2D and 3D thin-structure segmentation. It employs cross-shaped stripe self-attention to model long-range principal-axis context and incorporates cyclic shifts to enhance information exchange across stripes. To better preserve fine-grained details, we further introduce a detail-enhanced multi-scale self-attention module that aggregates contextual features from multi-resolution representations. In addition, we propose sparse-control dynamic snake convolution, which reconstructs reliable dense curvilinear kernels from sparsely predicted control points to better follow tortuous geometry. Extensive experiments on four benchmarks across ophthalmology, neurovascular imaging, and dermatology demonstrate that CSWinUNETR consistently outperforms state-of-the-art methods without task-specific post-processing or topology-aware losses. The code is available at https://github.com/labhai/CSWinUNETR.

URL PDF HTML ☆

赞 0 踩 0

2606.19460 2026-06-19 cs.CV cs.AI cs.LG 新提交 95%

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

使用整流流变换器扩展胸部X光片的生成式基础模型

Fabio De Sousa Ribeiro, Emma A. M. Stanley, Charles Jones, Tian Xia, Dominic C. Marshall, Laurent Renard Triché, Christopher V. Cosgriff, Panagiotis Dimitrakopoulos, Sotirios A. Tsaftaris, Ben Glocker

发表机构 * Imperial College London（帝国理工学院）； Causality in Healthcare AI Hub（医疗AI因果关系中心）； University of Edinburgh（爱丁堡大学）； Cleveland Clinic London（克利夫兰诊所伦敦）； Department of Perioperative Medicine, CHU Clermont-Ferrand（克莱蒙费朗大学医院围手术期医学科）； Department of Medicine, Massachusetts General Hospital（麻省总医院医学部）； Broad Institute of MIT and Harvard（麻省理工学院与哈佛大学博德研究所）

专题命中医学影像：十亿参数级胸部X光片生成基础模型。

AI总结提出首个十亿参数级胸部X光片生成基础模型，通过整流流变换器实现高保真可控合成，显著提升合成图像与真实图像的不可区分性。

Comments Project page: https://RadiT-project.github.io

详情

AI中文摘要

我们引入了首个从零开始在十亿参数规模上训练的胸部X光片合成生成基础模型。现有的放射学AI模型通常在不同患者亚群、机构和采集设置下泛化能力差，导致实际临床效用有限。可控、高保真的胸部X光片合成是多样化临床数据集和评估诊断模型鲁棒性的有前景途径。因此，我们提出了迄今为止最大的胸部X光片专用生成基础模型，拥有超过13亿参数，在包含120万张X光片和临床专家指导元数据的精选异质数据集上训练了1.6万亿个token。我们的模型支持跨多个人口统计亚组、采集视图和十多种病理的可控X光片生成和编辑。此外，我们显著推进了X光片合成保真度的最新技术，生成的图像对临床专家而言与真实X光片无法区分。

英文摘要

We introduce the first generative foundation model for chest radiograph synthesis trained from scratch at the billion-parameter scale. Existing radiographic AI models often suffer from poor generalisation across patient subpopulations, institutions, and acquisition settings, resulting in limited real-world clinical utility. Controlled, high-fidelity synthesis of chest radiographs is a promising path toward diversifying clinical datasets and evaluating the robustness of diagnostic models. Therefore, we present the largest specialist generative foundation model for chest radiographs to date, with over 1.3B parameters, trained for 1.6T tokens on a curated, heterogeneous dataset comprising 1.2M radiographs and clinical expert-guided metadata. Our model supports controllable radiograph generation and editing across multiple demographic subgroups, acquisition views, and a dozen pathologies. Moreover, we significantly advance the state of the art in radiograph synthesis fidelity, producing images that are indistinguishable from real radiographs to clinical experts.

URL PDF HTML ☆

赞 0 踩 0

2606.14957 2026-06-19 cs.CV 新提交 95%

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

学习用于多模态神经影像的稀疏潜在预测基础模型

Haoxu Huang, Long Chen, Jingyun Chen, Jinu Hyun, James Ryan Loftus, Kara Melmed, Daniel Orringer, Jennifer Frontera, Seena Dehkharghani, Arjun Masurkar, Narges Razavian

发表机构 * New York University, Center for Data Science（纽约大学数据科学中心）； NYU Grossman School of Medicine, Department of Radiology（纽约大学格罗斯曼医学院放射学系）； State University of New York at Binghamton, School of Computing（纽约州立大学宾汉姆顿分校计算机学院）； NYU Grossman School of Medicine, Department of Neurology（纽约大学格罗斯曼医学院神经病学系）； NYU Grossman School of Medicine, Department of Neurosurgery（纽约大学格罗斯曼医学院神经外科学系）； NYU Grossman School of Medicine, Department of Pathology（纽约大学格罗斯曼医学院病理学系）； School of Medicine, Department of Radiology, Stanford（斯坦福大学医学院放射学系）； NYU Grossman School of Medicine, Department of Neuroscience（纽约大学格罗斯曼医学院神经科学系）； NYU Grossman School of Medicine, Neuroscience Institute（纽约大学格罗斯曼医学院神经科学研究所）

专题命中医学影像：多模态神经影像基础模型

AI总结提出Neuro-JEPA模型，结合潜在预测目标和专家混合架构，学习T1w、T2w和FLAIR三种MRI序列的统一表示，在25项临床任务和22项公开数据集任务上优于现有基础模型和CNN基线。

Comments Under Review Preprint

详情

AI中文摘要

脑部MRI通常作为多个互补序列采集，具有独特的对比度加权，包括T1加权成像（T1w）解剖对比和液体敏感T2加权（T2w）对比。然而，在健康系统规模上，跨多种MRI对比机制学习统一表示的方法尚缺乏。在本研究中，我们引入了Neuro-JEPA，一种稀疏多模态神经影像基础模型，它结合了潜在预测目标和专家混合架构，以编码跨核心T1w、T2w和液体抑制FLAIR成像（FLAIR）的脑部MRI。我们进一步对架构、掩码、目标和稀疏性设计选择进行了系统的方法论研究，这些选择有利于稳健的神经影像多模态表示学习。Neuro-JEPA在428,647项研究的1,551,862次扫描上进行了预训练，这些扫描经过了模态特定的预处理和跨三种核心结构脑部MRI序列的数据整理。我们在临床和研究环境中评估了学习到的表示，包括来自三个健康系统（NYU Langone、NYU Long Island和Massachusetts General Hospital）的25项任务，以及来自12个公开数据集的22项任务，涵盖了单模态、多模态和跨域评估配置。在这些基准测试中，现有的神经影像基础模型相对于简单的卷积神经网络（CNN）基线显示出不一致的提升，而Neuro-JEPA在所有评估设置中实现了更强且更一致的性能。这些结果建立了一个可扩展的多模态神经影像表示学习方法论框架，并强调了基础模型评估协议需要包括简单基线、临床异质性队列和受控的多模态比较。

英文摘要

Brain MRIs are routinely acquired as multiple complementary sequences with unique contrast weighting, including T1-weighed imaging (T1w) anatomic and fluid-sensitive T2-weighted (T2w) contrasts. However, methods for learning unified representations across the multitude of MRI contrast mechanisms at health-system scale are lacking. In this study, we introduce Neuro-JEPA, a sparse multimodal neuroimaging foundation model that combines a latent predictive objective with a Mixture-of-Experts architecture to encode brain MRI across core T1w, T2w, and fluid-suppressed FLAIR imaging (FLAIR). We further provide a systematic methodological study of architectural, masking, objective, and sparsity design choices beneficial for robust neuroimaging multimodal representation learning. Neuro-JEPA was pretrained on 1,551,862 scans from 428,647 studies after modality-specific preprocessing with data curation across three core structural brain MRI sequences. We evaluated the learned representations across clinical and research settings, including 25 tasks from three health systems: NYU Langone, NYU Long Island, and Massachusetts General Hospital, and 22 tasks from 12 public datasets, covering unimodal, multimodal and cross-domain evaluation configurations. Across these benchmarks, existing neuroimaging foundation models showed inconsistent gains over a simple convolutional neural network (CNN) baseline, whereas Neuro-JEPA achieved stronger and more consistent performance across all evaluated settings. These results establish a scalable methodological framework for multimodal neuroimaging representation learning and highlight the need for foundation model evaluation protocols that include simple baselines, clinically heterogeneous cohorts and controlled multimodal comparisons.

URL PDF HTML ☆

赞 0 踩 0

2601.15119 2026-06-19 eess.IV cs.CV 95%

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

医学影像中的视觉模型：一种用于超声扫描中多囊卵巢综合征检测的混合方法

Md Mahmudul Hoque, Md Mehedi Hassain, Muntakimur Rahaman, Md. Towhidul Islam, Shaista Rani, Md Sharif Mollah

发表机构 * Department of CSE, CCN University of Science & Technology（计算机科学与工程系，CCN科学与技术大学）； Department of EEE,International Islamic University Chittagong（电子工程系，国际伊斯兰大学恰tagong分校）； Faculty of Engineering, Multimedia University（工程学院，多媒体大学）； Department of CSE, Stamford University of Bangladesh（计算机科学与工程系，斯塔福德大学孟加拉国分校）； Department of Biology, Lucknow University（生物学系，拉胡尔大学）； Department of CSE, Bangladesh Army International University of Science & Technology（计算机科学与工程系，孟加拉国军队国际科学与技术大学）

专题命中医学影像：混合视觉模型用于超声图像PCOS检测，属于医学影像分析。

AI总结本文提出两种混合模型，结合卷积和Transformer方法，用于超声图像中多囊卵巢综合征的准确检测，最终模型在准确性上达到98.23%。

详情

DOI: 10.1088/1742-6596/3191/1/012120

AI中文摘要

多囊卵巢综合征（PCOS）是育龄女性最常见的内分泌疾病。许多孟加拉女性在老年时患PCOS。我们的研究目的是识别有效的基于视觉的医学图像分析技术，并评估混合模型以准确检测PCOS。我们引入了两种新颖的混合模型，结合卷积和Transformer方法。训练和测试数据被分为两类：“感染”（PCOS阳性）和“非感染”（健康卵巢）。在初始阶段，我们的第一个混合模型“DenConST”（结合DenseNet121、Swin Transformer和ConvNeXt）达到了85.69%的准确率。最终优化的模型“DenConREST”（结合Swin Transformer、ConvNeXt、DenseNet121、ResNet18和EfficientNetV2）表现出更优异的性能，准确率达到98.23%。在所有评估的模型中，DenConREST表现最佳。本研究为从超声图像中检测PCOS提供了一个高效的解决方案，显著提高了诊断准确性并减少了检测错误。

英文摘要

Polycystic Ovary Syndrome (PCOS) is the most familiar endocrine illness in women of reproductive age. Many Bangladeshi women suffer from PCOS disease in their older age. The aim of our research is to identify effective vision-based medical image analysis techniques and evaluate hybrid models for the accurate detection of PCOS. We introduced two novel hybrid models combining convolutional and transformer-based approaches. The training and testing data were organized into two categories: "infected" (PCOS-positive) and "noninfected" (healthy ovaries). In the initial stage, our first hybrid model, 'DenConST' (integrating DenseNet121, Swin Transformer, and ConvNeXt), achieved 85.69% accuracy. The final optimized model, 'DenConREST' (incorporating Swin Transformer, ConvNeXt, DenseNet121, ResNet18, and EfficientNetV2), demonstrated superior performance with 98.23% accuracy. Among all evaluated models, DenConREST showed the best performance. This research highlights an efficient solution for PCOS detection from ultrasound images, significantly improving diagnostic accuracy while reducing detection errors.

URL PDF HTML ☆

赞 0 踩 0

2606.19804 2026-06-19 cs.CV 新提交 92%

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

HypOProto: 用于左心室充盈压分类的双曲序数原型

Victoria Wu, Nima Hashemi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang

发表机构 * The University of British Columbia（不列颠哥伦比亚大学）； Vancouver General Hospital（温哥华综合医院）

专题命中医学影像：使用超声心动图进行左心室充盈压分类，属于医学影像分析。

AI总结提出HypOProto框架，利用双曲空间中的序数原型对左心室充盈压进行分类，通过冻结的可解释基础模型实现高精度与临床可解释性。

详情

AI中文摘要

超声心动图（echo）是一种广泛用于评估心脏功能的成像模态，左心室充盈压（LVFP）是心力衰竭等疾病的关键生理标志物。将LVFP分为正常和升高类别的标准依赖于多普勒衍生的$E/e'$比值，该比值依赖于操作者，且在资源有限的环境中通常不可用，这促使了直接从B模式超声推断LVFP的方法。现有的深度学习方法实现了高性能，但大多是黑盒模型，限制了临床可解释性。我们提出了HypOProto，一个基于双曲序数原型的可解释LVFP分类框架，使用冻结的可解释基础模型骨干。HypOProto沿着生理$E/e'$尺度排列原型，将边界情况放置在双曲面根附近，其中小的角度差异区分相似情况，而正常和升高情况占据向外位置，反映诊断确定性的增加。这种双曲几何编码了临床上有意义的序数关系，并提高了可解释性。我们还引入了一种新的双曲原型角度分离（HyperPAS）损失，强制在双曲空间中实现类间原型分离。HypOProto在保持透明性的同时实现了最先进的性能，并在可视化中突出显示临床相关区域。这项工作代表了超声中LVFP分类的第一个基于原型的框架。我们的代码可在以下网址找到：此 https URL。

英文摘要

Echocardiography (echo) is a widely used imaging modality for assessing cardiac function, with Left Ventricular Filling Pressure (LVFP) serving as a critical physiological marker for conditions such as heart failure. Standard LVFP classification into normal \emph{vs} elevated categories relies on the Doppler-derived $E/e'$ ratio, which is operator-dependent and often unavailable in resource-limited settings, motivating methods that infer LVFP directly from B-mode echo. Existing deep learning approaches achieve high performance but remain largely black-box, limiting clinical interpretability. We propose HypOProto, a hyperbolic, ordinal prototype-based framework for interpretable LVFP classification using a frozen, explainable foundation model backbone. HypOProto arranges prototypes along the physiological $E/e'$ scale, placing borderline cases near the hyperboloid root where small angular differences separate similar cases, while normal and elevated cases occupy outward positions reflecting increasing diagnostic certainty. This hyperbolic geometry encodes clinically meaningful ordinal relationships and improves interpretability. We also introduce a novel Hyperbolic Prototype Angular Separation (HyperPAS) loss, enforcing inter-class prototype separation in hyperbolic space. HypOProto achieves SOTA performance while maintaining transparency, and highlights clinically relevant regions in visualizations. This work represents the first prototype-based framework for LVFP classification in echo. Our code can be found at https://github.com/DeepRCL/HypOProto.

URL PDF HTML ☆

赞 0 踩 0

2606.20477 2026-06-19 cs.CV cs.CL cs.LG 新提交 90%

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

面向放射学的空间定位2D视觉-语言模型的可扩展训练

Yusuf Salcan, Simon Ging, Robin Schirrmeister, Philipp Arnold, Elmar Kotter, Behzad Bozorgtabar, Thomas Brox

发表机构 * Computer Vision Group, University of Freiburg, Germany（德国弗莱堡大学计算机视觉组）； Department of Radiology, Medical Center -- University of Freiburg, Germany（德国弗莱堡大学医学中心放射科）； CRIION-AI Lab, Freiburg, Germany（德国弗莱堡CRIION-AI实验室）

专题命中医学影像：放射学视觉语言模型，空间定位

AI总结提出RefRad2D大规模双语数据集，通过LLM和自动分割生成空间定位数据，训练RadGrounder模型联合完成报告生成、VQA和空间定位，在外部基准上取得竞争性结果。

Comments Accepted for MICCAI 2026. First two authors: equal contribution. Last two authors: equal supervision

详情

AI中文摘要

我们研究了如何在没有手动空间标注的情况下，为放射学训练具有视觉定位能力的视觉-语言模型（VLM）。我们引入了RefRad2D，这是一个大规模的双语（德语/英语）数据集，包含来自临床实践的120万对CT和MR图像-文本对，并通过基于LLM的筛选和自动分割自动生成任务特定的VQA和空间定位子集。在此数据上训练的模型RadGrounder联合执行报告生成、视觉问答以及通过边界框检测或分割进行的空间定位。在外部VQA基准（Slake，VQA-RAD）上，RadGrounder取得了与专用医学VLM竞争的结果。将我们的临床数据加入训练混合集，相比于仅在下游数据集上微调，提高了开放式VQA的性能，显示了数据集的迁移性。关键在于，添加定位监督不会降低语言质量，从而在不牺牲VQA性能的情况下实现空间可验证的输出。

英文摘要

We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQA and spatial grounding subsets generated automatically via LLM-based curation and automated segmentation. Trained on this data, our model RadGrounder jointly performs report generation, visual question answering, and spatial grounding via bounding-box detection or segmentation. On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs. Adding our clinical data to the training mixture improves open-ended VQA over fine-tuning on the downstream datasets alone, showing the transferability of our dataset. Crucially, adding grounding supervision does not degrade language quality, enabling spatially verifiable outputs at no cost to VQA performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20390 2026-06-19 cs.CV 新提交 90%

Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification

几何感知超像素图变换器结合元数据用于皮肤病变分类

Muhammad Azeem, Tanveer Hussain, Amr Ahmed, Ardhendu Behera

发表机构 * Edge Hill University（埃奇希尔大学）

专题命中医学影像：提出基于图的皮肤病变分类方法，使用皮肤镜图像。

AI总结提出一种基于区域的图学习框架，将病变建模为超像素图，利用几何边属性和元数据上下文节点，通过边缘感知图变换器实现多模态融合，在四个公开数据集上取得优于现有方法的分类性能。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

由于病变结构异质性、类内变异大以及良恶性病例间细微视觉差异，从皮肤镜图像进行自动化皮肤癌分类仍然具有挑战性。现有的CNN/ViT流程通常依赖全局或补丁级特征，并常通过后期融合结合患者元数据，这限制了空间基础的多模态推理。我们提出一种新颖的基于区域的图学习框架，将病变显式建模为空间连贯的超像素区域图，这些区域表示为冻结的CNN特征。为了捕捉细粒度的病变排列，我们将区域间几何编码为边属性，并引入一个与所有区域相连的专用元数据上下文节点，从而在同一关系空间内结构化地整合人口统计学/临床变量。节点表示通过我们的边缘感知图变换器进行更新，随后进行注意力驱动的传播，最终生成用于良恶性分类的图级嵌入。在四个公开基准上的实验表明，显式的区域级关系建模和图原生多模态融合相较于现有技术取得了持续改进。因此，我们建立了一种新的以图为中心的视角，其中CNN特征被建模为关系节点，并通过上下文整合得到改进，从而产生更具表现力和鲁棒性的分类结果。

英文摘要

Automated skin cancer classification from dermoscopic images remains challenging due to heterogeneous lesion structure, strong intra-class variability, and subtle visual differences between benign and malignant cases. Existing CNN/ViT pipelines typically rely on global or patch-level features and often combine patient metadata via late fusion, which limits spatially grounded multimodal reasoning. We present a novel region-based graph learning framework that explicitly models lesions as graphs of spatially coherent superpixel regions represented as frozen CNN features. To capture fine-grained lesion arrangements, we encode inter-regional geometry as edge attributes and introduce a dedicated metadata context node connected to all regions, providing structured integration of demographic/clinical variables within the same relational space. Node representations are updated using our edge-aware graph transformer followed by attention-driven propagation, and a final graph-level embedding for benign-malignant classification. Experiments on four public benchmarks demonstrate that explicit region-level relational modeling and graph-native multimodal fusion yield consistent gains over the state-of-the-art. Consequently, we establish a new graph-centric perspective in which CNN features are modeled as relational nodes and improved through contextual integration, yielding more expressive and robust classifications.

URL PDF HTML ☆

赞 0 踩 0

2606.20172 2026-06-19 cs.LG 新提交 90%

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

基于多模态胎儿MRI预测早产背景下的出生胎龄

Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

发表机构 * Leibniz University Hannover（莱布尼茨汉诺威大学）

专题命中医学影像：多模态胎儿MRI预测早产出生胎龄。

AI总结提出结合多模态胎儿MRI和机器学习流程预测出生胎龄，包括数据插补、特征选择和回归模型，在333例对照和93例早产数据上评估，R²=0.13，MAE=2.74周，准确率0.77。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:013

Journal ref Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

详情

DOI: 10.59275/j.melba.2026-f34b

AI中文摘要

早产与高死亡率和终身发病风险相关。复杂的多因素病因阻碍了准确预测和最佳护理。我们开发并评估了一个包含定制机器学习方法的流程，用于数据插补、特征选择和回归模型，以从333例对照和93例早产病例的综合多模态形态和功能胎儿MRI数据预测出生胎龄。将出生胎龄预测分为足月和早产类别，并报告其准确性、敏感性和特异性。进行了消融研究以进一步验证流程设计。使用分层10折交叉验证评估性能。该流程实现了0.13的R²分数和2.74周的平均绝对误差。在交叉验证中，准确率为0.77，敏感性为0.59，特异性为0.82。流程选择的主要特征包括宫颈长度和基于胎盘T2*值的统计量。快速、运动鲁棒的多模态胎儿MRI技术与机器学习预测的结合使得能够预测出生胎龄。这些信息对任何妊娠都至关重要。据我们所知，早产在文献中仅作为分类问题处理。因此，这项工作提供了概念验证。未来工作将增加队列规模，以允许在早产队列内进行更精细的分层。我们的代码可在以下网址获取：此https URL。

英文摘要

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

URL PDF HTML ☆

赞 0 踩 0

2606.20161 2026-06-19 cs.CV 新提交 90%

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

ARTEMIS: 基于智能体引导的可靠性感知时间掩码演化用于不完美监督的视频息肉分割

Tong Wang, Siwen Wang, Yaolei Qi, Jinxing Zhou, Yuting He, Guanyu Yang, Yutong Xie

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education（东南大学教育部新一代人工智能技术及其跨学科应用重点实验室）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）； School of Medicine, Case Western Reserve University（凯斯西储大学医学院）

专题命中医学影像：视频息肉分割，临床医学影像应用。

AI总结提出ARTEMIS框架，利用视觉语言智能体选择可靠时间锚点，结合SAM2传播和可靠性感知鲁棒学习，从不完美监督（点、涂鸦、少量密集标签）中学习高质量视频息肉分割掩码，在多个基准上达到最优性能。

详情

AI中文摘要

不完美监督的视频息肉分割（VPS）旨在从廉价监督中学习密集、时间一致的掩码，包括弱标注（点、涂鸦）和少量密集标注帧的半监督。该设置具有临床价值，但由于弱对比、模糊边界、运动模糊和镜面高光，加上稀疏的像素级指导，具有挑战性。虽然SAM2可以从稀疏输入生成密集掩码，但直接伪标签通常会产生几何退化的掩码，存在边界泄漏，未充分利用时间一致性，并忽略可靠性。为解决这些问题，我们提出ARTEMIS，一个由智能体引导的可靠性感知时间掩码演化驱动的统一框架，用于不完美监督的VPS。ARTEMIS从可用监督初始化粗掩码：SAM2转换点/涂鸦，而密集标签作为可靠锚点。一个辩论-判断视觉语言智能体在弱监督下选择可靠的时间锚点，这些锚点通过SAM2双向传播以细化不可靠或未标注的帧。最后，ARTEMIS使用时间可靠性感知鲁棒学习训练分割器，结合可靠性引导的参考选择、参考原型传输模块和可靠性感知鲁棒损失。这些组件评估掩码可靠性，随时间演化锚点，跨帧传输目标身份，并降低噪声监督的权重而非丢弃困难样本。在SUN-SEG和CVC-ClinicDB-612上的涂鸦、点和有限标签设置下的实验表明，ARTEMIS达到了最先进的性能。代码将在此https URL发布。

英文摘要

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense, temporally consistent masks from inexpensive supervision, including weak annotations (points, scribbles) and semi-supervision with few densely labeled frames. This setting is clinically valuable but challenging due to weak contrast, ambiguous boundaries, motion blur, and specular highlights, compounded by sparse pixel-level guidance. While SAM2 can generate dense masks from sparse inputs, direct pseudo-labeling often yields geometry-degraded masks with boundary leakage, underutilizes temporal consistency, and ignores reliability. To address these issues, we propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS initializes coarse masks from available supervision: SAM2 converts points/scribbles, while dense labels serve as reliable anchors. A debate-and-judge vision-language agent selects reliable temporal anchors under weak supervision, which are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. These components assess mask reliability, evolve anchors over time, transport target identity across frames, and down-weight noisy supervision instead of discarding difficult samples. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate that ARTEMIS achieves state-of-the-art performance. Code will be released at https://github.com/wangtong627/ARTEMIS.

URL PDF HTML ☆

赞 0 踩 0

2606.20143 2026-06-19 cs.CV 新提交 90%

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

头颈肿瘤 (HECKTOR) 2025 挑战赛：多模态 PET/CT 中的分割、诊断与预后基准

Numan Saeed, Salma Hassan, Shahad Hardan, Lishan Cai, Xinglong Liang, Moona Mazher, Abdul Qayyum, Yansong Bu, Mengye Lyu, Yue Lin, Mingyuan Meng, Chuanyi Huang, Lisheng Wang, Dalal Chamseddine, Shamimeh Ahrari, Beining Wu, Yifei Chen, Fuyou Mao, Hao Zhang, Baixiang Zhao, Surajit Ray, Muzi Guo, Lei Xiang, Jakob Dexl, Michael Ingrisch, Adrien Depeursinge, Arman Rahmim, Mathieu Hatt, Vincent Andrearczyk, Mohammad Yaqub

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）； Amsterdam UMC（阿姆斯特丹大学医学中心）； The Netherlands Cancer Institute（荷兰癌症研究所）； Radboud University Medical Centre（拉德堡德大学医学中心）； University College London（伦敦大学学院）； Imperial College London（帝国理工学院）； Shenzhen Technology University（深圳技术大学）； Shenzhen University（深圳大学）； Newland Digital Technology（新大陆数字技术）； The University of Sydney（悉尼大学）； Shanghai Jiao Tong University（上海交通大学）； University Hospital, Nantes（南特大学医院）； Nantes Université, Centrale Nantes, CNRS, LS2N（南特大学、南特中央理工学院、法国国家科学研究中心、LS2N实验室）； Hangzhou Dianzi University（杭州电子科技大学）； Tsinghua University（清华大学）； Central South University（中南大学）； University of Glasgow（格拉斯哥大学）； China Mobile System Integration Co., Ltd.（中移系统集成有限公司）； Subtle Medical Inc.（Subtle Medical公司）； University Hospital, LMU Munich（慕尼黑大学医院）； Munich Center for Machine Learning（慕尼黑机器学习中心）； BC Cancer Research Institute（不列颠哥伦比亚癌症研究所）； HES-SO Valais-Wallis University of Applied Sciences and Arts（HES-SO瓦莱州应用科学与艺术大学）； Lausanne University Hospital (CHUV)（洛桑大学医院）； LaTIM, INSERM, UMR 1101, Univ Brest（LaTIM实验室、法国国家健康与医学研究院、UMR 1101、布雷斯特大学）

专题命中医学影像：头颈肿瘤PET/CT分割、诊断与预后基准。

AI总结 HECKTOR 2025 挑战赛利用多模态 PET/CT 和电子健康记录，建立了头颈癌自动分析的基准，涵盖肿瘤分割、复发预测和 HPV 分类三个任务，最佳算法分别达到 Dice 0.75、C-index 0.66 和平衡准确率 0.56。

Comments 17 pages, 4 figures, 4 tables. Overview paper for the HECKTOR 2025 challenge, held as a satellite event at MICCAI 2025. Challenge website: https://hecktor.grand-challenge.org/

详情

AI中文摘要

头颈癌 (HNC) 构成显著的全球健康负担，准确的肿瘤勾画对于有效的放疗计划至关重要。口咽部解剖结构的复杂性，加上肿瘤在影像上的异质性表现，使得手动分割耗时且存在观察者间差异。除分割外，从非侵入性影像预测长期临床结局（如无复发生存期 RFS）和确定人乳头瘤病毒 (HPV) 状态，仍然是具有挑战性但临床价值高的目标。HECKTOR 2025 挑战赛通过使用多模态 PET/CT 影像和电子健康记录，建立了一个用于自动 HNC 分析的全面基准。基于前几届（2020-2022），本次挑战赛采用了扩展的多机构数据集，包含来自全球 10 个中心的 1100 多名患者。参与者需完成三个互补目标：(1) 分割原发肿瘤体积 (GTVp) 和转移淋巴结 (GTVn)，(2) 预测无复发生存期，(3) 分类 HPV 状态。挑战赛吸引了 35 个注册团队，其中 15 个最终提交在保留测试集上进行了评估。表现最佳的算法在分割上达到平均 Dice 相似系数 0.75，在生存预测上达到一致性指数 0.66，在 HPV 分类上达到平衡准确率 0.56。本文对所提交的方法进行了全面分析，评估了它们在不同病变特征上的性能，并讨论了它们在自动化肿瘤学工作流程和决策支持系统中临床转化的意义。

英文摘要

Head and neck cancers (HNC) represent a significant global health burden, with accurate tumor delineation being essential for effective radiotherapy planning. The complexity of the oropharyngeal anatomy, combined with the heterogeneous appearance of tumors on imaging, makes manual segmentation time-intensive and subject to inter-observer variability. Beyond segmentation, predicting long-term clinical outcomes, such as recurrence-free survival (RFS), and determining human papillomavirus (HPV) status from noninvasive imaging, remain challenging yet clinically valuable goals. The HECKTOR 2025 challenge addresses these needs by establishing a comprehensive benchmark for automated HNC analysis using multimodal PET/CT imaging and electronic health records. Building on previous editions (2020-2022), this challenge features an expanded multi-institutional dataset comprising over 1,100 patients from 10 centers worldwide. Participants were tasked with three complementary objectives: (1) segmenting primary gross tumor volumes (GTVp) and metastatic lymph nodes (GTVn), (2) predicting recurrence-free survival, and (3) classifying HPV status. The challenge attracted 35 registered teams, with 15 final submissions evaluated on a held-out test set. Top-performing algorithms achieved a mean Dice similarity coefficient of 0.75 for segmentation, a concordance index of 0.66 for survival prediction, and a balanced accuracy of 0.56 for HPV classification. This paper presents a comprehensive analysis of the submitted methodologies, evaluates their performance across different lesion characteristics, and discusses their implications for clinical translation in automated oncology workflows and decision support systems.

URL PDF HTML ☆

赞 0 踩 0

2606.20037 2026-06-19 cs.LG 新提交 90%

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

使用3D MRI和PET的多模态方法诊断阿尔茨海默病

Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis

发表机构 * DSS Lab, School of ECE, NTUA（NTUA ECE学院DSS实验室）

专题命中医学影像：用MRI和PET多模态诊断阿尔茨海默病。

AI总结提出结合3D卷积特征提取器与三种融合策略（拼接、门控多模态单元、门控自注意力）及稀疏门控混合专家分类器的多模态模型，用于阿尔茨海默病诊断，在三个二分类任务上验证了输入自适应建模的有效性。

Comments 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

详情

DOI: 10.1109/BIBM66473.2025.11357133

AI中文摘要

阿尔茨海默病（AD）是一种不可逆的神经退行性疾病，也是全球主要的死亡原因之一。早期诊断尤为重要，尤其是在轻度认知障碍（MCI）阶段，及时干预有助于延缓其向AD的进展。神经影像数据，如磁共振成像（MRI）和正电子发射断层扫描（PET），可以通过提供与疾病相关的结构和功能脑变化来帮助早期检测脑部变化。然而，许多多模态模型仍通过静态拼接融合MRI和PET，并对所有受试者应用相同的计算，这限制了其对患者/站点异质性的鲁棒性，并可能浪费计算资源。为解决这些局限性，我们首次研究了将3D卷积特征提取器与三种融合策略（拼接、门控多模态单元（GMU）和门控自注意力）以及一个稀疏门控混合专家（MoE）分类器相结合的方法，该分类器执行输入自适应路由，仅激活每个病例中最具信息量的专家。最后，我们利用Grad-CAM可视化疾病相关区域，确保模型的可解释性。实验在三个二分类任务（NC vs. MCI、MCI vs. AD和NC vs. AD）上进行。结果表明，GMU在NC vs. MCI和NC vs. AD上分别达到80.46%和95.47%的准确率，而门控自注意力在MCI vs. AD上达到82.08%。消融实验表明，移除MoE会持续降低所有任务的准确率。这些发现强调了利用MRI和PET互补性的输入自适应多模态建模在AD诊断中的价值。

英文摘要

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

URL PDF HTML ☆

赞 0 踩 0

2606.19651 2026-06-19 cs.AI cs.CV cs.LG 新提交 90%

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

BrainG3N：用于可控3D脑MRI生成的双用途分词器

Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

发表机构 * Department of Biomedical Data Science, Stanford University School of Medicine（斯坦福大学医学院生物医学数据科学系）； Department of Mathematical Modelling, Statistics & Bioinformatics, Ghent University（根特大学数学建模、统计与生物信息学系）； Department of Electrical Engineering, Stanford University（斯坦福大学电气工程系）

专题命中医学影像：基于3D掩码自编码器的脑MRI生成，支持条件生成

AI总结提出基于3D掩码自编码器的分词器，解耦编码器与解码器，在23项线性探测任务中21项超越SOTA，并支持条件生成和纵向预测。

详情

AI中文摘要

三维（3D）脑MRI是临床神经病学和神经肿瘤学的核心，生成模型可以增强代表性不足的队列、模拟疾病轨迹并支持隐私保护的数据共享。潜在扩散已成为建模成像数据的首选解决方案，但它对分词器提出了两个竞争性要求：编码器嵌入必须保留下游任务所需的临床信息，解码器必须重建解剖学上准确的体积。现有的重建驱动分词器以牺牲前者为代价实现了后者。为了解决这个问题，我们引入了一种基于全体积掩码自编码器（MAE）的分词器，用于3D脑MRI潜在扩散，解耦编码器和解码器：冻结的3D MAE编码器产生临床信息丰富的嵌入，而专用的CNN解码器从这些嵌入的线性投影重建体素。我们在来自18个公共队列的35,309个体积上预训练编码器，涵盖四种模态、十种疾病类别和200多个采集站点，并在两种设置中展示了其双重用途。首先，在23项线性探测基准测试中，编码器在21项任务上优于或匹配SOTA模型（即BrainIAC、BrainSegFounder和MedicalNet）。其次，在这些临床信息丰富的嵌入上训练的条件扩散变压器（DiT）支持跨六个变量的条件生成和患者特定的纵向预测。这些结果共同建立了一个单一的3D脑MRI嵌入空间，能够同时支持下游临床任务和可控生成。

英文摘要

Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.

URL PDF HTML ☆

赞 0 踩 0

2606.19371 2026-06-19 cs.LG cs.AI cs.CV 新提交 90%

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

ProMUSE: 渐进式多模态不确定性引导的分阶段证据阿尔茨海默病分类

Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao

发表机构 * Kennesaw State University（肯尼索州立大学）； Michigan Technological University（密歇根理工大学）； University of Iowa（爱荷华大学）

专题命中医学影像：多模态阿尔茨海默病分类，使用MRI和PET。

AI总结提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，通过自适应决定何时需要额外模态，在保持准确性的同时降低数据采集成本。

详情

AI中文摘要

阿尔茨海默病（AD）是一种致命性疾病，会破坏老年人的记忆和认知能力。大多数AD治疗在早期阶段有效，导致对早期AD诊断的需求日益增加。AD诊断越来越依赖多模态数据，如临床评估、结构磁共振成像（MRI）和正电子发射断层扫描（PET）成像。然而，MRI和PET采集仍然昂贵且不易普及，使得全模态推理在现实临床工作流程中不切实际。我们提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，该网络自适应地确定何时需要额外模态，有助于在保持准确性的同时降低数据采集的总体成本。ProMUSE首先使用低成本临床数据进行证据分类，并通过基于Dirichlet的主观逻辑模型量化不确定性。当不确定性超过学习阈值时，ProMUSE逐步引入MRI或PET特征，通过Dempster-Shafer理论融合模态层面的信念和不确定性，获得校准的多模态预测。这种分阶段采集策略能够在最小化对昂贵成像依赖的同时实现准确诊断。在ADNI、AIBL和OASIS数据集上针对CN-AD、CN-MCI和MCI-AD任务的实验表明，ProMUSE在减少50-90%的MRI/PET使用量的同时，实现了与全模态基线相当或更优的准确性，从而大幅节省成本。这些结果突显了ProMUSE作为现实世界AD筛查中一种实用、不确定性感知且资源高效的解决方案。

英文摘要

Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

URL PDF HTML ☆

赞 0 踩 0

2507.23027 2026-06-19 cs.CV cs.AI 90%

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

恢复诊断价值：超分辨率辅助的资源受限成像中的心电图分类

Krishan Agyakari Raja Babu, Om Prabhu, Annu, Mohanasankar Sivaprakasam

发表机构 * Indian Institute of Technology Madras（印度理工学院马德拉斯分校）； All India Institute of Medical Sciences（全印度医学科学研究所）； Indian Institute of Technology Hyderabad（印度理工学院海得拉巴分校）

专题命中医学影像：超分辨率增强超声心动图分类，属于医学影像。

AI总结本文研究了基于深度学习的超分辨率技术在低质量2D超声心动图分类中的应用，通过CAMUS数据集验证了SRGAN和SRResNet在提升分类准确率和计算效率方面的有效性。

Comments Accepted at the MICCAI Workshop on "Medical Image Computing in Resource Constrained Settings & Knowledge Interchange (MIRASOL)" 2025

详情

DOI: 10.1007/978-3-032-13654-1_8

AI中文摘要

在资源受限环境下，自动心脏解读常受限于低质量超声心动图图像，限制了后续诊断模型的效果。尽管超分辨率（SR）技术在增强磁共振成像（MRI）和计算机断层扫描（CT）扫描方面表现出潜力，但其在超声心动图-一种广泛但易受噪声影响的模态中的应用仍待探索。本文研究了基于深度学习的SR技术在低质量2D超声心动图分类中的潜力。使用公开的CAMUS数据集，我们按图像质量分层样本，并评估了两个临床相关的任务：相对简单的两腔 vs. 四腔（2CH vs. 4CH）视图分类和更复杂的终舒张期 vs. 终收缩期（ED vs. ES）相分类。我们应用了两种广泛使用的SR模型-Super-Resolution Generative Adversarial Network（SRGAN）和Super-Resolution Residual Network（SRResNet），以增强低质量图像并观察到性能指标上的显著提升，特别是SRResNet，它还提供了计算效率。我们的发现表明，SR可以有效恢复降质超声扫描的诊断价值，使其成为资源受限环境（RCS）中AI辅助护理的可行工具，实现以少胜多。

英文摘要

Automated cardiac interpretation in resource-constrained settings (RCS) is often hindered by poor-quality echocardiographic imaging, limiting the effectiveness of downstream diagnostic models. While super-resolution (SR) techniques have shown promise in enhancing magnetic resonance imaging (MRI) and computed tomography (CT) scans, their application to echocardiography-a widely accessible but noise-prone modality-remains underexplored. In this work, we investigate the potential of deep learning-based SR to improve classification accuracy on low-quality 2D echocardiograms. Using the publicly available CAMUS dataset, we stratify samples by image quality and evaluate two clinically relevant tasks of varying complexity: a relatively simple Two-Chamber vs. Four-Chamber (2CH vs. 4CH) view classification and a more complex End-Diastole vs. End-Systole (ED vs. ES) phase classification. We apply two widely used SR models-Super-Resolution Generative Adversarial Network (SRGAN) and Super-Resolution Residual Network (SRResNet), to enhance poor-quality images and observe significant gains in performance metric-particularly with SRResNet, which also offers computational efficiency. Our findings demonstrate that SR can effectively recover diagnostic value in degraded echo scans, making it a viable tool for AI-assisted care in RCS, achieving more with less.

URL PDF HTML ☆

赞 0 踩 0

2606.20449 2026-06-19 cs.CV 新提交 85%

InfantFace: Detecting infant faces in neonatal clinical environments

InfantFace：新生儿临床环境中的婴儿面部检测

Abdullah Bin-Obaid, Maria M. Cobo, Rebeccah Slater, Lionel Tarassenko, Mauricio Villarroel

发表机构 * Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford（牛津大学生物医学工程研究所、工程科学系）； Department of Paediatrics, University of Oxford（牛津大学儿科系）； Universidad San Francisco de Quito USFQ, Colegio de Ciencias Biologicas y Ambientales（奎托大学圣弗朗西斯科德奎托大学，生物科学与环境学院）

专题命中医学影像：应用于新生儿临床环境，辅助医疗评估

AI总结针对新生儿临床环境中的遮挡和光照问题，提出基于YOLOv11m的单阶段面部检测模型，在多个公开数据集预训练后，通过临床数据微调，AP50从0.87提升至0.96。

Comments 32 pages, 7 figures, 4 tables; supplementary information included

详情

AI中文摘要

新生儿面部的可靠定位是基于视频摄像头的非接触式评估的第一步，例如疼痛和痛苦相关的面部表情分析、疼痛评分、心肺信号提取和呼吸停止警报。然而，新生儿临床环境中仍存在重大挑战。杂乱的背景、光照变化和不良照明条件会降低面部检测模型的准确性。临床干预、监测设备以及在某些情况下的医疗设备可能会遮挡面部，使视觉评估变得困难。我们提出了一种基于YOLOv11m的单阶段模型，专门用于新生儿临床环境中的婴儿面部检测。我们结合了多个公开数据集（VGGFace2、CelebA、FDDB、WIDER FACE）来训练和评估我们提出的模型。然后，我们在一个新生儿研究数据集上对模型进行了微调，该数据集包含来自114个记录会话的228个视频，涉及113名独立婴儿。在微调之前，我们的模型达到了0.87的AP50，超过了三个最先进的通用面部检测器的性能。在临床领域适应后，性能进一步提高到0.96的AP50。由于缺乏公开的新生儿数据集，评估不同数据集上的面部检测性能仍然是一个挑战。优先创建此类数据集，同时在其创建和使用中维护适当的隐私保护措施和伦理标准，将极大地支持该领域的进一步进展。

英文摘要

Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major challenges persist in neonatal clinical environments. Cluttered backgrounds, illumination changes and poor lighting conditions can reduce the accuracy of face detection models. Clinical interventions, monitoring equipment and, in some cases, medical devices can obstruct the face, making visual assessment difficult. We propose a one-stage YOLOv11m-based model tailored for face detection of infants in neonatal clinical environments. We combined multiple publicly available datasets (VGGFace2, CelebA, FDDB, WIDER FACE) to train and evaluate our proposed model. We then fine-tuned our model on a neonatal research dataset involving 228 videos from 114 recording sessions of 113 independent infants. Before fine-tuning, our model achieved an AP50 of 0.87, surpassing the performance of three state-of-the-art general face detectors. Performance improved further to an AP50 of 0.96 after clinical-domain adaptation. Evaluating face detection performance across different datasets remains a challenge due to the lack of publicly available neonatal datasets. Prioritising the creation of such datasets, while upholding appropriate privacy safeguards and ethical standards in their creation and use, would greatly support further progress in this field.

URL PDF HTML ☆

赞 0 踩 0

2606.20303 2026-06-19 cs.CV 新提交 85%

GEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AI

GEN-Guard：纠正可部署联邦手术AI的泛化失败

Julia Alekseenko, Pietro Mascagni, AI4SafeChole Consortium, Nicolas Padoy

发表机构 * University of Strasbourg, CNRS, INSERM, ICube, UMR7357（斯特拉斯堡大学，法国国家科学研究中心，法国国家健康与医学研究院，ICube实验室，UMR7357）； Bioimage Analysis Center, Fondazione Policlinico Universitario Agostino Gemelli IRCCS（生物图像分析中心，阿戈斯蒂诺·杰梅利大学综合医院基金会IRCCS）； Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, University of Milan（米兰IRCCS卡格兰达基金会马焦雷综合医院，米兰大学）； Monaldi Hospital, AORN dei Colli（莫纳尔迪医院，AORN dei Colli）

专题命中医学影像：联邦手术AI泛化失败检测与纠正

AI总结提出GEN-Guard框架，通过客户端阻塞评估检测性能泄漏，并利用分歧感知蒸馏进行特征级校正，提升联邦手术AI的跨机构泛化能力。

Journal ref Int J Comput Assist Radiol Surg. 2026 Jun 14

详情

DOI: 10.1007/s11548-026-03713-0

AI中文摘要

联邦学习（FL）在手术视频AI中实现了协作模型训练，无需共享敏感数据。然而，标准评估实践——仅基于参与医院的验证数据选择“最佳”全局模型——可能导致次优的部署选择。我们将这种关键失败模式识别为性能泄漏，即所选模型过拟合内部联邦数据，无法泛化到未见机构。我们提出GEN-Guard，一个实用的后处理框架，用于检测和纠正联邦手术AI中的泛化失败。它集成了通过客户端阻塞评估（CBE）进行泛化检测，该方法在隔离的客户端分布上验证性能以防止性能泄漏，以及通过分歧感知蒸馏（DAD）进行泛化纠正，该方法学习自适应的特征级校正以实现跨机构鲁棒性。两个组件在标准FL收敛后运行，同时为零样本适应未见环境提供鲁棒支持。我们首先量化了性能泄漏的严重性，观察到在标准评估下模型选择失败（MSF）超过80%。GEN-Guard在两个多中心临床挑战上进行了评估：腹腔镜胆囊切除术中的手术阶段识别和结肠镜中的息肉分割。在两个数据集上，GEN-Guard一致地纠正了这些失败，将联邦内F1分数提高了最多2个点，未见机构性能提高了最多3个点，最差情况机构性能提高了3-9个点。性能泄漏是联邦手术AI中一个系统性且以前未被充分认识的风险。GEN-Guard为检测和纠正此类失败提供了实用解决方案。通过提高跨机构鲁棒性和零样本泛化，它增强了FL在真实世界手术部署中的可靠性。

英文摘要

Federated Learning (FL) in surgical video AI enables collaborative model training without sharing sensitive data. However, standard evaluation practices - selecting the "best" global model based only on validation data from participating hospitals - can lead to suboptimal deployment choices. We identify this critical failure mode as performance leakage, where the selected model overfits internal federation data and fails to generalize to unseen institutions. We propose GEN-Guard, a practical post-hoc framework to detect and correct generalization failures in federated surgical AI. It integrates Generalization Detection via Client-Blocked Evaluation (CBE), which validates performance on isolated client distributions to prevent performance leakage, and Generalization Correction through Disagreement-Aware Distillation (DAD), which learns adaptive feature-level corrections for cross-institutional robustness. Both components operate after standard FL convergence while providing robust support for zero-shot adaptation to unseen environments. We first quantify the severity of performance leakage, observing Model Selection Failures (MSFs) exceeding 80% under standard evaluation. GEN-Guard is evaluated on two multi-center clinical challenges: surgical phase recognition in laparoscopic cholecystectomy and polyp segmentation in colonoscopy. Across both datasets, GEN-Guard consistently corrects these failures, improving in-federation F1 scores by up to 2 points, unseen-institution performance by up to 3 points, and worst-case institutional performance by 3-9 points. Performance leakage represents a systematic and previously under-recognized risk in federated surgical AI. GEN-Guard provides a practical solution for detecting and correcting such failures. By improving cross-institutional robustness and zero-shot generalization, it strengthens the reliability of FL for real-world surgical deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.20115 2026-06-19 cs.LG cs.CV 新提交 85%

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

当校准失败于脆弱的医院：通过风险曲线收缩实现联邦共形风险控制

Nafis Fuad Shahid

发表机构 * institutetext: Dhaka, Bangladesh（达卡，孟加拉国）

专题命中医学影像：联邦共形风险控制用于脑肿瘤分割。

AI总结针对联邦部署中标准共形风险控制（CRC）对个体机构覆盖不足的问题，提出基于风险曲线收缩的联邦CRC协议，在真实脑肿瘤数据上实现2.7/20的违规率且预测集仅扩大2.0倍。

Comments 9 pages, 3 figures, 2 tables. Submitted to the DeCaF Workshop at MICCAI 2026

详情

AI中文摘要

共形风险控制（CRC）通过在保留数据上校准预测集阈值，提供分割质量的无分布保证。在联邦部署中，标准方法将各站点的校准分数合并为一个阈值。我们在真实多机构脑肿瘤数据（FeTS-2022，1251名受试者，20个机构）上首次量化表明，这种朴素的合并CRC保护了平均医院，但违反了40%个体机构的覆盖，最差站点的假阴性率超出目标7.8个百分点。朴素的替代方案——每个站点本地CRC——基本恢复了覆盖，但将预测集扩大了83倍，使其在临床上无用。我们提出一种基于收缩的联邦CRC协议：每个站点仅将其经验风险曲线（G个标量）传输到服务器，服务器为每个站点计算收缩正则化阈值。单个超参数n0平滑地权衡最坏情况覆盖与预测集效率；留一站点敏感性分析确定n0=19，在2.0倍拉伸下实现2.7/20的违规。我们进一步表明，覆盖预算的直接拉格朗日优化失败，将风险集中在脆弱的医院，并且有限样本修正项是必不可少的：移除它会使违规增加三倍。在所述站点混合假设下，边际CRC保证通过构造得以保留；在三个种子下针对四个目标验证了每个站点的覆盖。没有患者级别的图像、掩膜或每体积分数离开任何站点。

英文摘要

Conformal risk control (CRC) provides distribution-free guarantees on segmentation quality by calibrating a prediction-set threshold on held-out data. In federated deployments, the standard approach pools calibration scores across sites into a single threshold. We provide the first quantification, on real multi-institutional brain tumor data (FeTS-2022, 1,251 subjects, 20 institutions), showing that this naive pooled CRC protects the average hospital but violates coverage at 40% of individual institutions, with the worst site exceeding the target false-negative rate by 7.8 percentage points. The naive alternative, per-site local CRC, largely restores coverage but inflates prediction sets by 83x, rendering them clinically useless. We propose a shrinkage-based federated CRC protocol: each site transmits only its empirical risk curve (G scalars) to a server, which computes a shrinkage-regularized threshold per site. A single hyperparameter n0 smoothly trades worst-case coverage for prediction-set efficiency; leave-one-site-out sensitivity analysis identifies n0=19, achieving 2.7/20 violations at 2.0x stretch. We further show that direct Lagrangian optimization of coverage budgets fails, concentrating risk on vulnerable hospitals, and that the finite-sample correction term is essential: removing it triples violations. The marginal CRC guarantee is preserved by construction under the stated site-mixture assumption; per-site coverage is validated across four targets with three seeds. No patient-level images, masks, or per-volume scores leave any site.

URL PDF HTML ☆

赞 0 踩 0

2606.20035 2026-06-19 cs.CV cs.LG 新提交 85%

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

PU-UNet：用于医学图像分割的稳定乘法交互

Ziyuan Li, Osamah Sufyan, Uwe Jaekel, Babette Dellen

发表机构 * Department of Mathematics, Informatics and Technology, University of Applied Sciences Koblenz（科布伦茨应用科学大学数学、信息学与技术系）； Technical University of Munich（慕尼黑工业大学）

专题命中医学影像：提出PU-UNet用于医学图像分割。

AI总结提出PU-UNet，通过稳定乘积单元残差块在低分辨率阶段实现显式乘法特征交互，在三个医学图像分割数据集上提升Dice和IoU，降低假阳性率。

Comments Accepted to the ICANN 2026

详情

AI中文摘要

许多密集预测网络依赖于加性特征变换，并且仅隐式地建模高阶特征交互。乘积单元为乘法特征建模提供了显式机制，但其对数-指数公式可能导致数值不稳定性，这限制了它们在深度密集预测网络中的使用。在这项工作中，我们提出了乘积单元U-Net（PU-UNet），这是一种残差U-Net，它将稳定的乘积单元残差块集成到丰富的低分辨率阶段，用于医学图像分割。所提出的公式结合了平滑正性映射和对数域裁剪，实现了稳定的乘法特征学习，且计算开销可忽略不计。在ISIC 2018、Kvasir-SEG和BUSI上，PU-UNet分别达到了0.942、0.959和高达0.925的Dice分数。与匹配的残差U-Net基线相比，PU-UNet在保持参数、FLOPs和推理延迟几乎不变的情况下，持续提高了Dice和IoU，并将正常BUSI病例的图像级假阳性率从0.077降至零。消融研究表明，这些增益与乘积单元交互相关，在低分辨率放置下最强，并受益于所提出的稳定化设计。这些结果表明，稳定的乘积单元残差学习可以成为通过显式乘法交互增强U-Net风格分割网络的有效方式。

英文摘要

Many dense prediction networks rely on additive feature transformations and model higher-order feature interactions only implicitly. Product units provide an explicit mechanism for multiplicative feature modeling, but their logarithmic--exponential formulation can cause numerical instability, which has limited their use in deep dense prediction networks. In this work, we propose Product-Unit U-Net (PU-UNet), a residual U-Net that integrates stable product-unit residual blocks into rich low-resolution stages for medical image segmentation. The proposed formulation combines smooth positivity mapping with log-domain clipping, enabling stable multiplicative feature learning with negligible computational overhead. On ISIC 2018, Kvasir-SEG, and BUSI, PU-UNet achieves Dice scores of 0.942, 0.959, and up to 0.925, respectively. Compared with a matched Residual U-Net baseline, PU-UNet consistently improves Dice and IoU while keeping parameters, FLOPs, and inference latency nearly unchanged, and reduces the image-level false-positive rate on normal BUSI cases from 0.077 to zero. Ablation studies suggest that the gains are associated with product-unit interactions, are strongest under low-resolution placement, and benefit from the proposed stabilization design. These results suggest that stable product-unit residual learning can be an effective way to enhance U-Net-style segmentation networks with explicit multiplicative interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.20027 2026-06-19 cs.CV 新提交 85%

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

QG-MIL：一种用于医学影像中领域无关多实例学习的门控Transformer聚合器

Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo

发表机构 * Department of Mathematics and Computer Science, University of Cagliari（卡利亚里大学数学与计算机科学系）； Institute of AI for Health, Helmholtz Munich（亥姆霍兹慕尼黑人工智能健康研究所）

专题命中医学影像：提出多实例学习聚合器用于医学影像分析。

AI总结提出QG-MIL门控Transformer聚合器，通过RMSNorm预归一化、逐头QK归一化、细粒度注意力输出门控和SwiGLU前馈模块，解决注意力集中问题，在六个基准上平均提升+6.1个宏F1分数。

详情

AI中文摘要

医学影像中基于注意力的多实例学习聚合器容易出现注意力集中，导致预测过于自信且不稳定。我们引入QG-MIL，一种门控Transformer聚合器，通过四个协同架构组件解决这一问题：基于RMSNorm的预归一化、逐头QK归一化、细粒度注意力输出门控和SwiGLU风格的前馈模块。这些设计选择共同稳定了训练，并将注意力更均匀地分布在实例上，无需辅助损失、掩码或多阶段正则化。我们在涵盖全切片病理学和细胞级血液学的六个基准上评估了QG-MIL，覆盖两种根本不同的MIL尺度。性能最佳的QG-MIL变体在所有六个基准上均优于领先的基线，平均提升+6.1个宏F1分数。注意力覆盖图和注意力质量分析证实了更分布的实例权重。消融研究表明，虽然单个组件在特定数据集上可以匹配完整模型，但与所选基线相比，QG-MIL设计提供了最一致的跨域性能和最紧凑的方差。我们发布了一个可配置的实现以支持可重复性，网址为：this https URL

英文摘要

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL

URL PDF HTML ☆

赞 0 踩 0

2606.19372 2026-06-19 eess.IV cs.CV cs.LG 新提交 90%

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

全自诊断(FSD): 通过逆问题和算子学习从智能手机视频进行基于物理的可视生物标志物推断

Jonathan Thomas, Harsh Thaker

发表机构 * Algomash® (Algorithmic Mashup Inc.)（算法混搭公司）

专题命中健康监测：从手机视频推断生理状态，血糖监测

AI总结提出全自诊断(FSD)框架，结合物理前向模型、信息论可观测性、正则化逆问题、算子学习和随机变分推断，从9秒面部视频恢复生理状态，在59名受试者38812次扫描中验证，血糖MARD达29.86%。

Comments 38,812 paired scans, preliminary longitudinal validation of multichannel visual glucose inference (MARD 17 to 46 percent across cohorts); physics plus information theory plus operator learning framework

详情

AI中文摘要

我们提出全自诊断(FSD)，一个统一的数学框架，用于从消费级智能手机拍摄的无约束9秒面部视频中恢复潜在生理状态。该方法整合了五个相互增强的组件：(1)基于辐射传输方程和发色团吸收的物理前向模型，将相机观测映射到生物标志物浓度；(2)信息论可观测性理论，证明多通道视觉信号（光谱、脉搏、呼吸、微表情和眼动）与生理状态包含严格递增的互信息；(3)具有域均匀可辨识性保证的稳定Tikhonov正则化逆问题；(4)算子学习公式，实现跨设备、分辨率和人群的泛化；(5)可解释为随机变分推断的监督学习过程，从配对生物传感器真实值持续优化模型，性能随配对观测数量的平方根倒数比例提升。在59名受试者的38812次真实世界配对扫描上的实证验证展示了实际性能。第一作者自采数据（血糖范围35-550 mg/dL）的MARD为29.86%，97.57%的预测落在Clarke误差网格A+B区，仅0.27%在危险E区。一位管理良好的糖尿病参与者在较窄的70-180 mg/dL范围内达到MARD 17%。这些结果证实，消费级面部视频编码了足够的结构化信息，可在完全无约束条件下进行临床相关的非侵入性生物标志物推断，且性能随更多配对数据的可用性可预测地提升。

英文摘要

We present Full-Self Diagnostics (FSD), a unified mathematical framework for recovering latent physiological states from unconstrained 9-second facial videos captured by consumer smartphones. The approach integrates five mutually reinforcing components: (1) a physics-based forward model derived from the radiative transfer equation and chromophore absorption that maps camera observables to biomarker concentrations; (2) an information-theoretic observability theory proving that multi-channel visual signals (spectral, pulse, respiratory, micro-expression, and oculomotor) contain strictly increasing mutual information with physiological state; (3) a stable, Tikhonov-regularized inverse problem with domain-uniform identifiability guarantees; (4) an operator-learning formulation that enables generalization across devices, resolutions, and populations; and (5) a supervised learning procedure, interpretable as stochastic variational inference, that continuously refines the model from paired biosensor ground truth with performance improving proportionally to one over the square root of the number of paired observations. Empirical validation on 38812 real-world paired scans across 59 subjects demonstrates practical performance. Self-collected data from the lead author (glucose range 35-550 mg/dL) yields MARD of 29.86 percent with 97.57 percent of predictions in Clarke Error Grid Zones A+B and only 0.27 percent in the dangerous Zone E. A well-managed diabetic participant achieves MARD of 17 percent in the narrower 70-180 mg/dL band. These results confirm that consumer-grade facial video encodes sufficient structured information for clinically relevant, non-invasive biomarker inference under fully unconstrained conditions, with performance scaling predictably as more paired data becomes available.

URL PDF HTML ☆

赞 0 踩 0

2606.19481 2026-06-19 cs.LG 新提交 90%

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL：面向离线强化学习的重症监护室实时胰岛素管理

Thomas Frost, Steve Harris

发表机构 * Institute of Health Informatics（健康信息学研究所）； University College London（伦敦大学学院）

专题命中健康监测：重症监护室胰岛素管理数据集，用于离线强化学习。

AI总结针对电子健康记录离散化导致模型泛化性差的问题，提出基于真实临床轨迹的离线强化学习数据集Insulin4RL，包含375,000+决策和12,209名患者，用于评估模型在真实采样假设下的性能。

Comments Under submission

详情

AI中文摘要

离线强化学习（ORL）有潜力利用历史电子健康记录（EHR）数据提高临床决策质量。当前该领域的训练和评估实践严重依赖于按固定规则时间间隔离散化的EHR数据集。离散化创建了复杂临床场景的虚构表示，并损害了回顾性模型评估的泛化性。在本文中，我们介绍Insulin4RL，一个医疗ORL数据集，其特点是来自真实临床轨迹的自然不规则输入和动作。该数据集源自MIMIC-IV，包含超过375,000个标记决策，涉及12,209名需要在重症监护室进行胰岛素输注滴定的患者。因此，该数据集可用于研究ORL模型在现实临床采样假设下的性能。我们提供了数据集结构和特征的描述、使用无模型离线强化学习的基线性能指标，以及使用拟合Q评估的标准化评估协议。最后，我们提出了未来研究可以利用该资源解决的领域。

英文摘要

Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily on EHR datasets that have been temporally discretised into fixed, regular time intervals. Discretisation creates fictional representations of complex clinical scenarios and compromises the generalisability of retrospective model evaluations. In this paper, we introduce Insulin4RL, a healthcare ORL dataset featuring naturally irregular inputs and actions from real clinical trajectories. Derived from MIMIC-IV, Insulin4RL comprises over 375,000 labelled decisions across 12,209 patients requiring insulin infusion titration in the Intensive Care Unit. The dataset can thus be used for research into ORL model performance under realistic clinical sampling assumptions. We provide a description of the dataset's structure and characteristics, baseline performance metrics using model-free offline reinforcement learning, and a standardised evaluation protocol using fitted Q-evaluation. We conclude with suggested areas for future research that could be addressed using this resource.

URL PDF HTML ☆

赞 0 踩 0

2606.20250 2026-06-19 cs.CV 新提交 90%

Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation

单阶段层次化校正用于弱监督组织病理学分割

Duc T. Nguyen, Hoang-Long Nguyen, Thanh-Ha DO, Huy-Hieu Pham

发表机构 * VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam（越南河内VinUniversity VinUni-Illinois智慧健康中心）； The Computer Vision and Medical AI Lab, VinUniversity, Hanoi, Vietnam（越南河内VinUniversity计算机视觉与医学人工智能实验室）； Posts and Telecommunications Institute of Technology, Hanoi, Vietnam（越南河内邮电技术学院）

专题命中病理影像：弱监督组织病理学分割

AI总结提出单阶段层次化校正框架，通过层次化特征校正模块在单次训练中直接生成高保真激活图，解决多阶段弱监督分割中的误差传播和计算开销问题。

Comments Accepted to MICCAI 2026. This is the pre-review submitted version, not the camera-ready version. The final authenticated version will be available in the MICCAI 2026 proceedings

详情

AI中文摘要

现有的计算病理学中的弱监督语义分割方法依赖于多阶段范式：类激活图生成、离线伪掩码细化和全监督再训练。虽然这种解耦方法已被广泛采用，但它存在根本性缺陷。多阶段过程不仅导致高计算训练成本，还遭受误差传播：浅层CNN中的局部纹理偏差产生假阳性伪影，后续细化步骤往往无法纠正。为了通过简单而高效的方法解决这些持续存在的挑战，我们提出了单阶段层次化校正（SSHR）框架。我们的方法不是事后被动地细化CAM，而是在前向传播过程中主动净化中间特征表示。我们引入了一个层次化特征校正模块（HFRM），利用深层全局语义上下文过滤浅层中的局部异常。该机制在单个训练循环内直接生成高保真激活图。在LUAD-HistoSeg和BCSS数据集上的实验表明，SSHR优于最先进的多阶段方法。此外，SSHR将训练时间减少了2到5倍。这种效率降低了计算开销，并加速了大规模组织病理学工作流的临床转化。代码可在以下网址获取：this https URL

英文摘要

Existing weakly supervised semantic segmentation (WSSS) methods in computational pathology rely on a multi-stage paradigm: class activation map (CAM) generation, offline pseudo-mask refinement, and fully supervised retraining. While established, this decoupled approach presents fundamental limitations. The multi-stage process not only incurs high computational training costs but also suffers from error propagation: local texture biases in shallow CNN layers generate false-positive artifacts that subsequent refinement steps often fail to correct. To address these persistent challenges through a simple yet highly effective approach, we propose the Single-Stage Hierarchical Rectification (SSHR) framework. Rather than passively refining CAMs post-hoc, our method proactively purifies intermediate feature representations during the forward pass. We introduce a Hierarchical Feature Rectification Module (HFRM) that utilizes deep global semantic context to filter out local anomalies in shallow layers. This mechanism generates high-fidelity activation maps directly within a single training loop. Experiments on the LUAD-HistoSeg and BCSS datasets demonstrate that SSHR outperforms state-of-the-art multi-stage methods. Furthermore, SSHR reduces training duration by 2 to 5 times. This efficiency minimizes computational overhead and accelerates clinical translation for large-scale histopathology workflows. The code is available at: https://github.com/trongduc-nguyen/SSHR

URL PDF HTML ☆

赞 0 踩 0

2606.19966 2026-06-19 cs.CV cs.LG 新提交 90%

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

语义锚定证据融合用于域鲁棒的全切片生存分析

Yucheng Xing, Ling Huang, Pei Liu, Jingying Ma, Jiaqing Xu, Kai He, Mengling Feng

发表机构 * National University of Singapore（新加坡国立大学）； Imperial College London（帝国理工学院）； Hunan University（湖南大学）

专题命中病理影像：提出SAEFS框架用于全切片生存分析

AI总结提出SAEFS框架，通过视觉问答提取语义锚点，结合双流证据提取和狄利克雷主观逻辑建模不确定性，实现跨域零样本生存分析，平均C-index提升10.2%。

详情

AI中文摘要

全切片图像（WSIs）广泛用于计算癌症预后。然而，现有方法主要关注域内性能，难以泛化到不同临床中心。这一局限性源于它们依赖像素级表示，极易受到染色协议和扫描硬件导致的域特定伪影影响。我们假设高级病理语义（如肿瘤分级和微环境结构）提供了域不变的语义表示，反映了人类病理学家的鲁棒诊断逻辑。因此，我们提出了语义锚定证据融合生存（SAEFS）框架，其中SAEFS通过视觉问答（VQA）从WSIs中推导语义锚点，采用双流WSI证据提取架构，使用基于狄利克雷的主观逻辑建模不确定性，并通过谨慎合取规则融合语义和视觉证据，以避免来自相关源的过度自信融合。仅在单一源域上训练并在四个未见域上进行零样本评估，SAEFS在预测准确性和可靠性上均一致优于最先进模型，平均C-index提升10.2%。定量分析进一步表明，VQA导出的语义特征比像素级特征表现出显著更低的跨中心差异，突显了其在跨中心临床应用中的鲁棒性。

英文摘要

Whole-slide images (WSIs) are widely used for computational cancer prognosis. However, most existing methods primarily focus on in-domain performance and fail to generalize across clinical centers. This limitation stems from their reliance on pixel-derived representations that are highly susceptible to domain-specific artifacts caused by staining protocols and scanner hardware. We hypothesize that high-level pathology semantics, such as tumor grade and micro-environmental architecture, provide a domain-invariant semantic representation that mirrors the robust diagnostic logic of human pathologists. Therefore, we propose a Semantic-Anchored Evidential Fusion Survival (SAEFS) framework, where SAEFS derives semantic anchors from WSIs via Visual Question Answering (VQA), employs a dual-stream WSI evidence extraction architecture, uses Dirichlet-based Subjective Logic to model uncertainty, and fuses semantic and visual evidence through a cautious conjunction rule to avoid overconfident fusion from correlated sources. Trained exclusively on one source domain and evaluated zero-shot across four unseen domains, SAEFS consistently outperforms state-of-the-art models both in prediction accuracy and reliability, improving the average C-index by 10.2%. Quantitative analyses further show that VQA-derived semantic features exhibit significantly lower cross-center divergence than pixel-derived features, highlighting their robustness for cross-center clinical applications.

URL PDF HTML ☆

赞 0 踩 0

2606.20164 2026-06-19 cs.CL cs.AI cs.LG q-bio.QM 新提交 90%

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

MedRLM：用于长上下文临床推理、传感器引导筛查、证据支持决策及社区到三级转诊优化的递归多模态健康智能

Aueaphum Aueawatthanaphisut

发表机构 * School of Information, Computer ； Communication Technology Sirindhorn International Institute of Technology, Thammasat University Pathum Thani, Thailand 1

专题命中临床大模型：MedRLM递归多模态框架用于临床推理和决策。

AI总结提出MedRLM递归多模态健康智能框架，通过递归检查、分解、检索、验证和合成患者信息，协调多个专业代理并引入临床证据图记忆，实现长上下文临床推理和传感器引导筛查。

Comments 9 pages, 3 figures, 3 tables, 1 Algorithm, 29 equations

详情

AI中文摘要

现实世界的临床决策支持需要对异质性和纵向的患者信息进行推理，而不是回答孤立的医学问题。然而，当前的医学大语言模型和检索增强生成系统通常依赖单步提示或检索，当临床证据分布在长电子健康记录、医学图像、传感器流、指南和转诊约束中时，这可能变得脆弱。本文提出MedRLM，一个用于长上下文临床推理、传感器引导筛查和社区到三级转诊支持的递归多模态健康智能框架。MedRLM不是将所有患者信息压缩到一个提示中，而是将患者病例视为一个外部临床环境，可以递归地检查、分解、检索、验证和综合。该框架协调了专门用于临床文本、纵向EHR、医学影像、生理传感器信号、指南检索、不确定性审计和转诊规划的代理。它进一步引入了临床证据图记忆，将患者特定的观察结果与检索到的证据、标准化定义、传感器衍生的生物标志物和转诊标准连接起来。传感器引导的递归触发机制在检测到异常生理或行为模式时激活更深层次的推理，而不确定性门控细化支持临床医生对高风险或低置信度病例的审查。我们还概述了一个使用公共和经认证的临床数据集（涵盖EHR、放射学、ECG、ICU时间序列和转诊代理结果）的真实数据评估设计。MedRLM旨在将医学AI从静态问答转向可审计、多模态和流程感知的临床决策支持。

英文摘要

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.19950 2026-06-19 cs.CV cs.AI 新提交 85%

Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA

多模态大语言模型的置信度校准：基于医学视觉问答的实证研究

Yuetian Du, Yucheng Wang, Ming Kong, Tian Liang, Qiang Long, Bingdi Chen, Qiang Zhu

发表机构 * College of Computer Science and Technology, Zhejiang University（浙江大学计算机科学与技术学院）； School of Computer Science and Technology, Xidian University（西安电子科技大学计算机科学与技术学院）； Zhihui Medical Technology (Shanghai) Co., Ltd.（智汇医疗科技（上海）有限公司）

专题命中临床大模型：研究MLLM在医学VQA中的置信度校准

AI总结针对多模态大语言模型在医学任务中置信度与准确性不匹配的问题，提出结合多策略融合询问与专家大语言模型评估的方法，在三个医学VQA数据集上将期望校准误差平均降低40%，提升了模型可靠性。

Comments Accepted by MICCAI 2025

2606.19373 2026-06-19 cs.LG cs.AI 新提交 90%

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

cAPM：具有主动学习的持续AI辅助起搏标测

Dylan O'Hara, Pradeep Bajracharya, Casey Meisenzahl, Karli Gillette, Anton J. Prassl, Gernot Plank, Saman Nazarian, Roderick Tung, John L Sapp, Linwei Wang

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）； University of Utah（犹他大学）； Scientific Computing and Imaging Institute, University of Utah（犹他大学科学计算与成像研究所）； Medical University of Graz（格拉茨医科大学）； University of Pennsylvania Perelman School of Medicine（宾夕法尼亚大学佩雷尔曼医学院）； The University of Arizona College of Medicine（亚利桑那大学医学院）； Dalhousie University（达尔豪斯大学）

专题命中诊断辅助：AI辅助起搏标测，用于室性心动过速治疗。

AI总结提出cAPM框架，通过任务无关的代理神经网络、主动学习和持续学习策略，在减少起搏标测数据量的同时，实现跨室性心动过速的知识迁移，将定位精度提升至81%。

详情

AI中文摘要

室性心动过速是一种危及生命的心律失常，是心源性猝死的主要原因。起搏标测是一种临床程序，用于在导管消融室性心动过速期间识别干预靶点。它要求临床医生在心室的不同部位起搏，并快速解释由此产生的心电图，以确定下一步起搏位置或是否已识别出靶点。已提出主动学习AI模型来指导临床医生选择下一个起搏点，显示出在减少起搏点数量和改善起搏标测效率方面的潜力。现有方法需要对每个靶点重新训练，无法在同一患者或不同患者的多个室性心动过速之间迁移知识。我们引入cAPM用于持续AI辅助起搏标测，以捕获和迁移从过去起搏标测数据中积累的知识，从而减少未来靶点室性心动过速所需的起搏标测数据量。这是通过一个任务无关的代理神经网络实现的，该网络学习从起搏点到12导联心电图形态的映射；一种主动学习策略，通过为每个靶点选择信息量最大的起搏点来优化该代理模型；以及一种持续学习策略，以顺序方式执行此操作，同时保留先前靶点的知识。在由不同生理条件和心室几何形状下顺序呈现的定位任务组成的计算机模拟测试平台上评估，cAPM（无论是否重放过去数据样本）在使用4.5个起搏标测点时，在临床耐受范围内（5毫米精度）定位的概率达到81%，而最先进的主动学习方法使用13.7个起搏点达到38%的概率。这些结果为cAPM准备用于体内临床前和临床研究提供了坚实基础，在这些研究中，cAPM可用于指导起搏标测。

英文摘要

Ventricular tachycardia is a life-threatening rhythm disorder and a major cause of sudden cardiac death. Pace-mapping is a clinical procedure for identifying the intervention target during catheter ablation of VT. It requires clinicians to pace different sites in the ventricles and rapidly interpret the resulting electrocardiograms to determine where to pace next or whether a target site has been identified. Active learning AI models have been proposed to guide clinicians to the next pacing site, showing promise in reducing the number of pacing sites and improving the efficiency of pace-mapping. Existing methods require retraining each target without the ability to transfer knowledge across multiple VTs within the same patient or across patients. We introduce cAPM for continuous AI-assisted pace-mapping to capture and transfer knowledge accumulated from past pace-mapping data to reduce the number of pace-mapping data needed for future target VTs. This is made possible by a task-agnostic surrogate neural network that learns the mapping from pacing sites to 12-lead ECG morphology, an active-learning strategy that refines this surrogate model by selecting the most informative pacing site for each target, and a continual learning strategy to do so sequentially while retaining knowledge from prior targets. Evaluated on an in-silico testbed consisting of sequentially-presented localization tasks across different physiological conditions and ventricular geometries, cAPM with and without replay of past data samples achieved an 81% probability of localizing within clinical tolerance (5 mm accuracy) using 4.5 pace-mapping sites, compared to the state-of-the-art active-learning method achieving 38% probability using 13.7 pacing sites. These results provide a strong basis for preparing cAPM towards in-vivo preclinical and clinical studies where it can be used to guide pace-mapping.

URL PDF HTML ☆

赞 0 踩 0

2606.20174 2026-06-19 cs.LG 新提交 85%

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

基于无细胞DNA分析的多癌早期检测的计算方法与挑战

Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki

发表机构 * AGH University of Krakow（AGH克拉科夫大学）； Norwegian Institute of Public Health（挪威公共卫生研究所）

专题命中诊断辅助：cfDNA多癌早期检测计算方法综述。

AI总结综述2022-2025年cfDNA多癌早期检测的计算方法，重点分析片段组学和表观遗传特征提取技术，指出多模态集成方法最具临床整合潜力，但需标准化评估协议。

详情

AI中文摘要

无细胞DNA（cfDNA）是非侵入性多癌早期检测（MCED）的一个有前景的途径，因为它可以通过单次抽血同时检测多种癌症，尤其对目前缺乏既定筛查程序的癌症具有敏感性。本文综述了2022年至2025年间基于cfDNA的MCED计算方法。我们重点关注如何提取和分析片段组学和表观遗传特征以在早期阶段检测癌症。我们首先简要概述cfDNA信号的生物学基础，然后回顾经典的统计和机器学习方法以及深度学习框架，包括基于自编码器的模型。对于每种方法，我们讨论其生物学可解释性、验证策略以及临床整合的准备情况。此外，我们将当前挑战分为技术、计算和方法论三类，并概述该领域的开放问题。本综述表明，多模态集成方法在临床整合方面具有最强的前景和最高的准备度。然而，为了更好地评估未来工作和进行并排比较，标准化评估协议和报告结果至关重要。

英文摘要

Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

URL PDF HTML ☆

赞 0 踩 0

1. 医学影像 22 篇

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

CSWinUNETR: Segmentation of Thin Anatomical Structures in Medical Images

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

InfantFace: Detecting infant faces in neonatal clinical environments

GEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AI

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

2. 健康监测 2 篇

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

3. 病理影像 2 篇

Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

4. 临床大模型 2 篇

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA

5. 诊断辅助 2 篇

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection