arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

多模态信息融合

面向图像、视频、多传感器和跨模态感知的信息融合,包括 Image Fusion、红外可见光、遥感、医学影像、LiDAR/雷达/相机和音视频融合。

2026-06-19 至 2026-06-19 收录 10 信号源:cs.CV, eess.IV, eess.SP, cs.RO, cs.MM
2606.20143 2026-06-19 cs.CV 新提交 90%

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

头颈肿瘤 (HECKTOR) 2025 挑战赛:多模态 PET/CT 中的分割、诊断与预后基准

Numan Saeed, Salma Hassan, Shahad Hardan, Lishan Cai, Xinglong Liang, Moona Mazher, Abdul Qayyum, Yansong Bu, Mengye Lyu, Yue Lin, Mingyuan Meng, Chuanyi Huang, Lisheng Wang, Dalal Chamseddine, Shamimeh Ahrari, Beining Wu, Yifei Chen, Fuyou Mao, Hao Zhang, Baixiang Zhao, Surajit Ray, Muzi Guo, Lei Xiang, Jakob Dexl, Michael Ingrisch, Adrien Depeursinge, Arman Rahmim, Mathieu Hatt, Vincent Andrearczyk, Mohammad Yaqub

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)(穆罕默德·本·扎耶德人工智能大学) Amsterdam UMC(阿姆斯特丹大学医学中心) The Netherlands Cancer Institute(荷兰癌症研究所) Radboud University Medical Centre(拉德堡德大学医学中心) University College London(伦敦大学学院) Imperial College London(帝国理工学院) Shenzhen Technology University(深圳技术大学) Shenzhen University(深圳大学) Newland Digital Technology(新大陆数字技术) The University of Sydney(悉尼大学) Shanghai Jiao Tong University(上海交通大学) University Hospital, Nantes(南特大学医院) Nantes Université, Centrale Nantes, CNRS, LS2N(南特大学、南特中央理工学院、法国国家科学研究中心、LS2N实验室) Hangzhou Dianzi University(杭州电子科技大学) Tsinghua University(清华大学) Central South University(中南大学) University of Glasgow(格拉斯哥大学) China Mobile System Integration Co., Ltd.(中移系统集成有限公司) Subtle Medical Inc.(Subtle Medical公司) University Hospital, LMU Munich(慕尼黑大学医院) Munich Center for Machine Learning(慕尼黑机器学习中心) BC Cancer Research Institute(不列颠哥伦比亚癌症研究所) HES-SO Valais-Wallis University of Applied Sciences and Arts(HES-SO瓦莱州应用科学与艺术大学) Lausanne University Hospital (CHUV)(洛桑大学医院) LaTIM, INSERM, UMR 1101, Univ Brest(LaTIM实验室、法国国家健康与医学研究院、UMR 1101、布雷斯特大学)

专题命中 医学影像融合 :多模态PET/CT影像用于头颈癌分割、诊断与预后

AI总结 HECKTOR 2025 挑战赛利用多模态 PET/CT 和电子健康记录,建立了头颈癌自动分析的基准,涵盖肿瘤分割、复发预测和 HPV 分类三个任务,最佳算法分别达到 Dice 0.75、C-index 0.66 和平衡准确率 0.56。

Comments 17 pages, 4 figures, 4 tables. Overview paper for the HECKTOR 2025 challenge, held as a satellite event at MICCAI 2025. Challenge website: https://hecktor.grand-challenge.org/

详情
AI中文摘要

头颈癌 (HNC) 构成显著的全球健康负担,准确的肿瘤勾画对于有效的放疗计划至关重要。口咽部解剖结构的复杂性,加上肿瘤在影像上的异质性表现,使得手动分割耗时且存在观察者间差异。除分割外,从非侵入性影像预测长期临床结局(如无复发生存期 RFS)和确定人乳头瘤病毒 (HPV) 状态,仍然是具有挑战性但临床价值高的目标。HECKTOR 2025 挑战赛通过使用多模态 PET/CT 影像和电子健康记录,建立了一个用于自动 HNC 分析的全面基准。基于前几届(2020-2022),本次挑战赛采用了扩展的多机构数据集,包含来自全球 10 个中心的 1100 多名患者。参与者需完成三个互补目标:(1) 分割原发肿瘤体积 (GTVp) 和转移淋巴结 (GTVn),(2) 预测无复发生存期,(3) 分类 HPV 状态。挑战赛吸引了 35 个注册团队,其中 15 个最终提交在保留测试集上进行了评估。表现最佳的算法在分割上达到平均 Dice 相似系数 0.75,在生存预测上达到一致性指数 0.66,在 HPV 分类上达到平衡准确率 0.56。本文对所提交的方法进行了全面分析,评估了它们在不同病变特征上的性能,并讨论了它们在自动化肿瘤学工作流程和决策支持系统中临床转化的意义。

英文摘要

Head and neck cancers (HNC) represent a significant global health burden, with accurate tumor delineation being essential for effective radiotherapy planning. The complexity of the oropharyngeal anatomy, combined with the heterogeneous appearance of tumors on imaging, makes manual segmentation time-intensive and subject to inter-observer variability. Beyond segmentation, predicting long-term clinical outcomes, such as recurrence-free survival (RFS), and determining human papillomavirus (HPV) status from noninvasive imaging, remain challenging yet clinically valuable goals. The HECKTOR 2025 challenge addresses these needs by establishing a comprehensive benchmark for automated HNC analysis using multimodal PET/CT imaging and electronic health records. Building on previous editions (2020-2022), this challenge features an expanded multi-institutional dataset comprising over 1,100 patients from 10 centers worldwide. Participants were tasked with three complementary objectives: (1) segmenting primary gross tumor volumes (GTVp) and metastatic lymph nodes (GTVn), (2) predicting recurrence-free survival, and (3) classifying HPV status. The challenge attracted 35 registered teams, with 15 final submissions evaluated on a held-out test set. Top-performing algorithms achieved a mean Dice similarity coefficient of 0.75 for segmentation, a concordance index of 0.66 for survival prediction, and a balanced accuracy of 0.56 for HPV classification. This paper presents a comprehensive analysis of the submitted methodologies, evaluates their performance across different lesion characteristics, and discusses their implications for clinical translation in automated oncology workflows and decision support systems.

2606.20112 2026-06-19 cs.CV eess.IV 新提交 85%

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

像素级残差扩散Transformer:可扩展的3D CT体生成

Zhenkai Zhang, Markus Hiller, Krista A. Ehinger, Tom Drummond

发表机构 * School of Computing and Information Systems, The University of Melbourne(墨尔本大学计算与信息系统学院)

专题命中 医学影像融合 :生成3D CT体数据,涉及医学影像生成

AI总结 提出像素级残差扩散Transformer(PRDiT),通过两阶段训练(局部MLP盲估计器分离低频结构+全局残差扩散Transformer建模高频残差)实现高保真3D CT体生成,在LIDC-IDRI和RAD-ChestCT数据集上优于现有方法。

Comments Accepted at ICLR 2026. Code available at https://github.com/Fredy-Zhang/PRDiT

详情
AI中文摘要

由于现有生成模型固有的巨大计算需求和优化困难,生成具有精细细节的高分辨率3D CT体仍然具有挑战性。在本文中,我们提出了像素级残差扩散Transformer(PRDiT),这是一种可扩展的生成框架,可直接在体素级别合成高质量的3D医学体。PRDiT引入了一个两阶段训练架构,包括:1)一个局部去噪器,形式为基于MLP的盲估计器,作用于重叠的3D块,以有效分离低频结构;2)一个全局残差扩散Transformer,采用内存高效注意力来建模和细化整个体上的高频残差。这种从粗到细的建模策略简化了优化,增强了训练稳定性,并有效保留了细微结构,而无需自编码器瓶颈。在LIDC-IDRI和RAD-ChestCT数据集上进行的大量实验表明,PRDiT始终优于最先进的模型,如HA-GAN、3D LDM和WDM-3D,在3D FID、MMD和Wasserstein距离指标上显著降低。

英文摘要

Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck. Extensive experiments conducted on the LIDC-IDRI and RAD-ChestCT datasets demonstrate that PRDiT consistently outperforms state-of-the-art models, such as HA-GAN, 3D LDM and WDM-3D, achieving significantly lower 3D FID, MMD and Wasserstein distance scores.

2606.19966 2026-06-19 cs.CV cs.LG 新提交 85%

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

语义锚定证据融合用于域鲁棒的全切片生存分析

Yucheng Xing, Ling Huang, Pei Liu, Jingying Ma, Jiaqing Xu, Kai He, Mengling Feng

发表机构 * National University of Singapore(新加坡国立大学) Imperial College London(帝国理工学院) Hunan University(湖南大学)

专题命中 医学影像融合 :语义锚定证据融合,用于全切片生存分析。

AI总结 提出SAEFS框架,通过视觉问答提取语义锚点,结合双流证据提取和狄利克雷主观逻辑建模不确定性,实现跨域零样本生存分析,平均C-index提升10.2%。

详情
AI中文摘要

全切片图像(WSIs)广泛用于计算癌症预后。然而,现有方法主要关注域内性能,难以泛化到不同临床中心。这一局限性源于它们依赖像素级表示,极易受到染色协议和扫描硬件导致的域特定伪影影响。我们假设高级病理语义(如肿瘤分级和微环境结构)提供了域不变的语义表示,反映了人类病理学家的鲁棒诊断逻辑。因此,我们提出了语义锚定证据融合生存(SAEFS)框架,其中SAEFS通过视觉问答(VQA)从WSIs中推导语义锚点,采用双流WSI证据提取架构,使用基于狄利克雷的主观逻辑建模不确定性,并通过谨慎合取规则融合语义和视觉证据,以避免来自相关源的过度自信融合。仅在单一源域上训练并在四个未见域上进行零样本评估,SAEFS在预测准确性和可靠性上均一致优于最先进模型,平均C-index提升10.2%。定量分析进一步表明,VQA导出的语义特征比像素级特征表现出显著更低的跨中心差异,突显了其在跨中心临床应用中的鲁棒性。

英文摘要

Whole-slide images (WSIs) are widely used for computational cancer prognosis. However, most existing methods primarily focus on in-domain performance and fail to generalize across clinical centers. This limitation stems from their reliance on pixel-derived representations that are highly susceptible to domain-specific artifacts caused by staining protocols and scanner hardware. We hypothesize that high-level pathology semantics, such as tumor grade and micro-environmental architecture, provide a domain-invariant semantic representation that mirrors the robust diagnostic logic of human pathologists. Therefore, we propose a Semantic-Anchored Evidential Fusion Survival (SAEFS) framework, where SAEFS derives semantic anchors from WSIs via Visual Question Answering (VQA), employs a dual-stream WSI evidence extraction architecture, uses Dirichlet-based Subjective Logic to model uncertainty, and fuses semantic and visual evidence through a cautious conjunction rule to avoid overconfident fusion from correlated sources. Trained exclusively on one source domain and evaluated zero-shot across four unseen domains, SAEFS consistently outperforms state-of-the-art models both in prediction accuracy and reliability, improving the average C-index by 10.2%. Quantitative analyses further show that VQA-derived semantic features exhibit significantly lower cross-center divergence than pixel-derived features, highlighting their robustness for cross-center clinical applications.

2606.19838 2026-06-19 cs.CV 新提交 85%

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

OTCHA: 基于最优传输的置信度感知潜在中心对齐用于多视图医学图像分类

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(汉阳大学) Hankuk University of Foreign Studies(韩国外国语大学)

专题命中 医学影像融合 :多视图医学图像分类,融合补丁令牌

AI总结 提出OTCHA模块,通过最优传输对齐多视图补丁令牌与共享潜在中心令牌,结合置信度门控和部分匹配,消除无关特征,提升多视图医学图像分类鲁棒性。

Comments Accepted at MICCAI 2026

详情
AI中文摘要

多视图成像(如乳腺X线摄影和胸部X线摄影)是临床实践的标准组成部分。然而,医学图像通常未配准,且包含视图特定的伪影或无关背景线索,这些可能掩盖诊断相关发现。许多现有方法直接融合每个视图的表征,使得此类无关内容污染融合嵌入,并在不同视图配置下降低鲁棒性。我们提出OTCHA,一种基于最优传输(OT)的置信度感知潜在中心令牌对齐模块,在融合前细化补丁令牌以用于多视图分类。OTCHA引入一组跨视图共享的可学习潜在中心令牌。对于每个视图,我们计算补丁令牌与中心令牌之间的OT计划,该计划联合考虑特征相似性和几何结构,并通过令牌条件尘埃箱增强OT公式以实现部分匹配并丢弃无关令牌。所得传输计划提供令牌级匹配置信度,该置信度门控中心介导的消息传递,并加权一种新的基于最优传输的表征对齐损失以稳定细化。在三个多视图医学图像数据集上的实验表明,在不同解剖结构和视图配置下,相比竞争基线方法取得一致改进。我们的代码可在该https URL获取。

英文摘要

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

2606.19371 2026-06-19 cs.LG cs.AI cs.CV 新提交 85%

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

ProMUSE: 渐进式多模态不确定性引导的分阶段证据阿尔茨海默病分类

Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao

发表机构 * Kennesaw State University(肯尼索州立大学) Michigan Technological University(密歇根理工大学) University of Iowa(爱荷华大学)

专题命中 医学影像融合 :利用多模态数据(临床、MRI、PET)进行AD分类,核心是多模态融合。

AI总结 提出ProMUSE,一种渐进式多模态不确定性引导的分阶段证据网络,通过自适应决定何时需要额外模态,在保持准确性的同时降低数据采集成本。

详情
AI中文摘要

阿尔茨海默病(AD)是一种致命性疾病,会破坏老年人的记忆和认知能力。大多数AD治疗在早期阶段有效,导致对早期AD诊断的需求日益增加。AD诊断越来越依赖多模态数据,如临床评估、结构磁共振成像(MRI)和正电子发射断层扫描(PET)成像。然而,MRI和PET采集仍然昂贵且不易普及,使得全模态推理在现实临床工作流程中不切实际。我们提出ProMUSE,一种渐进式多模态不确定性引导的分阶段证据网络,该网络自适应地确定何时需要额外模态,有助于在保持准确性的同时降低数据采集的总体成本。ProMUSE首先使用低成本临床数据进行证据分类,并通过基于Dirichlet的主观逻辑模型量化不确定性。当不确定性超过学习阈值时,ProMUSE逐步引入MRI或PET特征,通过Dempster-Shafer理论融合模态层面的信念和不确定性,获得校准的多模态预测。这种分阶段采集策略能够在最小化对昂贵成像依赖的同时实现准确诊断。在ADNI、AIBL和OASIS数据集上针对CN-AD、CN-MCI和MCI-AD任务的实验表明,ProMUSE在减少50-90%的MRI/PET使用量的同时,实现了与全模态基线相当或更优的准确性,从而大幅节省成本。这些结果突显了ProMUSE作为现实世界AD筛查中一种实用、不确定性感知且资源高效的解决方案。

英文摘要

Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

2606.14957 2026-06-19 cs.CV 新提交 85%

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

学习用于多模态神经影像的稀疏潜在预测基础模型

Haoxu Huang, Long Chen, Jingyun Chen, Jinu Hyun, James Ryan Loftus, Kara Melmed, Daniel Orringer, Jennifer Frontera, Seena Dehkharghani, Arjun Masurkar, Narges Razavian

发表机构 * New York University, Center for Data Science(纽约大学数据科学中心) NYU Grossman School of Medicine, Department of Radiology(纽约大学格罗斯曼医学院放射学系) State University of New York at Binghamton, School of Computing(纽约州立大学宾汉姆顿分校计算机学院) NYU Grossman School of Medicine, Department of Neurology(纽约大学格罗斯曼医学院神经病学系) NYU Grossman School of Medicine, Department of Neurosurgery(纽约大学格罗斯曼医学院神经外科学系) NYU Grossman School of Medicine, Department of Pathology(纽约大学格罗斯曼医学院病理学系) School of Medicine, Department of Radiology, Stanford(斯坦福大学医学院放射学系) NYU Grossman School of Medicine, Department of Neuroscience(纽约大学格罗斯曼医学院神经科学系) NYU Grossman School of Medicine, Neuroscience Institute(纽约大学格罗斯曼医学院神经科学研究所)

专题命中 医学影像融合 :融合T1w、T2w和FLAIR三种MRI序列,学习统一表示

AI总结 提出Neuro-JEPA模型,结合潜在预测目标和专家混合架构,学习T1w、T2w和FLAIR三种MRI序列的统一表示,在25项临床任务和22项公开数据集任务上优于现有基础模型和CNN基线。

Comments Under Review Preprint

详情
AI中文摘要

脑部MRI通常作为多个互补序列采集,具有独特的对比度加权,包括T1加权成像(T1w)解剖对比和液体敏感T2加权(T2w)对比。然而,在健康系统规模上,跨多种MRI对比机制学习统一表示的方法尚缺乏。在本研究中,我们引入了Neuro-JEPA,一种稀疏多模态神经影像基础模型,它结合了潜在预测目标和专家混合架构,以编码跨核心T1w、T2w和液体抑制FLAIR成像(FLAIR)的脑部MRI。我们进一步对架构、掩码、目标和稀疏性设计选择进行了系统的方法论研究,这些选择有利于稳健的神经影像多模态表示学习。Neuro-JEPA在428,647项研究的1,551,862次扫描上进行了预训练,这些扫描经过了模态特定的预处理和跨三种核心结构脑部MRI序列的数据整理。我们在临床和研究环境中评估了学习到的表示,包括来自三个健康系统(NYU Langone、NYU Long Island和Massachusetts General Hospital)的25项任务,以及来自12个公开数据集的22项任务,涵盖了单模态、多模态和跨域评估配置。在这些基准测试中,现有的神经影像基础模型相对于简单的卷积神经网络(CNN)基线显示出不一致的提升,而Neuro-JEPA在所有评估设置中实现了更强且更一致的性能。这些结果建立了一个可扩展的多模态神经影像表示学习方法论框架,并强调了基础模型评估协议需要包括简单基线、临床异质性队列和受控的多模态比较。

英文摘要

Brain MRIs are routinely acquired as multiple complementary sequences with unique contrast weighting, including T1-weighed imaging (T1w) anatomic and fluid-sensitive T2-weighted (T2w) contrasts. However, methods for learning unified representations across the multitude of MRI contrast mechanisms at health-system scale are lacking. In this study, we introduce Neuro-JEPA, a sparse multimodal neuroimaging foundation model that combines a latent predictive objective with a Mixture-of-Experts architecture to encode brain MRI across core T1w, T2w, and fluid-suppressed FLAIR imaging (FLAIR). We further provide a systematic methodological study of architectural, masking, objective, and sparsity design choices beneficial for robust neuroimaging multimodal representation learning. Neuro-JEPA was pretrained on 1,551,862 scans from 428,647 studies after modality-specific preprocessing with data curation across three core structural brain MRI sequences. We evaluated the learned representations across clinical and research settings, including 25 tasks from three health systems: NYU Langone, NYU Long Island, and Massachusetts General Hospital, and 22 tasks from 12 public datasets, covering unimodal, multimodal and cross-domain evaluation configurations. Across these benchmarks, existing neuroimaging foundation models showed inconsistent gains over a simple convolutional neural network (CNN) baseline, whereas Neuro-JEPA achieved stronger and more consistent performance across all evaluated settings. These results establish a scalable methodological framework for multimodal neuroimaging representation learning and highlight the need for foundation model evaluation protocols that include simple baselines, clinically heterogeneous cohorts and controlled multimodal comparisons.

2508.01819 2026-06-19 eess.IV 版本更新 80%

Decoding the Alzheimer's Continuum: Interpretable Multi-Gate Routing for Diagnosis and Transition Prediction

解码阿尔茨海默病连续谱:可解释的多门路由用于诊断与转换预测

Yufeng Jiang, Hexiao Ding, Hongzhao Chen, Jing Lan, Xinzhi Teng, Gerald W. Y. Cheng, Yunlin Mao, Zongxi Li, Haoran Xie, Jung Sun Yoo, Jing Cai

专题命中 医学影像融合 :多门专家混合架构融合临床先验与MRI

AI总结 提出M$^3$AD统一框架,利用可解释多门专家混合架构,基于T1加权sMRI同时实现三分类诊断和阶段转换预测,准确率达95.13%。

Comments Accepted by MICCAI2026

详情
AI中文摘要

阿尔茨海默病(AD)表现为从正常认知(NC)经轻度认知障碍(MCI)到痴呆的连续进展。然而,大多数深度学习方法将此连续谱简化为不连续的分类任务,很大程度上忽略了动态阶段转换。为了解码这一复杂进展,我们提出M$^3$AD,一个统一框架,仅使用T1加权sMRI联合处理三分类诊断和诊断阶段转换预测。M$^3$AD利用可解释的多门专家混合架构,采用专门的路由机制动态捕获诊断特定的病理模式和跨连续谱的共享结构特征。它进一步通过自适应注意力融合整合临床先验(年龄、性别、eTIV)以增强泛化能力。M$^3$AD在原始实验设置下达到95.13%的准确率(MCLNC报告为90.44%),转换预测准确率为94.87%。关键的是,分析多门路由揭示了区分稳定性和进展性MCI的独特专家激活特征,为个体水平的进展风险分层提供了机制基础。代码见:此 https URL。

英文摘要

Alzheimer's disease (AD) manifests as a continuous progression from normal cognition (NC) through mild cognitive impairment (MCI) to dementia. However, most deep learning approaches reduce this continuum to disjointed classification tasks, largely ignoring dynamic stage transitions. To decode this complex progression, we propose M$^3$AD, a unified framework that jointly addresses three-class diagnosis classification and diagnosis stage transition prediction using only T1-weighted sMRI. M$^3$AD leverages an interpretable multi-gate mixture of experts architecture, employing specialized routing mechanisms to dynamically capture both diagnosis-specific pathological patterns and shared structural features across the continuum. It further integrates clinical priors (age, sex, eTIV) via adaptive attention fusion to enhance generalization. M$^3$AD achieves 95.13% accuracy, compared to 90.44% reported by MCLNC under its original experimental setting, and 94.87% for transition prediction. Crucially, analyzing the multi-gate routing reveals distinct expert activation signatures distinguishing stable from progressive MCI, providing a mechanistic basis for individual-level progression risk stratification. Code is available at https://github.com/csyfjiang/M3AD.

2503.23179 2026-06-19 eess.IV cs.CV 版本更新 80%

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg:面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck(吕贝克大学医学信息学研究所) Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所) Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute(伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心) School of Computer Science, University of Birmingham(伯明翰大学计算机科学学院) Division of Informatics, Imaging and Data Sciences, University of Manchester(曼彻斯特大学信息学、成像和数据科学系) DAMO Academy, Alibaba Group(阿里集团DAMO学院) Hangzhou Shengshi Technology Co., Ltd(杭州盛世科技有限公司) Department of Radiation Oncology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射肿瘤科) EchoScout GmbH Radboud University Medical Center, Nijmegen(奈密根大学医学中心) Institute of Interventional Radiology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院介入放射科)

专题命中 医学影像融合 :CBCT与FBCT配准,属于医学影像融合

AI总结 提出OncoReg挑战,通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法,用于放射治疗中锥束CT与扇束CT的配准,发现特征提取是关键,深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情
AI中文摘要

在现代癌症研究中,由于患者隐私相关的挑战,产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题,该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法,并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集,第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上,纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要,特别是在图像引导放射治疗的动态治疗调整中,需要精确对齐以最小化对健康组织的辐射暴露,同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据,并对竞赛参赛作品和结果进行了全面分析。研究发现,特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性,而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色,尤其是方法的组合,特别是在特征提取方面,被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

2606.19767 2026-06-19 eess.IV cs.CV physics.med-ph 新提交 70%

Contour-Constrained Deformable Registration with Parameter Characterization for Head and Neck Surgical Guidance

面向头颈外科引导的带参数表征的轮廓约束可变形配准

Qingyun Yang, Jon S. Heiselman, Ayberk Acar, Morgan J. Ringel, Michael I. Miga, Matthieu Chabanas, Michael C. Topf, Jie Ying Wu

发表机构 * Vanderbilt University(范德比尔特大学) Vanderbilt University Medical Center(范德比尔特大学医学中心)

专题命中 医学影像融合 :结合表面点云、基准标记和轮廓约束进行可变形配准,属于多传感器融合。

AI总结 提出一种基于正则化Kelvinlet基函数的可变形配准框架,通过表面点云、基准标记和轮廓约束校正术后组织变形,在9例头颈标本上将配准误差从刚性配准的11.11mm降至5.62mm,降幅达49.41%。

详情
AI中文摘要

全球每年新增89万例头颈部鳞状细胞癌,其复发率在实体恶性肿瘤中最高。尽管冰冻切片分析是术中切缘评估的标准方法,但由于切除标本与切除床之间的对准不精确,加上切除后黏膜组织收缩,准确地将检测到的阳性切缘重新定位到切除床上仍然具有挑战性。我们提出了一种生物力学驱动的可变形配准框架,用于校正术后组织变形以提供术中引导。该方法基于正则化Kelvinlet基函数的可变形配准方法,将3D标本网格配准到术中切除床点云。配准匹配表面点云、基准标记和边界轮廓约束,直接惩罚标本与切除床边界之间的垂直距离一致性。在来自皮肤、颊粘膜和舌部位的9个标本上,使用刚性配准的整体平均目标配准误差为$11.11 \pm 4.07$ mm,使用无轮廓约束的可变形配准则降至$8.20 \pm 2.68$ mm(降低26.19%)。所提出的轮廓约束可变形配准进一步将误差降至$5.62 \pm 2.28$ mm,相对于刚性配准降低了49.41%。我们在临床最具挑战性的舌标本中观察到最大降幅。我们还进行了系统的两阶段参数搜索,以表征表面配准、基准对应、轮廓约束和应变能正则化的相对重要性。该搜索表明,对于具有大侧向变形的组织类型,轮廓权重主导配准精度,而算法在广泛的参数组合范围内均可运行。

英文摘要

With 890,000 annual new cases globally, head and neck squamous cell carcinoma has one of the highest recurrence rates among solid malignancies. Although frozen section analysis is the standard of care for intraoperative margin assessment, accurately relocating detected positive margins on the resection bed remains challenging due to imprecise alignment between resected specimens and their resection bed, compounded by post-resection mucosal tissue shrinkage. We present a biomechanics-driven deformable registration framework that corrects post-resection tissue deformation to provide intraoperative guidance. Our approach registers 3D specimen meshes to intraoperative resection bed point clouds using a deformable registration approach based on regularized Kelvinlet basis functions. The registration matches surface point clouds, fiducial landmarks, and boundary contour constraints that directly penalize perpendicular distance-to-agreement between specimen and resection bed boundaries. Across nine specimens from skin, buccal mucosa, and tongue sites, the overall mean target registration error was $11.11 \pm 4.07$ mm using rigid registration, which decreased to $8.20 \pm 2.68$ mm (26.19\% reduction) using deformable registration without contour constraint. The proposed contour-constrained deformable registration further reduced the error to $5.62 \pm 2.28$ mm, a 49.41\% reduction relative to rigid registration. We observed the largest reduction in the most clinically challenging tongue specimens. We also performed a systematic two-stage parameter search to characterize the relative importance of surface alignment, fiducial correspondences, contour constraint, and strain energy regularization. This search revealed that contour weighting dominates registration accuracy for tissue types with large lateral deformation, while the algorithm operates over a broad range of parameter combinations.

2507.23027 2026-06-19 cs.CV cs.AI 70%

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

恢复诊断价值:超分辨率辅助的资源受限成像中的心电图分类

Krishan Agyakari Raja Babu, Om Prabhu, Annu, Mohanasankar Sivaprakasam

发表机构 * Indian Institute of Technology Madras(印度理工学院马德拉斯分校) All India Institute of Medical Sciences(全印度医学科学研究所) Indian Institute of Technology Hyderabad(印度理工学院海得拉巴分校)

专题命中 医学影像融合 :超分辨率辅助超声心动图分类,属于医学影像融合

AI总结 本文研究了基于深度学习的超分辨率技术在低质量2D超声心动图分类中的应用,通过CAMUS数据集验证了SRGAN和SRResNet在提升分类准确率和计算效率方面的有效性。

Comments Accepted at the MICCAI Workshop on "Medical Image Computing in Resource Constrained Settings & Knowledge Interchange (MIRASOL)" 2025

详情
AI中文摘要

在资源受限环境下,自动心脏解读常受限于低质量超声心动图图像,限制了后续诊断模型的效果。尽管超分辨率(SR)技术在增强磁共振成像(MRI)和计算机断层扫描(CT)扫描方面表现出潜力,但其在超声心动图-一种广泛但易受噪声影响的模态中的应用仍待探索。本文研究了基于深度学习的SR技术在低质量2D超声心动图分类中的潜力。使用公开的CAMUS数据集,我们按图像质量分层样本,并评估了两个临床相关的任务:相对简单的两腔 vs. 四腔(2CH vs. 4CH)视图分类和更复杂的终舒张期 vs. 终收缩期(ED vs. ES)相分类。我们应用了两种广泛使用的SR模型-Super-Resolution Generative Adversarial Network(SRGAN)和Super-Resolution Residual Network(SRResNet),以增强低质量图像并观察到性能指标上的显著提升,特别是SRResNet,它还提供了计算效率。我们的发现表明,SR可以有效恢复降质超声扫描的诊断价值,使其成为资源受限环境(RCS)中AI辅助护理的可行工具,实现以少胜多。

英文摘要

Automated cardiac interpretation in resource-constrained settings (RCS) is often hindered by poor-quality echocardiographic imaging, limiting the effectiveness of downstream diagnostic models. While super-resolution (SR) techniques have shown promise in enhancing magnetic resonance imaging (MRI) and computed tomography (CT) scans, their application to echocardiography-a widely accessible but noise-prone modality-remains underexplored. In this work, we investigate the potential of deep learning-based SR to improve classification accuracy on low-quality 2D echocardiograms. Using the publicly available CAMUS dataset, we stratify samples by image quality and evaluate two clinically relevant tasks of varying complexity: a relatively simple Two-Chamber vs. Four-Chamber (2CH vs. 4CH) view classification and a more complex End-Diastole vs. End-Systole (ED vs. ES) phase classification. We apply two widely used SR models-Super-Resolution Generative Adversarial Network (SRGAN) and Super-Resolution Residual Network (SRResNet), to enhance poor-quality images and observe significant gains in performance metric-particularly with SRResNet, which also offers computational efficiency. Our findings demonstrate that SR can effectively recover diagnostic value in degraded echo scans, making it a viable tool for AI-assisted care in RCS, achieving more with less.