arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.00905 2026-05-05 cs.CL cs.AI cs.CV

DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

Anirudh Iyengar Kaniyar Narayana Iyengar, Tampu Ravi Kumar, Manan Suri, Raviteja Bommireddy, Dinesh Manocha, Puneet Mathur, Vivek Gupta

Comments 10 Pages, 4 figures

详情

英文摘要

Diagram question answering (Diagram QA) requires reasoning-level attribution that links each question-answer pair to all visual regions needed to derive the answer, rather than only the region containing the final response. Creating such structured evidence across diagrams, charts, maps, circuits, and infographics is time-consuming, and existing annotation tools tightly couple their interfaces to dataset-specific formats. We present DIAGRAMS, a lightweight, schema-driven review framework that decouples interface logic from dataset-specific JSON structures through an internal meta-schema and dataset adapters. Given an image and QA pair with optional candidate regions, the system performs QA-conditioned evidence selection and proposes the regions required for reasoning. When QA pairs or candidate regions are missing, it generates them and supports human verification and refinement. Across six Diagram QA datasets, model-suggested evidence achieves 85.39% precision and 75.30% recall against reviewer-final selections (micro-averaged). These results indicate that the review-first framework reduces manual region creation while maintaining high agreement with final reasoning-level attributions. We release a public demo and installable package to support dataset auditing, grounded supervision creation, and grounded evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.00904 2026-05-05 cs.CV

Robustness of Transformer-Based Fluence Map Prediction Under Clinically Realistic Perturbations

Ujunwa Mgboh, Rafi Ibn Sultan, Joshua Kim, Kundan Thind, Dongxiao Zhu

Comments Accepted by The Artificial Intelligence in Medicine (AIME) 2026 Conference

2605.00903 2026-05-05 cs.CV

A Light Weight Multi-Features-View Convolution Neural Network For Plant Disease Identification

Muhammad Kaleem Ullah Khan

2605.00902 2026-05-05 cs.CV cs.IR

Validation of Whole-Slide Foundation Models for Image Retrieval in TCGA Data

Tianhao Lei, Parsa Esmaeilkhani, Saghir Alfasly, Wataru Uegami, Judy C. Boughey, Matthew P. Goetz, Krishna R. Kalari, H. R. Tizhoosh

详情

英文摘要

Foundation models are reshaping computational histopathology, yet their value for whole-slide image retrieval relative to strong patch-based and supervised aggregation baselines remains unclear. We benchmarked ten pipelines on 9,387 diagnostic slides spanning 17 organs and 60 diagnoses from The Cancer Genome Atlas (TCGA) using patient-level leave-one-patient-out evaluation. Methods included four pre-trained slide foundation models, a supervised attention-based multiple instance learning (ABMIL) aggregator on patch embeddings, and patch-level retrieval across five sampling densities. Performance varied more across organs and diagnoses than across architectures. Although the slide foundation model TITAN achieved the strongest overall results, its advantage was modest; ABMIL and patch-based methods reached comparable Top-1 and Top-3 accuracy, with no model consistently dominant. Morphologically distinctive entities approached ceiling performance, while rare, heterogeneous, and closely related subtypes remained challenging. Misclassifications aligned with organs exhibiting known inter-observer variability, suggesting an intrinsic ceiling for morphology-only retrieval. Performance was driven primarily by patch-level feature representations, with limited benefit from slide-level aggregation, indicating aggregation may be unnecessary in many settings. These findings argue against a universally optimal architecture and instead support organ-resolved benchmarking, diagnosis-aware or ensemble strategies, stronger feature representations, and multimodal retrieval frameworks. Notably, even the best model achieved only $\approx 68\% \pm 21\%$ retrieval accuracy on TCGA, and some subtypes showed $0\%$ accuracy across all methods, highlighting fundamental limitations of morphology-based representations and the need for substantial progress before reliable clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.00901 2026-05-05 cs.CV cs.AI

RA-CMF: Region-Adaptive Conditional MeanFlow for CT Image Reconstruction

Md Shifatul Ahsan Apurba, Md Selim, Jin Chen

详情

英文摘要

The use of CT imaging is important for screening, diagnosis, therapy planning, and prognosis of lung cancers. Unfortunately, due to differences in imaging protocols and scanner models, CT images acquired by different means may show large differences in noise statistics, contrast, and texture. In this study, we develop a novel conditional MeanFlow pipeline for CT image reconstruction. We introduce a conditional MeanFlow network that models the enhancement trajectory by predicting image-conditioned flow fields given intermediate image states. The image enhancement network is trained with a MeanFlow consistency loss along with the image reconstruction loss. In order to provide an adaptive refinement process in terms of spatial location of enhancements, we integrate a regional reinforcement learning-driven policy network into our approach. The policy network receives information about the MeanFlow rollouts and provides predictions in terms of tile-wise refinement budgets, stopping criteria, and total budget allocation of enhancement processes. Our policy network is trained through reinforcement learning in a policy gradient framework, where the goal of the training reward is to maximize improvement of enhancements while minimizing unnecessary computations and avoiding instabilities. In this way, our approach combines conditional flow-based enhancement with reinforcement learning-based spatial enhancement control. This allows our approach to focus more attention on enhancing difficult areas while stabilizing areas already showing sufficient quality. Our results show high accuracy in the tumor ROI, with the average radiomic feature CCC being 0.96, an average PSNR of 31.30 $\pm$ 4.16, and average SSIM of 0.94 $\pm$ 0.07. Moreover, there is an improvement in the overall quality of images, with an average PSNR of 34.23 $\pm$ 1.71 and average SSIM of 0.95 $\pm$ 0.01.

URL PDF HTML ☆

赞 0 踩 0

2605.00899 2026-05-05 cs.CV cs.LG

LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

James Flora, Kowshik Thopalli, Akshay R. Kulkarni, Weng-Keen Wong, Shusen Liu

Comments 17 pages, 6 figures

2605.00896 2026-05-05 cs.CV cs.AI cs.LG

When Less Is More: Simplicity Beats Complexity for Physics-Constrained InSAR Phase Unwrapping

Prabhjot Singh, Manmeet Singh

Comments 9 pages, 5 figures, 2 tables. Oral presentation, ML4RS Workshop @ ICLR 2026

2605.00894 2026-05-05 cs.CV

Dino-NestedUNet: Unlocking Foundation Vision Encoders for Pathology Tumor Bulk Segmentation via Dense Decoding

Tianyang Wang, Ziyu Su, Abdul Rehman Akbar, Usama Sajjad, Usman Afzaal, Lina Gokhale, Charles Rabolli, Wei Chen, Anil Parwani, Muhammad Khalid Khan Niazi

2605.00893 2026-05-05 cs.CV cs.AI cs.IR

Retrieval-Guided Generation for Safer Histopathology Image Captioning

Md. Enamul Hoq, Wataru Uegami, Saghir Alfasly, Ghazal Alabtah, Sahar Rahimi Malakshan, Armita Kazemi, Alex T. Schmitgen, Fred Prior, H. R. Tizhoosh

2605.00892 2026-05-05 cs.CV

When To Adapt? Adapting the Model or Data in Federated Medical Imaging

Chamani Shiranthika, Parvaneh Saeedi

Comments 10 pages, Accepted for oral presentation and proceedings of 24th International Conference on Artificial Intelligence in Medicine, Ottawa, Canada, July 7-10, 2026

2605.00891 2026-05-05 cs.CV cs.AI

X2SAM: Any Segmentation in Images and Videos

Hao Wang, Limeng Qiao, Chi Zhang, Lin Ma, Guanglu Wan, Xiangyuan Lan, Xiaodan Liang

Comments Technical Report

2605.00890 2026-05-05 cs.CV cs.AI cs.LG

Skeleton-Based Posture Classification to Promote Safer Walker-Assisted Gait in Older Adults

Sergio D. Sierra M., Monica Sinha, Marcela Múnera, Carlos A. Cifuentes

2605.00889 2026-05-05 cs.CV cs.LG

On the explainability of max-plus neural networks

Ikhlas Enaieh, Olivier Fercoq, García Ángel

Comments IEEE International Symposium on Computer-Based Medical Systems (CBMS 2026), Jun 2026, Limassol, Cyprus, Cyprus

2605.00888 2026-05-05 cs.CV cs.AI cs.LG eess.IV eess.SP

Selective Correlation Based Knowledge Distillation for Ground Reaction Force Estimation

Eun Som Jeon, Jisoo Lee, Huisu Lim, Omik M. Save, Hyunglae Lee, Pavan Turaga

详情

DOI: 10.1016/j.measurement.2026.121510
Journal ref: Measurement, 2026

英文摘要

Wearable sensor-based human gait analysis holds great promise in healthcare, rehabilitation, clinical diagnosis and monitoring, and sports activities. Specifically, ground reaction force (GRF) provides essential insights into the body's interaction with the ground during movement and is typically measured using instrumented treadmills equipped with force plates. However, such equipment is expensive and restricted to laboratory environments. To enable a more portable solution, wearable insole sensors have been used to measure GRF. These sensors, however, are prone to noise and external interference, which reduces measurement accuracy. Deep learning methodologies could be adopted to address these issues, but they often require significant computing resources to achieve high accuracy, limiting their applicability for real-time analysis on portable devices. To overcome these limitations, we propose Selective Correlation Based Knowledge Distillation (SCKD) for estimating GRF from data collected by insole sensors. Our proposed method utilizes selected features considering temporal characteristics in the process of extracting correlation maps for knowledge transfer, enhancing interpretability and mitigating issues in high dimensional data processing. We demonstrate the effectiveness of the compact models generated by our distillation framework through comparison with existing methods. Various configurations of teacher-student architectures and training approaches are examined based on multiple evaluation criteria, utilizing data collected at different walking speeds and with different window sizes. Experimental results confirm that our approach outperforms existing methods in estimating GRF from wearable insole sensor data. Therefore, our approach offers a reliable and resource-efficient solution for human gait analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.00887 2026-05-05 cs.CV

SparseContrast: Dynamic Sparse Attention for Efficient and Accurate Contrastive Learning in Medical Imaging

Paarth Prasad, Ruchika Malhotra

2605.00886 2026-05-05 cs.CV

Selective Attention-Based Network for Robust Infrared Small Target Detection

Yingming Zhang, Wuqi Su, Qing Xiao, Yonggang Yang

2605.00885 2026-05-05 cs.CV

Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion

Yingming Zhang, Wuqi Su, Qing Xiao, Yonggang Yang

2605.00883 2026-05-05 cs.CV cs.AI

Towards High Fidelity Face Swapping: A Comprehensive Survey and New Benchmark

Qi Li, Weining Wang, Shuangjun Du, Bo Peng, Jing Dong, Kun Wang, Zhenan Sun, Ming-Hsuan Yang

2605.00882 2026-05-05 cs.CV

Intervention-Based Self-Supervised Learning: A Causal Probe Paradigm for Remote Photoplethysmography

Zhiyi Niu, Xiaoguang Tu, Bo Zhao, Junzhe Cao, Dan Guo, Zitong Yu

详情

英文摘要

Remote Photoplethysmography (rPPG) enables convenient non-contact physiological measurement. Existing Self-Supervised Learning (SSL) methods commonly fall into a correlation trap: they tend to learn the most dominant periodic signals in the data, such as high-energy motion or illumination noise, rather than the faint, true rPPG signal, leading to poor model generalization. To address this, we propose a new SSL paradigm, Physiological Causal Probing (PCP), which treats the latent rPPG signal as the underlying physical source and the resulting pixel chrominance variations as its visual manifestation. Its core idea is to shift from passive correlation learning to active, precise intervention: it intervenes on the video based on a proposed rPPG hypothesis, and verifies whether the post-intervention changes match physical expectations. We propose the Interv-rPPG framework to implement PCP: an rPPG extractor named PhysMambaFormer hypothesizes the rPPG signal, while a Controllable Physiological Signal Editor conducts precise chrominance-domain interventions on videos based on this hypothesis. Interv-rPPG validates the physical realism of the hypothesis through `Falsifiability via Nulling' and `Axiomatic Equivariance'. Our editor achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components of the video. Our method improves both in-domain and cross-domain performance on challenging datasets such as VIPL-HR and MMPD. Furthermore, it surpasses the supervised baseline in complex cross-dataset settings, while remaining competitive on clean datasets where the intervention mechanism may introduce slight residual chrominance noise. Extensive experiments, including diagnostic analysis of nuisance sensitivity, demonstrate that the PCP paradigm effectively resists motion and illumination artifacts.

URL PDF HTML ☆

赞 0 踩 0

2605.00880 2026-05-05 cs.CV cs.AI

Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving

Xinyu Zeng, Xiangkun He, Lei Tao, Chen Lv, Hong Cheng

Comments 16 pages, 11 figures

2605.00879 2026-05-05 cs.RO cs.SY eess.SY

LiDAR for Rehabilitation: A Comprehensive Survey of Applications, AI Techniques, and Future Directions

Soumia Siyoucef, Najmeddine Dhieb, Hakim Ghazzai, Eleonora Guanziroli, Franco Molteni, Gianluca Setti

Comments This paper is accepted for publication in IEEE Sensors Reviews, April, 2026

2605.00878 2026-05-05 cs.CV

Single Image Defogging Using a Fourth-Order Telegraph PDE Guided by Physical Haze Modeling

Manish Kumar, Rajendra K. Ray

2605.00876 2026-05-05 cs.LG cs.CV

GAZE: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain MRI

Duaa Alim, Mogtaba Alim, Liam Chalcroft

2605.00875 2026-05-05 cs.CV cs.AI

Visual Chart Representations for Cryptocurrency Regime Prediction: A Systematic Deep Learning Study

Dustin M. Haggett

Comments 9 pages, 8 figures, 9 tables. Stevens Institute of Technology course project, Fall 2025

2605.00874 2026-05-05 cs.CV cs.AI cs.LG cs.MM

Latent Space Probing for Adult Content Detection in Video Generative Models

Alizishaan Khatri, Chiquita Prabhu

Comments To be published in 2026 56th Annual IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W)

2605.00842 2026-05-05 cs.AI cs.LG

Understanding Emergent Misalignment via Feature Superposition Geometry

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

Comments Accepted to ACL2026

2605.00841 2026-05-05 cs.AI econ.GN q-fin.EC

AI Agents for Sustainable SMEs: A Green ESG Assessment Framework

Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu

2605.00839 2026-05-05 cs.AI cs.LG

2026 Roadmap on Artificial Intelligence and Machine Learning for Smart Manufacturing

Jay Lee, Hanqi Su, Marco Macchi, Adalberto Polenghi, Wei Wu, Zhiheng Zhao, George Q. Huang, Kiva Allgood, Devendra Jain, Benedikt Gieger, Vibhor Pandhare, Soumyabrata Bhattacharjee, Ram Mohril, Lingbao Kong, Qiyuan Wang, Xinlan Tang, Sungjong Kim, Chan Hee Park, Byeng D. Youn, Guo Dong Goh, Xi Huang, Wai Yee Yeong, Yung C Shin, He Zhang, Zitong Wang, Fei Tao, Jagjit Singh Srai, Satyandra K. Gupta, Byung Gun Joung, Albin John, John W. Sutherland, Sang Won Lee, Olga Fink, Vinay Sharma, Faez Ahmed, Wei Chen, Mark Fuge, Arild Waaler, Martin G. Skjæveland, Dimitris Kyritsis, Wei Chen, VispiNevile Karkaria, Yi-Ping Chen, Ying-Kuan Tsai, Joseph Cohen, Xun Huan, Jing Lin, Liangwei Zhang, Gregory W. Vogl, Aaron W. Cornelius, Xiaodong Jia, Dai-Yan Ji, Takanobu Minami, Ruoxin Wang

Comments This paper has been accepted for publication in the Journal Machine Learning: Engineering

详情

DOI: 10.1088/3049-4761/ae5967

英文摘要

The evolution of artificial intelligence (AI) and machine learning (ML) is reshaping smart manufacturing by providing new capabilities for efficiency, adaptability, and autonomy across industrial value chains. However, the deployment of AI and ML in industrial settings still faces critical challenges, including the complexity of industrial big data, effective data management, integration with heterogeneous sensing and control systems, and the demand for trustworthy, explainable, and reliable operation in high-stakes industrial environments. In this roadmap, we present a comprehensive perspective on the foundations, applications, and emerging directions of AI and ML in smart manufacturing. It is structured in three parts. The first highlights the foundations and trends that frame the evolution of AI in smart manufacturing. The second focuses on key topics where AI is already enabling advances, including industrial big data analytics, advanced sensing and perception, autonomous systems, additive and laser-based manufacturing, digital twins, robotics, supply chain and logistics optimization, and sustainable manufacturing. The third section explores non-traditional ML approaches that are opening new frontiers, such as physics-informed AI, generative AI, semantic AI, advanced digital twins, explainable AI, RAMS, data-centric metrology, LLMs, and foundation models for highly connected and complex manufacturing systems. By identifying both opportunities and remaining barriers across these areas, this roadmap outlines the advances needed in methods, integration strategies, and industrial adoption. We hope this roadmap will serve as a guide for researchers, engineers, and practitioners to accelerate innovation, align academic and industrial priorities, and ensure that AI-driven smart manufacturing delivers reliable, sustainable, and scalable impact for the future of manufacturing ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2605.00837 2026-05-05 cs.LG

Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions

Hao Xiao

Comments 14 pages, 7 figures, code at https://github.com/xiao98/Fast-Sinkhorn-CUDA

2605.00836 2026-05-05 cs.LG

From Euler to Dormand-Prince: ODE Solvers for Flow Matching Generative Models

Hao Xiao

Comments 14 pages, 10 figures, code at github.com/xiao98/ODE-Flow-Experiments