arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.18831 2026-04-22 cs.CV cs.RO

Feasibility of Indoor Frame-Wise Lidar Semantic Segmentation via Distillation from Visual Foundation Model

Haiyang Wu, Juan J. Gonzales Torres, George Vosselman, Ville Lehtola

详情

英文摘要

Frame-wise semantic segmentation of indoor lidar scans is a fundamental step toward higher-level 3D scene understanding and mapping applications. However, acquiring frame-wise ground truth for training deep learning models is costly and time-consuming. This challenge is largely addressed, for imagery, by Visual Foundation Models (VFMs) which segment image frames. The same VFMs may be used to train a lidar scan frame segmentation model via a 2D-to-3D distillation pipeline. The success of such distillation has been shown for autonomous driving scenes, but not yet for indoor scenes. Here, we study the feasibility of repeating this success for indoor scenes, in a frame-wise distillation manner by coupling each lidar scan with a VFM-processed camera image. The evaluation is done using indoor SLAM datasets, where pseudo-labels are used for downstream evaluation. Also, a small manually annotated lidar dataset is provided for validation, as there are no other lidar frame-wise indoor datasets with semantics. Results show that the distilled model achieves up to 56% mIoU under pseudo-label evaluation and around 36% mIoU with real-label, demonstrating the feasibility of cross-modal distillation for indoor lidar semantic segmentation without manual annotations.

URL PDF HTML ☆

赞 0 踩 0

2604.18829 2026-04-22 cs.CV

DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning

Abrar Majeedi, Zhiyuan Ruan, Ziyi Zhao, Hongcheng Wang, Jianglin Lu, Yin Li

Comments Accepted at CVPR Findings 2026

2604.18828 2026-04-22 cs.LG physics.comp-ph

The High Explosives and Affected Targets (HEAT) Dataset

Bryan Kaiser, Kyle Hickmann, Sharmistha Chakrabarti, Soumi De, Sourabh Pandit, David Schodt, Jesus Pulido, Divya Banesh, Christine Sweeney

2604.18816 2026-04-22 cs.LG cs.AI

Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning

Alexandre L. M. Levada

Comments 30 pages, 8 figures and 7 tables

2604.18811 2026-04-22 cs.LG cs.CV

Rethinking Dataset Distillation: Hard Truths about Soft Labels

Priyam Dey, Aditya Sahdev, Sunny Bhati, Konda Reddy Mopuri, R. Venkatesh Babu

Comments CVPR 2026 (Oral). First two authors contributed equally

详情

英文摘要

Despite the perceived success of large-scale dataset distillation (DD) methods, recent evidence finds that simple random image baselines perform on-par with state-of-theart DD methods like SRe2L due to the use of soft labels during downstream model training. This is in contrast with the findings in coreset literature, where high-quality coresets consistently outperform random subsets in the hardlabel (HL) setting. To understand this discrepancy, we perform a detailed scalability analysis to examine the role of data quality under different label regimes, ranging from abundant soft labels (termed as SL+KD regime) to fixed soft labels (SL) and hard labels (HL). Our analysis reveals that high-quality coresets fail to convincingly outperform the random baseline in both SL and SL+KD regimes. In the SL+KD setting, performance further approaches nearoptimal levels relative to the full dataset, regardless of subset size or quality, for a given compute budget. This performance saturation calls into question the widespread practice of using soft labels for model evaluation, where unlike the HL setting, subset quality has negligible influence. A subsequent systematic evaluation of five large-scale and four small-scale DD methods in the HL setting reveals that only RDED reliably outperforms random baselines on ImageNet-1K, but can still lag behind strong coreset methods due to its over-reliance on easy sample patches. Based on this, we introduce CAD-Prune, a compute-aware pruning metric that efficiently identifies samples of optimal difficulty for a given compute budget, and use it to develop CA2D, a compute-aligned DD method, outperforming current DD methods on ImageNet-1K at various IPC settings. Together, our findings uncover many insights into current DD research and establish useful tools to advance dataefficient learning for both coresets and DD.

URL PDF HTML ☆

赞 0 踩 0

2604.18806 2026-04-22 cs.LG cs.AR

A PPA-Driven 3D-IC Partitioning Selection Framework with Surrogate Models

Shang Wang, Shuai Liu, Owen Randall, Matthew E. Taylor

2604.18805 2026-04-22 cs.AI cond-mat.mtrl-sci cs.LG

AI scientists produce results without reasoning scientifically

Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, N. M. Anoop Krishnan, Kevin Maik Jablonka

2604.18804 2026-04-22 cs.CV cs.AI

Geometric Decoupling: Diagnosing the Structural Instability of Latent

Yuanbang Liang, Zhengwen Chen, Yu-Kun Lai

2604.18801 2026-04-22 cs.LG cs.DC

Preserving Clusters in Error-Bounded Lossy Compression of Particle Data

Congrong Ren, Sheng Di, Katrin Heitmann, Franck Cappello, Hanqi Guo

2604.18797 2026-04-22 cs.CV

CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization

Linkai Peng, Cuiling Sun, Zheyuan Zhang, Wanying Dou, Halil Ertugrul Aktas, Andrea M Bejar, Elif Keles, Tamas Gonda, Michael B Wallace, Zongwei Zhou, Gorkem Durak, Rajesh N Keswani, Ulas Bagci

Comments Accepted to MIDL 2026

2604.18791 2026-04-22 cs.LG cs.AI

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li

Comments 9 pages, 2 figures

2604.18790 2026-04-22 cs.CV

EfficientPENet: Real-Time Depth Completion from Sparse LiDAR via Lightweight Multi-Modal Fusion

Johny J. Lopez, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Anton Netchaev, Steven Sloan, Ken Pathak, Kendall N. Niles

Comments This work has been submitted to the IEEE for possible publication

2604.18789 2026-04-22 cs.AI cs.CR cs.LG

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

Jiacheng Liang, Yao Ma, Tharindu Kumarage, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Aram Galstyan, Charith Peris

Comments 9 pages, ACL 2026 Main

2604.18788 2026-04-22 cs.LG

Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs

Afsara Benazir, Felix Xiaozhu Lin

2604.18786 2026-04-22 cs.CL cs.AI

Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models

Seyedali Mohammadi, Manas Gaur, Francis Ferraro

Comments Accepted at ACL 2026

2604.18781 2026-04-22 cs.CV

CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans

Sergio Morell-Ortega, Ángela González-Cebrián, Boris Mansencal, Marien Gadea, Roberto Vivo-Hernando, Gregorio Rubio, Fernando Aparici, Maria de la Iglesia-Vaya, Gwenaelle Catheline, Pierrick Coupé, José V. Manjón

2604.18780 2026-04-22 cs.LG

Streaming Structured Inference with Flash-SemiCRF

Benjamin K. Johnson, Thomas Goralski, Ayush Semwal, Hui Shen, H. Josh Jang

2604.18775 2026-04-22 cs.CL cs.LG

An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models

Hanrui Luo, Shreyank N Gowda

2604.18765 2026-04-22 cs.LG cs.AI

Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis

Bibek Aryal, Gift Modekwe, Qiugang Lu

2604.18759 2026-04-22 cs.CL

Model-Agnostic Meta Learning for Class Imbalance Adaptation

Hanshu Rao, Guangzeng Han, Xiaolei Huang

Comments Accepted to Findings of ACL 2026

2604.18757 2026-04-22 cs.CV cs.AI

REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction

Seowung Leem, Lin Gu, Chenyu You, Kuang Gong, Ruogu Fang

Comments Accepted for publication a MIDL 2026

详情

英文摘要

The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk profiles for predicting incident AD and dementia, on average 8 years before diagnosis (range: 1-11 years). Because real-world risk factors are structured questionnaire data, we translate them into clinically interpretable narratives compatible with pretrained vision-language models (VLMs). We further propose a group-aware contrastive learning (GACL) strategy that clusters patients with similar retinal morphometry and risk factors as positive pairs, strengthening multimodal alignment. This unified representation learning framework substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders, as well as general-purpose VLMs, demonstrating the value of jointly modeling retinal biomarkers and clinical risk factors. By providing a generalizable and noninvasive approach for early AD and dementia risk stratification, REVEAL has the potential to enable earlier intervention and improve preventive care at the population level.

URL PDF HTML ☆

赞 0 踩 0

2604.18756 2026-04-22 cs.LG cs.AI cs.CL cs.CR

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

2604.18747 2026-04-22 cs.CV

URoPE: Universal Relative Position Embedding across Geometric Spaces

Yichen Xie, Depu Meng, Chensheng Peng, Yihan Hu, Quentin Herau, Masayoshi Tomizuka, Wei Zhan

2604.18745 2026-04-22 cs.CV

DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation

Enrique Hernandez Noguera, Md Meftahul Ferdaus, Elias Ioup, Mahdi Abdelguerfi

2604.18744 2026-04-22 cs.CV

Match-Any-Events: Zero-Shot Motion-Robust Feature Matching Across Wide Baselines for Event Cameras

Ruijun Zhang, Hang Su, Kostas Daniilidis, Ziyun Wang

2604.18740 2026-04-22 cs.CV

Autonomous Skeletal Landmark Localization towards Agentic C-Arm Control

Jay Jung, Ahmad Arrabi, Jax Luo, Scott Raymond, Safwan Wshah

Comments Accepted at IJCARS: IPCAI 2026. Int J CARS (2026)

详情

DOI: 10.1007/s11548-026-03632-0

英文摘要

Purpose: Automated C-arm positioning ensures timely treatment in patients requiring emergent interventions. When a conventional Deep Learning (DL) approach for C-arm control fails, clinicians must revert to manual operation, resulting in additional delays. Consequently, an agentic C-arm control framework based on multimodal large language models (MLLMs) is highly desirable, as it can incorporate clinician feedback and use reasoning to make adjustments toward more accurate positioning. Skeletal landmark localization is essential for C-arm control, and we investigate adapting MLLMs for autonomous landmark localization. Methods: We used an annotated synthetic X-ray dataset and a real X-ray dataset. Each X-ray in both datasets is paired with several skeletal landmarks. We fine-tuned two MLLMs and tasked them with retrieving the closest landmarks from each X-ray. Quantitative evaluations of landmark localization were performed and compared against a leading DL approach. We further conducted qualitative experiments demonstrating: (1) how an MLLM can correct an initially incorrect prediction through reasoning, and (2) how the MLLM can sequentially navigate the C-arm toward a target location. Results: On both datasets, fine-tuned MLLMs demonstrate competitive performance across all localization tasks when compared with the DL approach. In the qualitative experiments, the MLLMs provide evidence of reasoning and spatial awareness. Conclusion: This study shows that fine-tuned MLLMs achieve accurate skeletal landmark localization and hold promise for agentic autonomous C-arm control. Our code is available athttps://github.com/marszzibros/C-arm-localization-LLMs.git

URL PDF HTML ☆

赞 0 踩 0

2604.18729 2026-04-22 cs.CL

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Shubin Kim, Yejin Son, Junyeong Park, Keummin Ka, Seungbeen Lee, Jaeyoung Lee, Hyeju Jang, Alice Oh, Youngjae Yu

Comments Accepted to ACL 2026 Main Conference. The first two authors contributed equally. The last three authors are co-corresponding authors

2604.18728 2026-04-22 cs.LG cs.AI

The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification

Merkouris Papamichail, Konstantinos Varsos, Giorgos Flouris, João Marques-Silva

2604.18725 2026-04-22 cs.CV

Colour Extraction Pipeline for Odonates using Computer Vision

Megan Mirnalini Sundaram Rajaraman, Fons J. Verbeek, Vincent J. Kalkman, Rita Pucci

Comments 18 pages long (excluding references), 12 figures, to be submitted in NCCV 2026

2604.18722 2026-04-22 cs.CL

Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP

Thanmay Jayakumar, Deepon Halder, Raj Dabre

Comments 9 pages, ACL 2026 (Findings)