arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.05687 2026-04-23 cs.CV

3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models

Xinye Zheng, Fei Wang, Yiqi Nie, Kun Li, Junjie Chen, Jiaqi Zhao, Yanyan Wei, Zhiliang Wu

详情

英文摘要

Reconstructing 3D scenes from smoke-degraded multi-view images is particularly difficult because smoke introduces strong scattering effects, view-dependent appearance changes, and severe degradation of cross-view consistency. To address these issues, we propose a framework that integrates visual priors with efficient 3D scene modeling. We employ Nano-Banana-Pro to enhance smoke-degraded images and provide clearer visual observations for reconstruction and develop Smoke-GS, a medium-aware 3D Gaussian Splatting framework for smoke scene reconstruction and restoration-oriented novel view synthesis. Smoke-GS models the scene using explicit 3D Gaussians and introduces a lightweight view-dependent medium branch to capture direction-dependent appearance variations caused by smoke. Our method preserves the rendering efficiency of 3D Gaussian Splatting while improving robustness to smoke-induced degradation. Results demonstrate the effectiveness of our method for generating consistent and visually clear novel views in challenging smoke environments.

URL PDF HTML ☆

赞 0 踩 0

2604.01577 2026-04-23 cs.LG cs.AI

Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling

Shota Takashiro, Masanori Koyama, Takeru Miyato, Yusuke Iwasawa, Yutaka Matsuo, Kohei Hayashi

2603.28032 2026-04-23 cs.RO cs.AI cs.CV cs.HC

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Tianle Zeng, Yanci Wen, Hong Zhang

Comments Prebuilt binaries, project page, full source code, and community discussion group are all available at: https://github.com/louiszengCN/CarlaAir

2603.26842 2026-04-23 cs.LG cs.AI cs.CV

VAN-AD: Visual Masked Autoencoder with Normalizing Flow For Time Series Anomaly Detection

PengYu Chen, Shang Wan, Xiaohou Shi, Yuan Chang, Yan Sun, Sajal K. Das

Comments 13 pages, 20 figures

2603.26747 2026-04-23 cs.CV cs.LG

From Diffusion to Flow: Efficient Motion Generation in MotionGPT3

Jaymin Ban, JiHong Jeon, SangYeop Jeong

Comments ReALM-GEN Workshop ICLR 2026

2603.25383 2026-04-23 cs.CV

CLIP-RD: Relative Distillation for Efficient CLIP Knowledge Distillation

Jeannie Chung, Hanna Jang, Ingyeong Yang, Uiwon Hwang, Jaehyeong Sim

2603.25132 2026-04-23 cs.CV cs.LG

Robust Principal Component Completion

Yinjian Wang, Wei Li, Yuanyuan Gui, James E. Fowler, Gemine Vivone

2603.23694 2026-04-23 cs.CV

CoRe: Joint Optimization with Contrastive Learning for Medical Image Registration

Eytan Kats, Christoph Grossbroehmer, Ziad Al-Haj Hemidi, Fenja Falta, Wiebke Heyer, Mattias P. Heinrich

Comments Preprint

2603.23089 2026-04-23 cs.CV

A Synchronized Audio-Visual Multi-View Capture System

Xiangwei Shi, Gara Dorta, Ruud de Jong, Ojas Shirekar, Chirag Raman

2603.23043 2026-04-23 cs.LG cs.AI

Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts

Maria Conchita Agana Navarro, Geng Li, Theo Wolf, Maria Perez-Ortiz

2603.22885 2026-04-23 cs.LG

A Heterogeneous Long-Micro Scale Cascading Architecture for General Aviation Health Management

Xinhang Chen, Zhihuan Wei, Yang Hu, Zhiguo Zeng, Kang Zeng, Wei Wang

Comments Significant methodological flaws have been identified in the experimental validation and metric computation procedures that undermine the reliability of the reported results. A comprehensive revision is underway

2603.21373 2026-04-23 cs.LG cs.CL

PLR: Plackett-Luce for Reordering In-Context Learning Examples

Pawel Batorski, Paul Swoboda

2603.17478 2026-04-23 cs.LG cs.AI

Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization

Ahmet Kaplan

Comments 7 pages

2603.16059 2026-04-23 cs.RO

Ultrafast Sampling-based Kinodynamic Planning via Differential Flatness

Thai Duong, Clayton W. Ramsey, Zachary Kingston, Wil Thomason, Lydia E. Kavraki

Comments 20 pages, 10 figures, under review

2603.09283 2026-04-23 cs.CV

From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

Jiagao Hu, Yuxuan Chen, Fuhao Li, Zepeng Wang, Fei Wang, Daiguo Zhou, Jian Luan

Comments Project Page: https://xiaomi-research.github.io/svor/

2603.07474 2026-04-23 cs.CL cs.AI

Cross-Modal Taxonomic Generalization in (Vision-) Language Models

Tianyang Xu, Marcelo Sandoval-Castaneda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra

Comments ACL 2026 (main conference)

2603.07076 2026-04-23 cs.CV

Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

Shixuan Xu, Yabo Liu, Chao Huang, Junyu Dong, Xinghui Dong

2603.02364 2026-04-23 cs.SD eess.AS

When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Kirill Borodin, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian

Comments This paper has been submitted to Interspeech 2026 for review

2603.01168 2026-04-23 cs.LG cs.AI

SphUnc: Hyperspherical Uncertainty Decomposition and Causal Identification via Information Geometry

Rong Fu, Chunlei Meng, Jinshuo Liu, Dianyu Zhao, Yongtai Liu, Yibo Meng, Xiaowen Ma, Wangyu Wu, Yangchen Zeng, Shuaishuai Cao, Simon Fong

Comments 22 pages, 15 figures

2603.00696 2026-04-23 cs.CL

DRIV-EX: Counterfactual Explanations for Driving LLMs

Amaia Cardiel, Eloi Zablocki, Elias Ramzi, Eric Gaussier

Comments Accepted at ACL Findings 2026

2602.19470 2026-04-23 cs.CV physics.optics

Physics-informed Active Polarimetric 3D Imaging for Specular Surfaces

Jiazhang Wang, Hyelim Yang, Tianyi Wang, Florian Willomitzer

2602.17711 2026-04-23 cs.SD eess.AS

Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

Ivan Viakhirev, Kirill Borodin, Mikhail Gorodnichev, Grach Mkrtchian

Comments Published at MDPI Mathematics (see at https://www.mdpi.com/2227-7390/14/2/381)

详情

DOI: 10.3390/math14020381
Journal ref: Mathematics 14 (2026)

英文摘要

Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.

URL PDF HTML ☆

赞 0 踩 0

2602.15861 2026-04-23 cs.CL cs.AI

CAST: Achieving Stable LLM-based Text Analysis for Data Analytics

Jinxiang Xie, Zihao Li, Wei He, Rui Ding, Shi Han, Dongmei Zhang

Comments ACL 2026 Findings

2602.15353 2026-04-23 cs.CL cs.AI

NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering

Rong Fu, Yang Li, Zeyu Zhang, Jiekai Wu, Yaohua Liu, Shuaishuai Cao, Yangchen Zeng, Yuhang Zhang, Xiaojing Du, Simon Fong

Comments 26 pages, 7 figures

2602.13669 2026-04-23 cs.CV

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

Rang Meng, Weipeng Wu, Yuming Li, Chenguang Ma

2602.13232 2026-04-23 cs.AI cs.SE

PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading

Mayank Ravishankara

2602.10312 2026-04-23 cs.LG

Training-free retrieval-augmented generation with reinforced reasoning for flood damage nowcasting

Lipai Huang, Kai Yin, Chia-Fu Liu, Ali Mostafavi

Comments 18 pages, 3 figures, 8 tables, submitted to CACAIE journal

2602.10100 2026-04-23 cs.LG cs.CR

Towards Explainable Federated Learning: Understanding the Impact of Differential Privacy

Júlio Oliveira, Rodrigo Ferreira, André Riker, Glaucio H. S. Carvalho, Eirini Eleni Tsilopoulou

2602.09781 2026-04-23 cs.LG cs.AI

Explainability in Generative Medical Diffusion Models: A Faithfulness-Based Analysis on MRI Synthesis

Surjo Dey, Pallabi Saikia

Comments Accepted at 3rd World Congress on Smart Computing (WCSC2026) conference

2602.07473 2026-04-23 cs.AI cs.FL

Computing the Reachability Value of Posterior-Deterministic POMDPs

Nathanaël Fijalkow, Arka Ghosh, Roman Kniazev, Guillermo A. Pérez, Pierre Vandenhove