arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.09425 2026-04-13 cs.CV

Do Vision Language Models Need to Process Image Tokens?

Sambit Ghosh, R. Venkatesh Babu, Chirag Agarwal

Comments Accepted (Oral) at TRUE-V Workshop CVPR 2026

详情

英文摘要

Vision Language Models (VLMs) have achieved remarkable success by integrating visual encoders with large language models (LLMs). While VLMs process dense image tokens across deep transformer stacks (incurring substantial computational overhead), it remains fundamentally unclear whether sustained image-token processing is necessary for their performance or visual representations meaningfully evolve from early to later layers. In this work, we systematically investigate the functional role of image tokens in VLMs and show that visual representations rapidly converge to a bounded-complexity regime, \ie their entropy stabilizes, intrinsic dimensionality compresses, and trajectory curvature approaches a near-constant profile. In contrast, textual representations continue to undergo substantial restructuring across depth. Once stabilized, visual representations become largely interchangeable between layers, indicating limited additional transformation in deeper stages. Further, depth-wise visual truncation reveals that the necessity of visual processing is task-dependent, where single-token predictions remain comparatively robust to truncated visual depth, but multi-token generation require sustained access to visual representations. Under deterministic decoding, reducing visual depth perturbs intermediate reasoning trajectories more strongly than final outputs, suggesting that image tokens influence the structure of reasoning more than the ultimate conclusions. Collectively, these findings \textbf{question the assumption} that deeper visual processing is uniformly essential in VLMs, challenging the current paradigm of multimodal LLM architectures.

URL PDF HTML ☆

赞 0 踩 0

2604.09423 2026-04-13 cs.LG

Offline Local Search for Online Stochastic Bandits

Gerdus Benadè, Rathish Das, Thomas Lavastida

Comments Part of this work has been accepted at ACM SIGMETRICS 2026

2604.09419 2026-04-13 cs.LG cs.DC

NOMAD: Generating Embeddings for Massive Distributed Graphs

Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, Ali Jannesari

2604.09418 2026-04-13 cs.CL cs.LG

Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM

Solomiia Bilyk, Volodymyr Getmanskyi, Taras Firman

2604.09417 2026-04-13 cs.AI

Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?

Chao Jiang, Jingyu Huang, Miqing Li

详情

英文摘要

Many-objective optimisation, a subset of multi-objective optimisation, involves optimisation problems with more than three objectives. As the number of objectives increases, the number of solutions needed to adequately represent the entire Pareto front typically grows substantially. This makes it challenging, if not infeasible, to design a search algorithm capable of effectively exploring the entire Pareto front. This difficulty is particularly acute in the Bayesian optimisation paradigm, where sample efficiency is critical and only a limited number of solutions (often a few hundred) are evaluated. Moreover, after the optimisation process, the decision-maker eventually selects just one solution for deployment, regardless of how many high-quality, diverse solutions are available. In light of this, we argue an idea that under a very limited evaluation budget, it may be more useful to focus on finding a single solution of the highest possible quality for the decision-maker, rather than aiming to approximate the entire Pareto front as existing many-/multi-objective Bayesian optimisation methods typically do. Bearing this idea in mind, this paper proposes a \underline{s}ingle \underline{p}oint-based \underline{m}ulti-\underline{o}bjective search framework (SPMO) that aims to improve the quality of solutions along a direction that leads to a good tradeoff between objectives. Within SPMO, we present a simple acquisition function, called expected single-point improvement (ESPI), working under both noiseless and noisy scenarios. We show that ESPI can be optimised effectively with gradient-based methods via the sample average approximation (SAA) approach and theoretically prove its convergence guarantees under the SAA. We also empirically demonstrate that the proposed SPMO is computationally tractable and outperforms state-of-the-arts on a wide range of benchmark and real-world problems.

URL PDF HTML ☆

赞 0 踩 0

2604.09415 2026-04-13 cs.CV cs.AI cs.LG cs.RO

PhysInOne: Visual Physics Learning and Reasoning in One Suite

Siyuan Zhou, Hejun Wang, Hu Cheng, Jinxi Li, Dongsheng Wang, Junwei Jiang, Yixiao Jin, Jiayue Huang, Shiwei Mao, Shangjia Liu, Yafei Yang, Hongkang Song, Shenxing Wei, Zihui Zhang, Peng Huang, Shijie Liu, Zhengli Hao, Hao Li, Yitian Li, Wenqi Zhou, Zhihan Zhao, Zongqi He, Hongtao Wen, Shouwang Huang, Peng Yun, Bowen Cheng, Pok Kazaf Fu, Wai Kit Lai, Jiahao Chen, Kaiyuan Wang, Zhixuan Sun, Ziqi Li, Haochen Hu, Di Zhang, Chun Ho Yuen, Bing Wang, Zhihua Wang, Chuhang Zou, Bo Yang

Comments CVPR 2026. Siyuan, Hejun, Hu, Jinxi, Dongsheng, Junwei, Yixiao, Jiayue, and Shiwei are co-first authors. Project page: https://vlar-group.github.io/PhysInOne.html

2604.09411 2026-04-13 cs.CV

SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data

Qingwen Zhang, Xiaomeng Zhu, Chenhan Jiang, Patric Jensfelt

2604.09406 2026-04-13 cs.LG

OASIS: Online Activation Subspace Learning for Memory-Efficient Training

Sakshi Choudhary, Utkarsh Saxena, Kaushik Roy

2604.09405 2026-04-13 cs.CV

EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure

Junyeong Ahn, Seojin Yoon, Sungyong Baik

2604.09391 2026-04-13 cs.LG cs.CV

Efficient Unlearning through Maximizing Relearning Convergence Delay

Khoa Tran, Simon S. Woo

2604.09389 2026-04-13 cs.LG cs.CL

Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

Götz-Henrik Wiegand, Lorena Raichle, Rico Städeli, Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh

Comments Presented as a paper at 3rd DATA-FM workshop @ ICLR 2026, Brazil. Published at 13th IEEE Swiss Conference on Data Science and AI (SDS 2026)

2604.09386 2026-04-13 cs.CV

Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

Zhuohan Ouyang, Zhe Qian, Wenhuo Cui, Chaoqun Wang

2604.09377 2026-04-13 cs.CL

Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios

Hui Liu, Bin Zou, Kecheng Chen, Jie Liu, Wenya Wang, Haoliang Li

Comments 30 pages, Accepted by ACL 2026 Main

2604.09366 2026-04-13 cs.CV

Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors

Ying Zang, Yidong Han, Chaotao Ding, Yuanqi Hu, Deyi Ji, Qi Zhu, Xuanfu Li, Jin Ma, Lingyun Sun, Tianrun Chen, Lanyun Zhu

2604.09359 2026-04-13 cs.LG

Bringing Clustering to MLL: Weakly-Supervised Clustering for Partial Multi-Label Learning

Yu Chen, Weijun Lv, Yue Huang, Xuhuan Zhu, Fang Li

2604.09358 2026-04-13 cs.LG cs.NE

Drift-Aware Online Dynamic Learning for Nonstationary Multivariate Time Series: Application to Sintering Quality Prediction

Yumeng Zhao, Shengxiang Yang, Xianpeng Wang

详情

英文摘要

Accurate prediction of nonstationary multivariate time series remains a critical challenge in complex industrial systems such as iron ore sintering. In practice, pronounced concept drift compounded by significant label verification latency rapidly degrades the performance of offline-trained models. Existing methods based on static architectures or passive update strategies struggle to simultaneously extract multi-scale spatiotemporal features and overcome the stability-plasticity dilemma without immediate supervision. To address these limitations, a Drift-Aware Multi-Scale Dynamic Learning (DA-MSDL) framework is proposed to maintain robust multi-output predictive performance via online adaptive mechanisms on nonstationary data streams. The framework employs a multi-scale bi-branch convolutional network as its backbone to disentangle local fluctuations from long-term trends, thereby enhancing representational capacity for complex dynamic patterns. To circumvent the label latency bottleneck, DA-MSDL leverages Maximum Mean Discrepancy (MMD) for unsupervised drift detection. By quantifying online statistical deviations in feature distributions, DA-MSDL proactively triggers model adaptation prior to inference. Furthermore, a drift-severity-guided hierarchical fine-tuning strategy is developed. Supported by prioritized experience replay from a dynamic memory queue, this approach achieves rapid distribution alignment while effectively mitigating catastrophic forgetting. Long-horizon experiments on real-world industrial sintering data and a public benchmark dataset demonstrate that DA-MSDL consistently outperforms representative baselines under severe concept drift. Exhibiting strong cross-domain generalization and predictive stability, the proposed framework provides an effective online dynamic learning paradigm for quality monitoring in nonstationary environments.

URL PDF HTML ☆

赞 0 踩 0

2604.09352 2026-04-13 cs.CV

LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation

Aytaç Sekmen, Fatih Emre Gunes, Furkan Horoz, Hüseyin Umut Işık, Mehmet Alp Ozaydin, Onur Altay Topaloglu, Şahin Umutcan Üstündaş, Yurdasen Alp Yeni, Halil Ersin Soken, Erol Sahin, Ramazan Gokberk Cinbis, Sinan Kalkan

Comments This paper will be published in CVPRW2026

2604.09338 2026-04-13 cs.AI cs.CL

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

Lars Benedikt Kaesberg, Tianyu Yang, Niklas Bauer, Terry Ruas, Jan Philip Wahle, Bela Gipp

2604.09336 2026-04-13 cs.LG

Hierarchical Flow Decomposition for Turning Movement Prediction at Signalized Intersections

Md Atiqur Rahman Mallick, Kamrul Hasan, Pulock Das, Liang Hong, S M Shazzad Rassel

Comments Accepted to IEEE SoutheastCon 2026. 6 pages, 5 figures

2604.09331 2026-04-13 cs.LG cs.SY eess.SY

Stability Enhanced Gaussian Process Variational Autoencoders

Carl R. Richardson, Jichen Zhang, Ethan King, Ján Drgoňa

2604.09330 2026-04-13 cs.RO cs.CV

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

Xiaolei Lang, Yang Wang, Yukun Zhou, Chaojun Ni, Kerui Li, Jiagang Zhu, Tianze Liu, Jiajun Lv, Xingxing Zuo, Yun Ye, Guan Huang, Xiaofeng Wang, Zheng Zhu

2604.09327 2026-04-13 cs.CV

From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection

Narges Rashvand, Shanle Yao, Armin Danesh Pazho, Babak Rahimi Ardabili, Hamed Tabkhi

2604.09326 2026-04-13 cs.RO cs.CV

Multimodal Anomaly Detection for Human-Robot Interaction

Guilherme Ribeiro, Iordanis Antypas, Leonardo Bizzaro, João Bimbo, Nuno Cruz Garcia

2604.09324 2026-04-13 cs.CV

Structure-Aware Fine-Grained Gaussian Splatting for Expressive Avatar Reconstruction

Yuze Su, Hongsong Wang, Jie Gui, Liang Wang

Comments The code is on Github: https://github.com/Su245811YZ/SFGS

2604.09308 2026-04-13 cs.AI

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

Maochen Sun, Youzhi Zhang, Gaofeng Meng

2604.09303 2026-04-13 cs.RO cs.LG cs.SY eess.SY

Online Intention Prediction via Control-Informed Learning

Tianyu Zhou, Zihao Liang, Zehui Lu, Shaoshuai Mou

2604.09294 2026-04-13 cs.RO

A Benchmark of Dexterity for Anthropomorphic Robotic Hands

Davide Liconti, Yuning Zhou, Yasunori Toshimitsu, Ronan Hinchet, Robert K. Katzschmann

2604.09289 2026-04-13 cs.LG

Meta-Learned Basis Adaptation for Parametric Linear PDEs

Vikas Dwivedi, Monica Sigovan, Bruno Sixou

2604.09288 2026-04-13 cs.LG

Are Independently Estimated View Uncertainties Comparable? Unified Routing for Trusted Multi-View Classification

Yilin Zhang, Cai Xu, Haishun Chen, Ziyu Guan, Wei Zhao

Comments 14pages, Under Review

2604.09285 2026-04-13 cs.AI

SAGE: A Service Agent Graph-guided Evaluation Benchmark

Ling Shi, Yuqin Dai, Ziyin Wang, Ning Gao, Wei Zhang, Chaozheng Wang, Yujie Wang, Wei He, Jinpeng Wang, Deiyi Xiong