arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.11129 2026-04-14 cs.CL

DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning

Feiyang Li, Yile Wang

Comments Accepted to ACL 2026 Findings

详情

英文摘要

Task vectors, representing directions in model or activation spaces that encode task-specific behaviors, have emerged as a promising tool for steering large language models (LLMs). However, existing approaches typically require fine-tuning or invasive manipulation of internal states, limiting their flexibility and scalability. We propose \textsc{DeCoVec} (Decoding Space based Task Vector), a training-free and non-invasive framework that constructs task vectors directly in the \textit{decoding space} by leveraging in-context learning (ICL). Specifically, \textsc{DeCoVec} captures the task essence as the difference between the output logit distributions of few-shot and zero-shot prompts, then steers generation by injecting this vector into the decoding process. Experiments across seven LLMs (0.5B--9B) on TruthfulQA, Math-500, and AQUA-RAT show that \textsc{DeCoVec} consistently outperforms standard few-shot baselines, with gains up to +5.50 average accuracy. Further analysis demonstrates that \textsc{DeCoVec} effectively suppresses generation degeneration and logical flaws while exhibiting strong robustness to demonstration ordering, all without incurring additional input token costs. Our method offers a training-free and non-invasive solution for LLM steering without requiring weight updates or auxiliary models.

URL PDF HTML ☆

赞 0 踩 0

2604.11125 2026-04-14 cs.AI

A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health

Nikhil Mehta, Sachin Gupta, Gouri RP Anand

2604.11122 2026-04-14 cs.CV cs.AI

Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding

Yueying Li, Fengxiang Wang, Yan Li, Mingshuo Chen, Mengying Zhao, Long Lan

2604.11118 2026-04-14 cs.LG stat.ML

Distributionally Robust K-Means Clustering

Vikrant Malik, Taylan Kargin, Babak Hassibi

2604.11112 2026-04-14 cs.LG cs.CV

Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning

Linjie Li, Huiyu Xiao, Jiarui Cao, Zhenyu Wu, Yang Ji

Comments Accepted to CVPR2026

2604.11104 2026-04-14 cs.AI cs.IR cs.LG cs.NE

Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds

Pierre Jourlin

Comments Source code and raw results available: https://github.com/jourlin/synsynth (licence Hypocratic)

2604.11102 2026-04-14 cs.CV cs.MM

OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video

Junfu Pu, Yuxin Chen, Teng Wang, Ying Shan

Comments Project Page: https://arcomniscript.github.io

2604.11097 2026-04-14 cs.CV

CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

Rongjia Yu, Tong Jia, Hao Wang, Xiaofang Li, Xiao Yang, Zinuo Zhang, Cuiwei Liu

Comments preprint version of IEEE TMM 2026 Regular Paper

2604.11096 2026-04-14 cs.CL cs.AI cs.SD

Efficient Training for Cross-lingual Speech Language Models

Yan Zhou, Qingkai Fang, Yun Hong, Yang Feng

Comments Accepted to Findings of ACL 2026

2604.11095 2026-04-14 cs.LG cs.AI

Bottleneck Tokens for Unified Multimodal Retrieval

Siyu Sun, Jing Ren, Zhaohe Liao, Dongxiao Mao, Xiangyuan Ren, Yiyi Zhang, Haohua Zhao, Weixiong Lin, Jiang Shaohua, Liqing Zhang, Yuchao Zheng

2604.11091 2026-04-14 cs.CV

LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning

Linjie Li, Zhenyu Wu, Huiyu Xiao, Yang Ji

Comments Accepted to ICASSP2026

2604.11090 2026-04-14 cs.RO

Simulator Adaptation for Sim-to-Real Learning of Legged Locomotion via Proprioceptive Distribution Matching

Jeremy Dao, Alan Fern

2604.11082 2026-04-14 cs.CV

RESP: Reference-guided Sequential Prompting for Visual Glitch Detection in Video Games

Yakun Yu, Ashley Wiens, Adrián Barahona-Ríos, Benedict Wilkins, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer

2604.11081 2026-04-14 cs.CV

MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling

Mingyang Li, Brian Lee, Rui Zuo, Brent Bacchus, Priyantha Mudalige, Qinru Qiu

Comments 6 pages, 4 figures, 5 tables

2604.10970 2026-04-14 cs.CV

Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization

Ben Isselmann, Dilara Göksu, Heinz Neumann, Andreas Weinmann

Comments 29 pages, 8 figures, submitted to BMC Bioinformatics. arXiv admin note: text overlap with arXiv:2602.05527

2604.10799 2026-04-14 cs.CL cs.AI

Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

Krzysztof Ociepa, Łukasz Flis, Remigiusz Kinas, Krzysztof Wróbel, Adrian Gwoździej

Comments arXiv admin note: text overlap with arXiv:2601.11579

2604.09459 2026-04-14 cs.CL

From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

Chenchen Zhang

2604.09364 2026-04-14 cs.CV cs.CL

Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts

Farhad Nooralahzadeh, Omid Rohanian, Yi Zhang, Jonathan Fürst, Kurt Stockinger

详情

英文摘要

When a Vision-Language Model (VLM) sees a blue banana and answers "yellow", is the problem of perception or arbitration? We explore the question in ten VLMs with various sizes and reveal an Encoding-Grounding Dissociation: models that fail to report what they see (and thus provide a wrong answer) still encode the visual evidence as strongly as models that provide the correct answer. Using Multimodal Arbitration Crossover (MAC) analysis with layer-by-layer Logit Lens probing, we track the competition between visual and prior signals across every layer of each model. We show that visual attributes can be linearly decodable from early layers (AUC > 0.86). The accuracy remains nearly identical for both successful and failed samples. However, the gap in the final-layer logit - not the strength of encoding - better predicts grounding outcomes with a correlation of $ρ=$ 0.847. After having studied when VLMs base their answers on image clues rather than prior knowledge, we want to understand the causal relationships. We establish causality through full-sequence activation patching. The standard last-token interventions in LLM interpretability do not affect VLMs. In contrast, replacing the full token sequence at layers identified by MAC alters 60 to 84% of outputs. Partial-token decomposition shows that image tokens carry almost all of the causal impact, while text tokens have none. Scaling addresses the remaining architectural differences to achieve perfect retention. Moving from diagnosis to intervention, we show that training-free activation steering - both linear and sparse autoencoder-guided - in early layers can improve visual grounding by up to +3.8% with degrading performance in some setups. Overall, these findings lead to a clear conclusion: VLMs already see well, but the challenge is acting on what they see. Targeted interventions can help to bridge this gap.

URL PDF HTML ☆

赞 0 踩 0

2604.09249 2026-04-14 cs.CV cs.IR

FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding

Kaidong Feng, Zhuoxuan Huang, Huizhong Guo, Yuting Jin, Xinyu Chen, Yue Liang, Yifei Gai, Li Zhou, Yunshan Ma, Zhu Sun

2604.09168 2026-04-14 cs.CV

ELT: Elastic Looped Transformers for Visual Generation

Sahil Goyal, Swayam Agrawal, Gautham Govind Anil, Prateek Jain, Sujoy Paul, Aditya Kusupati

2604.09066 2026-04-14 cs.CL

Anchored Sliding Window: Toward Robust and Imperceptible Linguistic Steganography

Ruiyi Yan, Shiao Meng, Yugo Murawaki

Comments ACL2026 Main

2604.08718 2026-04-14 cs.CV cs.AI cs.RO

Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring

Xinmiao Xiong, Bangya Liu, Hao Wang, Dayou Li, Nuo Chen, Andrew Feng, Mingyu Ding, Suman Banerjee, Yang Zhou, Zhiwen Fan

2604.08701 2026-04-14 cs.CV cs.LG

Unified Multimodal Uncertain Inference

Dengjia Zhang, Alexander Martin, William Jurayj, Kenton Murray, Benjamin Van Durme, Reno Kriz

Comments Update citations

2604.08538 2026-04-14 cs.CV

ParseBench: A Document Parsing Benchmark for AI Agents

Boyang Zhang, Sebastián G. Acosta, Preston Carlson, Sacha Bron, Pierre-Loïc Doulcet, Daniel B. Ospina, Simon Suo

2604.08052 2026-04-14 cs.CL cs.CR

Efficient Provably Secure Linguistic Steganography via Range Coding

Ruiyi Yan, Yugo Murawaki

Comments ACL2026 Main

2604.07886 2026-04-14 cs.CL

Linear Representations of Hierarchical Concepts in Language Models

Masaki Sakata, Benjamin Heinzerling, Takumi Ito, Sho Yokoi, Kentaro Inui

Comments 27 pages, 18 figures, 11 tables

2604.07466 2026-04-14 cs.CL

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

Avyav Kumar Singh, Yen-Chen Wu, Alexandru Cioba, Alberto Bernacchia, Davide Buffelli

2604.07209 2026-04-14 cs.CV

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

InSpatio Team, Donghui Shen, Guofeng Zhang, Haomin Liu, Haoyu Ji, Hujun Bao, Hongjia Zhai, Jialin Liu, Jing Guo, Nan Wang, Siji Pan, Weihong Pan, Weijian Xie, Xianbin Liu, Xiaojun Xiang, Xiaoyu Zhang, Xinyu Chen, Yifu Wang, Yipeng Chen, Zhenzhou Fan, Zhewen Le, Zhichao Ye, Ziqiang Zhao

2604.06939 2026-04-14 cs.CV

Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesis

Jintao Chen, Chengyu Bai, Junjun Hu, Xinda Xue, Mu Xu

2604.05697 2026-04-14 cs.RO cs.SY eess.SY

GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps

Elizaveta Semenyakina, Ivan Snegirev, Mariya Lezina, Miguel Altamirano Cabrera, Safina Gulyamova, Dzmitry Tsetserukou

Comments 6 pages, 4 figures, 4 tables. Minor non-semantic changes in the main scheme