arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.11225 2026-04-14 cs.CV cs.CL

Sign Language Recognition in the Age of LLMs

Vaclav Javorek, Jakub Honzik, Ivan Gruber, Tomas Zelezny, Marek Hruz

Comments Accepted at the CVPR 2026 Workshop on Multimodal Sign Language Research (MSLR), 8 pages, 3 figures

2604.11218 2026-04-14 cs.CV

H-SPAM: Hierarchical Superpixel Anything Model

Julien Walther, Rémi Giraud, Michaël Clément

2604.11216 2026-04-14 cs.AI

Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

Seulki Lee

Comments 18 pages, 15 tables, no figures. AIO Working Paper. Companion to: S. Lee (2026a)

2604.11214 2026-04-14 cs.CL

HiEdit: Lifelong Model Editing with Hierarchical Reinforcement Learning

Yangfan Wang, Tianyang Sun, Chen Tang, Jie Liu, Wei Cai, Jingchi Jiang

Comments Accept by ACL 2026

2604.11211 2026-04-14 cs.CV cs.LG cs.MM

3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis

Stefan Schulz, Fernando Edelstein, Hannah Dröge, Matthias B. Hullin, Markus Plack

2604.11209 2026-04-14 cs.CL cs.AI

Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method

Tianzhe Zhao, Jiaoyan Chen, Shuxiu Zhang, Haiping Zhu, Qika Lin, Jun Liu

Comments Accepted at SIGIR 2026

2604.11207 2026-04-14 cs.CV

LoViF 2026 Challenge on Human-oriented Semantic Image Quality Assessment: Methods and Results

Xin Li, Daoli Xu, Wei Luo, Guoqiang Xiang, Haoran Li, Chengyu Zhuang, Zhibo Chen, Jian Guan, Weping Li, Weixia Zhang, Wei Sun, Zhihua Wang, Dandan Zhu, Chengguang Zhu, Ayush Gupta, Rachit Agarwal, Shouvik Das, Biplab Ch Das, Amartya Ghosh, Kanglong Fan, Wen Wen, Shuyan Zhai, Tianwu Zhi, Aoxiang Zhang, Jianzhao Liu, Yabin Zhang, Jiajun Wang, Yipeng Sun, Kaiwei Lian, Banghao Yin

Comments Accepted by CVPR2026 Workshop; LoViF Challenge

2604.11200 2026-04-14 cs.LG cs.AI stat.ML

ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values

Tom Bewley, Salim I. Amoukou, Emanuele Albini, Saumitra Mishra, Manuela Veloso

2604.11197 2026-04-14 cs.CV

MedP-CLIP: Medical CLIP with Region-Aware Prompt Integration

Jiahui Peng, He Yao, Jingwen Li, Yanzhou Su, Sibo Ju, Yujie Lu, Jin Ye, Hongchun Lu, Xue Li, Lincheng Jiang, Min Zhu, Junlong Cheng

2604.11195 2026-04-14 cs.CV cs.AI

Towards Adaptive Open-Set Object Detection via Category-Level Collaboration Knowledge Mining

Yuqi Ji, Junjie Ke, Lihuo He, Lizhi Wang, Xinbo Gao

Comments 15 pages,9 figures,accepted by IEEE Transactions on Image Processing

2604.11193 2026-04-14 cs.CL

TRACE: An Experiential Framework for Coherent Multi-hop Knowledge Graph Question Answering

Yingxu Wang, Jiaxin Huang, Mengzhu Wang, Nan Yin

2604.11188 2026-04-14 cs.CL cs.AI

MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis

Zixiong Yu, Jun Rao, Guhan Chen, Songtao Tian, Bohan Li, Jiansheng Wei, Min Zhang, Xiaojun Meng

Comments Accepted by ACL 2026 findings

2604.11177 2026-04-14 cs.CV

Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

Shivam Sharma, Sankalp Nagaonkar, Ashish Choithani, Ashutosh Trivedi

2604.11174 2026-04-14 cs.RO cs.AI

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li

Comments 34 pages, 7 tables. Code: https://github.com/s20sc/embodied-gov-bench

2604.11171 2026-04-14 cs.CV

Development and evaluation of CADe systems in low-prevalence setting: The RARE25 challenge for early detection of Barrett's neoplasia

Tim J. M. Jaspers, Francisco Caetano, Cris H. B. Claessens, Carolus H. J. Kusters, Rixta A. H. van Eijck van Heslinga, Floor Slooter, Jacques J. Bergman, Peter H. N. De With, Martijn R. Jong, Albert J. de Groof, Fons van der Sommen

Comments The final author list is currently being finalized and will be updated in subsequent versions

2604.11170 2026-04-14 cs.CV

Do Instance Priors Help Weakly Supervised Semantic Segmentation?

Anurag Das, Anna Kukleva, Xinting Hu, Yuki M. Asano, Bernt Schiele

Comments 23 pages, 15 figures

2604.11164 2026-04-14 cs.CV

RADA: Region-Aware Dual-encoder Auxiliary learning for Barely-supervised Medical Image Segmentation

Shuang Zeng, Boxu Xie, Lei Zhu, Xinliang Zhang, Jiakui Hu, Zhengjian Yao, Yuanwei Li, Yuxing Lu, Yanye Lu

2604.11162 2026-04-14 cs.CV

Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

Camile Lendering, Erkut Akdag, Egor Bondarev

Comments Accepted for presentation at the AI4RWC Workshop at CVPR 2026

2604.11156 2026-04-14 cs.CV

rPPG-VQA: A Video Quality Assessment Framework for Unsupervised rPPG Training

Tianyang Dai, Ming Chang, Yan Chen, Yang Hu

Comments Accepted by CVPR 2026

2604.11154 2026-04-14 cs.AI

Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model

Marta López-Rauhut, Loic Landrieu, Mathieu Aubry, Anne-Laure Ligozat

Comments 28 pages, 12 figures, 8 tables

2604.11152 2026-04-14 cs.CL

SHARE: Social-Humanities AI for Research and Education

João Gonçalves, Sonia de Jager, Petr Knoth, David Pride, Nick Jelicic

Comments 23 pages, 9 figures, 4 tables

2604.11151 2026-04-14 cs.LG stat.ML

Gradient-Variation Regret Bounds for Unconstrained Online Learning

Yuheng Zhao, Andrew Jacobsen, Nicolò Cesa-Bianchi, Peng Zhao

2604.11144 2026-04-14 cs.CV cs.CL cs.MM

Hierarchical Textual Knowledge for Enhanced Image Clustering

Yijie Zhong, Yunfan Gao, Weipeng Jiang, Haofen Wang

Comments Accepted by CVPR 2026

2604.11142 2026-04-14 cs.CV

Naka-GS: A Bionics-inspired Dual-Branch Naka Correction and Progressive Point Pruning for Low-Light 3DGS

Runyu Zhu, SiXun Dong, Zhiqiang Zhang, Qingxia Ye, Zhihua Xu

2604.11141 2026-04-14 cs.LG cs.CR

Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

Chenhao Fang, Jordi Mola, Mark Harman, Jason Nawrocki, Vaibhav Shrivastava, Yue Cheng, Jay Minesh Shah, Katayoun Zand, Mansi Tripathi, Arya Pudota, Matthew Becker, Hervé Robert, Abhishek Gulati

2604.11140 2026-04-14 cs.CV

Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE

Wei Bao, Yuehan Wang, Tianhang Zhou, Siqi Li, Yue Gao

2604.11138 2026-04-14 cs.RO cs.CV

ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation

Arjun Bhardwaj, Maximum Wilder-Smith, Mayank Mittal, Vaishakh Patil, Marco Hutter

2604.11136 2026-04-14 cs.CV cs.AI

BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning

Zekun Qian, Ruize Han, Wei Feng

2604.11135 2026-04-14 cs.RO cs.LG

AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps

Liaoyuan Fan, Zetian Xu, Chen Cao, Wenyao Zhang, Mingqi Yuan, Jiayu Chen

详情

英文摘要

Pretrained video generation models provide strong priors for robot control, but existing unified world action models still struggle to decode reliable actions without substantial robot-specific training. We attribute this limitation to a structural mismatch: while video models capture how scenes evolve, action generation requires explicit reasoning about where to interact and the underlying manipulation intent. We introduce AIM, an intent-aware unified world action model that bridges this gap via an explicit spatial interface. Instead of decoding actions directly from future visual representations, AIM predicts an aligned spatial value map that encodes task-relevant interaction structure, enabling a control-oriented abstraction of future dynamics. Built on a pretrained video generation model, AIM jointly models future observations and value maps within a shared mixture-of-transformers architecture. It employs intent-causal attention to route future information to the action branch exclusively through the value representation. We further propose a self-distillation reinforcement learning stage that freezes the video and value branches and optimizes only the action head using dense rewards derived from projected value-map responses together with sparse task-level signals. To support training and evaluation, we construct a simulation dataset of 30K manipulation trajectories with synchronized multi-view observations, actions, and value-map annotations. Experiments on RoboTwin 2.0 benchmark show that AIM achieves a 94.0% average success rate, significantly outperforming prior unified world action baselines. Notably, the improvement is more pronounced in long-horizon and contact-sensitive manipulation tasks, demonstrating the effectiveness of explicit spatial-intent modeling as a bridge between visual world modeling and robot control.

URL PDF HTML ☆

赞 0 踩 0

2604.11131 2026-04-14 cs.AI cs.LG cs.MA

MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments

Abhishek Sawaika, Samuel Yen-Chi Chen, Udaya Parampalli, Rajkumar Buyya

Comments Accepted in QC4C3 Workshop at IEEE QCNC, 2026