arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.20705 2026-04-23 cs.CV

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

详情

英文摘要

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable reward designs. In this work, we introduce SSL-R1, a generic self-supervised RL framework that derives verifiable rewards directly from images. To this end, we revisit self-supervised learning (SSL) in visual domains and reformulate widely-used SSL tasks into a set of verifiable visual puzzles for RL post-training, requiring neither human nor external model supervision. Training MLLMs on these tasks substantially improves their performance on multimodal understanding and reasoning benchmarks, highlighting the potential of leveraging vision-centric self-supervised tasks for MLLM post-training. We think this work will provide useful experience in devising effective self-supervised verifiable rewards to enable RL at scale. Project page: https://github.com/Jiahao000/SSL-R1.

URL PDF HTML ☆

赞 0 踩 0

2604.20696 2026-04-23 cs.CV

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

2604.20692 2026-04-23 cs.RO

A Kinematic Framework for Evaluating Pinch Configurations in Robotic Hand Design without Object or Contact Models

HyoJae Kang, Joonho Lee, Hyunmok Jung, Dong Il Park

Comments This manuscript has been submitted for possible publication

2604.20688 2026-04-23 cs.LG cs.AI

Storm Surge Modeling, Bias Correction, Graph Neural Networks, Graph Convolution Networks

Noujoud Nader, Stefanos Giaremis, Clint Dawson, Carola Kaiser, Karame Mohammadiporshokooh, Hartmut Kaiser

Comments 51 pages, 9 figures, 5 tables

2604.20686 2026-04-23 cs.RO

Kinematic Optimization of Phalanx Length Ratios in Robotic Hands Using Potential Dexterity

HyoJae Kang, Joonho Lee, Jeongdo Ahn, Dong Il Park

Comments This manuscript has been submitted for possible publication

2604.20685 2026-04-23 cs.LG

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

Andor Vári-Kakas, Ji Won Park, Natasa Tagasovska

Comments Accepted to the Algorithmic Fairness Across Alignment Procedures and Agentic Systems Workshop at ICLR 2026

2604.20682 2026-04-23 cs.LG

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

Samuel Salfati

Comments 18 pages, 10 figures

2604.19683 2026-04-23 cs.RO

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang

Comments 16 pages,5 figures

2604.19593 2026-04-23 cs.CL cs.AI cs.LG

RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian

Mircea Timpuriu, Mihaela-Claudia Cercel, Dumitru-Clementin Cercel

2604.19278 2026-04-23 cs.AI cs.MA

Explicit Trait Inference for Multi-Agent Coordination

Suhaib Abdurahman, Etsuko Ishii, Katerina Margatina, Divya Bhargavi, Monica Sunkara, Yi Zhang

Comments Accepted at ACL 2026 Main Conference

2604.15259 2026-04-23 cs.LG cs.AI

Stability and Generalization in Looped Transformers

Asher Labovich

Comments 11 main pages, 27 total

2604.11098 2026-04-23 cs.CV cs.LG eess.SP

Efficient Transceiver Design for Aerial Image Transmission and Large-scale Scene Reconstruction

Zeyi Ren, Jialin Dong, Wei Zuo, Yikun Wang, Bingyang Cheng, Sheng Zhou, Zhisheng Niu

Comments 6 pages, 6 figures, Accepted in ISIT 2026 IEEE International Symposium on Information Theory-w

2604.09734 2026-04-23 cs.CV cs.AI

Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy

Mehdi Fatan Serj, C. Alejandro Parraga, Xavier Otazu

2604.08570 2026-04-23 cs.LG cs.AI cs.PL cs.SE quant-ph

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Ali Slim, Haydar Hamieh, Jawad Kotaich, Yehya Ghosn, Mahdi Chehimi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, Bernard Ghanem

Comments 24 pages total, 25 figures, 5 tables, including supplementary material. Accepted to the ICLR 2026 Workshop on I Can't Believe It's Not Better

2603.23286 2026-04-23 cs.CV

Physical Knot Classification Beyond Accuracy: A Benchmark and Diagnostic Study

Shiheng Nie, Yunguang Yue

Comments 20 pages, 2 figures, supplementary material included

2603.20714 2026-04-23 cs.CV

The Role and Relationship of Initialization and Densification in 3D Gaussian Splatting

Ivan Desiatov, Torsten Sattler

Comments Sources are available at https://github.com/deivse/ivd_splat . Changes in this version: fixed wrong graphs being used in Fig. 6 (b), Fig. 10 (a,c,d) due to compilation issue; results with EDGS* are now using splat scale increase when reducing init. size (previously reported results without scale increase, but conclusions remain unchanged)

2603.12451 2026-04-23 cs.LG

Overcoming the Modality Gap in Context-Aided Forecasting

Vincent Zhihao Zheng, Étienne Marcotte, Arjun Ashok, Andrew Robert Williams, Lijun Sun, Alexandre Drouin, Valentina Zantedeschi

2603.06870 2026-04-23 cs.AI

LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning

Denys Pushkin, Emmanuel Abbe

Comments 28 pages, 5 figures, 2 tables. Updated version to reflect the manuscript under review at COLM 2026

2602.10386 2026-04-23 cs.LG

Colorful Talks with Graphs: Human-Interpretable Graph Encodings for Large Language Models

Angelo Zangari, Peyman Baghershahi, Sourav Medya

Comments Accepted to ACL Findings 2026 22 pages, 18 tables, 5 figures

2601.09871 2026-04-23 cs.AI cs.HC cs.LG

Epistemology gives a Future to Complementarity in Human-AI Interactions

Andrea Ferrario, Alessandro Facchini, Juan M. Durán

Comments Submitted

详情

英文摘要

Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the humanAI interaction literature, it has gained traction by generalizing the reliance paradigm and by offering a more practical alternative to the contested construct of trust in AI. Yet complementarity faces key theoretical challenges: it lacks precise theoretical anchoring, it is formalized only as a post hoc indicator of relative predictive accuracy, it remains silent about other desiderata of human-AI interactions, and it abstracts away from the magnitude-cost profile of its performance gain. As a result, complementarity is difficult to obtain in empirical settings. In this work, we leverage epistemology to address these challenges by reframing complementarity within the discourse on justificatory AI. Drawing on computational reliabilism, we argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task. Together with other reliability indicators assessing the alignment of the human-AI team with the epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions. This repositioning supports the practical reasoning of those affected by these outputs -- patients, managers, regulators, and others. Our approach suggests that the role and value of complementarity lie not in providing a stand-alone measure of relative predictive accuracy, but in helping calibrate decision-making to the reliability of AI-supported processes. We conclude by translating this repositioning into design- and governance-oriented recommendations, including a minimal reporting checklist for justificatory human-AI interactions and measures of efficient complementarity.

URL PDF HTML ☆

赞 0 踩 0

2601.03396 2026-04-23 cs.CL

Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation

Maan Qraitem, Kate Saenko, Bryan A. Plummer

2512.08730 2026-04-23 cs.CV

SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images

Kaiyu Li, Shengqi Zhang, Yujie Wang, Yupeng Deng, Zhi Wang, Deyu Meng, Xiangyong Cao

2511.19367 2026-04-23 cs.CV cs.AI

AnatomicalNets: A Multi-Structure Segmentation and Contour-Based Distance Estimation Pipeline for Clinically Grounded Lung Cancer T-Staging

Saniah Kayenat Chowdhury, Rusab Sarmun, Muhammad E. H. Chowdhury, Sohaib Bassam Zoghoul, Israa Al-Hashimi, Adam Mushtak, Amith Khandakar

2510.15751 2026-04-23 cs.LG

SAMix: Calibrated and Accurate Continual Learning via Sphere-Adaptive Mixup and Neural Collapse

Trung-Anh Dang, Vincent Nguyen, Ngoc-Son Vu, Christel Vrain

2509.21267 2026-04-23 cs.CL cs.CY

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework

Shomik Jain, Jack Lanchantin, Maximilian Nickel, Candace Ross, Karen Ullrich, Ashia Wilson, Jamelle Watson-Daniels

2509.20138 2026-04-23 cs.AI

Formal Verification of Minimax Algorithms

Wieger Wesselink, Kees Huizing, Huub van de Wetering

Comments 18 pages. Revised and extended version submitted to CAV 2026

2509.03740 2026-04-23 cs.CV cs.CL

CLIP-SVD: Efficient and Interpretable Vision-Language Adaptation via Singular Values

Taha Koleilat, Hassan Rivaz, Yiming Xiao

Comments TMLR 2026

2509.03335 2026-04-23 cs.LG

EvolveSignal: A Large Language Model Powered Coding Agent for Discovering Traffic Signal Control Strategies

Leizhen Wang, Peibo Duan, Hao Wang, Yue Wang, Jian Xu, Nan Zheng, Zhenliang Ma

2508.16676 2026-04-23 cs.LG cs.CL

WISCA: A Lightweight Model Transition Method to Improve LLM Training via Weight Scaling

Jiacheng Li, Jianchao Tan, Zhidong Yang, Pingwei Sun, Feiye Huo, Jiayu Qin, Xiangyu Zhang, Maoxin He, Yerui Sun, Yuchen Xie, Guangming Tan, Weile Jia, Xunliang Cai, Tong Zhao

Comments Findings of the Association for Computational Linguistics: ACL 2026

2412.14590 2026-04-23 cs.LG

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Zhen Zheng, Xiaonan Song, Chuanjie Liu

Comments Accepted at MLSys 2026