arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.05505 2026-04-14 cs.CL

FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse

Yubo Hou, Zhisheng Chen, Tao Wan, Zengchang Qin

详情

英文摘要

The stateless architecture of Large Language Models inherently lacks the mechanism to preserve dynamic context, compelling agents to redundantly reprocess history to maintain long-horizon autonomy. While latent memory offers a solution, current approaches are hindered by architectural segregation, relying on auxiliary encoders that decouple memory from the reasoning backbone. We propose FlashMem, a framework that distills intrinsic memory directly from transient reasoning states via computation reuse. Leveraging the property that internal representations uniquely encode input trajectories, FlashMem identifies the last hidden state as a sufficient statistic for the interaction history. This enables a Shared-KV Consolidator to synthesize memory by attending directly to the backbone's frozen cache, eliminating redundant re-parameterization. Furthermore, a parameter-free Cognitive Monitor leverages attention entropy to adaptively trigger consolidation only when high epistemic uncertainty is detected. Experiments demonstrate that FlashMem matches the performance of heavy baselines while reducing inference latency by 5 times, effectively bridging the gap between efficiency and persistent cognition.

URL PDF HTML ☆

赞 0 踩 0

2601.05499 2026-04-14 cs.RO

TOSC: Task-Oriented Shape Completion for Open-World Dexterous Grasp Generation from Partial Point Clouds

Weishang Wu, Yifei Shi, Zhiping Cai

Comments Accepted to AAAI 2026

2601.04392 2026-04-14 cs.LG cs.AI cs.RO cs.SY eess.SY math.OC

Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay

Mohsen Jalaeian-Farimani, Xiong Xiong, Luca Bascetta

Comments Accepted in ECC26 conference

2601.03926 2026-04-14 cs.CL

Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models

Haeun Jang, Hwan Chang, Hwanhee Lee

Comments ACL 2026 Findings

2601.02956 2026-04-14 cs.CL

Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion

Jeonghyun Park, Byeongjeong Kim, Seojin Hwang, Hwanhee Lee

Comments ACL 2026 Findings

2512.20563 2026-04-14 cs.CV cs.AI cs.LG cs.RO

LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, Kashyap Chitta

Comments Accepted at CVPR 2026

2512.19691 2026-04-14 cs.AI stat.AP

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati

Comments Github codebase: https://github.com/junzeye/validate-medcalc-labels

2512.18994 2026-04-14 cs.CV

Dual-Margin Embedding for Fine-Grained Long-Tailed Plant Taxonomy

Cheng Yaw Low, Heejoon Koo, Jaewoo Park, Meeyoung Cha

Comments 4 figures, 5 tables, and 17 pages

2512.18073 2026-04-14 cs.CV

FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis

Ekta Gavas, Sudipta Banerjee, Chinmay Hegde, Nasir Memon

Comments Revised version with additional experiments and code release

2512.07661 2026-04-14 cs.CV

Optimization-Guided Diffusion for Interactive Scene Generation

Shihao Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang Li

2512.01512 2026-04-14 cs.CL

MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

Yexing Du, Kaiyuan Liu, Youcheng Pan, Bo Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin, YaoWei Wang

Comments Accepted in IEEE TASLP

2512.01390 2026-04-14 cs.CV

FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

Seungho Choi, Jeahun Sung, Jihyong Oh

Comments CVPR 2026 (camera ready ver.). Please visit our project page at https://cmlab-korea.github.io/FRAMER/

2511.19172 2026-04-14 cs.CV

MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes

Kehua Chen, Tianlu Mao, Xinzhu Ma, Hao Jiang, Zehao Li, Zihan Liu, Shuqin Gao, Honglong Zhao, Feng Dai, Yucheng Zhang, Zhaoqi Wang

Comments Accepted by CVPR26; Project page: https://m3phist0.github.io/MetroGS

2511.18082 2026-04-14 cs.CV cs.RO

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

Wencheng Ye, Tianshi Wang, Lei Zhu, Fengling Li, Guoli Yang, Hengtao Shen

2511.17441 2026-04-14 cs.RO

RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation

Shihan Wu, Xuecheng Liu, Shaoxuan Xie, Pengwei Wang, Xinghang Li, Bowen Yang, Zhe Li, Kai Zhu, Hongyu Wu, Yiheng Liu, Zhaoye Long, Runtian Xu, Yue Wang, Chong Liu, Dihan Wang, Ziqiang Ni, Xiang Yang, You Liu, Ruoxuan Feng, Lei Zhang, Denghang Huang, Chenghao Jin, Anlan Yin, Xinlong Wang, Zhenguo Sun, Junkai Zhao, Mengfei Du, Mingyu Cao, Xiansheng Chen, Hongyang Cheng, Xiaojie Zhang, Yankai Fu, Ning Chen, Cheng Chi, Sixiang Chen, Huaihai Lyu, Xiaoshuai Hao, Yequan Wang, Bo Lei, Dong Liu, Xi Yang, Yance Jiao, Tengfei Pan, Yunyan Zhang, Songjing Wang, Ziqian Zhang, Xu Liu, Ji Zhang, Caowei Meng, Zhizheng Zhang, Jiyang Gao, Song Wang, Xiaokun Leng, Zhiqiang Xie, Zhenzhen Zhou, Peng Huang, Wu Yang, Yandong Guo, Yichao Zhu, Suibing Zheng, Hao Cheng, Xinmin Ding, Yang Yue, Huanqian Wang, Chi Chen, Jingrui Pang, YuXi Qian, Haoran Geng, Lianli Gao, Haiyuan Li, Bin Fang, Gao Huang, Yaodong Yang, Hao Dong, He Wang, Hang Zhao, Yadong Mu, Di Hu, Hao Zhao, Tiejun Huang, Shanghang Zhang, Yonghua Lin, Zhongyuan Wang, Guocai Yao

Comments Add experiments

2511.15875 2026-04-14 cs.CV

Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation

Lukas Arzoumanidis, Julius Knechtel, Jan-Henrik Haunert, Youness Dehbi

2511.14393 2026-04-14 cs.RO

Perception-aware Exploration for Consumer-grade UAVs

Svetlana Seliunina, Daniel Schleich, Sven Behnke

2511.11232 2026-04-14 cs.CV

DoReMi: Bridging 3D Domains via Topology-Aware Domain-Representation Mixture of Experts

Mingwei Xing, Xinliang Wang, Yifeng Shi

Comments The first two authors contributed equally to this paper

2510.27484 2026-04-14 cs.LG cs.AI cs.CL

Thought Branches: Interpreting LLM Reasoning Requires Resampling

Uzay Macar, Paul C. Bogdan, Senthooran Rajamanoharan, Neel Nanda

Comments Uzay Macar and Paul C. Bogdan contributed equally to this work, and their listed order was determined by coinflip

2510.22329 2026-04-14 cs.AI math.OC

Graph-Coarsening Approach for the Capacitated Vehicle Routing Problem with Time Windows

Mustafa Mert Özyılmaz

Comments 17 pages, 30 figures. A revised version with quantum solver experiment results

2510.17934 2026-04-14 cs.CL cs.AI

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

Haoyu Huang, Hong Ting Tsang, Jiaxin Bai, Xi Peng, Gong Zhang, Yangqiu Song

Comments ICLR 2026

2510.17516 2026-04-14 cs.CL cs.AI cs.CY cs.LG

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger

Comments Accepted at ICLR 2026. Project Website: http://simbench.tiancheng.hu/ Data: https://huggingface.co/datasets/pitehu/SimBench

2510.16333 2026-04-14 cs.CV cs.LG

RL makes MLLMs see better than SFT

Junha Song, Sangdoo Yun, Dongyoon Han, Jaegul Choo, Byeongho Heo

详情

英文摘要

A dominant assumption in Multimodal Language Model (MLLM) research is that its performance is largely inherited from the LLM backbone, given its immense parameter scale and remarkable capabilities. This has created a void in the understanding of the vision encoder, which determines how MLLMs perceive images. The recent shift in MLLM training paradigms, from Supervised Finetuning (SFT) to Reinforcement Learning (RL), magnifies this oversight-namely, the significant lack of analysis on how such training reshapes the vision encoder as well as the MLLM. To address this, we first investigate the impact of training strategies on MLLMs, where RL shows a clear advantage over SFT in strongly vision-related VQA benchmarks. Motivated by this, we conduct a critical yet under-explored analysis of the vision encoder of MLLMs through diverse and in-depth experiments, ranging from ImageNet classification and segmentation to gradient visualization. Our results demonstrate that MLLM's post-training strategy (i.e., SFT or RL) not only leads to distinct outcomes on MLLM downstream tasks, but also fundamentally reshapes MLLM's underlying visual representations. Specifically, the key finding of our study is that RL produces stronger and precisely localized visual representations compared to SFT, boosting the ability of the vision encoder for MLLM. We then reframe our findings into a simple recipe for building strong vision encoders for MLLMs, Preference-Instructed Vision OpTimization (PIVOT). When integrated into MLLMs, a PIVOT-trained vision encoder outperforms even larger and more heavily-trained counterparts, despite requiring less than 1% of the computational cost of standard vision pretraining. This result opens an effective and efficient path for advancing the vision backbones of MLLMs. Project page available at https://june-page.github.io/pivot/

URL PDF HTML ☆

赞 0 踩 0

2510.11217 2026-04-14 cs.CL cs.AI

Domain-Specific Data Generation Framework for RAG Adaptation

Chris Xing Tian, Weihao Xie, Zhen Chen, Zhengyuan Yi, Hui Liu, Haoliang Li, Shiqi Wang, Siwei Ma

Comments To appear in ACL 2026

2510.10182 2026-04-14 cs.CL cs.AI

A Survey of Inductive Reasoning for Large Language Models

Kedi Chen, Dezhao Ruan, Yuhao Dan, Yaoting Wang, Siyu Yan, Xuecheng Wu, Yinqi Zhang, Qin Chen, Jie Zhou, Liang He, Biqing Qi, Linyang Li, Qipeng Guo, Xiaoming Shi, Wei Zhang

2510.09389 2026-04-14 cs.LG cs.AI

Design Principles for Sequence Models via Coefficient Dynamics

Jerome Sieber, Antonio Orvieto, Melanie N. Zeilinger, Carmen Amo Alonso

2510.07972 2026-04-14 cs.AI

SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

Pengkun Jiao, Yiming Jin, Jianhui Yang, Chenhe Dong, Zerui Huang, Shaowei Yao, Xiaojiang Zhou, Dan Ou, Haihong Tang

2510.05837 2026-04-14 cs.CL

EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget

Liang Chen, Xueting Han, Qizhou Wang, Bo Han, Jing Bai, Hinrich Schutze, Kam-Fai Wong

Comments ICLR 2026

2510.01152 2026-04-14 cs.CL

MASH: Modeling Abstention via Selective Help-Seeking

Mustafa Omer Gul, Claire Cardie, Tanya Goyal

Comments 25 pages, with 15 dedicated to citations and appendix. 17 tables and 11 figures. Preprint, under review. Paper updated to reflect new title and results

2509.26306 2026-04-14 cs.AI

Interactive Learning for LLM Reasoning

Hehai Lin, Shilei Cao, Sudong Wang, Haotian Wu, Minzhi Li, Linyi Yang, Juepeng Zheng, Chengwei Qin

Comments The code is available at https://github.com/linhh29/Interactive-Learning-for-LLM-Reasoning

详情

英文摘要

Existing multi-agent learning approaches have developed interactive training environments to explicitly promote collaboration among multiple Large Language Models (LLMs), thereby constructing stronger multi-agent systems (MAS). However, during inference, they require re-executing the MAS to obtain final solutions, which diverges from human cognition that individuals can enhance their reasoning capabilities through interactions with others and resolve questions independently in the future. To investigate whether multi-agent interaction can enhance LLMs' independent problem-solving ability, we introduce ILR, a novel co-learning framework for MAS that integrates two key components: Dynamic Interaction and Perception Calibration. Specifically, Dynamic Interaction first adaptively selects either cooperative or competitive strategies depending on question difficulty and model ability. LLMs then exchange information through Idea3, an innovative interaction paradigm designed to mimic human discussion, before deriving their respective final answers. In Perception Calibration, ILR employs Group Relative Policy Optimization (GRPO) to train LLMs while integrating one LLM's reward distribution characteristics into another's reward function, thereby enhancing the cohesion of multi-agent interactions. We evaluate the effectiveness of ILR across three LLMs from two model families of varying scales on five mathematical, one coding, one general question answering, and one scientific reasoning benchmarks. Experimental results show that ILR consistently outperforms single-agent learning, yielding an improvement of up to 5% over the strongest baseline. We further discover that Idea3 can enhance the robustness of stronger LLMs during multi-agent inference, and dynamic interaction types can boost multi-agent learning compared to pure cooperative or competitive strategies.

URL PDF HTML ☆

赞 0 踩 0