arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.06070 2026-05-08 cs.CV

Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models

Zhikai Li, Yue Zhao, Edward Zhongwei Zhang, Xuewen Liu, Jing Zhang, Qingyi Gu, Zhen Dong

详情

英文摘要

Reinforcement learning from human feedback (RLHF) effectively promotes preference alignment of text-to-image (T2I) diffusion models. To improve computational efficiency, direct preference optimization (DPO), which avoids explicit reward modeling, has been widely studied. However, its reliance on binary feedback limits it to coarse-grained modeling on chosen-rejected pairs, resulting in suboptimal optimization. In this paper, we propose ArenaPO, which leverages Arena scores as offline rewards to provide refined feedback, thus achieving efficient and fine-grained optimization without a reward model. This enables ArenaPO to benefit from both the rich rewards of traditional RLHF and the efficiency of DPO. Specifically, we first construct a model Arena in which each model's capability is represented as a Gaussian distribution, and infer these capabilities by traversing the annotated pairwise preferences. Each output image is treated as a sample from the corresponding capability distribution. Then, for a image pair, conditioned on the two capability distributions and the observed pairwise preference, the absolute quality gap is estimated using latent-variable inference based on truncated normal distribution, which serves as fine-grained feedback during training. It does not require a reward model and can be computed offline, thus introducing no additional training overhead. We conduct ArenaPO training on Pick-a-Pic v2 and HPD v3 datasets, showing that ArenaPO consistently outperforms existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.06068 2026-05-08 cs.AI cs.DC

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori, Shihang Li, Simon Peter, Baris Kasikci

2605.06067 2026-05-08 cs.LG cs.AI

Normalized Architectures are Natively 4-Bit

Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg

2605.06066 2026-05-08 cs.LG cs.AI

Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark

Cristiano da Costa Cunha, Ajmal Mian, Tim French, Wei Liu

Comments 21 pages, 8 figures, 9 tables, 1 algorithm

详情

英文摘要

Causal reinforcement learning (RL) lacks benchmarks for complex systems that combine sequential decision making, hidden information, large masked action spaces, and explicit causal structure. We introduce MTG-Causal-RL, a Gymnasium benchmark built on Magic: The Gathering with a 3,077-dimensional partial observation, a 478-action masked discrete action space, five competitive Standard archetypes, three reward schemes, and a hand-specified Structural Causal Model (SCM) over strategic variables. Every episode exposes causal variables, SCM-predicted intervention effects, and per-factor credit traces, making causal credit assignment, leave-one-out cross-archetype transfer, and policy auditability first-class metrics. We adapt a panel of reference baselines: random, heuristic, masked PPO, a causal-world-model PPO variant, and an architecture-matched scalar control. We propose Causal Graph-Factored Advantage PPO (CGFA-PPO) as a reference causal agent that uses SCM parents of win probability as factor-aligned critic targets with an intervention-calibration loss. All comparisons use paired seeds, paired-bootstrap confidence intervals, and Holm-Bonferroni correction within pre-registered families. Masked PPO and CGFA-PPO reach competitive in-distribution win rates and exceed the random baseline; per-factor calibration trajectories and leave-one-out transfer gaps expose diagnostic structure that scalar win rate alone cannot. We release the benchmark, reference-baseline results, and full evaluation protocol openly. By coupling a strategically rich, partially observed domain with an explicit causal interface and statistical protocol, MTG-Causal-RL gives causal-RL, world-model, and LLM-agent research a shared testbed for questions current benchmarks cannot pose together: causal credit assignment under masked action spaces, structural transfer across archetypes, and SCM-grounded policy auditability.

URL PDF HTML ☆

赞 0 踩 0

2605.06064 2026-05-08 cs.CV

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers

Xiangyue Zhang, Yiyi Cai, Kunhang Li, Kaixing Yang, You Zhou, Zhengqing Li, Xuangeng Chu, Jiaxu Zhang, Haiyang Liu

2605.06062 2026-05-08 cs.RO cs.SY eess.SY

Monitoring autonomous persistent surveillance missions using invariance

Vladislav Nenchev, Prodromos Sotiriadis

Comments Accepted at IEEE ICRA 2026

2605.06061 2026-05-08 cs.LG cs.CG math.AT

Geometry-Aware Simplicial Message Passing

Elena Xinyi Wang, Bastian Rieck

2605.06058 2026-05-08 cs.LG cs.CV

Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions

Kjetil Indrehus, Adrian Duric, Changkyu Choi, Ali Ramezani-Kebrya

2605.06054 2026-05-08 cs.AI cs.HC

Visual Fingerprints for LLM Generation Comparison

Amal Alnouri, Andreas Hinterreiter, Christina Humer, Furui Cheng, Marc Streit

Comments Submitted to the Short Paper track at IEEE VIS 2026

2605.06053 2026-05-08 cs.LG

Towards Generation-Efficient Uncertainty Estimation in Large Language Models

Mingcheng Zhu, Yu Liu, Tingting Zhu

Comments 21 pages, 6 figures, and 8 tables. The abstract provided in the metadata differs slightly from the manuscript version due to character limits

详情

英文摘要

Uncertainty estimation is important for deploying LLMs in high-stakes applications such as healthcare and finance, where hallucinations can appear fluent and plausible while being factually incorrect, making it difficult for users to judge whether an output should be trusted. Existing methods require one or more full autoregressive generations to estimate uncertainty, which introduces substantial inference cost and often delays uncertainty assessment. In this paper, we investigate whether effective uncertainty estimation can be achieved with partial generation or even input-only information. Specifically, we first develop a unified framework that formulates uncertainty estimation as an early estimation problem over the autoregressive generation process of LLMs. This framework organises existing and proposed estimators by the information they observe, ranging from multi-generation to input-only prediction, and clarifies the performance-cost trade-off underlying different uncertainty estimation methods. Building on this view, we study two largely underexplored low-cost settings: estimating uncertainty with part of the generation, and predicting uncertainty from the input prompt. We propose Logit Magnitude, which uses top-M logit evidence to estimate uncertainty from an early-stopped generation prefix, and MetaUE, which distils generation-based uncertainty into a lightweight input-only estimator trained with uncertainty scores. Extensive experiments on general and domain-specific benchmarks show that Logit Magnitude achieves strong performance, and partial generations of LLMs are often sufficient for effective uncertainty estimation. MetaUE further provides a competitive input-only approximation in several settings. These findings suggest that effective uncertainty estimation requires less generation than commonly assumed, enabling unreliable responses to be identified earlier.

URL PDF HTML ☆

赞 0 踩 0

2605.06051 2026-05-08 cs.CV

RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control

Youcan Xu, Jiaxin Shi, Zhen Wang, Wensong Song, Feifei Shao, Chen Liang, Jun Xiao, Long Chen

2605.06050 2026-05-08 cs.LG

When Brain Networks Travel: Learning Beyond Site

Yingxu Wang, Kunyu Zhang, Yanwu Yang, Thomas Wolfers, Yujie Wu, Siyang Gao, Nan Yin

2605.06049 2026-05-08 cs.CV

Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

Weijian Su, Songqian Zhang, Yuqi Han, Jian Zhuang, Yongdong Huang, Qiang Zhang

Comments Accepted by CVPR 2026

2605.06046 2026-05-08 cs.LG

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

Saksham Rathi, Preeti, Mythili Vutukuru

Comments 22 pages, 36 figures

详情

英文摘要

Auto-regressive token generation in large language models is memory-bound because it requires "attending to" key and value tensors (KV cache) of all previous tokens. Prior work aims to improve the efficiency of this decode process by batching multiple requests together, and maximizing batch size subject to GPU memory constraints. The key observation of our work is that with prefix-sharing workloads, smaller, prefix-homogeneous batches -- where all requests share a common prefix -- can achieve higher decode throughput than larger, heterogeneous batches, due to better spatial and temporal locality during KV cache accesses. However, prefix-aware schedulers in state-of-the-art inference engines maximize prefix reuse within a batch only to reduce KV cache memory footprint, but do not stop batch formation at smaller homogeneous batches that could have performed better. Further, we show that shared prefix detection in existing schedulers relies on radix-tree traversals, incurring substantial CPU overhead that is often comparable to GPU execution time. This paper presents Feather, a prefix-aware scheduler that uses reinforcement learning (RL) to learn the optimal tradeoff between batch size and prefix homogeneity. We also introduce Chunked Hash Tree (CHT), a lightweight data structure that enables fast prefix detection and efficient request selection for the RL scheduler, avoiding expensive tree traversals. We integrate Feather into vLLM and SGLang, and our evaluation shows that Feather achieves 2--10$\times$ higher end-to-end throughput as compared to existing schedulers, while doing no worse than the status quo when the workload does not have enough prefix sharing. Feather achieves these gains by reducing the total number of KV cache accesses, surpassing the performance of prefix-aware attention kernels that have the same goal.

URL PDF HTML ☆

赞 0 踩 0

2605.06043 2026-05-08 cs.CV

Domain Generalization through Spatial Relation Induction over Visual Primitives

Dat Nguyen, Duc-Duy Nguyen

2605.06040 2026-05-08 cs.AI cs.CL

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

Leon Hamm, Zlatan Ajanovic

2605.06036 2026-05-08 cs.LG cs.AI

Optimal Transport for LLM Reward Modeling from Noisy Preference

Licheng Pan, Haochen Yang, Haoxuan Li, Yunsheng Lu, Yongqi Tong, Yinuo Wang, Shijian Wang, Zhixuan Chu, Lei Shen, Yuan Lu, Hao Wang

2605.06035 2026-05-08 cs.SD cs.AI

Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features

Lisan Al Amin, Rakib Hossain, Mahbubul Islam, Faisal Quader, Thanh Thi Nguyen

2605.06032 2026-05-08 cs.LG cs.AI

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

Hugo Cazaux, Eyjólfur Ingi Ásgeirsson, Hlynur Stefánsson

2605.06030 2026-05-08 cs.CL

More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs

Adrián Gude, Roi Santos-Ríos, Francis Bond, Dan Flickinger, Carlos Gómez-Rodríguez, Olga Zamaraeva

2605.06029 2026-05-08 cs.AI

Pathways to AGI

Gordon Fletcher, Saomai Vu Khan

Comments Additional data at 10.17866/rd.salford.32201874

2605.06028 2026-05-08 cs.LG

Multi-agent decision making: A Blackwell's informativeness approach

Zheng Zhang, Cuong C. Nguyen, Kevin Wells, Gustavo Carneiro

2605.06024 2026-05-08 cs.AI

Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals

Wenliang Huang, Zengyi Yu

Comments Accepted by the 2026 International Joint Conference on Neural Networks (IJCNN)

2605.06021 2026-05-08 cs.CV cs.DL

PlotPick: AI-powered batch extraction of numerical data from scientific figures

Tommy Carstensen

Comments 7 pages, 2 figures, 2 tables. Software available at https://plotpick.streamlit.app and https://github.com/tommycarstensen/plotpick

2605.06014 2026-05-08 cs.LG cs.AI cs.DS cs.NI

Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven

Ran Ben-Basat, William Kuszmaul, Michael Mitzenmacher, Amit Portnoy, Shay Vargaftik

详情

英文摘要

Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate is individually distributed as a shifted beta distribution, which converges to a Gaussian distribution in high dimensions. Generally, one RHT is not suitable in the worst case, as individual coordinates can be far from these distributions. We show that after composing two RHTs on any $d$-sized input vector, the marginal distribution of every fixed coordinate of the normalized rotated vector is within $O(d^{-1/2})$ of a standard Gaussian both in Kolmogorov distance and in $1$-Wasserstein distance. We then plug these bounds into the analyses of modern compression schemes, namely DRIVE and QUIC-FL, and show that two RHTs achieve performance that asymptotically matches URRs. However, we show that two RHTs may not be sufficient for Vector Quantization (VQ), which often requires weak correlation across fixed-size blocks of coordinates (as opposed to only marginal distribution convergence for single coordinates). We prove that a composition of three RHTs leads to decaying coordinate covariance. This ensures that any fixed, bounded, multi-dimensional VQ codebook optimized for URRs has the same expected error when using three RHTs, up to an additive term that vanishes with the dimension. Finally, because practical inputs are rarely adversarial, we propose a linear-time ${O}(d)$ check on the input's moments to dynamically adapt the number of RHTs used at runtime to improve performance.

URL PDF HTML ☆

赞 0 踩 0

2605.06012 2026-05-08 cs.CV cs.AI

T2I-VeRW: Part-level Fine-grained Perception for Text-to-Image Vehicle Retrieval

Xiao Wang, Ziwen Wang, Weizhe Kong, Wentao Wu, Yuehang Li, Aihua Zheng, Chenglong Li, Jin Tang

2605.06010 2026-05-08 cs.CV cs.AI

Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

Yuchen Guo, Junli Gong, Wenjun Dong, Yiuming Cheung, Weifeng Su

2605.06007 2026-05-08 cs.CL cs.AI cs.HC

PersonaKit (PK): A Plug-and-Play Platform for User Testing Diverse Roles in Full-Duplex Dialogue

Hyunbae Jeon, Jinho D. Choi

2605.06006 2026-05-08 cs.CL

From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence

Premtim Sahitaj, Jawan Kolanowski, Ariana Sahitaj, Veronika Solopova, Max Upravitelev, Daniel Röder, Iffat Maab, Junichi Yamagishi, Sebastian Möller, Vera Schmitt

Comments Accepted at LREC 2026. To appear in the conference proceedings

2605.06005 2026-05-08 cs.CV

Neuromorphic visual attention for Sign-language recognition on SpiNNaker

Sarka Liskova, Olha Vedmedenko, Mazdak Fatahi, Matej Hoffmann, P. Michael Furlong, Giulia D Angelo