arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16309 2026-05-08 cs.CL

Omnilingual MT: Machine Translation for 1,600 Languages

Omnilingual MT Team, Belen Alastruey, Niyati Bafna, Andrea Caciolai, Kevin Heffernan, Artyom Kozhevnikov, Christophe Ropers, Eduardo Sánchez, Charles-Eric Saint-James, Ioannis Tsiamas, Xiang "Tony" Cao, Chierh Cheng, Joe Chuang, Paul-Ambroise Duquenne, Mark Duppenthaler, Nate Ekberg, Cynthia Gao, Pere Lluís Huguet Cabot, João Maria Janeiro, Jean Maillard, Gabriel Mejia Gonzalez, Holger Schwenk, Edan Toledo, Arina Turkatenko, Albert Ventayol-Boada, Rashel Moritz, Alexandre Mourachko, Surya Parimi, Mary Williamson, Shireen Yates, David Dale, Marta R. Costa-jussà

2603.16281 2026-05-08 cs.LG q-bio.NC

Laya: A LeJEPA Approach to EEG via Latent Prediction over Reconstruction

Saarang Panchavati, Uddhav Panchavati, Hiroki Nariai, Corey Arnold, William Speier

2603.15646 2026-05-08 cs.LG cs.AI cs.CL

Alternating Reinforcement Learning with Contextual Rubric Rewards: Beyond the Scalarization Strategy

Guangchen Lan, Lian Xiong, Xin Zhou, Hejie Cui, Yuwei Zhang, Mao Li, Zhenyu Shi, Besnik Fetahu, Lihong Li, Xian Li

2603.14337 2026-05-08 cs.CV

On the Nature of Attention Sink that Shapes Decoding Strategy in Omni-LLMs

Suho Yoo, Youngjoon Jang, Joon Son Chung

Comments Preprint

2603.14209 2026-05-08 cs.CV cs.AI

ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

Shishi Xiao, Tongyu Zhou, David Laidlaw, Gromit Yeuk-Yin Chan

Comments Project page: https://chartist-ai.github.io/

详情

英文摘要

A pictorial chart is an effective medium for visual storytelling, seamlessly integrating visual elements with data charts. However, creating such images is challenging because the flexibility of visual elements often conflicts with the rigidity of chart structures. This process thus requires a creative deformation that maintains both data faithfulness and visual aesthetics. Current methods that extract dense structural cues from natural images (e.g., edge or depth maps) are ill-suited as conditioning signals for pictorial chart generation. We present ChArtist, a domain-specific diffusion model for generating pictorial charts automatically, offering two distinct types of control: 1) spatial control that aligns well with the chart structure, and 2) subject-driven control that respects the visual characteristics of a reference image. To achieve this, we introduce a skeleton-based spatial control representation. This representation encodes only the data-encoding information of the chart, allowing for the easy incorporation of reference visuals without a rigid outline constraint. We implement our method based on the Diffusion Transformer (DiT) and leverage an adaptive position encoding mechanism to manage these two controls. We further introduce Spatially Gated Attention to modulate the interaction between spatial control and subject control. To support the fine-tuning of pre-trained models for this task, we created a large-scale dataset of 30,000 triplets (skeleton, reference image, pictorial chart). We also propose a unified data accuracy metric to evaluate the data faithfulness of the generated charts. We believe this work demonstrates that current generative models can achieve data-driven visual storytelling by moving beyond general-purpose conditions to task-specific representations. Project page: https://chartist-ai.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2603.09986 2026-05-08 cs.CL cs.AI

Quantifying Hallucinations in Language Language Models on Medical Textbooks

Brandon C. Colelough, Davis Bartels, Dina Demner-Fushman

Comments 8 pages, 4 figures

2603.03511 2026-05-08 cs.LG cond-mat.mtrl-sci physics.chem-ph

Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory

Xuan Zhang, Haiyang Yu, Chengdong Wang, Jacob Helwig, Shuiwang Ji, Xiaofeng Qian

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

英文摘要

We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over time in response to an external excitation, enabling first-principles predictions of physical properties such as optical absorption, electron dynamics, and high-order response. However, conventional real-time TDDFT relies on time-consuming propagation of all occupied states with fine time steps. In this work, we propose OrbEvo, which is based on an equivariant graph transformer architecture and learns to evolve the full electronic wavefunction coefficients across time steps. First, to account for external field, we design an equivariant conditioning to encode both strength and direction of external electric field and break the symmetry from SO(3) to SO(2). Furthermore, we design two OrbEvo models, OrbEvo-WF and OrbEvo-DM, using wavefunction pooling and density matrix as interaction method, respectively. Motivated by the central role of the density functional in TDDFT, OrbEvo-DM encodes the density matrix aggregated from all occupied electronic states into feature vectors via tensor contraction, providing a more intuitive approach to learn the time evolution operator. We adopt a training strategy specifically tailored to limit the error accumulation of time-dependent wavefunctions over autoregressive rollout. To evaluate our approach, we generate TDDFT datasets consisting of 5,000 different molecules in the QM9 dataset and 1,500 molecular configurations of the malonaldehyde molecule in the MD17 dataset. Results show that our OrbEvo model accurately captures quantum dynamics of excited states under external field, including time-dependent wavefunctions, time-dependent dipole moment, and optical absorption spectra.

URL PDF HTML ☆

赞 0 踩 0

2603.02087 2026-05-08 cs.CV cs.AI cs.LG

A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment

Harikrishnan Unnikrishnan, Rita Patel

Comments for associated code see: https://github.com/hari-krishnan/openglottal

2603.00117 2026-05-08 cs.RO cs.AI

PEPA: a Persistently Autonomous Embodied Agent with Personalities

Kaige Liu, Yang Li, Lijun Zhu, Weinan Zhang

2602.22710 2026-05-08 cs.SD cs.AI cs.HC

Same Words, Different Judgments: How Preferences Vary Across Modalities

Aaron Broukhim, Nadir Weibel, Eshin Jolly

Comments Submitted to NeurIPS 2026 for review

2602.19202 2026-05-08 cs.CV

UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

Gang Xu, Zhiyu Zhu, Junhui Hou

2602.15827 2026-05-08 cs.RO cs.AI cs.LG cs.SY eess.SY

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Zhen Wu, Xiaoyu Huang, Lujie Yang, Yuanhang Zhang, Xi Chen, Pieter Abbeel, Rocky Duan, Angjoo Kanazawa, Carmelo Sferrazza, Guanya Shi, C. Karen Liu

2602.13670 2026-05-08 cs.LG

Advancing Analytic Class-Incremental Learning through Vision-Language Calibration

Binyu Zhao, Wei Zhang, Xingrui Yu, Zhaonian Zou, Ivor Tsang

Comments 20 pages, 11 figures, 9 tables. Accepted by ICML2026

2602.13310 2026-05-08 cs.CV cs.AI

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

Haoran Xu, Hongyu Wang, Jiaze Li, Shunpeng Chen, Zizhao Tong, Jianzhong Ju, Zhenbo Luo, Jian Luan

2602.11229 2026-05-08 cs.AI cs.LG

Latent Generative Solvers for Generalizable Long-Term Physics Simulation

Zituo Chen, Sili Deng

2602.11183 2026-05-08 cs.RO cs.CV cs.SY eess.SY

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

Yin Tang, Jiawei Ma, Jinrui Zhang, Alex Jinpeng Wang, Deyu Zhang

Comments ICML 2026 Camera Ready

2602.07974 2026-05-08 cs.LG

Structural Learning Theory: A Metric-Topology Factorization Approach

Xin Li

2602.07906 2026-05-08 cs.LG cs.AI

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering

Yuzhu Cai, Zexi Liu, Xinyu Zhu, Cheng Wang, Yanfeng Wang, Siheng Chen

Comments 18 pages, 5 figures

2602.07322 2026-05-08 cs.RO cs.AI

Action-to-Action Flow Matching

Jindou Jia, Gen Li, Xiangyu Chen, Tuo An, Yuxuan Hu, Jingliang Li, Xinying Guo, Jianfei Yang

Comments 20 pages, 19 figures

2602.05896 2026-05-08 cs.LG cs.AI

Parity, Sensitivity, and Transformers

Alexander Kozachinskiy, Tomasz Steifer, Przemysław Wałȩga

Comments 15 pages. Version 2 -- lower bound extended from 1-layer 1-head to 1-layer O(1)-head transformers

2602.02493 2026-05-08 cs.CV cs.AI

PixelGen: Improving Pixel Diffusion with Perceptual Supervision

Zehong Ma, Ruihan Xu, Shiliang Zhang

Comments Project Pages: https://zehong-ma.github.io/PixelGen/

2602.02288 2026-05-08 cs.LG cs.AI

AROpt: An Optimization Method for Autoregressive Time Series Forecasting

Zheng Li, Jerry Cheng, Huanying Gu

Comments 16 pages, 5 figures, 3 tables

2602.01505 2026-05-08 cs.LG stat.ML

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

Navdeep Kumar, Tehila Dahan, Lior Cohen, Ananyabrata Barua, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

Comments Following further internal verification, we identified foundational issues in the analytical framework, including unresolved problems in the treatment of nonstationary sampling and parts of the coupled convergence analysis under the stated assumptions. Addressing these issues requires a substantial overhaul of the theoretical framework beyond a standard revision

2602.01150 2026-05-08 cs.LG cs.AI cs.CR cs.CV math.OC

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

Jialong Sun, Zeming Wei, Jiaxuan Zou, Jiacheng Gong, Jie Fu, Chengyang Dong, Heng Xu, Jialong Li, Bo Liu

2602.01124 2026-05-08 cs.LG

ChronoSpike: An Adaptive Spiking Graph Neural Network for Dynamic Graphs

Md Abrar Jahin, Taufikur Rahman Fuad, Jay Pujara, Craig Knoblock

2602.00407 2026-05-08 cs.LG

Fed-Listing: Federated Label Distribution Inference in Graph Neural Networks

Suprim Nakarmi, Junggab Son, Yue Zhao, Zuobin Xiong

Comments 9 pages, 3 figures, and 4 tables

2601.21187 2026-05-08 cs.CV cs.LG

FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models

Chenyu Huang, Peng Ye, Xudong Tan, Jinhan Mu, Shenghe Zheng, Li Shen, Tao Chen

Comments Accepted by ICML 2026

2601.20375 2026-05-08 cs.LG cs.AI cs.CL

LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning

Wei Huang, Anda Cheng, Yinggui Wang, Lei Wang, Tao Wei

Comments Accepted by VLDB2026

详情

英文摘要

Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In practice, DP strategies are typically developed through iterative manual analysis and trial-and-error adjustment. These processes inevitably incur high labor costs and may lead to privacy issues in high-privacy domains like healthcare due to direct human access to sensitive data. Thus, achieving automated data processing without exposing the raw data has become a critical challenge. To address this challenge, we propose LLM-AutoDP, a novel framework that leverages LLMs as agents to automatically generate and optimize data processing strategies. Our method generates multiple candidate strategies and iteratively refines them using feedback signals and comparative evaluations. This iterative in-context learning mechanism enables the agent to converge toward high-quality processing pipelines without requiring direct human intervention or access to the underlying data. To further accelerate strategy search, we introduce three key techniques: Distribution Preserving Sampling, which reduces data volume while maintaining distributional integrity; Processing Target Selection, which uses a binary classifier to identify low-quality samples for focused processing; Cache-and-Reuse Mechanism}, which minimizes redundant computations by reusing prior processing results. Results show that models trained on data processed by our framework achieve over 80% win rates against models trained on unprocessed data. Compared to AutoML baselines based on LLM agents, LLM-AutoDP achieves approximately a 65% win rate. Moreover, our acceleration techniques reduce the total searching time by up to 10 times, demonstrating both effectiveness and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2601.15395 2026-05-08 cs.CL cs.AI cs.HC

Beyond Fixed Psychological Personas: State Beats Trait, but Language Models are State-Blind

Tamunotonye Harry, Ivoline Ngong, Chima Nweke, Yuanyuan Feng, Joseph Near

Comments Accepted to Findings of ACL 2026

2601.14594 2026-05-08 cs.CV

LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning

Lianying Chao, Linfeng Yin, Peiyu Ren, Yifan Jiang, Qiaoyu Ren, Dingcheng Shan, Jing-cheng Pang, Sijie Wu, Xubin Li, Kai Zhang, Xin Chen