arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.01051 2026-04-23 cs.LG

SwiftRepertoire: Few-Shot Immune-Signature Synthesis via Dynamic Kernel Codes

Rong Fu, Muge Qi, Yang Li, Yabin Jin, Jiekai Wu, Jiaxuan Lu, Chunlei Meng, Youjin Wang, Zeli Su, Juntao Gao, Li Bao, Qi Zhao, Wei Luo, Simon Fong

Comments 19 pages, 8 figures, 8 tables

详情

英文摘要

Repertoire-level analysis of T cell receptors offers a biologically grounded signal for disease detection and immune monitoring, yet practical deployment is impeded by label sparsity, cohort heterogeneity, and the computational burden of adapting large encoders to new tasks. We introduce a framework that synthesizes compact task-specific parameterizations from a learned dictionary of prototypes conditioned on lightweight task descriptors derived from repertoire probes and pooled embedding statistics. This synthesis produces small adapter modules applied to a frozen pretrained backbone, enabling immediate adaptation to novel tasks with only a handful of support examples and without full model fine-tuning. The architecture preserves interpretability through motif-aware probes and a calibrated motif discovery pipeline that links predictive decisions to sequence-level signals. Together, these components yield a practical, sample-efficient, and interpretable pathway for translating repertoire-informed models into diverse clinical and research settings where labeled data are scarce and computational resources are constrained.

URL PDF HTML ☆

赞 0 踩 0

2601.23258 2026-04-23 cs.LG cs.AI cs.CL

Agnostic Language Identification and Generation

Mikael Møller Høgsgaard, Chirag Pabbaraju

Comments typos and minor bug fixes

2601.21503 2026-04-23 cs.AI cs.CL cs.LG cs.NE

MAR: Efficient Large Language Models via Module-aware Architecture Refinement

Junhong Cai, Guiqin Wang, Kejie Zhao, Jianxiong Tang, Xiang Wang, Luziwei Leng, Ran Cheng, Yuxin Ma, Qinghai Guo

Comments Accepted by ICASSP 2026. 5 pages, 5 figures

2601.21367 2026-04-23 cs.AI cs.LG

Hebbian Learning with Global Direction

Wenjia Hua, Kejie Zhao, Luziwei Leng, Ran Cheng, Yuxin Ma, Qinghai Guo

Comments Accepted to ICASSP 2026

2601.20144 2026-04-23 cs.CL

Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents

Ziyi Wang, Yuxuan Lu, Yimeng Zhang, Pei Chen, Ziwei Dong, Jing Huang, Jiri Gesi, Xianfeng Tang, Chen Luo, Qun Liu, Yisi Sang, Hanqing Lu, Manling Li, Jin Lai, Dakuo Wang

2601.19932 2026-04-23 cs.CL cs.HC

"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews

Ruyuan Wan, Changye Li, Ting-Hao 'Kenneth' Huang

2601.17609 2026-04-23 cs.CL

What Language Models Know But Don't Say: Non-Generative Prior Extraction for Generalization

Sara Rezaeimanesh, Mohammad M. Ghassemi

详情

英文摘要

In domains like medicine and finance, large-scale labeled data is costly and often unavailable, leading to models trained on small datasets that struggle to generalize to real-world populations. Large language models contain extensive knowledge from years of research across these domains. We propose LoID (Logit-Informed Distributions), a deterministic method for extracting informative prior distributions for Bayesian logistic regression by directly accessing their token-level predictions. Rather than relying on generated text, we probe the model's confidence in opposing semantic directions (positive vs. negative impact) through carefully constructed sentences. By measuring how consistently the LLM favors one direction across diverse phrasings, we extract the strength and reliability of the model's belief about each feature's influence. We evaluate LoID on ten real-world tabular datasets under synthetic out-of-distribution (OOD) settings characterized by covariate shift, where the training data represents only a subset of the population. We compare our approach against (1) standard uninformative priors, (2) AutoElicit, a recent method that prompts LLMs to generate priors via text completions, (3) LLMProcesses, a method that uses LLMs to generate numerical predictions through in-context learning and (4) an oracle-style upper bound derived from fitting logistic regression on the full dataset. We assess performance using Area Under the Curve (AUC). Across datasets, LoID significantly improves performance over logistic regression trained on OOD data, recovering up to \textbf{59\%} of the performance gap relative to the oracle model. LoID outperforms AutoElicit and LLMProcessesc on 8 out of 10 datasets, while providing a reproducible and computationally efficient mechanism for integrating LLM knowledge into Bayesian inference.

URL PDF HTML ☆

赞 0 踩 0

2601.16399 2026-04-23 cs.LG math.OC

A Hessian-Free Actor-Critic Algorithm for Bi-Level Reinforcement Learning with Applications to LLM Fine-Tuning

Sihan Zeng, Sujay Bhatt, Sumitra Ganesh, Alec Koppel

2601.14044 2026-04-23 cs.CV

Weather-R1: Logically Consistent Reinforcement Fine-Tuning for Multimodal Reasoning in Meteorology

Kaiyu Wu, Pucheng Han, Hualong Zhang, Naigeng Wu, Keze Wang

2601.12910 2026-04-23 cs.CL cs.AI

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Tim Baumgärtner, Iryna Gurevych

Comments Accepted at ACL 2026

2601.12078 2026-04-23 cs.CL cs.IR

Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Linfeng Du, Ye Yuan, Zichen Zhao, Fuyuan Lyu, Emiliano Penaloza, Xiuying Chen, Zipeng Sun, Jikun Kang, Laurent Charlin, Xue Liu, Haolun Wu

Comments Accepted to ACL 2026

2601.11505 2026-04-23 cs.LG cs.AI cs.SY eess.SY q-bio.QM

MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management

Miriam K. Wolff, Peter Calhoun, Eleonora Maria Aiello, Yao Qin, Sam F. Royston

Comments 30 pages, 5 figures, 1 Table, 10 supplementary figures, 3 supplementary tables, submitted to JDST

2601.09373 2026-04-23 cs.CL

The Imperfective Paradox in Large Language Models

Bolei Ma, Yusuke Miyao

Comments ACL 2026

2601.08558 2026-04-23 cs.CV

REVNET: Rotation-Equivariant Point Cloud Completion via Vector Neuron Anchor Transformer

Zhifan Ni, Eckehard Steinbach

Comments ICPR 2026

2601.06606 2026-04-23 cs.LG cs.AI

CEDAR: Context Engineering for Agentic Data Science

Rishiraj Saha Roy, Chris Hinze, Luzian Hahn, Fabian Kuech

Comments Accepted at ECIR 2026

2601.02989 2026-04-23 cs.CL

Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

Hosein Hasani, Mohammadali Banayeeanzade, Ali Nafisi, Sadegh Mohammadian, Fatemeh Askari, Mobin Bagherian, Amirmohammad Izadi, Mahdieh Soleymani Baghshah

Comments ACL 2026

2512.15146 2026-04-23 cs.CL

Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang

Comments Accepted to ACL 2025 Main Conference. 15 pages, 9 figures, 5 tables

2512.12325 2026-04-23 cs.LG math.ST stat.ML stat.TH

Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data

Shubhada Agrawal, Aaditya Ramdas

Comments Published at ALT 2026

2512.09756 2026-04-23 cs.CL

MOA: Multi-Objective Alignment for Role-Playing Agents

Chonghua Liao, Ke Wang, Yuchuan Wu, Ruoran Li, Fei Huang, Yongbin Li

2512.08923 2026-04-23 cs.AI

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

Angela van Sprang, Laurens Samson, Ana Lucic, Erman Acar, Sennay Ghebreab, Yuki M. Asano

Comments Accepted at CVPR 2026. Angela van Sprang and Laurens Samson contributed equally as first authors

2511.21356 2026-04-23 cs.LG cs.AI

Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Bram Silue, Santiago Amaya-Corredor, Patrick Mannion, Lander Willem, Pieter Libin

Comments 13 pages, 5 figures, 1 table. Code: https://github.com/silue-dev/hairl. Published at ESANN 2026

2511.19328 2026-04-23 cs.LG

Understanding the Staged Dynamics of Transformers in Learning Latent Structure

Rohan Saha, Farzane Aminmansour, Alona Fyshe

Comments Preprint

2511.19176 2026-04-23 cs.LG cs.IR

From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

Jeeho Shin, Kyungho Kim, Kijung Shin

2511.17069 2026-04-23 cs.CL

Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments

Yunsung Kim, Mike Hardy, Joseph Tey, Candace Thille, Chris Piech

Comments In Findings of the Association for Computational Linguistics (ACL 2026)

2511.11931 2026-04-23 cs.RO

MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy

Saida Liu, Nikolay Atanasov, Shumon Koga

Comments Camera-ready version for L4DC 2026

2511.01233 2026-04-23 cs.CV cs.GR cs.HC

Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark

Rajmund Nagy, Hendric Voss, Thanh Hoang-Minh, Mihail Tsakov, Teodor Nikolov, Zeyi Zhang, Tenglong Ao, Sicheng Yang, Shaoli Huang, Yongkang Cheng, M. Hamza Mughal, Rishabh Dabral, Kiran Chhatre, Christian Theobalt, Libin Liu, Stefan Kopp, Rachel McDonnell, Michael Neff, Taras Kucherenko, Youngwoo Yoon, Gustav Eje Henter

Comments Accepted to CVPR 2026, Findings Track. 23 pages, 10 figures. The last two authors made equal contributions

2510.26285 2026-04-23 cs.CL cs.AI cs.LG cs.NE

Language Models Learn Universal Representations of Numbers and Here's Why You Should Care

Michal Štefánik, Timothee Mickus, Marek Kadlčík, Bertram Højer, Michal Spiegel, Raúl Vázquez, Aman Sinha, Josef Kuchař, Philipp Mondorf, Pontus Stenetorp

2510.25223 2026-04-23 cs.AI

FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data

Kun Ouyang, Haoyu Wang, Dong Fang

Comments 14 pages, 11 figures

2510.22955 2026-04-23 cs.LG

SARNet: A Spike-Aware consecutive validation Framework for Accurate Remaining Useful Life Prediction

Junhao Fan, Wenrui Liang, Wei-Qiang Zhang

Comments 5 pages, 2 figures, 3 tables. Equal contribution by Junhao Fan and Wenrui Liang. Corresponding author: Wei-Qiang Zhang. Accepted to ICASSP 2026

2510.21652 2026-04-23 cs.AI cs.CL

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Jonathan Bragg, Mike D'Arcy, Nishant Balepur, Dan Bareket, Bhavana Dalvi, Sergey Feldman, Dany Haddad, Jena D. Hwang, Peter Jansen, Varsha Kishore, Bodhisattwa Prasad Majumder, Aakanksha Naik, Sigal Rahamimov, Kyle Richardson, Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu, Guy Wiener, Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka, Brooke Vlahos, Peter Clark, Doug Downey, Yoav Goldberg, Ashish Sabharwal, Daniel S. Weld

Comments Published as a conference paper at ICLR 2026

详情

英文摘要

AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry; indeed, there are now many such agents, ranging from general-purpose "deep research" systems to specialized science-specific agents, such as AI Scientist and AIGS. Rigorous evaluation of these agents is critical for progress. Yet existing benchmarks fall short on several fronts: they often (1) lack reproducible agent tools necessary for a controlled comparison of core agentic capabilities; (2) do not account for confounding variables such as model cost and tool access; (3) do not provide standardized interfaces for quick agent prototyping and evaluation; (4) fail to provide holistic, product-informed measures of real-world use cases such as science research; and (5) lack comprehensive baseline agents necessary to identify true advances. In response, we define principles and tooling for more rigorously benchmarking agents. Using these, we present AstaBench, a suite that provides a holistic measure of agentic ability to perform scientific research, comprising 2400+ problems spanning the entire scientific discovery process and multiple scientific domains, and including many problems inspired by actual user requests to deployed Asta agents. Our suite comes with the first scientific research environment with production-grade search tools that enable controlled, reproducible evaluation, better accounting for confounders. Alongside, we provide a comprehensive suite of nine science-optimized classes of Asta agents and numerous baselines. Our extensive evaluation of 57 agents across 22 agent classes reveals several interesting findings, most importantly that despite meaningful progress on certain individual aspects, AI remains far from solving the challenge of science research assistance.

URL PDF HTML ☆

赞 0 踩 0