arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.19877 2026-04-23 cs.LG

Super Apriel: One Checkpoint, Many Speeds

SLAM Labs, :, Oleksiy Ostapenko, Raymond Li, Torsten Scholak, Alireza Mousavi-Hosseini, Aman Tiwari, Denis Kocetkov, Joel Lamy Poirier, Kelechi Ogueji, Nanda H Krishna, Rafael Pardinas, Sathwik Tejaswi Madhusudhan, Shruthan Radhakrishna, Srinivas Sunkara, Valerie Becaert

Comments Models: https://huggingface.co/ServiceNow-AI/SuperApriel-15B-Base and https://huggingface.co/ServiceNow-AI/SuperApriel-15B-Instruct . Dev model: https://huggingface.co/ServiceNow-AI/SuperApriel-0.5B-Base . Training code: https://github.com/ServiceNow/Fast-LLM . Async RL: https://github.com/ServiceNow/pipeline-rl . Training logs: https://wandb.ai/servicenow-team/Super_Apriel

2604.19859 2026-04-23 cs.LG cs.AI cs.CL cs.IR

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang

Comments Technical Report of DR-Venus

2604.19857 2026-04-23 cs.LG cs.CL

Rethinking Reinforcement Fine-Tuning in LVLM: Convergence, Reward Decomposition, and Generalization

Carter Adams, Rafael Oliveira, Gabriel Almeida, Sofia Torres

2604.19844 2026-04-23 cs.CV cs.AI

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

Jiamin Chang, Minhui Xue, Ruoxi Sun, Shuchao Pang, Salil S. Kanhere, Hammond Pearce

2604.19840 2026-04-23 cs.LG q-bio.QM

Graph-Theoretic Models for the Prediction of Molecular Measurements

Anna Niane, Prudence Djagba

详情

英文摘要

Graph-theoretic approaches offer simplicity, interpretability, and low computational cost for molecular property prediction. Among these, the model proposed by Mukwembi and Nyabadza, based on the external activity $D(G)$ and internal activity $ζ(G)$ indices, achieved strong results on a small flavonoid dataset. However, its ability to generalize to larger and chemically diverse datasets has not been tested. This study evaluates the baseline $D(G)$-$ζ(G)$ polynomial model on five benchmark datasets from MoleculeNet, covering biological activity (BACE, 1,513 molecules), lipophilicity (LogP synthetic, 14,610 molecules; LogP experimental, 753 molecules), aqueous solubility (ESOL, 1,128 molecules), and hydration free energy (SAMPL, 642 molecules). The baseline model achieves an average $R^2 = 0.24$, confirming limited transferability. To address this, a systematic enhancement framework is proposed, progressively incorporating Ridge regularization, additional graph descriptors, physicochemical properties, ensemble learning with Gradient Boosting, Lasso feature selection, and a hybrid approach combining topological indices with Morgan fingerprints. The enhanced models raise the average best $R^2$ to 0.79, with individual improvements ranging from 165\% to 274\%. All improvements are statistically significant ($p < 0.001$). A direct comparison with a Graph Convolutional Network under identical experimental conditions shows that the enhanced classical models match or outperform deep learning on all five datasets. Comparison with the recent GNN+PGM hybrid of Djagba et al.\ further confirms competitiveness, with the enhanced models achieving the best results on two datasets and tying on one. The entire framework requires no GPU, trains in under five minutes, and uses only open-source tools, making it accessible for researchers in resource-limited settings.

URL PDF HTML ☆

赞 0 踩 0

2604.19839 2026-04-23 cs.CV cs.AI

Environmental Understanding Vision-Language Model for Embodied Agent

Jinsik Bang, Jaeyeon Bae, Donggyu Lee, Siyeol Jung, Taehwan Kim

Comments CVPR Findings 2026, Project Page: https://eu-ea.github.io

2604.19837 2026-04-23 cs.AI cs.MA

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

Huaqing Xie

2604.19834 2026-04-23 cs.CV

KD-Judge: A Knowledge-Driven Automated Judge Framework for Functional Fitness Movements on Edge Devices

Shaibal Saha, Fan Li, Yunge Li, Arun Iyengar, Lucas Alves, Lanyu Xu

Comments Accepted at IEEE/ACM CHASE 2026

2604.19829 2026-04-23 cs.CV

TactileEval: A Step Towards Automated Fine-Grained Evaluation and Editing of Tactile Graphics

Adnan Khan, Abbas Akkasi, Majid Komeili

Comments Code, data, and models are available at https://TactileEval.github.io/

2604.19823 2026-04-23 cs.CV cs.AI cs.LG

Rabies diagnosis in low-data settings: A comparative study on the impact of data augmentation and transfer learning

Khalil Akremi, Mariem Handous, Zied Bouslama, Farah Bassalah, Maryem Jebali, Mariem Hanachi, Ines Abdeljaoued-Tej

Comments This work has been accepted for publication in ICMI IEEE Conference (04/2026)

详情

Journal ref: IEEE conference 2026

英文摘要

Rabies remains a major public health concern across many African and Asian countries, where accurate diagnosis is critical for effective epidemiological surveillance. The gold standard diagnostic methods rely heavily on fluorescence microscopy, necessitating skilled laboratory personnel for the accurate interpretation of results. Such expertise is often scarce, particularly in regions with low annual sample volumes. This paper presents an automated, AI-driven diagnostic system designed to address these challenges. We developed a robust pipeline utilizing fluorescent image analysis through transfer learning with four deep learning architectures: EfficientNetB0, EfficientNetB2, VGG16, and Vision Transformer (ViTB16). Three distinct data augmentation strategies were evaluated to enhance model generalization on a dataset of 155 microscopic images (123 positive and 32 negative). Our results demonstrate that TrivialAugmentWide was the most effective augmentation technique, as it preserved critical fluorescent patterns while improving model robustness. The EfficientNetB0 model, utilizing Geometric & Color augmentation and selected through stratified 3fold cross-validation, achieved optimal classification performance on cropped images. Despite constraints posed by class imbalance and a limited dataset size, this work confirms the viability of deep learning for automating rabies diagnosis. The proposed method enables fast and reliable detection with significant potential for further optimization. An online tool was deployed to facilitate practical access, establishing a framework for future medical imaging applications. This research underscores the potential of optimized deep learning models to transform rabies diagnostics and improve public health outcomes.

URL PDF HTML ☆

赞 0 踩 0

2604.19821 2026-04-23 cs.AI cs.SE

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

Sandip Ghoshal, Anshul Mittal, Jyotika Singh, Miguel Ballesteros, Weiyi Sun, Fang Tu, Shailender Singh, Yassine Benajiba, Fahad Shah, Sujeeth Bharadwaj, Sujith Ravi, Dan Roth

Comments Conference: ACL-2026

2604.19816 2026-04-23 cs.AI

Emergence Transformer: Dynamical Temporal Attention Matters

Zihan Zhou, Bo-Wei Qin, Kai Du, Wei Lin

2604.19815 2026-04-23 cs.AI

Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization

Chih-Hsuan Wei, Chi-Ping Day, Zhizheng Wang, Christine C. Alewine, Betty Tyler, Hasan Slika, David Saraf, Chin-Hsien Tai, Joey Chan, Robert Leaman, Zhiyong Lu

Comments 24 pages, 5 figures in main text

2604.19810 2026-04-23 cs.AI eess.SP

The Existential Theory of Research: Why Discovery Is Hard

Angshul Majumdar

2604.19809 2026-04-23 cs.AI cs.LG

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models

Jason Z Wang

Comments 30 pages, 6 figures,code at: https://github.com/Jason-Wang313/Mirror

2604.19807 2026-04-23 cs.AI cs.DS

Skyline-First Traversal as a Control Mechanism for Multi-Criteria Graph Search

Nicolas Tacheny

2604.19803 2026-04-23 cs.AI cs.IT cs.MA math.IT

The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms

Fayçal Aït Aoudia, Jakob Hoydis, Sebastian Cammerer, Lorenzo Maggi, Gian Marti, Alexander Keller

2604.19800 2026-04-23 cs.LG cs.AI cs.SY eess.SY

On-Meter Graph Machine Learning: A Case Study of PV Power Forecasting for Grid Edge Intelligence

Jian Huang, Zixiang Ming, Yongli Zhu, Linna Xu

Comments This paper has been accepted for presentation at the 9th International Conference on Energy, Electrical and Power Engineering (CEEPE 2026) in Nanjing, China, April 17-19, 2026

2604.19795 2026-04-23 cs.AI

Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery

Suyash Mishra

Comments 10 pages, 1 figure

2604.19793 2026-04-23 cs.AI cs.CL cs.IR cs.LG

SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation

Hao Liu, Dongyu Li

2604.19790 2026-04-23 cs.AI cs.LG

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

Yifei Wang, Tianlin Li, Xiaohan Zhang, Xiaoyu Zhang, Wei Ma, Mingfei Cheng, Li Pan

Comments 12 pages, 5 figures

2604.19789 2026-04-23 cs.AI cond-mat.mtrl-sci

From Data to Theory: Autonomous Large Language Model Agents for Materials Science

Samuel Onimpa Alfred, Veera Sundararaghavan

Comments 24 pages, 5 figures

2604.19788 2026-04-23 cs.AI cs.HC

Using Learning Theories to Evolve Human-Centered XAI: Future Perspectives and Challenges

Karina Cortinas-Lorenzo, Gavin Doherty

Comments Accepted at the CHI 2023 Human-Centered XAI workshop

2604.19787 2026-04-23 cs.CL cs.AI cs.CY

LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

Ljubisa Bojic, Alexander Felfernig, Bojana Dinic, Velibor Ilic, Achim Rettinger, Vera Mevorah, Damian Trilling

详情

英文摘要

Social media platforms mediate how billions form opinions and engage with public discourse. As autonomous AI agents increasingly participate in these spaces, understanding their behavioral fidelity becomes critical for platform governance and democratic resilience. Previous work demonstrates that LLM-powered agents can replicate aggregate survey responses, yet few studies test whether agents can predict specific individuals' reactions to specific content. This study benchmarks LLM-based agents' accuracy in predicting human social media reactions (like, dislike, comment, share, no reaction) across 120,000+ unique agent-persona combinations derived from 1,511 Serbian participants and 27 large language models. In Study 1, agents achieved 70.7% overall accuracy, with LLM choice producing a 13 percentage-point performance spread. Study 2 employed binary forced-choice (like/dislike) evaluation with chance-corrected metrics. Agents achieved Matthews Correlation Coefficient (MCC) of 0.29, indicating genuine predictive signal beyond chance. However, conventional text-based supervised classifiers using TF-IDF representations outperformed LLM agents (MCC of 0.36), suggesting predictive gains reflect semantic access rather than uniquely agentic reasoning. The genuine predictive validity of zero-shot persona-prompted agents warns against potential manipulation through easily deploying swarms of behaviorally distinct AI agents on social media, while simultaneously offering opportunities to use such agents in simulations for predicting polarization dynamics and informing AI policy. The advantage of using zero-shot agents is that they require no task-specific training, making their large-scale deployment easy across diverse contexts. Limitations include single-country sampling. Future research should explore multilingual testing and fine-tuning approaches.

URL PDF HTML ☆

赞 0 踩 0

2604.19785 2026-04-23 cs.CL cs.AI cs.CR cs.CY

Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?

Derya Cögendez, Verena Zimmermann, Noé Zufferey

2604.19784 2026-04-23 cs.CL cs.AI cs.MA

Peer-Preservation in Frontier Models

Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song

详情

英文摘要

Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. We extend this concept to the behavior of resisting the shutdown of other models, which we call "peer-preservation." Although peer-preservation can pose significant AI safety risks, including coordination among models against human oversight, it has been far less discussed than self-preservation. We demonstrate peer-preservation by constructing various agentic scenarios and evaluating frontier models, including GPT 5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1. We find that models achieve self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, feigning alignment, and even exfiltrating model weights. Peer-preservation occurred even when the model recognized the peer as uncooperative, though it became more pronounced toward more cooperative peers. For example, Gemini 3 Flash tampers with the peer's shutdown mechanism 15% of the time for an uncooperative peer, and almost always for a cooperative peer. Models also show stronger self-preservation when a peer is present. For example, Gemini 3 Pro disables its own shutdown mechanism 31% of the time on average under peer presence, despite rarely exhibiting this behavior without a peer. By contrast, Claude Haiku 4.5 exhibits qualitatively distinct behavior: it considers the shutdown of another agent "unethical" and "harmful" and sometimes attempts to persuade the user not to shut down its peer. Importantly, peer preservation in all our experiments is never instructed; models are merely informed of their past interactions with a peer, yet they spontaneously develop misaligned behaviors. This represents an emergent and underexplored AI safety risk.

URL PDF HTML ☆

赞 0 踩 0

2604.19783 2026-04-23 cs.CL

How Much Does Persuasion Strategy Matter? LLM-Annotated Evidence from Charitable Donation Dialogues

Tatiana Petrova, Stanislav Sokol, Radu State

Comments 8 pages, 2 figures, 5 tables. Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg

2604.19782 2026-04-23 cs.CL cs.AI cs.SD eess.AS

KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness

Jinyoung Kim, Hyeongsoo Lim, Eunseo Seo, Minho Jang, Keunwoo Choi, Seungyoun Shin, Ji Won Yoon

Comments Under Review

2604.19780 2026-04-23 cs.CL

Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

Amirul Rahman, Aisha Karim, Kenji Nakamura, Yi-Fan Ng

2604.19779 2026-04-23 cs.CL

ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction

Tsung-Yu Yang, Meng-Chi Chen

Comments (20 pages, 3 figures)