arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.17673 2026-04-21 cs.LG

Grokking of Diffusion Models: Case Study on Modular Addition

Joon Hyeok Kim, Yong-Hyun Park, Mattis Dalsætra Østby, Jiatao Gu

详情

英文摘要

Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhibit grokking--delayed generalization after overfitting--on modular addition, enabling controlled analysis of their internal computations. We study this phenomenon across two levels of data regime. In a single-image regime, mechanistic dissection reveals that the model implements modular addition by composing periodic representations of individual operands. In a diverse-image regime with high intraclass variability, we find that the model leverages its iterative sampling process to partition the task into an arithmetic computation phase followed by a visual denoising phase, separated by a critical timestep threshold. Our work provides the mechanistic decomposition of algorithmic learning in diffusion models, revealing how these models bridge continuous pixel-space generation and discrete symbolic reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.17670 2026-04-21 cs.LG stat.ML

Prior-Fitted Functional Flow: In-Context Generative Models for Pharmacokinetics

César Ojeda, Niklas Hartung, Wilhelm Huisinga, Tim Jahn, Purity Kamene Kavwele, Marian Klose, Piyush Kumar, Ramsés J. Sánchez, Darius A. Faroughy

Comments 9 pages, 2 tables and 4 figures

2604.17667 2026-04-21 cs.CL cs.IR

Peerispect: Claim Verification in Scientific Peer Reviews

Ali Ghorbanpour, Soroush Sadeghian, Alireza Daghighfarsoodeh, Sajad Ebrahimi, Negar Arabzadeh, Seyed Mohammad Hosseini, Ebrahim Bagheri

2604.17663 2026-04-21 cs.LG cs.AI cs.CL

ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Tim Elson

Comments 49 pages, 7 figures

2604.17659 2026-04-21 cs.CL cs.AI

Semantic Density Effect (SDE): Maximizing Information Per Token Improves LLM Accuracy

Amr Ahmed

2604.17654 2026-04-21 cs.AI

Poly-EPO: Training Exploratory Reasoning Models

Ifdita Hasan Orney, Jubayer Ibn Hamid, Shreya S Ramanujam, Shirley Wu, Hengyuan Hu, Noah Goodman, Dorsa Sadigh, Chelsea Finn

2604.17653 2026-04-21 cs.AI cs.DB

PV-SQL: Synergizing Database Probing and Rule-based Verification for Text-to-SQL Agents

Yuan Tian, Tianyi Zhang

Comments Accepted to Findings of ACL 2026

2604.17652 2026-04-21 cs.CV

Self-Supervised Super-Resolution for Sentinel-5P Hyperspectral Images

Hyam Omar Ali, Antoine Crosnier, Romain Abraham, Baptiste Combelles, Fabrice Jégou, Bruno Galerne

2604.17651 2026-04-21 cs.CV cs.RO

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

Siyuan Meng, Chengbo Ai

Comments 18 pages, 7 tables, 1 figure, vision paper

2604.17650 2026-04-21 cs.CL

Measuring Distribution Shift in User Prompts and Its Effects on LLM Performance

Parker Seegmiller, Sarah Masud Preum

Comments Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2604.17648 2026-04-21 cs.CL

ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts

Olubusayo Olabisi, Ekata Mitra, Ameeta Agrawal

Comments Accepted to ACL 2026

2604.17633 2026-04-21 cs.CL

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Felicia Körner, Maria Matveev, Florian Eichin, Gitta Kutyniok, Barbara Plank, Michael A. Hedderich

Comments 10 pages

2604.17629 2026-04-21 cs.CV

BioVLM: Routing Prompts, Not Parameters, for Cross-Modality Generalization in Biomedical VLMs

Mainak Singha, Tanisha Gupta, Ankit Jha, Muhammad Haris Khan, Sayantani Ghosh, Biplab Banerjee

Comments Accepted in ACL Findings 2026

2604.17627 2026-04-21 cs.LG cs.DC cs.PF

SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

Christian Lysenstøen

Comments 20 pages, 6 figures, 5 tables. Code and raw per-trial JSONL data: https://github.com/Chrislysen/SLO-Guard

2604.17626 2026-04-21 cs.AI cs.CL cs.SE

Toward Reusability of AI Models Using Dynamic Updates of AI Documentation

Peter Bajcsy, Walid Keyrouz

Comments 28 pages, 16 figures, 9 tables

2604.17622 2026-04-21 cs.LG

STRIKE: Additive Feature-Group-Aware Stacking Framework for Credit Default Prediction

Swattik Maiti, Ritik Pratap Singh, Fardina Fathmiul Alam

Comments 17 pages, 5 figures

详情

英文摘要

Credit risk default prediction remains a cornerstone of risk management in the financial industry. The task involves estimating the likelihood that a borrower will fail to meet debt obligations, an objective critical for lending decisions, portfolio optimization, and regulatory compliance. Traditional machine learning models such as logistic regression and tree-based ensembles are widely adopted for their interpretability and strong empirical performance. However, modern credit datasets are high-dimensional, heterogeneous, and noisy, increasing overfitting risk in monolithic models and reducing robustness under distributional shift. We introduce STRIKE (Stacking via Targeted Representations of Isolated Knowledge Extractors), a feature-group-aware stacking framework for structured tabular credit risk data. Rather than training a single monolithic model on the complete dataset, STRIKE partitions the feature space into semantically coherent groups and trains independent learners within each group. This decomposition is motivated by an additive perspective on risk modeling, where distinct feature sources contribute complementary evidence that can be combined through a structured aggregation. The resulting group-specific predictions are integrated through a meta-learner that aggregates signals while maintaining robustness and modularity. We evaluate STRIKE on three real-world datasets spanning corporate bankruptcy and consumer lending scenarios. Across all settings, STRIKE consistently outperforms strong tree-based baselines and conventional stacking approaches in terms of AUC-ROC. Ablation studies confirm that performance gains stem from meaningful feature decomposition rather than increased model complexity. Our findings demonstrate that STRIKE is a stable, scalable, and interpretable framework for credit risk default prediction tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.17614 2026-04-21 cs.AI cs.CL cs.LG

Characterizing Model-Native Skills

Feiyang Kang, Mahavir Dabas, Myeongseob Ko, Ruoxi Jia

Comments We argue that when the goal is to intervene on model behavior, skill characterization should be *model-native*: grounded in the model's own representations rather than imposed through external ontologies

详情

英文摘要

Skills are a natural unit for describing what a language model can do and how its behavior can be changed. However, existing characterizations rely on human-written taxonomies, textual descriptions, or manual profiling pipelines--all external hypotheses about what matters that need not align with the model's internal representations. We argue that when the goal is to intervene on model behavior, skill characterization should be *model-native*: grounded in the model's own representations rather than imposed through external ontologies. We instantiate this view by recovering a compact orthogonal basis from sequence-level activations. The resulting basis is semantically interpretable but need not correspond to any predefined human ontology; instead, it captures axes of behavioral variation that the model itself organizes around. We validate this characterization on reasoning post-training, using the recovered basis for both SFT data selection and inference-time steering. We develop lightweight proxy interventions to identify which directions are most useful for a given model. Across Llama3-8B and Qwen2.5-3B, selecting data along those directions improves Pass@1 by up to 20% on MATH and 41% on AMC, outperforming data selection based on human-characterized skills. Because the basis lives in activation space, the same directions also serve as steering vectors at inference time, improving Pass@8 by up to 4.8% on MATH--an intervention that human-characterized skills cannot support. We further validate the characterization on safety alignment, where selecting adversarial training data for model-native skill coverage rather than textual diversity yields more sample-efficient learning. These results suggest that recovering skills from the model's own representations, rather than imposing them externally, provides a more effective foundation for intervening on model behavior. Codes are open-sourced.

URL PDF HTML ☆

赞 0 踩 0

2604.17611 2026-04-21 cs.LG cs.AI

STEP-PD: Stage-Aware and Explainable Parkinson's Disease Severity Classification Using Multimodal Clinical Assessments

Md Mezbahul Islam, John Michael Templeton, Christian Poellabauer, Ananda Mohan Mondal

Comments 10 pages, 6 figures, 4 tables, accepted at IEEE International Conference on Healthcare Informatics (ICHI 2026)

2604.17609 2026-04-21 cs.CL cs.LG

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

Leon Engländer, Sophia Althammer, Ahmet Üstün, Matthias Gallé, Tom Sherborne

2604.17585 2026-04-21 cs.CV cs.AI cs.LG

DGSSM: Diffusion guided state-space models for multimodal salient object detection

Suklav Ghosh, Arijit Sur, Pinaki Mitra

Comments Accepted at ICPR 2026. Diffusion-guided Mamba framework for multimodal salient object detection. Evaluated on 13 benchmarks (RGB, RGB-D, RGB-T)

2604.17584 2026-04-21 cs.AI

DIRCR: Dual-Inference Rule-Contrastive Reasoning for Solving RAVENs

Jiachen Zhang, Chengtai Li, Jianfeng Ren, Linlin Shen, Zheng Lu, Ruibin Bai

Comments Accepted By ICASSP 2026

2604.17581 2026-04-21 cs.LG cs.AI q-bio.NC

How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function

Paul M. Thompson

Comments 25 pages, 5 figures

详情

英文摘要

How much data is enough to make a scientific discovery? As biomedical datasets scale to millions of samples and AI models grow in capacity, progress increasingly depends on predicting when additional data will substantially improve performance. In practice, model development often relies on empirical scaling curves measured across architectures, modalities, and dataset sizes, with limited theoretical guidance on when performance should improve, saturate, or exhibit cross-over behavior. We propose a scaling-law framework for cross-modal discoverability based on spectral structure of data covariance operators, task-aligned signal projections, and learned representations. Many performance metrics, including AUC, can be expressed in terms of cumulative signal-to-noise energy accumulated across identifiable spectral modes of an encoder and cross-modal operator. Under mild assumptions, this accumulation follows a zeta-like scaling law governed by power-law decay of covariance spectra and aligned signal energy, leading naturally to the appearance of the Riemann zeta function. Representation learning methods such as sparse models, low-rank embeddings, and multimodal contrastive objectives improve sample efficiency by concentrating useful signal into earlier stable modes, effectively steepening spectral decay and shifting scaling curves. The framework predicts cross-over regimes in which simpler models perform best at small sample sizes, while higher-capacity or multimodal encoders outperform them once sufficient data stabilizes additional degrees of freedom. Applications include multimodal disease classification, imaging genetics, functional MRI, and topological data analysis. The resulting zeta law provides a principled way to anticipate when scaling data, improving representations, or adding modalities is most likely to accelerate discovery.

URL PDF HTML ☆

赞 0 踩 0

2604.17574 2026-04-21 cs.CL

Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang

2604.17570 2026-04-21 cs.CV cs.AI

PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation

Yuanlong Wang, Weichi Chen, Adrian Rajab, Wenfang Liu, Yulan Jin, Andrew Srisuwananukorn, Ping Zhang

Comments 19 pages, 12 figures, Accepted by CVPR Findings 2026

2604.17569 2026-04-21 cs.CL

MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring

Salam Albatarni, May Bashendy, Sohaila Eltanbouly, Tamer Elsayed

Comments Accepted at ACL Findings 2026

2604.17568 2026-04-21 cs.LG math.ST stat.ML stat.TH

Diverse Dictionary Learning

Yujia Zheng, Zijian Li, Shunxing Fan, Andrew Gordon Wilson, Kun Zhang

Comments ICLR 2026

2604.17567 2026-04-21 cs.CV eess.IV

Multi-Camera Self-Calibration in Sports Motion Capture: Leveraging Human and Stick Poses

Fan Yang, Changsoo Jung, Ryosuke Kawamura, Hon Yung Wong

2604.17562 2026-04-21 cs.AI cs.MA

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni, Min Zhu

2604.17543 2026-04-21 cs.CL

PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs

Yuting Huang, Yinghao Hu, Qian Xiao, Wenlin Zhong, Yiquan Wu, Taishi Zhou, Moke Chen, Changlong Sun, Kun Kuang, Fei Wu

2604.17542 2026-04-21 cs.CV

Dual Strategies for Test-Time Adaptation

Nam Nguyen Phuong, Duc Nguyen The Minh, Phi Le Nguyen, Ehsan Abbasnejad, Minh Hoai

Comments Findings of Computer Vision and Pattern Recognition 2026