arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.27644 2026-05-08 cs.LG cs.AI cs.PL

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

Chengcao Yang

Comments v2: Updated abstract; strengthened the proof of Proposition 4.1; corrected minor typos; corrected author list

详情

英文摘要

We propose a paradigm shift toward open-ended curriculum self-play: rather than learning to answer on a fixed prompt set, a unified policy learns to question: generating verifiable problems, solving them, and turning verifier feedback into self-improvement without human-annotated solutions. We introduce ANCORA, in which the policy alternates between a Proposer that synthesizes novel specifications and a Solver that produces verified solutions, anchored by three load-bearing mechanisms: a two-level group-relative update coupling Proposer advantages across specifications with Solver advantages across solution attempts; iterative self-distilled SFT projecting the base model onto its valid-output manifold before RL; and a UCB-guided Curriculum DAG whose policy-induced problem set can provably expand under self-composition. Without these stabilizers, sparse verifier feedback drives Proposer collapse even under MLRL-aligned rewards; with them, ANCORA bootstraps a verifiable curriculum from zero human solutions. Instantiated in Verus, ANCORA lifts Dafny2Verus pass@1 from a 26.6% SFT baseline to 81.5% in test-time training (TTT, 0-shot), outperforming PSV self-play by 15.8 points despite PSV's 1-shot inference; in a transfer setting, training from Dafny2Verus seeds yields 36.2% and 17.2% pass@1 on held-out MBPP and HumanEval.

URL PDF HTML ☆

赞 0 踩 0

2604.26799 2026-05-08 cs.CV cs.GR cs.MM

MesonGS++: Post-training Compression of 3D Gaussian Splatting with Hyperparameter Searching

Shuzhao Xie, Junchen Ge, Weixiang Zhang, Jiahang Liu, Chen Tang, Yunpeng Bai, Shijia Ge, Jingyan Jiang, Yuzhi Huang, Fengnian Yang, Cong Zhang, Xiaoyi Fan, Zhi Wang

Comments https://github.com/mmlab-sigs/mesongs_plus

2604.26227 2026-05-08 cs.CV

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

Runzhong Zhang, Suchen Wang, Yueqi Duan, Yansong Tang, Yue Zhang, Yap-Peng Tan

Comments Accepted to IJCAI 2023

2604.24916 2026-05-08 cs.RO cs.AI

asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics

Fang Wan, Guangyi Huang, Tianyu Wu, Zishang Zhang, Bangchao Huang, Haoran Sun, Mingdong Chen, Chaoyang Song

Comments 10 pages, 9 figure, accepted for RSS2026. For Supplementary Videos, see https://bionicdl.ancorasir.com/?p=2238

2604.23045 2026-05-08 cs.LG

A Differentiable Framework for Global Circulation Model Precipitation Bias Correction

Kamlesh Sawadekar, Seth McGinnis, Peijun Li, Kathryn Lawson, Chaopeng Shen

Comments 27 pages, 8 figures, 3 tables

2604.22056 2026-05-08 cs.LG cs.NI eess.SP

Learning Coverage- and Power-Optimal Transmitter Placement from Building Maps: A Comparative Study of Direct and Indirect Neural Approaches

Çağkan Yapar

2604.21137 2026-05-08 cs.CL cs.AI

Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification

Jiho Noh, Mukhesh Raghava Katragadda, Raymond Carl, Soon Lee

2604.21106 2026-05-08 cs.LG cs.CL

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

Comments v3: substantially refined framing + minor corrections v2: added case studies on truncated-BPTT and hyperconnections

2604.20658 2026-05-08 cs.CL cs.CY cs.MA

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

Shivani Kumar, Adarsh Bharathwaj, David Jurgens

2604.20568 2026-05-08 cs.LG cs.IT math.IT stat.ME

Amortized Vine Copulas for High-Dimensional Density and Information Estimation

Houman Safaai

2604.20229 2026-05-08 cs.SD cs.AI

Enhancing Speaker Verification with Whispered Speech via Post-Processing

Magdalena Gołębiowska, Piotr Syga

Comments 15 pages, 3 figures, conference paper at ACIIDS 2026

2604.19675 2026-05-08 cs.CV

MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention

Zhi Chen, Runze Hu, Le Zhang

2604.18916 2026-05-08 cs.AI

Benchmarking PNW Model for MedMNIST to 100% Accuracy

Bo Deng

Comments 12 pages, 4 figure, 1 table

2604.18753 2026-05-08 cs.LG cs.AI

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

Andrew Wang, Ellie Pavlick, Ritambhara Singh

2604.17573 2026-05-08 cs.AI

Beyond Static Snapshots: A Grounded Evaluation Framework for Language Models at the Agentic Frontier

Jazmia Henry

Comments Submitted for consideration to NeurIPS 2026

详情

英文摘要

We argue that current evaluation frameworks for large language models (LLMs) suffer from four systematic failures that make them structurally inadequate for deployed, agentic systems: distributional, temporal, scope, and process invalidity. These failures compound in RLHF, making reward hacking a predictable consequence of evaluation design rather than an unpredictable training pathology, and RLHF's dual-model architecture imposes a hardware barrier limiting evaluation reproducibility. We propose the Grounded Continuous Evaluation (GCE) framework and present ISOPro as a reference implementation. ISOPro replaces the learned reward model with a deterministic verifier, eliminating reward hacking by construction in verifiable-reward domains, and updates LoRA adapters on CPU, reducing the hardware barrier by an order of magnitude. We validate ISOPro across three architectures (Qwen 2.5 3B, Llama 3.2 3B, Gemma 2 2B) and two domains (scheduling, MBPP), with a head-to-head matched-compute comparison against GRPO-LoRA. Across twelve cells, ISOPro produces the largest absolute capability gains (+25.6, +22.2, +16.0pp) at mean delta +9.0pp and worst-case regression -5.6pp; GRPO-LoRA at consumer-budget hyperparameters reaches a smaller peak gain (+8.5pp), deeper worst-case regression (-10pp), and mean delta -1.5pp. Held-out compositional generalization on MBPP reaches 40% for ISOPro on two of three architectures (including a 0% to 40% bootstrap on Qwen 2.5 3B), against 20% for GRPO-LoRA on one of three. We characterize a buffer-skew failure mode in which the implicit curriculum can erode pre-existing tier capability under three preconditions, with three corresponding mitigations. The work is situated alongside DeepSeek-R1's GRPO, which arrived at the same architectural conclusion at scale: for verifiable-reward domains, the verifier is the reward signal.

URL PDF HTML ☆

赞 0 踩 0

2604.17137 2026-05-08 cs.LG cs.RO

BOIL: Learning Environment Personalized Information

Rohan Patil, Henrik I. Christensen

2604.15711 2026-05-08 cs.CV cs.AI

SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification

Enhui Chai, Sicheng Chen, Tianyi Zhang, Xingyu Li, Tianxiang Cui

详情

DOI: 10.1016/j.media.2026.104080
Journal ref: Medical Image Analysis, Volume 111, June 2026, 104080

英文摘要

Pathological diagnosis is highly reliant on image analysis, where Regions of Interest (ROIs) serve as the primary basis for diagnostic evidence, while whole-slide image (WSI)-level tasks primarily capture aggregated patterns. To extract these critical morphological features, ROI-level Foundation Models (FMs) based on Vision Transformers (ViTs) and large-scale self-supervised learning (SSL) have been widely adopted. However, three core limitations remain in their application to ROI analysis: (1) cross-magnification domain shift, as fixed-scale pretraining hinders adaptation to diverse clinical settings; (2) inadequate local-global relationship modeling, wherein the ViT backbone of FMs suffers from high computational overhead and imprecise local characterization; (3) insufficient fine-grained sensitivity, as traditional self-attention mechanisms tend to overlook subtle diagnostic cues. To address these challenges, we propose SSMamba, a hybrid SSL framework that enables effective fine-grained feature learning without relying on large external datasets. This framework incorporates three domain-adaptive components: Mamba Masked Image Modeling (MAMIM) for mitigating domain shift, a Directional Multi-scale (DMS) module for balanced local-global modeling, and a Local Perception Residual (LPR) module for enhanced fine-grained sensitivity. Employing a two-stage pipeline, SSL pretraining on target ROI datasets followed by supervised fine-tuning (SFT), SSMamba outperforms 11 state-of-the-art (SOTA) pathological FMs on 10 public ROI datasets and surpasses 8 SOTA methods on 6 public WSI datasets. These results validate the superiority of task-specific architectural designs for pathological image analysis.

URL PDF HTML ☆

赞 0 踩 0

2604.05834 2026-05-08 cs.LG

Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning

Tillmann Rheude, Stefan Hegselmann, Roland Eils, Benjamin Wild

2604.05438 2026-05-08 cs.LG cs.CL

Residual-Mass Accounting for Partial-KV Decoding

Yasuto Hoshi, Daisuke Miyashita, Jun Deguchi

2604.04552 2026-05-08 cs.CV cs.AI

StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods

Zheng Li, Jerry Cheng, Huanying Helen Gu

Comments 27 pages, 10 figures, 9 tables

2603.29552 2026-05-08 cs.CL cs.AI cs.LG

Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models

Linda Zeng, Steven Y. Feng, Michael C. Frank

Comments Code and data at https://github.com/lindazeng979/bilingual-babyLM

2603.28964 2026-05-08 cs.LG cs.AI

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

Yongzhong Xu

Comments 63 pages, 5 figures

2603.28129 2026-05-08 cs.RO

A Position Statement on Endovascular Models and Effectiveness Metrics for Mechanical Thrombectomy Navigation, on behalf of the Stakeholder Taskforce for AI-assisted Robotic Thrombectomy (START)

Harry Robertshaw, Anna Barnes, Phil Blakelock, Raphael Blanc, Robert Crossley, Rebecca Fahrig, Ameer E. Hassan, Benjamin Jackson, Lennart Karstensen, Neelam Kaur, Markus Kowarschik, Jeremy Lynch, Franziska Mathis-Ullrich, Dwight Meglan, Vitor Mendes Pereira, Mouloud Ourak, Matteo Pantano, S. M. Hadi Sadati, Alice Taylor-Gee, Tom Vercauteren, Phil White, Alejandro Granados, Thomas C. Booth

Comments Published in Journal of the American Heart Association

详情

DOI: 10.1161/JAHA.125.044931
Journal ref: J Am Heart Assoc. 2026;15:e044931

英文摘要

While we are making progress in overcoming infectious diseases and cancer; one of the major medical challenges of the mid-21st century will be the rising prevalence of stroke. Large vessels occlusions are especially debilitating, yet effective treatment (needed within hours to achieve best outcomes) remains limited due to geography. One solution for improving timely access to mechanical thrombectomy in geographically diverse populations is the deployment of robotic surgical systems. Artificial intelligence (AI) assistance may enable the upskilling of operators in this emerging therapeutic delivery approach. Our aim was to establish consensus frameworks for developing and validating AI-assisted robots for thrombectomy. Objectives included standardizing effectiveness metrics and defining reference testbeds across in silico, in vitro, ex vivo, and in vivo environments. To achieve this, we convened experts in neurointervention, robotics, data science, health economics, policy, statistics, and patient advocacy. Consensus was built through an incubator day, a Delphi process, and a final Position Statement. We identified that the four essential testbed environments each had distinct validation roles. Realism requirements vary: simpler testbeds should include realistic vessel anatomy compatible with guidewire and catheter use, while standard testbeds should incorporate deformable vessels. More advanced testbeds should include blood flow, pulsatility, and disease features. There are two macro-classes of effectiveness metrics: one for in silico, in vitro, and ex vivo stages focusing on technical navigation, and another for in vivo stages, focused on clinical outcomes. Patient safety is central to this technology's development. One requisite patient safety task needed now is to correlate in vitro measurements to in vivo complications.

URL PDF HTML ☆

赞 0 踩 0

2603.27389 2026-05-08 cs.LG cs.AI stat.ML

Prediction-Based Markov Violation Scores for Detecting Non-Markovian Observations in Reinforcement Learning

Naveen Mysore

Comments Accepted at RLC 2026, to appear in Reinforcement Learning Journal

2603.20991 2026-05-08 cs.LG cs.AI cs.CL cs.LO

Structural Sensitivity in Compressed Transformers: Relative Error Propagation and Layer Removal

Abhinaba Basu, Kumkum Basu, Koushik Deb

2603.20180 2026-05-08 cs.CV cs.AI cs.CL

Adaptive Greedy Frame Selection for Long Video Understanding

Yuning Huang, Xiaoyu Ji, Joseph Huang, Yichi Zhang, Fengqing Zhu

2603.20079 2026-05-08 cs.CL

Predicting States of Understanding in Explanatory Interactions Using Cognitive Load-Related Linguistic Cues

Yu Wang, Olcay Türk, Angela Grimminger, Hendrik Buschmeier

2603.19715 2026-05-08 cs.AI

Neuro-Symbolic Proof Generation for Scaling Systems Software Verification

Baoding He, Zenan Li, Wei Sun, Yuan Yao, Taolue Chen, Xiaoxing Ma, Zhendong Su

Comments Published as a conference paper at OSDI'2026, and code is available at \url{https://github.com/SoaringE/seL4-proof-search}

2603.18257 2026-05-08 cs.LG cs.AI

Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning

Jiaxin Liu, Anzhe Cheng, Paul Bogdan

2603.17980 2026-05-08 cs.CV

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

Shuyao Shi, Kang G. Shin

Comments 22 pages, 10 figures