arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24620 2026-04-28 cs.CL

Looking for the Bottleneck in Fine-grained Temporal Relation Classification

Hugo Sousa, Ricardo Campos, Alípio Jorge

详情

DOI: 10.1145/3805712.3809581

英文摘要

Temporal relation classification is the task of determining the temporal relation between pairs of temporal entities in a text. Despite recent advancements in natural language processing, temporal relation classification remains a considerable challenge. Early attempts framed this task using a comprehensive set of temporal relations between events and temporal expressions. However, due to the task complexity, datasets have been progressively simplified, leading recent approaches to focus on the relations between event pairs and to use only a subset of relations. In this work, we revisit the broader goal of classifying interval relations between temporal entities by considering the full set of relations that can hold between two time intervals. The proposed approach, Interval from Point, involves first classifying the point relations between the endpoints of the temporal entities and then decoding these point relations into an interval relation. Evaluation on the TempEval-3 dataset shows that this approach can yield effective results, achieving a temporal awareness score of $70.1$ percent, a new state-of-the-art on this benchmark.

URL PDF HTML ☆

赞 0 踩 0

2604.24618 2026-04-28 cs.AI

Evaluating whether AI models would sabotage AI safety research

Robert Kirk, Alexandra Souly, Kai Fronsdal, Abby D'Cruz, Xander Davies

2604.24616 2026-04-28 cs.CV

Infrastructure-Guided Connectivity-Enhanced Road Crack Detection and Estimation

Haosong Xiao, Yamini Ramesh, Rishabh Shukla, Swarat Sarkar, Chaozhe R. He

Comments Accepted and will be presented at the Fourth IEEE International Conference on Mobility: Operations, Services, and Technologies (MOST) on May 4 - 6, 2026 at Detroit, Michigan

2604.24612 2026-04-28 cs.AI cs.LO math.CT math.LO

NeSyCat: A Monad-Based Categorical Semantics of the Neurosymbolic ULLER Framework

Daniel Romero Schellhorn, Till Mossakowski

Comments 42 pages. Submitted to Neurosymbolic Artificial Intelligence (IOS Press), after extending from a conference paper of NeSy25

2604.24611 2026-04-28 cs.LG

Uncovering Latent Patterns in Social Media Usage and Mental Health: A Clustering-Based Approach Using Unsupervised Machine Learning

Md All Shahria, Sanjeda Dewan Mithila, Touhid Alam, Mohammad Sakib Mahmood, Mahfuza Khatun

Comments 13 pages, 5 figures, International Conference on Advancement in Healthcare Technology and Biomedical Engineering, Vancouver, BC, Canada

2604.24609 2026-04-28 cs.CL

Evaluation of Pose Estimation Systems for Sign Language Translation

Catherine O'Brien, Gerard Sant, Mathias Müller, Sarah Ebling

Comments Accepted at LREC 2026 Workshop on the Representation and Processing of Sign Languages. O'Brien and Sant contributed equally to this paper. 16 pages, 6 figures

2604.24606 2026-04-28 cs.RO cs.SY eess.SY

Hybrid A*-Based Reverse Path-Planning of a Vehicle with Trailer System

Xincheng Cao, Haochong Chen, Bilin Aksun-Guvenc, Levent Guvenc, Brian Link, Peter J Richmond, Dokyung Yim, Shihong Fan, John Harber

2604.24602 2026-04-28 cs.CV

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

Lixian Chen, Mingxuan Huang, Yanhui Chen, Junyi Lin, Yang Shi

2604.24590 2026-04-28 cs.LG cs.CE

Fraud Detection in Cryptocurrency Markets with Spatio-Temporal Graph Neural Networks

Lidia Losavio, Luca Persia, Madan Sathe, Dimosthenis Pasadakis

Comments 9 pages, 3 figures, Accepted at the SDS2026: IEEE Swiss Conference on Data Science and AI

2604.24589 2026-04-28 cs.AI astro-ph.GA astro-ph.IM

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Wenke Ren, Hengxiao Guo, Wenwen Zuo, Xiaoman Zhang

Comments 24 pages, 5 figures

详情

英文摘要

Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities remains untested. We present AstroVLBench, a comprehensive benchmark comprising over 4,100 expert-verified instances across five tasks spanning optical imaging, radio interferometry, multi-wavelength photometry, time-domain light curves, and optical spectroscopy. Evaluating six frontier models, we find that performance is strongly modality-dependent: while one model (Gemini 3 Pro) emerges as the most consistently capable across tasks, task-specific strengths vary, and all models substantially underperform domain-specialized methods. Mechanistic ablations reveal that performance depends not only on directing attention to salient visual features but also on grounding those features in physical knowledge. Phenomenological prompts describing what to look for improve accuracy by sharpening model focus, but physical prompts explaining why those features matter perform better overall and yield more balanced classifications with reduced class-specific bias. Consistent with this picture, presenting the underlying one-dimensional measurements directly as numerical tables instead of rendered plots yields up to 13 percentage points improvement. Reasoning quality analysis further demonstrates that, without explicit physical grounding, models may reach correct predictions from phenomenologically plausible cues while providing physically imprecise justifications, establishing that accuracy alone is insufficient for trustworthy scientific deployment. These findings provide the first systematic, multi-modal baselines for VLMs in observational astronomy and identify the specific representation, grounding, and reasoning bottlenecks where current models fail.

URL PDF HTML ☆

赞 0 踩 0

2604.24586 2026-04-28 cs.CV

Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows

Yuta Baba, Keiji Yanai

Comments 28 pages, 14 figures

2604.24575 2026-04-28 cs.CV

Diffusion Model as a Generalist Segmentation Learner

Haoxiao Wang, Antao Xiang, Haiyang Sun, Peilin Sun, Changhao Pan, Yifu Chen, Minjie Hong, Weijie Wang, Shuang Chen, Yue Chen, Zhou Zhao

2604.24572 2026-04-28 cs.AI cs.MA

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

Niko Moeller-Grell, Shihao Shenzhang, Zhangshu Joshua Jiang, Richard JB Dobson, Vishnu V Chandrabalan

详情

英文摘要

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of electronic health records data of nearly one billion patients in 83 countries. Yet generating real-world evidence (RWE) from these repositories remains a manual process requiring clinical, epidemiological and technical expertise. LLMs and multi-agent systems have shown promise for clinical tasks, but RWE automation exposes a fundamental challenge: agentic systems introduce emergent behaviours, coordination failures and safety risks that existing approaches fail to govern. No infrastructure exists to ensure agentic RWE generation is flexible, safe and auditable across the lifecycle. We introduce FastOMOP, an open-source multi-agent architecture that addresses this gap by separating three infrastructure layers, governance, observability and orchestration, from pluggable agent-teams. Governance is enforced at the process boundary through deterministic validation independent of agent reasoning, ensuring no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. We validated FastOMOP using a natural-language-to-SQL agent team across three OMOP CDM datasets: synthetic data from Synthea, MIMIC-IV and a real-world NHS dataset from Lancashire Teaching Hospitals (IDRIL). FastOMOP achieved reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, demonstrating process-boundary governance delivers safety guarantees independent of model choice. These results indicate that the reliability gap in RWE deployment is architectural rather than model capability, and establish FastOMOP as a governed architecture for progressive RWE automation.

URL PDF HTML ☆

赞 0 踩 0

2604.24562 2026-04-28 cs.AI cs.CL cs.CY

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Bowen Jian, Rongjie Yu, Hong Wang, Liqiang Wang, Zihang Zou

2604.24559 2026-04-28 cs.CL cs.AI

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

Zhihan Zhang, Lizi Liao

Comments Accepted to ACL 2026 Main Conference

2604.24558 2026-04-28 cs.AI cs.LG

Hierarchical Behaviour Spaces

Michael Tryfan Matthews, Anssi Kanervisto, Jakob Foerster, Pierluca D'Oro, Scott Fujimoto, Mikael Henaff

2604.24555 2026-04-28 cs.LG stat.ML

Efficient learning by implicit exploration in bandit problems with side observations

Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos

Comments Published at Neural Information Processing Systems (NeurIPS) 2014

2604.24549 2026-04-28 cs.LG cs.AI

GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility

Yihong Zhou, Hongtai Zeng, Thomas Morstyn

2604.24547 2026-04-28 cs.LG

Dialysis Risk Prediction and Treatment Effect Estimation for AKI patients using Longitudinal Electronic Health Records

Kalyani P. Pande, Evan Yang, Bryan Zhu, Sandeep K. Mallipattu, Alisa Yurovsky, Tengfei Ma

2604.24544 2026-04-28 cs.AI cs.CL

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

Alessio Sordo, Lingxiao Du, Meeka-Hanna Lenisa, Evgeny Bogdanov, Maxim Romanovsky

2604.24543 2026-04-28 cs.CV

RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting

Jinghao Shi, Mengqi Lei, Kunliang He, Yun Li, Wei Bao, Siqi Li

2604.24537 2026-04-28 cs.LG stat.ML

Stochastic simultaneous optimistic optimization

Michal Valko, Alexandra Carpentier, Rémi Munos

Comments Published in International Conference on Machine Learning (ICML 2013)

2604.24536 2026-04-28 cs.CL

Generating Place-Based Compromises Between Two Points of View

Sumanta Bhattacharyya, Francine Chen, Scott Carter, Yan-Ying Chen, Tatiana Lau, Nayeli Suseth Bravo, Monica P. Van, Kate Sieck, Charlene C. Wu

2604.24532 2026-04-28 cs.LG

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

Ying-Tu Chen, Wei Hung, Bing-Shu Wu, Zhang-Wei Hong, Ping-Chun Hsieh

Comments ICLR 2026

2604.24527 2026-04-28 cs.AI

Interoceptive machine framework: Toward interoception-inspired regulatory architectures in artificial intelligence

Diego Candia-Rivera

2604.24524 2026-04-28 cs.CV

Point Cloud Registration for Fusion between SPECT MPI and CTA Images

Ni Yao, Xiangyu Liu, Shaojie Tang, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chengyang Li, Fubao Zhu, Chen Zhao, Zhihui Xu, Weihua Zhou

详情

英文摘要

Clinical fusion of Single Photon Emission Computed Tomography Myocardial Perfusion Imaging (SPECT MPI) and Computed Tomography Angiography (CTA) remains limited by cross-modality misregistration and reliance on manual landmarks, which can hinder accurate ischemia localization and lesion-level functional assessment. To address this issue, we propose a registration and fusion framework for SPECT MPI and CTA that integrates functional and structural information for comprehensive cardiac evaluation. The proposed pipeline performs U-Net-based segmentation on both modalities. On SPECT MPI, only the left ventricle (LV) is extracted, and anatomical landmarks are automatically derived from characteristic LV structures. On CTA, both ventricles are segmented, and their spatial relationship is used to automatically define landmarks at the interventricular septal junction. Scale-space consistency preprocessing and landmark-driven coarse registration are applied to mitigate initial misalignment. Based on this initialization, multiple fine registration methods are evaluated on LV epicardial surface point clouds, including ICP, SICP, CPD, CluReg, FFD, and BCPD-plus-plus. The resulting transformations are then propagated to voxel-level resampling for high-precision SPECT-CTA fusion. In a retrospective cohort of 60 patients, the proposed framework preserved sub-millimeter coronary detail from CTA while accurately overlaying quantitative SPECT perfusion. Among the evaluated methods, BCPD-plus-plus achieved the highest accuracy with a mean point cloud distance of 1.7 mm. By combining robust initialization, comparative fine registration, and voxel-level fusion, the proposed approach provides a practical solution for myocardial ischemia localization and functional evaluation of coronary lesions, while remaining independent of any specific fine registration algorithm.

URL PDF HTML ☆

赞 0 踩 0

2604.24515 2026-04-28 cs.CL

SEARCH-R: Structured Entity-Aware Retrieval with Chain-of-Reasoning Navigator for Multi-hop Question Answering

Yuqing Fu, Yimin Deng, Wanyu Wang, Yuhao Wang, Yejing Wang, Hongshi Liu, Yiqi Wang, Xiao Han, Maolin Wang, Guoshuai Zhao, Yi Chang, Xiangyu Zhao

Comments ACL2026 findings

2604.24512 2026-04-28 cs.AI

Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols

Dahlia Shehata, Ming Li

详情

英文摘要

As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and formalize a systemic failure mode termed the Attention Latch in decoder-only autoregressive Transformers. This phenomenon, a behavioral manifestation of Information Over-squashing, occurs when the cumulative probabilistic weight of historical context overrides mid-task updates, causing agents to remain anchored to obsolete constraints despite explicit contradictory instructions. We propose Self-Synthesizing Reasoning Protocols (SSRP), a metacognitive framework that implements a discrete separation between high-level architectural planning (Architect) and turn-by-turn procedural execution (Executive). We evaluate SSRP across 9K trajectories using the MultiWOZ 2.2 dataset and the Aggregate Pivot Accuracy (APA), a novel metric we validate by mapping its scores to the U-shaped 'Lost in the Middle' curve. We present 3 experimental tiers: a shallow recency-based retrieval pilot, a high-entropy SOP, and a semantic hijacked 3-hop Multi-Fact Synthesis task. Our results empirically locate the Attention Stability Boundary, where stateless Vanilla ReAct baselines for GPT 5.4 collapse to 0.1% success while SSRP achieves a 715X Resilience Lift. We demonstrate statistically significant gains across Gemini 3.1 Pro, Claude Sonnet 4.6 and DeepSeek V3.2. Audits confirm SSRP necessity by proving attentional lapse via a recursive reflexion baseline (100% success); decoupling the latch from positional bias through equidistant stress testing (90% accuracy); and formalizing SSRP via the Information Bottleneck principle and granularity ablations. Procedural Integrity audit (98.8% adherence) reveals a Grounding Paradox where high-stability models fail by refusing to hallucinate under retrieval-reasoning contamination.

URL PDF HTML ☆

赞 0 踩 0

2604.24506 2026-04-28 cs.AI cs.LG

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho, Miles Cranmer, Tom Hehir, Michael McCabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho

详情

英文摘要

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and aligned dataset, LORE, linking nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual modalities within partially observed biomolecular states. MIMIC uses a split-track encoder-decoder architecture to condition on arbitrary subsets of observed modalities and reconstruct or generate missing components of molecular state across the genome, transcriptome, and proteome. Multimodal conditioning consistently improves MIMIC's sequence reconstruction relative to sequence-only inputs, while its learned representations enable state-of-the-art performance on RNA and protein downstream tasks. MIMIC achieves state-of-the-art splicing prediction, and its joint generative formulation enables isoform-aware inference that further improves performance. Beyond prediction, the same generative framework supports constrained design. For RNA, MIMIC identifies corrective edits in a clinically relevant HBB splice-disrupting mutation without reverting it by using evolutionary and structural signals. For proteins, jointly conditioning on shape and surface chemistry of PD-L1 and hACE2 binding sites produces diverse, high-confidence sequences with strong in silico support for target binding. Finally, MIMIC uses experimental context as semantic conditioning to model assay-dependent RNA chemical probing, rather than treating context as a fixed output. Together, these results position MIMIC's aligned multimodal generative modeling as a strong foundation for unifying representation learning, conditional prediction, and constrained biomolecular design within a single model.

URL PDF HTML ☆

赞 0 踩 0

2604.24498 2026-04-28 cs.CV

Self-Supervised Representation Learning via Hyperspherical Density Shaping

Esteban Rodríguez-Betancourt, Edgar Casasola-Murillo

Comments 8 pages, 8 figures, 4 tables