arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.22335 2026-04-27 cs.CL

Context-Fidelity Boosting: Enhancing Faithful Generation through Watermark-Inspired Decoding

Weixu Zhang, Fanghua Ye, Qiang Gao, Jian Li, Haolun Wu, Yuxing Tian, Sijing Duan, Nan Du, Xiaolong Li, Xue Liu

Comments Accepted at ACL 2026

详情

英文摘要

Large language models (LLMs) often produce content that contradicts or overlooks information provided in the input context, a phenomenon known as faithfulness hallucination. In this paper, we propose Context-Fidelity Boosting (CFB), a lightweight and general decoding-time framework that reduces such hallucinations by increasing the generation probability of source-supported tokens. Motivated by logit-shaping principles from watermarking techniques, CFB applies additive token-level logit adjustments based on a token's degree of support from the input context. Specifically, we develop three boosting strategies: static boosting, which applies a fixed bias to source-supported tokens; context-aware boosting, which scales this bias using the divergence between next-token distributions with and without context; and token-aware boosting, which further redistributes the adaptive bias according to local relevance estimated from source-position attention and source-scoped semantic similarity. CFB requires no retraining or architectural changes, making it compatible with a wide range of LLMs. Experiments on summarization and question answering tasks across multiple open-source LLMs show that CFB consistently improves faithfulness metrics with minimal generation overhead. Our implementation is fully open-sourced.

URL PDF HTML ☆

赞 0 踩 0

2604.22333 2026-04-27 cs.CV cs.AI

ChangeQuery: Advancing Remote Sensing Change Analysis for Natural and Human-Induced Disasters from Visual Detection to Semantic Understanding

Dongwei Sun, Jing Yao, Kan Wei, Xiangyong Cao, Chen Wu, Zhenghui Zhao, Pedram Ghamisi, Jun Zhou, Jón Atli Benediktsson

详情

英文摘要

Rapid situational awareness is critical in post-disaster response. While remote sensing damage assessment is evolving from pixel-level change detection to high-level semantic analysis, existing vision-language methodologies still struggle to provide actionable intelligence for complex strategic queries. They remain severely constrained by unimodal optical dependence, a prevailing bias towards natural disasters, and a fundamental lack of grounded interactivity. To address these limitations, we present ChangeQuery, a unified multimodal framework designed for comprehensive, all-weather disaster situation awareness. To overcome modality constraints and scenario biases, we construct the Disaster-Induced Change Query (DICQ) dataset, a large-scale benchmark coupling pre-event optical semantics with post-event SAR structural features across a balanced distribution of natural catastrophes and armed conflicts. Furthermore, to provide the high-quality supervision required for interactive reasoning, we propose a novel Automated Semantic Annotation Pipeline. Adhering to a ``statistics-first, generation-later'' paradigm, this engine automatically transforms raw segmentation masks into grounded, hierarchical instruction sets, effectively equipping the model with fine-grained spatial and quantitative awareness. Trained on this structured data, the ChangeQuery architecture operates as an interactive disaster analyst. It supports multi-task reasoning driven by diverse user queries, delivering precise damage quantification, region-specific descriptions, and holistic post-disaster summaries. Extensive experiments demonstrate that ChangeQuery establishes a new state-of-the-art, providing a robust and interpretable solution for complex disaster monitoring. The code is available at \href{https://sundongwei.github.io/changequery/}{https://sundongwei.github.io/changequery/}.

URL PDF HTML ☆

赞 0 踩 0

2604.22331 2026-04-27 cs.CV

Depth-Aware Rover: A Study of Edge AI and Monocular Vision for Real-World Implementation

Lomash Relia, Jai G Singla, Amitabh, Nitant Dube

Comments Accepted by IEEE

2604.22328 2026-04-27 cs.LG cs.AI cs.CE

FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

Marco Obermeier, Marco Pruckner, Florian Haselbeck, Andreas Zeiselmair

详情

英文摘要

Driven by the transition towards a climate-neutral energy system, accurate energy time series forecasting is critical for planning and operation. Yet, it remains largely a dataset-specific task, requiring comprehensive training data, limiting scalability, and resulting in high model development and maintenance effort. Recently, foundation models that aim to learn generalizable patterns via extensive pretraining have shown superior performance in multiple prediction tasks. Despite their success and strong potential to address challenges in energy forecasting, their application in this domain remains largely unexplored. We address this gap by presenting the Foundation Models in Energy Time Series Forecasting (FETS) benchmark. We (1) provide a structured overview of energy forecasting use cases along three main dimensions: stakeholders, attributes, and data categories; (2) collect and analyze 54 datasets across 9 data categories, guided by typical stakeholder interests; (3) benchmark foundation models against classical machine learning approaches across different forecasting settings. Foundation models consistently outperform dataset-specific optimized machine learning approaches across all settings and data categories, despite the latter having seen the full historic target data during training. In particular, covariate-informed foundation models achieve the strongest performance. Further analysis reveals a strong correlation between predictive performance and spectral entropy, performance saturation beyond a certain context length, and improved performance at higher aggregation levels such as national load, district heating, and power grid data. Overall, our findings highlight the strong potential of foundation models as scalable and generalizable forecasting solutions for the energy domain, particularly in data-constrained and privacy-sensitive settings.

URL PDF HTML ☆

赞 0 踩 0

2604.22325 2026-04-27 cs.CL

Dynamically Acquiring Text Content to Enable the Classification of Lesser-known Entities for Real-world Tasks

Fahmida Alam, Ellen Riloff

2604.22324 2026-04-27 cs.LG

A Brain-Inspired Deep Separation Network for Single Channel Raman Spectra Unmixing

Gaoruishu Long, Jinchao Liu, Bo Liu, Jie Liu, Xiaolin Hu

Comments Accepted by the 2026 International Joint Conference on Neural Networks (IJCNN 2026). 8 pages, 5 figures

2604.22313 2026-04-27 cs.CL

CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems

Tabinda Sarwar, Farhad Moghimifar, Cong Duy Vu Hoang, Xiaoxiao Ma, Shawn Chang Xu, Fahimeh Saleh, Poorya Zaremoodi, Avirup Sil, Katrin Kirchhoff

Comments Accepted at ACL 2026 (Industry Track)

2604.22310 2026-04-27 cs.CV

Revisiting Geometric Obfuscation with Dual Convergent Lines for Privacy-Preserving Image Queries in Visual Localization

Jeonggon Kim, Heejoon Moon, Je Hyeong Hong

Comments Accepted at CVPR 2026 (oral). Supplementary material included after references. 18 pages, 11 figures, 8 tables

2604.22302 2026-04-27 cs.CV

Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

Ran Zhao, Sheng Jin, Size Wu, Kang Liao, Zerui Gong, Zujin Guo, Yang Xiao, Wei Li

2604.22296 2026-04-27 cs.CV

Evaluation of image simulation open source solutions for simulation of synthetic images in lunar environment

Jai G Singla, Hinal B Patel, Nitant Dube

2604.22294 2026-04-27 cs.CL cs.AI

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam

Comments 49 pages (14 main), preprint

2604.22292 2026-04-27 cs.CL cs.AI

ReLeVAnT: Relevance Lexical Vectors for Accurate Legal Text Classification

Ishaan Gakhar, Harsh Nandwani

Comments 9 Pages, 2 figures

2604.22290 2026-04-27 cs.SD cs.MM eess.AS

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter, Sebastian Murgul, Michael Heizmann

Comments Accepted to the 5th International Conference on SMART MULTIMEDIA (ICSM), 2025

详情

英文摘要

Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI performances, beat-based rhythm quantization remains largely unexplored. In this work, we introduce a novel deep learning approach for quantizing MIDI performances using a priori beat information. Our method leverages the transformer architecture to effectively process synchronized score and performance data for training a quantization model. Key components of our approach include dataset preparation, a beat-based pre-quantization method to align performance and score times within a unified framework, and a MIDI tokenizer tailored for this task. We adapt a transformer model based on the T5 architecture to meet the specific requirements of rhythm quantization. The model is evaluated using a set of score-level metrics designed for objective assessment of quantization performance. Through systematic evaluation, we optimize both data representation and model architecture. Additionally, we apply performance and score augmentations, such as transposition, note deletion, and performance-side time jitter, to enhance the model's robustness. Finally, a qualitative analysis compares our model's quantization performance against state-of-the-art probabilistic and deep-learning models on various example pieces. Our model achieves an onset F1-score of 97.3% and a note value accuracy of 83.3% on the ASAP dataset. It generalizes well across time signatures, including those not seen during training, and produces readable score output. Fine-tuning on instrument-specific datasets further improves performance by capturing characteristic rhythmic and melodic patterns. This work contributes a robust and flexible framework for beat-based MIDI quantization using transformer models.

URL PDF HTML ☆

赞 0 踩 0

2604.22283 2026-04-27 cs.RO

A Kinematic Analysis of Palm Degrees of Freedom for Enhancing Thumb Opposability in Robotic Hands

HyoJae Kang, Yeong Jae Park, Hyunmok Jung, Joonho Lee, Dong Il Park

Comments This manuscript has been submitted for possible publication

2604.22281 2026-04-27 cs.CV

DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning

Joonmyung Choi, Sanghyeok Lee, Jongha Kim, Sehyung Kim, Dohwan Ko, Jihyung Kil, Hyunwoo J. Kim

Comments CVPR 2026

2604.22273 2026-04-27 cs.AI

When Does LLM Self-Correction Help? A Control-Theoretic Markov Diagnostic and Verify-First Intervention

Aofan Liu, Jingxiang Meng

2604.22266 2026-04-27 cs.CL

Large Language Models Decide Early and Explain Later

Ayan Datta, Zhixue Zhao, Bhuvanesh Verma, Radhika Mamidi, Mounika Marreddy, Alexander Mehler

2604.22261 2026-04-27 cs.CL

Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

Fahmida Alam, Mihai Surdeanu, Ellen Riloff

2604.22260 2026-04-27 cs.CV cs.AI

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

Wenhui Huang, Songyan Zhang, Collister Chua, Yang Liang, Zhiqi Mao, Heng Yang, Chen Lv

详情

英文摘要

Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception and reasoning in intelligent transportation systems (ITS), existing research remains largely centered on microscopic autonomous driving (AD), with limited attention to city-scale traffic analysis. In particular, open-ended safety-oriented visual question answering (VQA) and corresponding foundation models for reasoning over heterogeneous roadside camera observations remain underexplored. To address this gap, we introduce the Land Transportation Dataset (LTD), a large-scale open-source vision-language dataset for open-ended reasoning in urban traffic environments. LTD contains 11.6K high-quality VQA pairs collected from heterogeneous roadside cameras, spanning diverse road geometries, traffic participants, illumination conditions, and adverse weather. The dataset integrates three complementary tasks: fine-grained multi-object grounding, multi-image camera selection, and multi-image risk analysis, requiring joint reasoning over minimally correlated views to infer hazardous objects, contributing factors, and risky road directions. To ensure annotation fidelity, we combine multi-model vision-language generation with cross-validation and human-in-the-loop refinement. Building upon LTD, we further propose UniVLT, a transportation foundation model trained via curriculum-based knowledge transfer to unify microscopic AD reasoning and macroscopic traffic analysis within a single architecture. Extensive experiments on LTD and multiple AD benchmarks demonstrate that UniVLT achieves SOTA performance on open-ended reasoning tasks across diverse domains, while exposing limitations of existing foundation models in complex multi-view traffic scenarios.

URL PDF HTML ☆

赞 0 踩 0

2604.22258 2026-04-27 cs.LG cs.AI

Protect the Brain When Treating the Heart: A Convolutional Neural Network for Detecting Emboli

Andrea Angino, Ken Trotti, Diego Ulisse Pizzagalli, Rolf Krause, Tiziano Torre, Stefanos Demertzis

Comments Corresponding authors: Andrea Angino and Diego Ulisse Pizzagalli

2604.22254 2026-04-27 cs.LG cs.MA

Fast Neural-Network Approximation of Active Target Search Under Uncertainty

Bilal Yousuf, Zsofia Lendek, Lucian Busoniu

2604.22244 2026-04-27 cs.RO

Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems

Aayushi Shrivastava, Kartik Nagpal, Sairam Jinkala, Jean-Baptiste Bouvier, Negar Mehr

2604.22240 2026-04-27 cs.CV

OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space

Zhuding Liang, Tianyi Yan, Dubing Chen, Jiasen Zheng, Huan Zheng, Cheng-zhong Xu, Yida Wang, Kun Zhan, Jianbing Shen

2604.22239 2026-04-27 cs.CL cs.AI

Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

Zhanli Li, Yixuan Cao, Lvzhou Luo, Ping Luo

Comments Findings of ACL 2026. The camera-ready version corrects some labeling errors. The accompanying repository is continuously updated based on community feedback; for the most up-to-date implementation and results, please refer to the repository

2604.22237 2026-04-27 cs.CL cs.AI

Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis

Zhilin Fan, Deliang Wang, Penghe Chen, Yu Lu

Comments This paper has been accepted in AIED2026

2604.22235 2026-04-27 cs.RO cs.AI cs.LG

Learning-augmented robotic automation for real-world manufacturing

Yunho Kim, Quan Nguyen, Taewhan Kim, Youngjin Heo, Joonho Lee

2604.22229 2026-04-27 cs.LG cs.AI

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Zhancun Mu, Guangyu Zhao, Yiwu Zhong, Chi Zhang

Comments 17 pages, 4 figures

2604.22226 2026-04-27 cs.CV

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

Siyu Cao, Lu Zhang, Ruizhe Zeng, Zhi-yong Liu

2604.22225 2026-04-27 cs.CL eess.AS

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis

Xi Wang, Jie Wang, Xingchen Song, Baijun Song, Jingran Xie, Jiahe Shao, Zijian Lin, Di Wu, Meng Meng, Jian Luan, Zhiyong Wu

Comments Submitted to Interspeech 2026

2604.22220 2026-04-27 cs.CV

Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework

Chunpeng Wang, Binyan Qu, Xiaoyu Wang, Zhiqiu Xia, Shanshan Zhang, Yunan Liu, Qi Li