arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2506.14387 2026-04-22 cs.AI

SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane

详情

英文摘要

Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned epistemic abstention: the ability to acknowledge when the model does not know. This failure mode is especially concerning in high-stakes settings, where abstention is a critical safeguard against hallucination. We present SEAT, a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. SEAT combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. Crucially, SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment, making it attractive for lightweight and privacy-sensitive adaptation. Across models and datasets, SEAT improves human-evaluated abstention on unknown queries by 18%-101% over the strongest baseline while retaining near-perfect target knowledge acquisition, and produces coherent, context-aware abstentions after tuning. Further analyses show that both components are essential, that SEAT more cleanly separates known from unknown queries in representation space, and that it preserves downstream utility. These results identify preservation of epistemic abstention as a core objective for safe knowledge adaptation.

URL PDF HTML ☆

赞 0 踩 0

2506.06211 2026-04-22 cs.CL cs.AI cs.CV

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

Hengzhi Li, Justin Zhang, Brendon Jiang, Alexander Naehu, Regan Song, Megan Tjandrasuwita, Chanakya Ekbote, Steven-Shine Chen, Adithya Balachandran, Wei Dai, Rebecca Chang, Paul Pu Liang

2506.01687 2026-04-22 cs.CL

StochasTok: Improving Fine-Grained Subword Understanding in LLMs

Anya Sims, Thom Foster, Klara Kaleb, Tuan-Duy H. Nguyen, Joseph Lee, Jakob N. Foerster, Yee Whye Teh, Cong Lu

2505.22176 2026-04-22 cs.CL

TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation

Vihang Pancholi, Jainit Bafna, Tejas Anvekar, Manish Shrivastava, Vivek Gupta

Comments Accepeted for Findings at ACL 2025

2505.21410 2026-04-22 cs.AI cs.LG cs.RO

MRS: Multi-Resolution Skills for HRL Agents

Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

2505.21242 2026-04-22 cs.CL

Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings

Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly

Comments 16 pages. Accepted for publication in the Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

2505.20816 2026-04-22 cs.CL

Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective

Krishna Singh Rajput, Tejas Anvekar, Chitta Baral, Vivek Gupta

2505.20279 2026-04-22 cs.CV cs.CL

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Zhiwen Fan, Jian Zhang, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, Kevin Wang, Huaizhi Qu, Shijie Zhou, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Tianlong Chen, Jiachen Li, Zhengzhong Tu, Zhangyang Wang, Rakesh Ranjan

Comments Project Page: https://vlm-3r.github.io/

2505.14137 2026-04-22 cs.AI

Memory Assignment for Finite-Memory Strategies in Adversarial Patrolling Games

Vojtěch Kůr, Vít Musil, Vojtěch Řehák

Comments Extended version of a paper accepted at the International Conference on Automated Planning and Scheduling (ICAPS 2026)

2505.06335 2026-04-22 cs.LG cs.AI cs.CR

Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients

Jinsheng Yuan, Yuhang Hao, Weisi Guo, Yun Wu, Chongyan Gu

Comments Under review for IEEE Transactions on Dependable and Secure Computing

2503.16683 2026-04-22 cs.CV cs.AI

GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations

Zeping Liu, Ni Lao, Zhangyu Wang, Junfeng Jiao, Gengchen Mai

Comments Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

2503.16251 2026-04-22 cs.LG cs.CV cs.DC cs.ET

RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility

Dawood Wasif, Terrence J. Moore, Jin-Hee Cho

Comments Accepted at ICLR 2026; camera-ready version

2503.07259 2026-04-22 cs.CV cs.AI cs.LG cs.MM

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

Baiyu Chen, Wilson Wongso, Zechen Li, Yonchanok Khaokaew, Hao Xue, Flora Salim

Comments IMWUT/UbiComp 2026

2503.06717 2026-04-22 cs.CV

You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging

Wentian Xu, Ziyun Liang, Harry Anthony, Yasin Ibrahim, Felix Cohen, Guang Yang, Konstantinos Kamnitsas

Comments Accepted at ICLR 2026

2502.16161 2026-04-22 cs.CV cs.CL

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai

Comments Accepted by IEEE TPAMI

2502.09741 2026-04-22 cs.CL cs.LG

FoNE: Precise Single-Token Number Embeddings via Fourier Features

Tianyi Zhou, Deqing Fu, Mahdi Soltanolkotabi, Robin Jia, Vatsal Sharan

2412.01782 2026-04-22 cs.CV cs.AI

Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

Young-Jin Park, Carson Sobolewski, Navid Azizan

详情

英文摘要

DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly important for safety-critical applications, such as in autonomous vehicles. Addressing this concern, we provide empirical and theoretical evidence that predictions within the same image play distinct roles, resulting in varying reliability levels. Our analysis reveals that DETRs employ an optimal specialist strategy: one prediction per object is trained to be well-calibrated, while the remaining predictions are trained to suppress their foreground confidence to near zero, even when maintaining accurate localization. We show that this strategy emerges as the loss-minimizing solution to the Hungarian matching, fundamentally shaping DETRs' outputs. While selecting the well-calibrated predictions is ideal, they are unidentifiable at inference time. This means that any post-processing algorithm poses a risk of outputting a set of predictions with mixed calibration levels. Therefore, practical deployment necessitates a joint evaluation of both the model's calibration quality and the effectiveness of the post-processing algorithm. However, we demonstrate that existing metrics like average precision and expected calibration error are inadequate for this task. To address this issue, we further introduce Object-level Calibration Error (OCE): This object-centric design penalizes both retaining suppressed predictions and missed ground truth foreground objects, making OCE suitable for both evaluating models and identifying reliable prediction subsets. Finally, we present a post hoc uncertainty quantification framework that predicts per-image model accuracy.

URL PDF HTML ☆

赞 0 踩 0

2411.18275 2026-04-22 cs.CV

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, Xianglong Liu

Comments Accepted by Machine Intelligence Research

2411.09887 2026-04-22 cs.RO

Planning by Simulation: Motion Planning with Learning-based Parallel Scenario Prediction for Autonomous Driving

Tian Niu, Kaizhao Zhang, Zhongxue Gan, Wenchao Ding

2410.03294 2026-04-22 cs.LG

Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs

Tianheng Ling, Chao Qian, Gregor Schiele

Comments 20 pages, 8 figures, 6 tables, accepted by the 21st EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous2024)

2409.09451 2026-04-22 cs.CV cs.LG

On the Generalizability of Foundation Models for Crop Type Mapping

Yi-Chia Chang, Adam J. Stewart, Favyen Bastani, Piper Wolters, Shreya Kannan, George R. Huber, Jingtong Wang, Arindam Banerjee

Comments Accepted to IEEE IGARSS 2025. The final version is available in the Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2025

2408.09030 2026-04-22 cs.CL cs.HC

Effects of Collaboration on the Performance of Interactive Theme Discovery Systems

Alvin Po-Chun Chen, Rohan Das, Dananjay Srinivas, Alexandra Barry, Maksim Seniw, Maria Leonor Pacheco

2407.11041 2026-04-22 cs.LG cs.AI

Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT

Tianheng Ling, Chao Qian, Gregor Schiele

Comments 7 pages, 3 figures, 4 tables, accepted by 2024 IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT) and got best paper award

2406.18344 2026-04-22 cs.CV

AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space

Huzheng Yang, James Gee, Jianbo Shi

2405.13071 2026-04-22 cs.CL cs.AI cs.SI

A Novel Method for News Article Event-Based Embedding

Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach

2405.09806 2026-04-22 cs.CV cs.AI cs.CL cs.LG

A Generalist Model for Diverse Text-Guided Medical Image Synthesis

Joseph Cho, Mrudang Mathur, Cyril Zakka, Dhamanpreet Kaur, Matthew Leipzig, Alex Dalal, Aravind Krishnan, Eubee Koo, Karen Wai, Cindy S. Zhao, Akshay Chaudhari, Matthew Duda, Ashley Choi, Ehsan Rahimy, Lyna Azzouz, Robyn Fong, Rohan Shad, William Hiesinger

2403.10559 2026-04-22 cs.LG cs.AI cs.RO

Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI

Bo Shu, Yiting Zhang, Saisai Hu, Dong Shu

2403.09905 2026-04-22 cs.RO cs.CV

Personalized Embodied Navigation for Portable Object Finding

Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

Comments 10 pages

2310.10865 2026-04-22 cs.CL

Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts

Christina Chance, Da Yin, Dakuo Wang, Kai-Wei Chang

2604.19663 2026-04-22 cs.IR cs.LG

From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

Quang-Huy Nguyen, Thanh-Hai Nguyen, Khac-Manh Thai, Duc-Hoang Pham, Huy-Son Nguyen, Cam-Van Thi Nguyen, Masoud Mansoury, Duc-Trong Le, Hoang-Quynh Le

详情

DOI: 10.1145/3805712.3808574

英文摘要

Counterfactual explanations (CEs) provide an intuitive way to understand recommender systems by identifying minimal modifications to user-item interactions that alter recommendation outcomes. Existing CE methods for recommender systems, however, have been evaluated under heterogeneous protocols, using different datasets, recommenders, metrics, and even explanation formats, which hampers reproducibility and fair comparison. Our paper systematically reproduces, re-implement, and re-evaluate eleven state-of-the-art CE methods for recommender systems, covering both native explainers (e.g., LIME-RS, SHAP, PRINCE, ACCENT, LXR, GREASE) and specific graph-based explainers originally proposed for GNNs. Here, a unified benchmarking framework is proposed to assess explainers along three dimensions: explanation format (implicit vs. explicit), evaluation level (item-level vs. list-level), and perturbation scope (user interaction vectors vs. user-item interaction graphs). Our evaluation protocol includes effectiveness, sparsity, and computational complexity metrics, and extends existing item-level assessments to top-K list-level explanations. Through extensive experiments on three real-world datasets and six representative recommender models, we analyze how well previously reported strengths of CE methods generalize across diverse setups. We observe that the trade-off between effectiveness and sparsity depends strongly on the specific method and evaluation setting, particularly under the explicit format; in addition, explainer performance remains largely consistent across item level and list level evaluations, and several graph-based explainers exhibit notable scalability limitations on large recommender graphs. Our results refine and challenge earlier conclusions about the robustness and practicality of CE generation methods in recommender systems: https://github.com/L2R-UET/CFExpRec.

URL PDF HTML ☆

赞 0 踩 0