arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.17771 2026-04-21 cs.CL cs.AI cs.DB

SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

Mohammadtaher Safarzadeh, Hitesh Laxmichand Patel, Afshin Orojlooyjadid, Graham Horwood, Dan Roth

Comments ACL 2026 Main Conference

详情

英文摘要

Large language models (LLMs) have achieved strong performance on natural language to SQL (NL2SQL) benchmarks, yet their reported accuracy may be inflated by contamination from benchmark queries or structurally similar patterns seen during training. We introduce SPENCE (Syntactic Probing and Evaluation of NL2SQL Contamination Effects), a controlled syntactic probing framework for detecting and quantifying such contamination. SPENCE systematically generates syntactic variants of test queries for four widely used NL2SQL datasets-Spider, SParC, CoSQL, and the newer BIRD benchmark. We use SPENCE to evaluate multiple high-capacity LLMs under execution-based scoring. For each model, we measure changes in execution accuracy across increasing levels of syntactic divergence and quantify rank sensitivity using Kendall's tau with bootstrap confidence intervals. By aligning these robustness trends with benchmark release dates, we observe a clear temporal gradient: older benchmarks such as Spider exhibit the strongest negative values and thus the highest likelihood of training leakage, whereas the more recent BIRD dataset shows minimal sensitivity and appears largely uncontaminated. Together, these findings highlight the importance of temporally contextualized, syntactic-probing evaluation for trustworthy NL2SQL benchmarking.

URL PDF HTML ☆

赞 0 踩 0

2604.17770 2026-04-21 cs.LG

LLM-AUG: Robust Wireless Data Augmentation with In-Context Learning in Large Language Models

Pranshav Gajjar, Manan Tiwari, Sayanta Seth, Vijay K. Shah

2604.17769 2026-04-21 cs.CL cs.AI

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

Yuan Fang, Yiming Luo, Aimin Zhou, Fei Tan

Comments Accepted to Findings of ACL 2026. 10 pages, 6 figures. Code and data available at https://github.com/ZeroLoss-Lab/R-CAI

2604.17768 2026-04-21 cs.AI

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

Xiaohan Zou, Roshan Sridhar, Mohammadtaher Safarzadeh, Dan Roth

Comments Accepted at ACL 2026 Main Conference

2604.17761 2026-04-21 cs.AI cs.CL

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Rongyuan Tan, Jue Zhang, Zhuozhao Li, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

Comments 45 pages, 16 figures, 16 tables

2604.17753 2026-04-21 cs.AI cs.CL cs.CV

Evolutionary Negative Module Pruning for Better LoRA Merging

Anda Cao, Zhuo Gou, Yi Wang, Kaixuan Chen, Yu Wang, Can Wang, Mingli Song, Jie Song

Comments Accepted to ACL 2026 (main conference)

2604.17751 2026-04-21 cs.LG cs.CL

HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation

Lixian Chen, Jianhong Tan

2604.17749 2026-04-21 cs.CV

Ego-InBetween: Generating Object State Transitions in Ego-Centric Videos

Mengmeng Ge, Takashi Isobe, Xu Jia, Yanan Sun, Zetong Yang, Weinong Wang, Dong Zhou, Dong Li, Huchuan Lu, Emad Barsoum

Comments CVPR2026

2604.17748 2026-04-21 cs.CV

Source-Free Domain Adaptation with Vision-Language Prior

Song Tang, Yunxiang Bai, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

2604.17747 2026-04-21 cs.LG

Efficient Federated RLHF via Zeroth-Order Policy Optimization

Deyi Wang, Qining Zhang, Lei Ying

2604.17738 2026-04-21 cs.CL

Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data

Zhaohua Liang, Zhilin Wang, Renjie Cao, Yining Zhang

2604.17734 2026-04-21 cs.CV

Score-Based Matching with Target Guidance for Cryo-EM Denoising

Xiaoqi Wu, Xueying Zhan, Wen Li, Junhao Wu, Xin Huang, Min Xu

2604.17730 2026-04-21 cs.CL cs.AI cs.HC

MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models

Suhyun Lee, Palakorn Achananuparp, Neemesh Yadav, Ee-Peng Lim, Yang Deng

Comments Accepted to ACL 2026 Findings

2604.17727 2026-04-21 cs.CV cs.AI

Voronoi-guided Bilateral 2D Gaussian Splatting for Arbitrary-Scale Hyperspectral Image Super-Resolution

Jie Zhang, Jinkun You, Shi Chen, Yicong Zhou

2604.17725 2026-04-21 cs.CL cs.AI

RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models

Arya Hadizadeh Moghaddam, Drew Ross, Mohsen Nayebi Kerdabadi, Dongjie Wang, Zijun Yao

Comments Finding of ACL 2026 - Accepted Paper

2604.17721 2026-04-21 cs.CV cs.AI

GeGS-PCR: Effective and Robust 3D Point Cloud Registration with Two-Stage Color-Enhanced Geometric-3DGS Fusion

Jiayi Tian, Haiduo Huang, Tian Xia, Wenzhe Zhao, Pengju Ren

2604.17720 2026-04-21 cs.LG cs.CV

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching

Yuzhe Fu, Hancheng Ye, Cong Guo, Junyao Zhang, Qinsi Wang, Yueqian Lin, Changchun Zhou, Hai, Li, Yiran Chen

Comments Accepted to DAC'26

2604.17718 2026-04-21 cs.CL cs.SI

Do LLMs Use Cultural Knowledge Without Being Told? A Multilingual Evaluation of Implicit Pragmatic Adaptation

Mehwish Nasim, Sanjeevan Selvaganapathy, Neel Ganapathi Sabhahit, Marie Griesbach, Pranav Bhandari, Janina Lütke Stockdiek, Lennart Schäpermeier, Usman Naseem, Christian Grimme

2604.17716 2026-04-21 cs.CL cs.AI cs.LG

Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction

Jon-Paul Cacioli

Comments 11 pages, 4 figures, 2 tables. Companion to arXiv:2604.15702

2604.17714 2026-04-21 cs.CL cs.AI

Screen Before You Interpret: A Portable Validity Protocol for Benchmark-Based LLM Confidence Signals

Jon-Paul Cacioli

Comments 25 pages, 6 figures, 8 tables, 2 appendices. Companion to arXiv:2604.15702

2604.17713 2026-04-21 cs.LG

Modeling Higher-Order Brain Interactions via a Multi-View Information Bottleneck Framework for fMRI-based Psychiatric Diagnosis

Kunyu Zhang, Qiang Li, Vince D. Calhoun, Shujian Yu

2604.17710 2026-04-21 cs.CV

Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

Jiangnan Li, Linqing Huang, Xiaowen Yan, Min Gan, Wenpeng Lu, Jinfu Fan

Comments Accepted by ICME 2026 (IEEE International Conference on Multimedia and Expo)

2604.17707 2026-04-21 cs.CL cs.AI

Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report

Jon-Paul Cacioli

Comments 14 pages, 6 figures. Companion to arXiv:2604.15702

2604.17696 2026-04-21 cs.AI

Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Xiachong Feng, Deyi Yin, Xiaocheng Feng, Yi Jiang, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Qiming Li, Yuxuan Gu, Bing Qin, Lingpeng Kong

Comments ACL 2026 Main

2604.17695 2026-04-21 cs.LG cs.CL

MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression

Libo Sun, Peixiong He, Po-Wei Harn, Xiao Qin

Comments 9 pages, 3 figures, 6 tables

2604.17691 2026-04-21 cs.LG cs.AI

SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments 16 pages (12 main + 4 appendix), 2 figures, 12 tables

2604.17688 2026-04-21 cs.CV

Dual-stream Spatio-Temporal GCN-Transformer Network for 3D Human Pose Estimation

Jiawen Duan, Jian Xiang, Zhiqiang Li, Linlin Xue, Wan Xiang

Comments Published in Displays, Vol. 93, 2026, Article 103429. DOI: https://doi.org/10.1016/j.displa.2026.103429 Free access: https://authors.elsevier.com/a/1mnPTWHUHYdGQ

2604.17679 2026-04-21 cs.RO

A Hamilton-Jacobi Reachability-Guided Search Framework for Efficient and Safe Indoor Planar Robot Navigation

Hanyang Hu, Cameron Siu, Mo Chen

2604.17677 2026-04-21 cs.AI

Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems

Nick Loghmani

Comments 34 pages, 5 Figures, 1 table

2604.17674 2026-04-21 cs.CL cs.AI

Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts

Moinul Hossain, Sourav Rabi Das, Zikrul Shariar Ayon, Sadia Afrin Promi, Ahnaf Atef Choudhury, Shakila Rahman, Jia Uddin