arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.01264 2026-05-05 cs.SE cs.LG

FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback

Kushal Jasti, Tejamani Prashanth Sahu, Rishitha Pentyala, Muvvala Mohit, Vivek Yelleti

详情

英文摘要

Traditional approaches to test case generation often involve manual effort and incur significant computational overhead. Additionally, these approaches are not scalable, and hence, unsuitable for complex software systems. Recently, Large Language Models (LLMs) have been applied to software testing. However, single-shot prompt engineering-based approaches tend to hallucinate and generate redundant test cases, resulting in fewer branches. To handle the above-mentioned limitations, in this paper, we propose FeedbackLLM, a novel automated language-agnostic test case generation framework based on tightly coupled two-stage approach. In the first stage, FeedbackLLM extracts the input constraints by parsing source code and generates the possible test cases. The quality of the test cases is evaluated in the second stage by the following two specialized LLM feedback agents: (i) Line Feedback Agent: extracts the metadata related to missed line executions and (ii) Branch Feedback Agent: extracts the metadata of the unexecuted branch conditions. The above agents operate in a two-stage process, communicating in tandem, and this procedure is repeated for k-steps. Further, we also introduced a redundancy prevention cache to avoid duplicate API requests and avoid unnecessary execution cycles. The performance of the proposed architecture is evaluated on the standard benchmark programs related to C and Python programs. FeedbackLLM demonstrated more line and branch coverage than baseline tools while scaling linearly in execution time.

URL PDF HTML ☆

赞 0 踩 0

2605.01263 2026-05-05 cs.DS cs.LG

New Bounds for Kernel Sums via Fast Spherical Embeddings

Tal Wagner

Comments ICML 2026

2605.01251 2026-05-05 cs.HC cs.RO

What Does a Meow Mean? In Search of Intuitively Understandable Communication by a Nonverbal Companion Robot

Vivienne Bihe Chi, Claudia B. Rébola, Bertram F. Malle

Comments To appear in the Proceedings of the 18th International Conference on Social Robotics (ICSR 2026)

2605.01245 2026-05-05 cs.HC cs.AI

The Garden of Forking Paths: Narrative Arc-Conditioned Gameplay Planning

Yunge Wen, Chenliang Huang, Hangyu Zhou, Zhuo Zeng, Chun Ming Louis Po, Julian Togelius, Timothy Merino, Sam Earle

2605.01238 2026-05-05 cs.HC cs.CV

EduGage: Methods and Dataset for Sensor-Based Momentary Assessment of Engagement in Self-Guided Video Learning

Zikang Leng, Edan Eyal, Yingtian Shi, Jiaman He, Yaqi Liu, Thomas Plötz

2605.01219 2026-05-05 cs.MM cs.CV cs.SD eess.IV

Multimodal Confidence Modeling in Audio-Visual Quality Assessment

Mayesha Maliha R. Mithila, Mylene C. Q. Farias

Comments Accepted at ICIP 2026, 6 pages, 4 figures, no supplementary material

2605.01170 2026-05-05 physics.app-ph cs.RO

A skin-like conformal sensor for real-time shape mapping

Kaiping Yin, Sooik Im, Chaorui Qiu, Yun Bai, Xiangyu Lu, Chenhang Li, Junjie Yao, Xiaoyue Ni

Comments 13 pages, 5 figures

2605.01163 2026-05-05 cs.IR cs.LG

Multimodal Data Curation Through Ranked Retrieval

Pratyush Muthukumar, Harshil Kotamreddy, Sarah Amiraslani, Tomo Kanazawa, Ramani Akkati, Shaan Jain, Andrew Mathau

Comments ICLR DATA-FM 2026

2605.01160 2026-05-05 cs.SE cs.AI

The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development

Sabry E. Farrag

Comments 30 pages, 4 tables, 1 figure, 71 references

2605.01133 2026-05-05 cs.CR cs.LG cs.MA

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Lingxi Zhang, Guangtao Zheng, Hanjie Chen

2605.01104 2026-05-05 cs.SE cs.CL cs.HC

RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions

Keyu He, Qianou Ma, Valerie Chen, Wayne Chi, Tongshuang Wu

2605.01091 2026-05-05 cs.CY cs.AI cs.MA

Governing What the EU AI Act Excludes: Accountability for Autonomous AI Agents in Smart City Critical Infrastructure

Talal Ashraf Butt, Muhammad Iqbal, Razi Iqbal

Comments 24 pages, 3 figures, 8 tables. Submitted to Computer Law & Security Review

2605.01078 2026-05-05 cs.CR cs.AI

A Sentence Relation-Based Approach to Sanitizing Malicious Instructions

Soumil Datta, Melissa Umble, Daniel S. Brown, Guanhong Tao

2605.01074 2026-05-05 cs.NE cs.LG

Benchmarking local Hebbian learning rules for memory storage and prototype extraction

Anders Lansner, Andreas Knoblauch, Naresh B Ravichandran, Pawel Herman

Comments 31 pages, 9 + 2 suppl figures, 5 tables

2605.01072 2026-05-05 hep-th cs.LG

Reconstructing conformal field theoretical compositions with Transformers

Haotian Cao, Garrett Merz, Kyle Cranmer, Gary Shiu

2605.01060 2026-05-05 cs.DC cs.LG

SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

Shashank Kapadia, Deep Narayan Mishra, Sujal Reddy Alugubelli, Ajay Kumar, Swapnil Yadav, Rishi Bhatia

Comments 15 pages, 10 figures, 11 tables

2605.01055 2026-05-05 cs.DC cs.AI

SCION: Size-aware Policy Orchestration for Nonstationary Object Caches (Long Paper Version)

Qizhi Wang

Comments 17 pages, 4 figures, 26 tables. Code repository: https://github.com/Icemap/SCION

2605.01047 2026-05-05 cs.CR cs.AI cs.CL cs.LG

LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning

Joseph Spracklen, Pedram Aghazadeh, Farinaz Koushanfar, Murtuza Jadliwala

详情

英文摘要

Hallucinations, outputs that sound plausible but are factually incorrect, remain an open challenge for deployed LLMs. In code generation, models frequently hallucinate non-existent software packages, recommending imports and installation commands for fictional libraries. This creates a critical supply-chain vulnerability: an attacker can proactively register such packages on public registries with malicious payloads that are subsequently installed and executed by developers or autonomous agents, a class of package confusion attack known as slopsquatting. Once a model is deployed, mitigating this failure mode is difficult: full retraining is costly, and existing approaches either cause severe degradation of model utility or rely on a pre-specified forget-set, an assumption that does not apply to the unbounded space of hallucinations. To address this problem, we present Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses hallucinations while preserving general model utility. AU introduces a hybrid token-level objective that simultaneously reinforces valid outputs and suppresses hallucinated ones. Combined with an adaptive discovery loop that continuously surfaces new hallucination-inducing contexts without human supervision, AU enables generalization to unseen prompts and hallucinations. We demonstrate that AU reduces package hallucination rates by 81%, corresponding to a substantial reduction in slopsquatting attack surface, while maintaining performance on standard coding benchmarks. Our analysis shows that distributional changes are concentrated on package-related generations, leaving general coding behavior largely unaffected and confirming that AU's effect is isolated to the targeted distribution. AU operates entirely on model-generated data, requires no human annotation, and generalizes across domains.

URL PDF HTML ☆

赞 0 踩 0

2605.01040 2026-05-05 cs.CE cs.LG

Differentiable Multiphysics Co-Optimization via Implicit Neural Representations: A Transient Hamburger-Cooking Benchmark

Navid Zobeiry

Comments Preprint. 24 pages, 5 figures

2605.01003 2026-05-05 stat.ME cs.LG eess.SP

Pi-Change: A Prior-Informed Multiple Change Point Detection Algorithm

Jonathon Jacobs, Shanshan Chen

2605.00974 2026-05-05 cs.CR cs.CL

SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

Jindong Li, Ying Liu, Yali Fu, Jinjing Zhu, Leyao Wang, Menglin Yang, Rex Ying

详情

英文摘要

LLMs are increasingly equipped with safety alignment mechanisms, yet recent studies demonstrate that they remain vulnerable to jailbreaking attacks that elicit harmful behaviors without explicit policy violations. While a growing body of work has explored automated jailbreak strategies, existing methods face several fundamental challenges, including the lack of systematic utilization of both successful and failed attack experiences, as well as the absence of principled mechanisms for composing and selecting reusable attack rules under diverse constraints. As a result, existing methods struggle to accumulate transferable knowledge over time and to reliably adapt attack strategies across different targets and evolving safety mechanisms. To address these issues, we propose a Self-Evolving Rule-Driven Training-Free Jailbreak (SRTJ) framework that systematically discovers, composes, and refines attack strategies through interaction and feedback, without updating model parameters. Specifically, SRTJ couples experience-driven attack generation with answer set programming (ASP)-based rule selection and constraint-aware composition, where iterative verifier feedback is leveraged to jointly refine successful strategies and analyze failure patterns. The resulting rule memory evolves in a hierarchical multi-level manner, explicitly organizing distilled attack knowledge into long-term, middle-term, and short-term rules, thereby capturing both stable transferable strategies and transient adaptive behaviors to effectively balance exploration and exploitation across attack attempts. Extensive experiments on mainstream jailbreak benchmark (HarmBench) demonstrate that SRTJ achieves strong and stable attack performance across different target LLMs, while exhibiting improved robustness and generalization compared to existing jailbreak methods. The code is available at https://github.com/TheSolkatt/SRTJ.

URL PDF HTML ☆

赞 0 踩 0

2605.00972 2026-05-05 physics.data-an cs.AI cs.CV cs.IR

Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration

Nihanth W. Cherukuru, Matt Rehme, Kirsten J. Mayer, David John Gagne, John Schreck, John Clyne, Charlie Becker

Comments 5 pages, 3 figures, Preprint

2605.00971 2026-05-05 eess.IV cs.CV

Reconstruction Interval Z-Phase Dependence of AI Detection Sensitivity in CT Lung Nodule Screening

Dan Soliman

2605.00968 2026-05-05 eess.SP cs.AI

Adaptive 3D-RoPE: Physics-Aligned Rotary Positional Encoding for Wireless Foundation Models

Chenyu Zhang, Xinchen Lyu, Chenshan Ren, Shuhan Liu, Qimei Cui

Comments 13 pages, 7 figures

2605.00964 2026-05-05 cs.IR cs.AI cs.HC

Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?

Lennard C. Froma, Tom Kouwenhoven, Maaike H. T. de Boer, Catholijn M. Jonker, Max J. van Duijn

2605.00957 2026-05-05 cs.IR cs.AI

"I Don't Know" -- Towards Appropriate Trust with Certainty-Aware Retrieval Augmented Generation

Daan Di Scala, Maaike de Boer, Pınar Yolum

Comments To be published in VALE 2025 Proceedings

2605.00955 2026-05-05 cs.CR cs.AI

E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems

Zelin Guan, Shengda Zhuo, Zeyan Li, Jinchun He, Wangjie Qiu, Zhiming Zheng, Shuqiang Huang

2605.00948 2026-05-05 q-bio.QM cs.AI

Co-Generative De Novo Functional Protein Design

Xinrui Chen, Yizhen Luo, Siqi Fan, Zaiqing Nie

2605.00944 2026-05-05 cs.IR cs.AI cs.CL

SCARV: Structure-Constrained Aggregation for Stable Sample Ranking in Redundant NLP Datasets

Xu Zheng, Feiyu Wu, Linhong Wu, Zhuocheng Wang, Hui Li

2605.00942 2026-05-05 cs.SE cs.LG

PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation

Gourisetty Venkata Sai Koushik, Dama Aditya, Mahankali Harish Sai, Peddi Siddarhta, Shadab Ahmad, Vivek Yelleti