arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
专题追踪
2606.03876 2026-06-03 cs.HC cs.AI cs.MA

From 'What' to 'How' and 'Why': Sharing LLM-Generated Retrospective Summaries of Older Adults' Passive Tracking Data with Remote Family Members

从“是什么”到“怎么样”和“为什么”:与远程家庭成员共享老年人被动追踪数据的LLM生成回顾性摘要

Jiachen Li, Reina Szeyi Chan, Akshat Choube, Xiang Zhi Tan, Elizabeth Mynatt, Varun Mishra

发表机构 * Northeastern University(东北大学)

AI总结 本研究利用大型语言模型(LLM)从多模态追踪数据生成回顾性摘要,通过技术探针和访谈重新设计系统,显著提升了远程家庭成员对摘要的满意度、帮助性、信任和接收意愿,并提出了支持其从“是什么”到“怎么样”和“为什么”的认知转变的设计启示。

详情
AI中文摘要

随着现代普适计算技术的日益普及,多模态追踪系统有望为远程家庭成员(RFM)等利益相关者提供及时的意识和 reassurance,这些成员在老年人护理协调中扮演核心角色。然而,将异构数据流整合为高层次、有意义的内容(如回顾性摘要)仍然具有挑战性。虽然近期工作已展示了大型语言模型(LLM)在解释多模态追踪数据方面的潜力,但针对像RFM这样拥有丰富个人知识、强烈情感责任但对老年人日常生活了解有限且照护能力受限的利益相关者生成叙事性描述的研究仍较少。在本工作中,我们探索了如何利用LLM为老年人的RFM从多模态追踪数据生成回顾性摘要。我们利用并定制了现有系统Vital Insight,在不同日期和数据可用性场景下生成初始摘要作为技术探针,并对11名RFM进行访谈以收集反馈。基于这些见解,我们将系统重新设计为一种多层、多智能体、洞察驱动的摘要方法,从客观统计和描述构建到丰富、上下文感知的叙述。随后,我们通过同一11名RFM的调查比较了重新设计的摘要与初始版本,发现满意度、感知帮助性、信任和接收意愿均有显著提升。最后,我们提出了针对RFM及更广泛场景的AI生成摘要的设计启示,强调需要支持RFM的认知转变,从简单地呈现“收集了什么数据”转向解释“我的亲人过得怎么样”和“为什么”。

英文摘要

With the growing prevalence of modern ubiquitous computing technologies, multi-modal tracking systems hold promise for providing timely awareness and reassurance to stakeholders such as remote family members (RFMs) of older adults, who play a central role in care coordination. However, combining heterogeneous data streams into high-level, meaningful content - such as retrospective summaries - remains challenging. While recent work has demonstrated the promise of large language models (LLMs) for interpreting multi-modal tracking data, less attention has been given to generating narrative accounts for stakeholders like RFMs, who possess rich personal knowledge of older adults and strong emotional responsibility, yet have limited visibility into their daily lives and limited capacity for caregiving. In this work, we explore how LLMs can be used to generate retrospective summaries from multi-modal tracking data for RFMs of older adults. We leveraged and customized an existing system, Vital Insight, to generate initial summaries on different dates and data availability scenarios as technology probes, and conducted interviews with 11 RFMs to gather feedback. Based on these insights, we redesigned the system into a multi-layer, multi-agent, insight-driven summary approach that builds from objective statistics and descriptions to enriched, context-aware narratives. We then compared the redesigned summaries with the initial versions through a survey with the same 11 RFMs and found significant improvements in satisfaction, perceived helpfulness, trust, and willingness to receive the summaries. We conclude by presenting design implications for AI-generated summaries for RFMs and broader contexts, emphasizing the need to support RFMs' sensemaking shift from simply presenting ''What'' data were collected, to explaining ''How'' is my loved one doing and ''Why''.

2606.03866 2026-06-03 cs.IR cs.AI cs.CL

Taiji: Pareto Optimal Policy Optimization with Semantics-IDs Trade-off for Industrial LLM-Enhanced Recommendation

Taiji: 面向工业LLM增强推荐的帕累托最优策略优化与语义ID权衡

Yuecheng Li, Zeyu Song, Jing Yao, Chi Lu, Peng Jiang, Kun Gai

发表机构 * Kuaishou Technology(快手科技) Unaffiliated(无隶属)

AI总结 提出Taiji框架,通过逆向工程推理和开放拒绝采样生成高质量CoT数据,并采用帕累托最优策略优化(POPO)自适应调整跨域奖励权重,实现LLM语义知识与推荐ID特征的帕累托最优权衡,在快手广告平台部署后服务超4亿日活用户。

Comments 8 pages, 2 figures

详情
AI中文摘要

通过大型语言模型(LLM)扩展推荐系统已成为工业界的显著趋势。然而,通过后训练(如SFT和RL)将LLM的语义空间与推荐系统的ID空间对齐仍然具有挑战性。现有的LLM4Rec范式受到两个主要问题的瓶颈:(1)在SFT期间,难以衡量和改进开放域推荐中的思维链(CoT)质量;(2)在RL对齐过程中,忽略了LLM语义奖励与推荐偏好奖励之间的权衡。受这些挑战启发,我们提出了Taiji,一种专为工业推荐系统设计的新型LLM-as-Enhancer框架。为了克服SFT瓶颈,我们利用逆向工程推理和开放拒绝采样生成高质量、领域特定的CoT数据。为了解决RL对齐问题,我们提出了帕累托最优策略优化(POPO),它自适应调整跨域奖励权重。理论上,它在LLM的语义世界知识与代表在线用户偏好的协同ID特征之间实现了最优权衡。大量的离线评估和在线A/B测试验证了Taiji的有效性。自2026年5月在快手广告平台部署以来,Taiji目前每天服务超过4亿用户,产生了显著的商业收入,并展示了其在网络规模环境中的强大可扩展性。

英文摘要

Scaling recommender systems via large language models (LLMs) has become a prominent trend in the industry. However, aligning the LLM's semantic space with the recommender's ID space via post-training (e.g., SFT and RL) remains challenging. Existing LLM4Rec paradigms are bottlenecked by two main issues: (1) the difficulty of measuring and improving chain-of-thought (CoT) quality in open-domain recommendation during SFT, and (2) the neglect of the trade-off between LLM semantic rewards and recommendation preference rewards during RL alignment. Inspired by these challenges, we present Taiji, a novel LLM-as-Enhancer framework designed for industrial recommender systems. To overcome the SFT bottleneck, we utilize reverse-engineered reasoning and open-ended rejection sampling to generate high-quality, domain-specific CoT data. To resolve the RL alignment issue, we propose Pareto Optimal Policy Optimization (POPO), which adaptively adjusts cross-domain reward weights. Theoretically, it achieves an optimal trade-off between the semantic world knowledge of LLMs and the collaborative ID features representing online user preferences. Extensive offline evaluations and online A/B tests validate the effectiveness of Taiji. Deployed on Kuaishou's advertising platform since May 2026, Taiji currently serves over 400 million users daily, yielding significant commercial revenue and demonstrating its robust scalability in web-scale environments.

2606.03864 2026-06-03 cs.SI cs.CY cs.DL cs.LG physics.soc-ph

Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics

基于概念网络动力学的科学突破可解释预测

Thomas Maillart, Thibaut Chataing, Ntorina Antoni, David Dosu, Paul Bagourd, Julian Jang-Jaccard, Alain Mermoud

发表机构 * Geneva School of Economics and Management, University of Geneva, Geneva, Switzerland(日内瓦经济管理学院,日内瓦大学,瑞士日内瓦) Faculty of Medicine, University of Geneva, Geneva, Switzerland(日内瓦大学医学学院,瑞士日内瓦) TU Eindhoven, The Netherlands(埃因霍温理工大学,荷兰) Open Quantum Institute, CERN, Geneva, Switzerland(开放量子研究所,欧洲核子研究中心,瑞士日内瓦) armasuisse Science + Technology, Switzerland(armasuisse 科学与技术,瑞士)

AI总结 提出一种可解释的机器学习方法,通过建模OpenAlex概念网络的演化,预测科学突破的结构前兆(研究概念之间联系的出现和增强),并利用59个特征的两阶段LightGBM模型同时预测概念对的形成和未来权重,在四个技术/生物医学领域取得优于现有方法的ROC-AUC(0.954-0.967)和可解释性。

Comments 18 pages, 10 figures, 4 tables. An earlier version was presented at Global Tech Mining Conference 2026. Code and data: https://github.com/wazaahhh/breakthroughs-forecasting

详情
AI中文摘要

我们介绍了一种可解释的机器学习方法,通过建模OpenAlex概念网络随时间演化的方式,预测科学突破的结构前兆——研究概念之间联系的出现和增强。利用59个语义和拓扑特征,一个两阶段LightGBM模型联合预测概念对的形成及其未来权重,增加了一个回归阶段,将预期强度量化到先前的链接存在预测之上。与现有技术相比,该方法同时提高了准确性和可解释性:在四个技术和生物医学领域的比较验证中,无需重新调整即可在所有时间范围内获得[0.954, 0.967]的ROC-AUC,超过了先前模型约0.90的水平,而每个预测都基于结构化的、可审计的特征,而非不透明的嵌入。分类性能高(AUC约0.95),回归保持稳定(一到五年内RMSLE为0.45至0.6)。特征归因表明,结构因素——特别是Adamic-Adar相似性和基于度的Hadamard度量——持续驱动准确性,表明与突破相关的重组出现在紧密连接的子网络中。两个专家锚定的案例——量子退火和AI赋能的量子架构——显示模型浮现出与专家预期一致的技术融合。然后,我们概述了一个三层决策架构——检测、专家翻译、机构整合——将这些预测转化为基于证据的研究战略和政策,以开放数据和可解释特征为基础。

英文摘要

We introduce an explainable machine-learning approach that forecasts the structural precursors of scientific breakthroughs -- the emergence and intensification of links between research concepts -- by modelling how OpenAlex concept networks evolve over time. Using 59 semantic and topological features, a two-stage LightGBM model jointly predicts the formation and the future weight of concept pairs, adding a regression stage that quantifies expected intensity to prior link-existence forecasts. Relative to the state of the art, the approach improves accuracy and explainability at once: comparative validation across four technology and biomedical domains yields ROC-AUC in [0.954, 0.967] at all horizons without re-tuning, exceeding the roughly 0.90 of prior models, while every forecast rests on structural, auditable features rather than opaque embeddings. Classification performance is high (AUC about 0.95) and regression remains stable (RMSLE 0.45 to 0.6 over one to five years). Feature attribution shows that structural factors -- particularly Adamic-Adar similarity and degree-based Hadamard measures -- consistently drive accuracy, suggesting that breakthrough-relevant recombinations emerge in tightly connected sub-networks. Two expert-anchored cases, quantum annealing and AI-enabled quantum architectures, show the model surfacing technological convergence consistent with expert expectations. We then outline a three-layer decision architecture -- detection, expert translation, institutional integration -- that turns these forecasts into evidence-based research strategy and policy, anchored in open data and explainable features.

2606.03852 2026-06-03 cs.SE cs.AI

FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement

FLARE: 面向大语言模型代码精炼的细粒度诊断反馈

Yinsheng Yao, Hongxiang Zhang, Weixi Tong, Tianyi Zhang

发表机构 * Tongji University(同济大学) Purdue University(普渡大学)

AI总结 提出FLARE框架,利用轻量级诊断模型预测行级可疑信号进行缺陷定位和代码精炼,通过Top-K候选搜索提升修复效果。

详情
AI中文摘要

大型语言模型生成的代码常含有错误。现有方法依赖测试失败和自批评等反馈信号来迭代精炼生成的代码,但这些信号要么过于粗粒度,要么过于高层,不足以告知模型何处需要修复。在本工作中,我们提出了Flare,一个迭代框架,配备轻量级诊断模型,用于预测行级可疑信号以进行缺陷定位和代码精炼。鉴于诊断预测固有的不确定性,Flare搜索前K个最可疑区域,并根据执行结果选择最佳候选。在LiveCodeBench和BigCodeBench上使用五个基础LLM的实验表明,即使没有候选搜索(k=1),Flare也以1.72%到7.42%的绝对提升优于最强基线。此外,与无候选搜索相比,搜索10个候选平均提升8.50%。单独评估时,我们的轻量级诊断模型与最近的缺陷定位方法相比取得了最佳性能,表明它能提供可靠的细粒度代码精炼指导。

英文摘要

Large language models often generate code with bugs. Existing methods rely on feedback signals such as test failures and self-critiques to iteratively refine the generated code. Such signals are either too coarse-grained or too high-level, which is not sufficient to inform the model where to fix the bug. In this work, we present Flare, an iterative framework with a lightweight diagnostic model that predicts line-level suspiciousness signals for bug localization and code refinement. Given the inherent uncertainty of diagnostic predictions, Flare searches over the top-k suspicious regions and selects the best candidate according to execution outcomes. Experiments on LiveCodeBench and BigCodeBench with five base LLMs show that, even without candidate search (k=1), Flare outperforms the strongest baseline with an absolute improvement from 1.72% to 7.42%. Furthermore, searching over 10 candidates yields an average improvement of 8.50% compared with no candidate search. When evaluated in isolation, our lightweight diagnostic model achieves the best performance compared with recent fault localization methods, demonstrating that it can provide reliable fine-grained guidance for code refinement.

2606.03811 2026-06-03 cs.CR cs.AI cs.LG

AI Agents Enable Adaptive Computer Worms

AI代理实现自适应计算机蠕虫

Jonas Guan, Tom Blanchard, Hanna Foerster, Hengrui Jia, Gabriel Huang, Nicolas Papernot

发表机构 * University of Toronto(多伦多大学) Vector Institute(向量研究所) University of Cambridge(剑桥大学) ServiceNow

AI总结 本研究展示了AI代理能够生成针对每个目标的定制攻击策略,利用被感染机器上的大语言模型自我维持并传播,形成自持的AI驱动网络威胁。

详情
AI中文摘要

计算机蠕虫是一种通过在网络中从一台机器复制到另一台机器来传播的恶意软件。传统蠕虫(如WannaCry)利用预定的漏洞,修补这些漏洞即可阻止其传播。本文表明,人工智能(AI)代理实现了一种根本性的新威胁:一种能够针对每个遭遇的目标生成定制攻击策略的蠕虫。该蠕虫寄生性地利用被感染的机器运行开放权重的大语言模型(LLM)以维持其推理能力,或扩展其攻击范围。在部署于Linux、Windows和物联网(IoT)设备的机器网络上,该蠕虫通过利用常见的现实企业网络漏洞进行传播。由于蠕虫由窃取的计算资源驱动,攻击者每次新感染所需的边际成本为零。这在攻击者和防御者之间造成了不稳定的经济不对称。此外,由于蠕虫不需要商业AI平台,集中式安全控制(如服务拒绝或速率限制)在结构上无关紧要。我们的结果表明,自持的AI驱动网络威胁不再是理论上的。我们必须为自主的生成式对手做好准备:这些恶意软件系统无需人类操作员即可传播,其定义不是固定的利用代码,而是实时推理目标、适应观察并合成攻击逻辑的能力。

英文摘要

A computer worm is malware that spreads on a network by replicating itself from one machine to another. Traditional worms, like WannaCry, exploited predetermined vulnerabilities, and their spread can be halted by patching those vulnerabilities. Here we show that artificial intelligence (AI) agents enable a fundamentally new threat: a worm that generates tailored attack strategies to each target it encounters. The worm parasitically uses compromised machines to run open-weight large language models (LLMs) to sustain its reasoning, or extend its reach for further attacks. Deployed on a network of machines spanning Linux, Windows, and IoT (Internet of Things) devices, the worm propagated by exploiting common, real-world corporate network vulnerabilities. Since the worm is powered by stolen compute, the attacker's marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders. Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical. We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined not by fixed exploit code, but by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.

2606.03770 2026-06-03 cs.DC cs.AI

E2LLM: Towards Efficient LLM Serving in Heterogeneous Edge/Fog Environments

E2LLM:异构边缘/雾环境中高效LLM服务

Truong-Thanh Le, Amir Taherkordi, Hoang-Loc La, Frank Eliassen, Phuong Hoai Ha, Peiyuan Guan

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出E2LLM框架,通过复制模型到多设备组并采用模型并行,结合遗传算法聚类和动态规划分区,在资源受限的异构边缘/雾环境中实现高效LLM部署,相比Splitwise基线在高需求下平均等待时间降低50%以上。

详情
AI中文摘要

大型语言模型(LLM)已成为现代应用不可或缺的一部分,但其部署仍具挑战性。除了执行模型本身,实际部署必须解决成本效率、低延迟和最优资源利用问题。传统方法通常假设整个模型可以托管在单个设备上,这在许多现实场景中不成立,尤其是在设备资源受限的边缘和雾环境中。本文介绍了E2LLM,一个旨在在此类资源有限环境中实现高效LLM部署的框架。E2LLM并非简单地将单个模型分区到所有可用设备,而是将完整模型复制到多个设备组(副本),并在每个副本内应用模型并行。每个副本根据其处理输入和输出令牌的效率被分配专门角色PREFILL或DECODER。这种分离利用了LLM推理这两个阶段之间的固有差异。为了有效组织设备,我们利用遗传算法形成最大化系统性能的集群。在每个集群内,我们应用动态规划确定最优分区策略,以最小化模型并行执行中的瓶颈。实验结果表明,我们的方法能够稳健地适应不同工作负载,包括输入和输出令牌长度显著变化的场景。与Splitwise基线相比,E2LLM在高需求条件下将平均等待时间降低了50%以上。

英文摘要

Large Language Models (LLMs) have become integral to modern applications, yet their deployment remains challenging. Beyond executing the models themselves, practical deployment must address cost efficiency, low latency, and optimal resource utilization. Conventional approaches typically assume that an entire model can be hosted on a single device, which does not hold in many real-world scenarios, particularly in Edge and Fog environments where device resources are constrained. In this paper, we introduce E2LLM, a framework designed to enable efficient LLM deployment in such resource limited settings. Rather than simply partitioning a single model across all available devices, E2LLM replicates the full model across multiple groups of devices (replicas) and applies model parallelism within each replica. Each replica is assigned a specialized role PREFILL or DECODER based on its efficiency in handling input and output tokens. This separation leverages the inherent differences between these two phases of LLM inference. To effectively organize devices, we utilize a Genetic Algorithm to form clusters that maximize system performance. Within each cluster, we apply Dynamic Programming to determine an optimal partitioning strategy that minimizes bottlenecks in model-parallel execution. Experimental results demonstrate that our approach adapts robustly to varying workloads, including scenarios with significant variation in input and output token lengths. Compared to the Splitwise baseline, E2LLM reduces average waiting time by over 50% under high-demand conditions

2606.03647 2026-06-03 cs.CR cs.AI cs.LG

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

黑盒、自适应、高效、可迁移、有害、适用……攻击是破解LLM所需的一切

Vincent Limbach, Jonas Dornbusch, David Lüdke, Stephan Günnemann, Leo Schwinn

发表机构 * University of St. Gallen(圣加尔大学)

AI总结 提出间接危害优化(IHO)方法,通过迭代偏好优化训练掩码扩散语言模型攻击器,实现黑盒、高效、可迁移的自适应攻击,显著提升对分层防御的破解成功率。

详情
AI中文摘要

准确评估对抗鲁棒性是一个长期挑战。有缺陷的攻击设计可能会夸大鲁棒性估计,使得部署风险评估和防御比较不可靠。历史上,像AutoAttack这样的标准化攻击在很大程度上解决了图像分类器的问题,为跨防御的系统比较提供了可靠的评估基线。然而,对于LLM越狱评估,目前还没有等效的方法,而设计这样的攻击要困难得多。一个可靠的攻击必须(除其他外)兼容黑盒、适用于任意防御管道且高效,而现有方法无法同时满足这些条件。我们引入了间接危害优化(IHO),这是一种掩码扩散语言模型攻击器,通过对危害评判器进行迭代偏好优化来训练,仅需对目标进行黑盒访问。相同的方法无需修改即可用作针对个体行为的强自适应攻击,或作为一种高效的摊销策略,无需微调即可迁移到未见行为和未见目标模型。即使面对分层防御(例如,结合辅助检测器的Circuit Breaker训练模型),IHO在攻击成功率上也显著优于最先进的方法,且无需任何防御特定的适应。我们的结果将IHO定位为向那种过去提高了可靠性的标准化越狱评估迈出的实际一步。代码和模型可在GitHub和Hugging Face上获取。

英文摘要

Accurately evaluating adversarial robustness is a longstanding challenge. A flawed attack design can inflate robustness estimates, making deployment risk assessment and defense comparison unreliable. Historically, standardized attacks such as AutoAttack have largely resolved this for image classifiers, providing a reliable evaluation baseline for systematic comparison across defenses. However, no equivalent exists for LLM jailbreak evaluation yet, where designing such an attack is considerably more difficult. A reliable attack must, among other things, be black-box compatible, applicable to arbitrary defense pipelines, and efficient, which no existing method jointly satisfies. We introduce Indirect Harm Optimization (IHO), a masked diffusion language model attacker trained via iterative preference optimization against a harmfulness judge, requiring only black-box access to the target. The same method can be used without modification as a strong adaptive attack on individual behaviors, or as an efficient amortized policy that transfers to held-out behaviors and unseen target models without fine-tuning. Even against layered defenses, such as a Circuit Breaker-trained model combined with an auxiliary detector, IHO improves attack success considerably over state-of-the-art approaches, without any defense-specific adaptation. Our results position IHO as a practical step toward the kind of standardized jailbreak evaluation that has improved reliability in the past. Code and models are available on GitHub and Hugging Face.

2606.03601 2026-06-03 cs.SE cs.AI

DDOR: Delta Debugging for Explainable Overrefusal Testing and Repair

DDOR: 用于可解释过度拒绝测试与修复的Delta调试方法

Qinyan Zhou, Peixin Zhang, Jun Sun, Haonan Zhang, Dongxia Wang

发表机构 * Southeast University(东南大学) Singapore Management University(新加坡管理学院) Zhejiang University(浙江大学) Huzhou Institute of Industrial Control Technology(湖州工业控制技术研究所)

AI总结 提出DDOR框架,通过delta调试定位最小拒绝触发片段(mRTF),实现黑盒环境下大语言模型过度拒绝行为的自动化测试与修复。

详情
AI中文摘要

虽然安全对齐和护栏有助于大语言模型(LLM)避免有害输出,但它们也可能导致过度拒绝,即对仅看似有风险的无害查询进行无根据的拒绝。我们提出了DDOR(用于过度拒绝的Delta调试),这是一个完全自动化和可解释的框架,用于在黑盒设置中进行过度拒绝测试和修复,其中仅可访问模型输入和输出,内部安全机制保持不透明。DDOR应用delta调试来定位最小拒绝触发片段(mRTF),这些片段提供了短语级别的、可解释的证据,说明拒绝发生的原因。基于这些mRTF,DDOR生成多样化、上下文丰富的提示,并执行多预言验证以过滤本质上不安全或模糊的案例,从而产生可扩展且模型特定的过度拒绝测试套件(每个模型约1K个案例)。除了评估之外,我们进一步利用定位的mRTF进行有针对性的提示修复,显著减少过度拒绝,同时保留原始意图并在真正有害的输入上保持安全性。总体而言,DDOR提供了一种实用的端到端解决方案,用于评估和缓解过度拒绝,在不牺牲安全性的情况下提高LLM的可用性。

英文摘要

While safety alignment and guardrails help large language models (LLMs) avoid harmful outputs, they can also induce overrefusal, i.e., unwarranted rejection of benign queries that merely appear risky. We present DDOR (Delta Debugging for OverRefusal), a fully automated and explainable framework for overrefusal testing and repair in a black-box setting, where only model inputs and outputs are accessible and internal safety mechanisms remain opaque. DDOR applies delta debugging to localize minimal refusal-triggering fragments (mRTFs) that provide phrase-level, explainable evidence for why a refusal occurs. Conditioned on these mRTFs, DDOR generates diverse, context-rich prompts and performs multi-oracle validation to filter intrinsically unsafe or ambiguous cases, producing scalable and model-specific overrefusal test suites (approximately 1K cases per model). Beyond evaluation, we further leverage localized mRTFs to perform targeted prompt repair, substantially reducing overrefusal while preserving the original intent and maintaining safety on genuinely harmful inputs. Overall, DDOR offers a practical end-to-end solution to both evaluate and mitigate overrefusal, improving LLM usability without sacrificing safety.

2606.03593 2026-06-03 cs.SE cs.RO

Making Embodied AI Reliable: A Community Agenda from Testing to Formal Verification

使具身AI可靠:从测试到形式验证的社区议程

Xi Zheng, Dulanga Weerakoon, Yintong Huo, Teresa Yeo, Guy Van Den Broeck, Vijay Ganesh, Daniel Neider, Biplav Srivastava, Ivan Ruchkin, Archan Misra, Corina Pasareanu

发表机构 * University of Waterloo(滑铁卢大学) Universityinceton University(普林斯顿大学)

AI总结 本文基于AAAI'26 Bridge Program讨论,提出通过集成测试、形式验证和运行时保证的神经符号方法,解决具身AI在开放世界中的生命周期可靠性问题。

详情
AI中文摘要

具身AI系统越来越多地部署在开放世界环境中,但确保其可靠性仍然是一个根本性挑战。借鉴AAAI'26 Bridge Program关于“通过测试和形式验证使具身AI可靠”的讨论,本文认为具身AI的可靠性本质上是一个生命周期保证问题,源于不确定性、人类交互以及紧密耦合系统组件之间的涌现行为。我们确定了实现可靠具身AI的三个互补方向:(1)基于可信场景的测试,由经过验证的规范和有意义覆盖度量支持;(2)通过系统行为和环境的符号化结构化表示实现的组合验证;(3)能够在部署期间适应不确定性和分布偏移的运行时保证机制。我们不将这些方法视为独立,而是倡导集成保证工作流,通过共享的神经符号表示和系统生命周期中的持续反馈,连接测试、验证和运行时适应。这种集成为构建能够在复杂现实世界中安全可靠运行的值得信赖的具身AI系统提供了基础。

英文摘要

Embodied AI systems are increasingly deployed in open-world environments, yet ensuring their reliability remains a fundamental challenge. Drawing on discussions from the AAAI'26 Bridge Program on "Making Embodied AI Reliable with Testing and Formal Verification", this article argues that reliability in embodied AI is inherently a lifecycle assurance problem arising from uncertainty, human interaction, and emergent behaviors across tightly coupled system components. We identify three complementary directions toward reliable embodied AI: (1) trustworthy scenario-based testing supported by validated specifications and meaningful coverage metrics, (2) compositional verification enabled by structured symbolic representations of system behavior and environmental context, and (3) runtime assurance mechanisms capable of adapting to uncertainty and distribution shifts during deployment. Rather than treating these approaches independently, we advocate integrated assurance workflows that connect testing, verification, and runtime adaptation through shared neuro-symbolic representations and continuous feedback across the system lifecycle. Such integration provides a foundation for building trustworthy embodied AI systems that can operate safely and reliably in complex real-world environments.

2606.03535 2026-06-03 cs.IR cs.CL

Can LLM Rerankers Predict Their Own Ranking Performance?

LLM 重排序器能否预测自身的排序性能?

Shiyu Ni, Keping Bi, Jiafeng Guo, Jingtong Wu, Zengxin Han, Xueqi Cheng

发表机构 * State Key Laboratory of AI Safety(人工智能安全国家重点实验室) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 研究 LLM 重排序器能否通过自一致性或口头化置信度来估计自身生成的排序质量,并提出两种监督方法 Verb-Num 和 Verb-List 以改进校准。

详情
AI中文摘要

检索效果在不同查询间差异显著,因此在获得相关性判断之前估计排序质量非常重要。查询性能预测(QPP)解决了这一需求,但大多数现有方法依赖于检索或重排序后的外部预测器。本文研究 extit{重排序器内部 QPP}:LLM 重排序器能否估计其刚刚产生的排序的质量?我们探讨了无训练和基于训练的方法。对于无训练估计,我们检查了跨采样排序的特定于度量的自一致性以及由重排序器直接生成的口头化置信度。在 TREC Deep Learning 2019--2022 上使用四个 LLM 的实验表明,自一致性与最先进(SOTA)方法竞争力相当,并且在几乎所有设置下校准更好,而直接口头化置信度严重过度自信。为了改进口头化置信度,我们提出了两种监督方法 Verb-Num 和 Verb-List,使 LLM 重排序器仅需少量额外输出标记即可生成校准的排序质量估计。

英文摘要

Retrieval effectiveness varies substantially across queries, making it important to estimate ranking quality before relevance judgments are available. Query performance prediction (QPP) addresses this need, but most existing methods rely on external predictors after retrieval or reranking. In this paper, we study \textit{reranker-internal QPP}: can an LLM reranker estimate the quality of the ranking it has just produced? We investigate both training-free and training-based approaches. For training-free estimation, we examine metric-specific self-consistency across sampled rankings and verbalized confidence produced directly by the reranker. Experiments on TREC Deep Learning 2019--2022 with four LLMs show that self-consistency is competitive with the state-of-the-art (SOTA) approach and better calibrated in almost all settings, while direct verbalized confidence is severely overconfident. To improve verbalized confidence, we propose two supervised methods, Verb-Num and Verb-List, which enable LLM rerankers to produce calibrated ranking-quality estimates with only a few additional output tokens.

2606.03523 2026-06-03 cs.CR cs.AI cs.LG

High-Precision APT Malware Attribution with Out-of-Scope Resilience

高精度APT恶意软件归因与越界鲁棒性

Peter Williams, Adam Sobey, Erisa Karafili

发表机构 * Department of Computer Science, University of Oxford(1 奥克斯福德大学计算机科学系)

AI总结 提出基于排名二元分类器与显式弃权的APT恶意软件归因方法,在越界样本占比87%时仍保持92%精度和95%选择性准确率。

详情
AI中文摘要

早期归因高级持续性威胁(APT)活动可帮助防御者优先调查、选择对策并减少入侵影响。恶意软件提供了有用的归因证据,但自动化APT恶意软件归因在实践中仍然困难。现有方法通常作为封闭集分类器在有限数量的已知APT组织上进行训练和评估。然而,在操作环境中,分类器很可能遇到训练中未出现的组织样本。封闭集分类器被迫将这些样本分配给已知组织,产生无根据且可能误导的归因。我们提出一种基于排名二元分类器与显式弃权的高精度APT恶意软件归因方法。我们的方法不是训练单个多类分类器,而是为每个APT组织训练和调整两个二元分类器,根据验证性能对分类器进行排名,并顺序应用它们。仅当分类器提供足够证据时才对样本进行归因;否则,弃权。我们在APT恶意软件数据集和旨在压力测试越界行为的更大组合数据集上评估该方法。在APT恶意软件数据集上,该方法实现了比先前公布结果更高的精度。在最具挑战性的设置中,87%的测试样本来自训练中排除的60个APT组织,该方法对94%的越界样本弃权,同时在其分类的样本上保持92%的精度和95%的选择性准确率。

英文摘要

Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples to known groups, producing unsupported and potentially misleading attributions. We present a high-precision APT malware attribution method based on ranked binary classifiers with explicit abstention. Rather than training a single multi-class classifier, our approach trains and tunes two binary classifiers per APT group, ranks the classifiers by validation performance, and applies them sequentially. A sample is attributed only when a classifier provides sufficient evidence; otherwise, it abstains. We evaluate the method on the APT Malware dataset and on a larger combined dataset designed to stress-test out-of-scope behaviour. On the APT Malware dataset, the method achieves higher precision than previously published results on the same dataset. In the most challenging setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples while maintaining 92% precision and 95% selective accuracy on the samples it classified.

2606.03486 2026-06-03 cs.CR cs.AI

NeuroArmor: Safe-Variant-Guided Representation Consistency for Selective Re-Anchoring in Jailbreak Defense

NeuroArmor:基于安全变体引导的表示一致性实现越狱防御中的选择性重新锚定

Zhongyang Lin, Ziran Zhao, Feifei Zhai, Pengyuan Liu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出NeuroArmor白盒运行时防御方法,通过为每个提示构建安全变体作为局部安全参考,在隐藏状态空间进行一致性检查并路由异常,有效降低恶意攻击成功率同时保持低误报率。

Comments 16 pages, 4 figures, 17 tables. Submitted to ACL ARR

详情
AI中文摘要

大型语言模型仍然容易受到越狱攻击,这些攻击将有害意图隐藏在看似普通的请求背后,例如角色扮演、翻译、编码、对抗性后缀和多轮铺垫。现有的防御方法仍然难以在不过度拦截良性但敏感的请求的情况下处理这些攻击,部分原因是它们通常对每个提示应用相同的操作,因此无法平衡安全性和有用性。我们提出NeuroArmor,一种白盒运行时防御方法,它使用提示特定的安全变体作为局部安全参考,用于决定何时需要干预,并在触发时作为干预的安全目标。对于每个提示,NeuroArmor构建K个安全变体,在隐藏状态空间中将提示状态与此局部安全参考进行比较,并将异常路由到恶意提示的拒绝分支或边界良性提示的有用恢复分支。在Llama-3-8B-Instruct上,NeuroArmor将恶意攻击成功率(ASR)从41.56%降低到1.57%,同时将共享良性池上的良性误报率(FPR)从30.26%降低到22.05%;匹配的基线在此权衡上仍然明显较弱。外部评估者和手动行为评估进一步表明,剩余未拦截的输出产生操作危害的可能性大大降低。总体而言,NeuroArmor通过结合提示特定的一致性检查、路由和选择性干预,为越狱防御提供了更有效的运行时策略。

英文摘要

Large language models remain vulnerable to jailbreak attacks that hide harmful intent behind seemingly ordinary requests such as role-play, translation, encoding, adversarial suffixes, and multi-turn buildup. Existing defenses still struggle to handle these attacks without over-blocking benign but sensitive requests, partly because they often apply the same action to every prompt and therefore fail to balance safety and helpfulness. We propose NeuroArmor, a white-box runtime defense that uses prompt-specific safe variants as a local safety reference for deciding when intervention is needed and, once triggered, as safe targets for intervention. For each prompt, NeuroArmor builds K safe variants, compares the prompt state against this local safe reference in hidden-state space, and routes anomalies either to a refusal branch for malicious prompts or to a helpful recovery branch for borderline benign prompts. On Llama-3-8B-Instruct, NeuroArmor reduces malicious attack success rate (ASR) from 41.56% to 1.57% while lowering benign false positive rate (FPR) on the shared benign pool from 30.26% to 22.05%; matched baselines remain substantially weaker on this trade-off. External-judge and manual behavioral evaluations further show that the remaining non-blocked outputs are much less likely to be operationally harmful. Overall, NeuroArmor provides a more effective runtime strategy for jailbreak defense by combining prompt-specific consistency checking, routing, and selective intervention.

2606.03453 2026-06-03 cs.CR cs.AI cs.MA

FORGE: Multi-Agent Graduated Exploitation and Detection Engineering

FORGE:多智能体渐进式利用与检测工程

Farooq Shaikh

发表机构 * Dynatrace

AI总结 提出多智能体系统FORGE,通过渐进式利用深度桥接漏洞利用生成、优先级排序和检测规则工程三个孤立领域,在603个CVE上实现67.8%的端到端L1+利用,并生成低误报的Sigma和Snort检测规则。

Comments 18 pages, 4 figures, 3 tables. Accepted at the AgentCy Workshop at the 21st International Conference on Availability, Reliability and Security (ARES 2026). Keywords: Vulnerability assessment, Multi-agent systems, Exploit generation, Detection engineering, Risk prioritization

详情
AI中文摘要

漏洞披露数量现已远超组织评估能力,然而三个相邻研究社区(概念验证生成、漏洞优先级排序和检测规则工程)基本上各自为政。现有的自动利用生成系统报告二进制的通过/失败结果,丢弃了部分进展,并且对另外两个社区不产生任何信号。本文提出了FORGE,一个多智能体系统,通过渐进式利用深度来桥接这三个孤岛。五个专门智能体(情报、生成器、规划器、利用和检测器)在一个固定流水线中执行,该流水线(1)从CVE元数据生成目标易受攻击的应用程序,(2)进行指导性的多轮利用,由LLM主预言机根据四级分类法(L0:无证据到L3:完全入侵)评估,以及(3)生成基于OpenTelemetry利用痕迹的Sigma和Snort检测规则。渐进式深度是桥接机制:更深的利用为检测工程提供更丰富的行为痕迹,而跨评分区间的深度数据为优先级排序验证提供真实依据。分层知识架构跨评估累积情报,将构建和利用经验传递给后续CVE。在CVE-GENIE数据集的603个CVE上评估,跨8种语言和187种CWE类型,以每个CVE 1.50美元的成本实现了67.8%的端到端L1+利用。无论EPSS或CVSS区间如何,利用率保持在接近68%,表明模式级可达性与基于元数据的优先级排序正交。来自L2+利用的检测规则实现了显著高于L1衍生规则的跨度归一化基础(p=0.035),并且93.4%的生成Snort规则对合成良性语料库产生零误报。

英文摘要

Vulnerability disclosure volumes now far exceed organizational assessment capacity, yet three adjacent research communities (proof-of-concept generation, vulnerability prioritization, and detection rule engineering) operate largely in isolation. Existing automated exploit generation systems report binary pass/fail outcomes, discarding partial progress and producing no signal for the other two communities. This paper presents FORGE, a multi-agent system that bridges these three silos through graduated exploitation depth. Five specialized agents (Intel, Generator, Planner, Exploit, and Detector) execute in a fixed pipeline that (1) generates targeted vulnerable applications from CVE metadata, (2) conducts coached, multi-turn exploitation assessed by an LLM-primary oracle on a four-level taxonomy (L0: no evidence through L3: full compromise), and (3) produces Sigma and Snort detection rules grounded in OpenTelemetry exploitation traces. Graduated depth is the bridging mechanism: deeper exploitation yields richer behavioral traces for detection engineering, while depth data across scoring bands provides ground truth for prioritization validation. A tiered knowledge architecture accumulates intelligence across assessments, transferring build and exploitation experience to subsequent CVEs. Evaluation on 603 CVEs from the CVE-GENIE dataset achieves 67.8% end-to-end L1+ exploitation at USD 1.50 per CVE across eight languages and 187 CWE types. Exploitation rates remain near 68% regardless of EPSS or CVSS band, indicating that pattern-level reachability is orthogonal to metadata-based prioritization. Detection rules from L2+ exploitation achieve significantly higher span-normalized grounding than L1-derived rules (p=0.035), and 93.4% of generated Snort rules produce zero false positives against a synthetic benign corpus.

2606.03430 2026-06-03 cs.CR cs.AI

FlowGuard: Flow Matching for Identity-Independent Detection of Data-Free Model Stealing Attacks on Energy System Intrusion Detection Systems

FlowGuard: 基于流匹配的能源系统入侵检测系统中无数据模型窃取攻击的身份无关检测

Maxime Schwarzer, Laurin Holz, Tobias Huerten, Johannes Loevenich, Thies Moehlenhof, Roberto Rigolin F. Lopes, Veit Hagenmeyer

发表机构 * CortAIx Labs, Thales Deutschland(CortAIx实验室,Thales德国) Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院)

AI总结 提出FlowGuard,一种基于流匹配的身份无关防御方法,通过检测查询是否属于分布外(OOD)来防御针对能源系统入侵检测系统的无数据模型窃取攻击,在单客户端和分布式Sybil场景下均保持稳定检测率。

详情
AI中文摘要

部署在能源基础设施中的人工智能入侵检测系统(IDS)容易受到模型窃取攻击,攻击者可以离线创建规避流量。当前针对模型提取的防御要么依赖于身份绑定的查询监控(对分布式攻击者Sybil无效),要么通过软标签扰动进行预测中毒(不适用于硬标签IDS部署)。因此,我们提出FlowGuard,一种基于流匹配的身份无关防御,在IDS处理之前将传入查询分类为分布外(OOD)。该方法利用了以下事实:为无数据模型窃取攻击合成的查询占据比真实网络流量更低维的流形,导致在使用基于合法数据训练的连续归一化流时,对数似然显著降低。我们在单客户端和分布式(100客户端Sybil)设置下,使用MAZE和DisGUIDE攻击评估了我们的方法,并与PRADA和FDINet进行了比较。当分布发生变化时,PRADA的检测率降至0%,而我们的防御在不依赖身份信息的情况下,在两种设置下均保持稳定的检测率。我们讨论了该方法的范围和局限性,并概述了在数据依赖攻击中的潜在应用。

英文摘要

Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS) deployed in energy infrastructure are vulnerable to model theft attacks, which allow adversaries to create evasive traffic offline. Current defences against model extraction rely either on identity-bound query monitoring, which is ineffective against distributed attackers (Sybil), or on prediction poisoning through soft-label perturbation, which is inapplicable to hard-label IDS deployments. Therefore, we propose FlowGuard, an identity-independent defence based on flow matching that classifies incoming queries as out-of-distribution (OOD) prior to IDS processing. This approach exploits the fact that queries generated synthetically for data-free model stealing attacks occupy a lower-dimensional manifold than real network traffic. This results in measurably lower log-likelihoods when using a Continuous Normalizing Flow that has been trained on legitimate data. We evaluate our method against PRADA and FDINet using MAZE and DisGUIDE attacks in single-client and distributed (100-client Sybil) settings. While PRADA's detection rate dropped to 0% when the distribution changed, our defence maintained a stable detection rate across both settings without relying on identity information. We discuss the scope and limitations of the approach, and outline potential applications to data-dependent attacks.

2606.03428 2026-06-03 cs.NE cs.AI cs.LG

PrimeSVT: An Automated Memory-aware Pruning Framework with Prioritized Compression Policy for Spiking Vision Transformers

PrimeSVT: 一种具有优先压缩策略的自动化内存感知剪枝框架用于脉冲视觉Transformer

Rachmad Vidya Wicaksana Putra, Achyuta Muthuvelan, Alberto Marchisio, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程系,纽约大学(NYU)阿布扎克分校) New York University (NYU) Abu Dhabi, United Arab Emirates (UAE)(纽约大学(NYU)阿布扎克分校,阿拉伯联合酋长国(UAE))

AI总结 提出PrimeSVT框架,通过自动化结构化剪枝和优先压缩策略,在满足精度和内存约束下压缩脉冲视觉Transformer,实现内存节省26.68%且精度损失小于3%。

Comments 8 pages, 8 figures, 3 tables

详情
AI中文摘要

脉冲视觉Transformer(SViT)的大尺寸仍然阻碍其嵌入式实现,因此需要模型压缩。现有工作通过非结构化剪枝压缩SViT模型,这需要专门的硬件加速器来利用其特定的稀疏模式以最大化效率提升。此外,它们的手动方法需要大量设计时间来为每个网络找到合适的剪枝设置,因此这种方法不可扩展。为了解决这一限制,我们提出了PrimeSVT,一种新颖的框架,对预训练的SViT模型执行自动化的内存感知结构化剪枝,从而在推理期间最大化其效率提升,适用于广泛使用的计算架构。为此,PrimeSVT首先根据层的大小(即参数数量)对SViT层进行排序,根据它们在不同剪枝率下的鲁棒性识别目标剪枝层,然后利用这个顺序从最大层到最小层逐层顺序压缩模型(即所谓的优先压缩策略),同时考虑用户定义的约束(即可接受的精度和内存节省)。在每一层中,PrimeSVT基于L2范数值采用通道级滤波器剪枝,以结构性地移除不重要的权重。实验结果表明,PrimeSVT通过自动化单次剪枝节省了26.68%的内存,同时将精度保持在原始未剪枝SViT模型(73.3%)的3%以内(未微调时为70.3%,微调后为72.9%),从而满足了精度和内存约束。这些表明我们的PrimeSVT框架实现了SViT及其嵌入式实现的设计自动化。

英文摘要

The large sizes of Spiking Vision Transformers (SViTs) still hinder their embedded implementation, highlighting the need for model compression. State-of-the-art works compress SViT models through unstructured pruning, which needs specialized hardware accelerators for their specific sparsity patterns to maximize efficiency gains. Moreover, their manual approach requires a huge design time to find an appropriate pruning setting for each network, thus making this approach not scalable. To address this limitation, we propose PrimeSVT, a novel framework that performs automated memory-aware structured pruning on pre-trained SViT models, thereby maximizing their efficiency gains during inference amenable to widely-used computing architectures. To achieve this, PrimeSVT first sorts the SViT layers based on their sizes (i.e., number of parameters), identifies the targeted pruning layers based on their robustness under different pruning rates, then leverages this order for compressing the model layer-by-layer sequentially from the largest one to the smallest one (i.e., so-called prioritized compression policy), while considering the user-defined constraints (i.e., acceptable accuracy and memory saving). In each layer, PrimeSVT employs channel-wise filter pruning based on their L2-norm values to structurally remove the non-significant weights. Experimental results show that PrimeSVT saves 26.68% memory through automated single-shot pruning, while preserving accuracy within 3% (70.3% without fine-tuning and 72.9% with fine-tuning) from the original unpruned SViT model (73.3%), thus meeting the accuracy and memory constraints. These show that our PrimeSVT framework enables design automation for SViTs and their embedded implementation.

2606.03381 2026-06-03 cs.CR cs.AI

AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

AI模型提取攻击:绕过防御中的单客户端假设

Maxime Schwarzer, Johannes F. Loevenich, Gustavo Sánchez, Laurin Holz, Thies Möhlenhof, Tobias Hürten, Roberto Rigolin F. Lopes, Veit Hagenmeyer

发表机构 * ETH Zurich(苏黎世联邦理工学院) University of Zurich(苏黎世大学) University of Tübingen(图宾根大学)

AI总结 本文通过提出CerberusAI框架,系统性地证明模型提取攻击中的单客户端假设(SCA)在高级持续性威胁(APT)等协同攻击者面前无效,并展示基本轮询查询分布策略即可绕过PRADA等防御机制,呼吁转向无状态、独立于身份的防御架构。

详情
AI中文摘要

确保部署在军事指挥控制(C2)系统和关键基础设施中的人工智能(AI)模型的保护对于维持信息优势至关重要。模型提取攻击(MEA)构成了重大威胁,因为它们使对手能够复制专有模型、泄露受保护信息并准备离线对抗性攻击。然而,当前的防御策略主要依赖于单客户端假设(SCA),即隐含地假设攻击源自孤立身份。本工作系统地证明了在协同威胁行为者(如高级持续性威胁APT)存在的情况下,SCA从根本上无效。我们引入了一个模块化、开源框架CerberusAI,用于可复现的模型窃取研究,并利用它模拟分布式攻击场景。我们的实证评估表明,成熟的防御机制(如防止深度神经网络模型窃取攻击PRADA)可以通过基本的轮询查询分布策略被绕过,导致检测性能显著下降。此外,我们证明即使是全局聚合方法也可以通过自适应流量混合使其在操作上变得无用。这些结果强调了在模型提取攻击领域需要向有状态、独立于身份的防御架构进行范式转变。本文最初发表于由信息系统技术(IST)科学与技术委员会IST-224-RSY组织的国际军事通信与信息系统会议(ICMCIS),该会议于2026年5月12-13日在英国巴斯举行,并获得了最佳论文奖。

英文摘要

Ensuring the protection of Artificial Intelligence (AI) models deployed in military Command and Control (C2) systems and critical infrastructure is essential for maintaining information superiority. Model Extraction Attacks (MEAs) pose a significant threat, as they enable adversaries to replicate proprietary models, compromise protected information, and prepare offline adversarial attacks. However, current defense strategies predominantly rely on the Single Client Assumption (SCA), which is the implicit assumption that attacks originate from isolated identities. This work systematically demonstrates that the SCA is fundamentally invalid in the presence of coordinated threat actors, such as Advanced Persistent Threats (APTs). We introduce a modular, open-source framework called CerberusAI for reproducible model-stealing research, and use it to simulate distributed attack scenarios. Our empirical evaluation shows that well-established defense mechanisms, such as Protecting Against Deep Neural Network Model Stealing Attacks (PRADA), can be bypassed by basic round-robin query distribution strategies, resulting in a significant reduction in detection performance. Furthermore, we demonstrate that even global aggregation approaches can be rendered operationally useless through adaptive traffic mixing. These results highlight the need for a paradigm shift towards stateful, identity-independent defense architectures in the field of model extraction attacks. This paper was originally presented at the International Conference on Military Communication and Information Systems (ICMCIS), organized by the Information Systems Technology (IST) Scientific and Technical Committee, IST-224-RSY - the ICMCIS, held in Bath, United Kingdom, 12-13 May 2026 and won the best paper award.

2606.03344 2026-06-03 cs.CR cs.LG

RogueMerge: Robust and Unified Attacks against LLM Model Merging

RogueMerge: 针对大语言模型合并的鲁棒统一攻击

Jinghuai Zhang, Yetian He, Kunlin Cai, Han Zhao, Fnu Suya, Yuan Tian

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出RogueMerge框架,通过联合优化、元学习模拟和分布鲁棒优化,解决模型合并中针对自回归生成、未知合并配置和攻击提示泛化的三大挑战,实现鲁棒且统一的攻击。

详情
AI中文摘要

模型合并通过聚合来自未验证公共平台的任务向量,将专门能力组合到单个大语言模型中,暴露了关键的供应链攻击面:由于任何恶意行为都可以编码到任务向量中,并且合并允许第三方向量直接写入模型权重,攻击者提供的任务向量可以启用或放大多种下游威胁。先前的工作仅研究针对分类器的模型合并的后门攻击,使用静态算术启发式方法,由于三个原因无法有效处理针对生成式大语言模型的各种攻击。(i) 大语言模型依赖于自回归解码,合并引入的微小参数漂移会在令牌间累积并迅速降低攻击效果。(ii) 攻击者不知道受害者的合并配置,导致独立优化的静态攻击向量容易被稀释或破坏。(iii) 实际威胁诱导必须泛化到优化期间未见过的攻击提示,静态向量无法充分编码。我们提出RogueMerge,这是第一个原则性的统一框架,解决了所有三个挑战。为了处理自回归生成,我们用联合优化替代静态算术,明确强制合并后的攻击成功。为了处理未知的合并设置,我们将攻击注入表述为随机最小-最大问题,并通过元学习风格的模拟来解决。为了在异构攻击提示间泛化,我们采用分布鲁棒优化,并在大语言模型规模下推导出可处理的一阶泰勒近似,具有可证明的误差界。在四种威胁、六种合并算法和超过170个合并的大语言模型上,RogueMerge始终优于现有攻击。它还在多种合并设置下保持稳定,并能抵抗标准防御。

英文摘要

Model merging composes specialized capabilities into a single LLM by aggregating task vectors sourced from unverified public platforms, exposing a critical supply-chain attack surface: Because any malicious behavior can be encoded into a task vector, and merging grants third-party vectors direct write access to model weights, an attacker-provided task vector can enable or amplify diverse downstream threats. Prior work studies only backdoor attacks against model merging for classifiers using static arithmetic heuristics, which fail to effectively handle diverse attacks on generative LLMs for three reasons. (i) LLMs rely on autoregressive decoding, where the minor parameter drift introduced by merging compounds across tokens and rapidly degrades the attack. (ii) Attackers have no knowledge of the victim's merging configurations, causing a static attack vector optimized in isolation to be easily diluted or destroyed. (iii) Practical threat induction must generalize to attack prompts unseen during optimization, which static vectors cannot adequately encode. We present RogueMerge, the first principled, unified framework that addresses all three challenges. To handle autoregressive generation, we replace static arithmetic with a joint optimization that explicitly enforces attack success after merging. To handle unknown merging settings, we formulate attack injection as a stochastic min-max problem and solve it via meta-learning-style simulation. To generalize across heterogeneous attack prompts, we employ distributionally robust optimization and derive a tractable first-order Taylor approximation at LLM scale, with a provable error bound. Across four threats, six merging algorithms, and over 170 merged LLMs, RogueMerge consistently outperforms existing attacks. It also remains stable across diverse merging settings and resists standard defenses.

2606.03327 2026-06-03 cs.DB cs.CL

CAPER: Clause-Aligned Process Supervision for Text-to-SQL

CAPER: 面向Text-to-SQL的子句对齐过程监督

Lujie Ban, Jiasheng Shi, Jinyang Li, Xiaolin Han, Tsz Nam Chan, Chenhao Ma

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) The University of Hong Kong(香港大学) Northwestern Polytechnical University(西北工业大学) Shenzhen University(深圳大学)

AI总结 提出CAPER方法,通过反事实干预SQL抽象语法树自动推导子句级监督,训练轻量级Clause-PRM模型CAPER-9B,用于策略优化和候选验证,在BIRD和Spider数据集上提升了执行准确率和故障定位能力。

详情
AI中文摘要

Text-to-SQL系统通常通过查询级执行正确性进行评估,但这种终端信号对于哪个中间SQL决策导致成功或失败几乎没有指导作用。Token级密集监督也不适合:SQL token与完整的语义决策不对齐,可能惩罚执行等效的查询,并且难以大规模可靠标记。因此,我们提出CAPER,通过对SQL抽象语法树进行反事实干预自动推导子句级监督,实现用于奖励建模的根因错误定位;所得数据用于训练CAPER-9B,一个轻量级的Clause-PRM,为策略优化和候选验证提供子句边界反馈。在BIRD和Spider上的实验表明,子句对齐监督不仅提高了执行准确率(相对于GPT-5.4实现了高达15.3%的相对EX提升),还增强了故障定位能力,在保留的故障上达到了84.53%的准确率和90.60%的MRR。我们的项目页面位于此https URL。

英文摘要

Text-to-SQL systems are typically evaluated by query-level execution correctness, but this terminal signal provides little guidance about which intermediate SQL decision caused success or failure. Token-level dense supervision is also ill-suited: SQL tokens do not align with complete semantic decisions, can penalize execution-equivalent queries, and are difficult to label reliably at scale. We therefore propose CAPER, which automatically derives clause-level supervision via counterfactual intervention on the SQL abstract syntax tree, enabling root-cause error localization for reward modeling; the resulting data is used to train CAPER-9B, a lightweight Clause-PRM that provides clause-boundary feedback for policy optimization and candidate verification. Experiments on BIRD and Spider show that clause-aligned supervision not only improves execution accuracy, achieving up to a 15.3% relative EX improvement over GPT-5.4, but also strengthens failure-localization capability, reaching 84.53% accuracy and 90.60% MRR on held-out failures. Our project page is at https://github.com/banrichard/RL-NL2SQL.

2606.03288 2026-06-03 cs.CY cs.AI

AI-Generated Traces for Novice Programmers: Learning Effects and Learner Differences in a Multi-Institutional Study

AI生成的新手程序员追踪:多机构研究中的学习效果与学习者差异

Yuri Noviello, Naaz Sibia, Anastasiia Birillo, Thomas Overklift Vaupel Klein, Michael Liut, Gosia Migut

发表机构 * Delft University of Technology(代尔夫特理工大学) University of Toronto(多伦多大学) JetBrains Research(JetBrains研究)

AI总结 本研究提出AI生成的类比动画追踪(GATs),通过多机构实验比较其与文本解释对新手程序员学习程序执行的影响,发现GATs在即时学习上有选择性优势,但效果依赖情境且短暂,且受学习者参与度调节。

详情
AI中文摘要

入门编程(CS1)课程常常难以支持学生对程序执行的理解。虽然可视化可以使执行过程明确,但其有效性取决于设计和情境,而AI生成可视化的实证证据仍然有限。我们提出了生成动画追踪(GATs),即基于AI生成的、类比驱动的、配有旁白的动画,协调源代码、执行状态和概念类比。我们在两个机构的CS1课程中(Python,N=961;Java,N=151)进行了一项研究,比较GATs与文本解释。我们测量了即时学习表现和体验、课程结束时的参与度和考试成绩。结果表明,GATs可以在即时学习方面产生选择性优势,但优势取决于情境且是短期的。我们观察到GATs对表现的影响受到学习者参与度概况的调节。这一发现强调了个性化方法的重要性。

英文摘要

Introductory programming (CS1) courses often struggle to support students' understanding of program execution. While visualizations can make execution processes explicit, their effectiveness depends on design and context, and empirical evidence for AI-generated visualizations remains limited. We propose Generated Animated Traces (GATs), AI-generated, analogy-based, narrated animations that coordinate source code, execution state, and conceptual analogies. We conduct a study at two institutions in CS1 courses (Python, N=961; Java N=151) comparing GATs to textual explanations. We measure immediate learning performance and experience, end-of-course engagement and exam performance. Results show that GATs can yield selective benefits for immediate learning, but benefits are context-dependent and short-term. We observe that GATs' influence on performance is moderated by learner engagement profiles. This finding underscores the importance of personalized approaches.

2606.03257 2026-06-03 cs.NE cs.AI cs.LG

PSViT: A Methodology for Structurally Pruning Spiking Vision Transformers

PSViT:一种结构剪枝脉冲视觉Transformer的方法

Rachmad Vidya Wicaksana Putra, Achyuta Muthuvelan, Alberto Marchisio, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程系,纽约大学(NYU)阿布扎赫德分校) New York University (NYU) Abu Dhabi, United Arab Emirates (UAE)(纽约大学(NYU)阿布扎赫德分校,阿拉伯联合酋长国(UAE))

AI总结 提出PSViT方法,通过结构化剪枝(均匀通道滤波器和基于敏感性的细粒度剪枝)压缩脉冲视觉Transformer,在ImageNet-1K上实现22.4%内存节省且精度损失小于3%。

Comments 8 pages, 7 figures, 3 tables

详情
AI中文摘要

脉冲视觉Transformer(SViT)模型是很有前景的低功耗ViT模型,用于解决基于视觉的任务,具有最先进的性能。然而,它们的大尺寸限制了在资源受限的嵌入式平台上的部署,凸显了模型压缩的需求。一种突出的压缩技术是剪枝,最先进的工作采用非结构化剪枝技术来压缩SViT模型。这种技术需要专门针对稀疏模式定制的硬件架构才能最大化其效率优势,使得这种方法不可扩展。为了解决这个问题,我们提出了PSViT,一种对SViT模型进行结构化剪枝的新方法,从而使得利用现有且广泛使用的计算架构高效加速其推理成为可能。为此,PSViT采用了几个关键步骤:均匀通道滤波器剪枝以结构化消除非显著权重,敏感性分析以评估单层通道剪枝对精度和网络大小的影响,以及基于敏感性分析和给定网络架构的细粒度通道剪枝。实验结果表明,PSViT通过单次剪枝有效获得了22.4%的内存节省,同时在ImageNet-1K上保持高精度(未经微调为70.3%,经微调为72.8%),与原始未剪枝SViT模型(73.3%)相比精度损失在3%以内。这些结果还表明,PSViT方法推进了在资源受限应用中实现高效SViT部署的努力。

英文摘要

Spiking Vision Transformer (SViT) models are promising low-power ViT models for solving vision-based tasks with state-of-the-art performance. However, their large sizes limit their deployments for resource-constrained embedded platforms, underscoring the needs of model compression. One of prominent compression techniques is pruning, and the state-of-the-art works employ unstructured pruning techniques to compress SViT models. Such techniques require specialized hardware architectures tailored for the sparsity patterns to maximize their efficiency benefits, making this approach not scalable. To address this, we propose PSViT, a novel methodology to perform structured pruning on SViT models, hence making it possible to efficiently accelerate their inference using the existing and widely-used computing architectures. To do this, PSViT employs several key steps: uniform channel-wise filter pruning to structurally eliminate the non-significant weights, sensitivity analysis to evaluate the impact of channel-wise pruning of individual layer on accuracy and network size, as well as fine-grained channel-wise pruning based on the sensitivity analysis and the given network architecture. Experimental results show that PSViT effectively obtains 22.4% memory saving through single-shot pruning, while maintaining high accuracy within 3% (70.3% without fine-tuning and 72.8% with fine-tuning) from the original non-pruned SViT model (73.3%) on the ImageNet-1K. These results also show that the PSViT methodology advances the effort in enabling efficient SViT deployments on resource-constrained applications.

2606.03210 2026-06-03 cs.CE cs.LG cs.NA math.NA

Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative

PINN 在 FWD 反分析中的批判性评估及可微有限元方法作为替代方案

Yongjin Choi, Hyeonbin Moon, Seunghwa Ryu

发表机构 * KAIST(韩国科学技术院)

AI总结 本文批判性评估了物理信息神经网络(PINN)在多层路面系统落锤式弯沉仪(FWD)反分析中的表现,并提出可微有限元方法(DiffFEM)作为更准确、稳定和高效的替代方案。

详情
AI中文摘要

基于自动微分的反分析方法,包括物理信息神经网络(PINN)和可微编程,最近因其计算精确梯度和收敛效率的能力而显示出巨大潜力。然而,它们对落锤式弯沉仪(FWD)反计算的适用性尚未被探索。本研究基于合成基准,批判性评估了基于PINN的多层路面系统反分析,并研究了可微有限元方法(DiffFEM)作为替代方案。标准PINN由于层状路面系统固有的尖锐域不连续性而无法恢复层模量。尽管我们使用了具有域分解的扩展PINN(XPINN),它在不连续域上表现更好,但其性能仍然对损失权重和网络架构高度敏感,并且在测量噪声下会退化。相比之下,DiffFEM始终获得更准确、稳定且计算高效的反演结果。这些结果表明,将控制物理作为硬约束强加的DiffFEM比基于PINN的方法(其中控制物理通过损失函数作为软约束施加)具有更好的准确性、鲁棒性和计算效率。更广泛地说,研究结果表明,在基于PINN和DiffFEM的反分析之间进行选择需要仔细考虑,当存在高效且稳健的可微正演求解器时,DiffFEM提供了实际优势。

英文摘要

Automatic-differentiation-based inverse analysis methods, including physics-informed neural networks (PINNs) and differentiable programming, have recently shown great promise due to their ability to compute accurate gradients and convergence efficiency. However, their applicability to falling weight deflectometer (FWD) backcalculation remains unexplored. This study critically evaluates PINN-based inverse analysis for a multilayer pavement system and investigates differentiable finite element method (DiffFEM) as an alternative based on a synthetic benchmark. The standard PINN does not recover layer moduli because of the sharp domain discontinuities inherent to layered pavement systems. Although we use an extended PINN with domain decomposition (XPINN), which shows better performance on discontinuous domains, its performance remains highly sensitive to loss weighting and network architecture, and degrades under measurement noise. By contrast, DiffFEM consistently achieves more accurate, stable, and computationally efficient inversion results. These results indicate that DiffFEM, which enforces the governing physics as a hard constraint, yields better accuracy, robustness, and computational efficiency than PINN-based approaches, in which the governing physics is imposed as a soft constraint through the loss function. More broadly, the findings suggest that the choice between PINN- and DiffFEM-based inverse analysis needs careful consideration, with DiffFEM offering practical advantages when an efficient and robust differentiable forward solver is available.

2606.03183 2026-06-03 cs.MM cs.CV cs.SD eess.AS

Inference-Time Scaling for Joint Audio-Video Generation

联合音视频生成的推理时缩放

Jaemin Jung, Kyeongha Rho, Inkyu Shin, Joon Son Chung

发表机构 * Korea Advanced Institute of Science and Technology(韩国科学技术院) Luma AI

AI总结 针对联合音视频生成中多目标优化的挑战,提出多验证器框架与自适应奖励加权算法,在无需额外训练的情况下显著提升语义对齐、感知质量和音视频同步。

Comments Accepted by Transactions on Machine Learning Research (TMLR). Project page: https://jung-jaemin.github.io/ITS-AVGen-Proj/

详情
AI中文摘要

联合音视频生成旨在合成与文本提示语义对齐且精确同步的逼真音视频对。现有联合音视频生成模型通常需要大量训练资源来提高保真度,而推理时缩放(ITS)最近在单模态领域成为一种有前景的无训练替代方案。然而,将ITS从单模态扩展到多模态领域并非易事,因为它需要平衡多个异构目标。在本文中,我们首次对联合音视频生成的ITS进行了全面研究。我们首先证明多验证器框架对于解决单目标指导的局限性(包括非对称性能权衡和验证器欺骗)至关重要。通过系统分析,我们随后确定了一个最优的多验证器组合,该组合在所有质量维度上产生均衡的改进。最后,为了有效聚合多样化的奖励信号,我们提出了自适应奖励加权(ARW),一种新颖的测试时优化算法。ARW将奖励聚合视为在线优化问题,利用可学习参数校准奖励方差,无需奖励分布的先验知识,从而确保鲁棒的多目标选择。在VGGSound和JavisBench-mini基准上的实验结果表明,我们的框架显著增强了生成输出的语义对齐、感知质量和音视频同步。合成样本和代码可在项目页面获取:this https URL。

英文摘要

Joint audio-video generation aims to synthesize realistic audio-video pairs that are both semantically aligned with text prompts and precisely synchronized. While existing joint audio-video generation models often require substantial training resources to improve fidelity, Inference-Time Scaling (ITS) has recently emerged as a promising training-free alternative in single-modality domains. However, extending ITS from a single modality to multimodal domains is non-trivial, as it requires balancing multiple heterogeneous objectives. In this paper, we present the first comprehensive study of ITS for joint audio-video generation. We first demonstrate that a multi-verifier framework is essential to address the limitations of single-objective guidance, including asymmetric performance trade-offs and verifier hacking. Through systematic analysis, we then identify an optimal multi-verifier combination that yields balanced improvements across all quality dimensions. Finally, to effectively aggregate diverse reward signals, we propose Adaptive Reward Weighting (ARW), a novel test-time optimization algorithm. ARW treats reward aggregation as an online optimization problem, utilizing learnable parameters to calibrate reward variances without requiring prior knowledge of reward distributions, thereby ensuring robust multi-objective selection. Experimental results on VGGSound and JavisBench-mini benchmarks demonstrate that our framework significantly enhances semantic alignment, perceptual quality, and audio-visual synchronization of generated outputs. Synthesized samples and code are available on the project page: https://jung-jaemin.github.io/ITS-AVGen-Proj.

2606.03173 2026-06-03 cs.CY cs.LG cs.SI

Auditing Engagement Incentives in the Kidfluencer Ecosystem: A Multimodal Weak Supervision Approach

审计儿童网红生态系统中的参与激励:一种多模态弱监督方法

Zijing Wei, Chao Peter Yang, Xuanjie Chen

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本研究采用多模态弱监督方法审计YouTube儿童网红频道,发现剥削信号与观看量显著正相关,且表演性劳动、情感诱饵和隐私侵犯能带来参与度溢价。

详情
AI中文摘要

YouTube上“儿童网红”的兴起引发了对儿童数字劳动和剥削的伦理担忧。尽管新兴立法试图规范这一生态系统,但由于大规模操作化剥削的困难,将剥削与参与度联系起来的实证证据仍然稀缺。本研究对79个儿童网红频道的5,051个视频进行了多模态AI审计,使用弱监督方法检测剥削信号,无需大规模人工标注。我们聚合了噪声标注函数——包括基于LLM的标题分类和基于GPT-4 Vision的缩略图与描述分析,涵盖六个基于文献的维度——为每个视频分配一个概率剥削分数。一项多标注者验证研究(N=107)显示与人类判断高度一致(宏平均F1=0.911),并对整体剥削风险具有高敏感性(召回率=0.960,F1=0.793)。我们的发现揭示了表演性劳动、情感诱饵和隐私侵犯的显著参与度溢价。剥削分数与观看次数相关(Spearman ρ=0.229,p<10^{-50}),控制频道层面变化的混合效应回归显示,剥削分数每增加一个单位,观看次数增加4.4倍(p<0.001)。频道内分析表明,情感诱饵的中位观看次数提升+65.6%,表演性内容提升+56.0%(FDR校正p<0.001),且在同年稳健性检验中效果持续(p=0.030)。相比之下,明确的商业内容(产品植入)没有溢价(-3.8%,不显著),表明平台奖励的是儿童身份和劳动的商品化,而非传统广告。这些发现挑战了仅关注财务信托的政策框架,表明参与度与儿童的密集表演性劳动系统性地相关。

英文摘要

The rise of `kidfluencers' on YouTube has raised ethical concerns about child digital labor and exploitation. While emerging legislation attempts to regulate this ecosystem, empirical evidence linking exploitation to engagement remains scarce, given the difficulty of operationalizing exploitation at scale. This study presents a multimodal AI audit of 5,051 videos across 79 kidfluencer channels, using weak supervision to detect exploitation signals without large-scale manual labels. We aggregate noisy labeling functions -- including LLM-based classification of titles and GPT-4 Vision analysis of thumbnails and descriptions across six literature-grounded dimensions -- to assign a probabilistic exploitation score to each video. A multi-annotator validation study (N=107) shows strong agreement with human judgment (macro-average F1 $= 0.911$) and high sensitivity for overall exploitation risk (recall $= 0.960$, F1 $= 0.793$). Our findings reveal a significant engagement premium for performative labor, emotional bait, and privacy violations. Exploitation scores correlate with view counts (Spearman $ρ= 0.229$, $p < 10^{-50}$), and mixed-effects regression controlling for channel-level variation shows that a one-unit increase in exploitation score yields a $4.4\times$ increase in views ($p < 0.001$). Within-channel analyses indicate median view boosts of $+65.6\%$ for emotional bait and $+56.0\%$ for performative content (FDR-corrected $p<0.001$), with effects holding in same-year robustness checks ($p=0.030$). Explicit commercial content (product placement), by contrast, shows no premium ($-3.8\%$, n.s.), suggesting the platform rewards commodification of the child's identity and labor over traditional advertising. These findings challenge policy frameworks focused solely on financial trusts, showing that engagement is systematically tied to the intensive, performative labor of children.

2606.03136 2026-06-03 cs.CR cs.CL

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

PsychoPass: 多轮对抗性LLM对话的几何轮廓分析

Muberra Ozmen, Subhabrata Majumdar

发表机构 * Coveo Montreal, QC, Canada(加拿大蒙特利尔 Coveo) Indian Institute of Management Bangalore(班加罗尔印度管理学院)

AI总结 提出PsychoPass框架,通过提取对话轨迹在嵌入空间中的几何特征,在有害内容生成前预测多轮越狱攻击,并发现早期几何信号具有鲁棒性。

详情
AI中文摘要

对大型语言模型(LLM)的多轮越狱攻击揭示了当前防护措施的不匹配:它们作用于单个轮次,而攻击则作为跨对话的轨迹展开。我们提出从内容转向动态,将对话建模为表示空间中的路径,并询问对抗意图是否在其几何形状中早期编码。我们引入了PsychoPass,一个从嵌入空间中的对话轨迹提取几何特征以在有害内容生成前预测潜在攻击的框架。这些特征在朴素分类器中实现了近乎完美的性能,这很大程度上可以通过包含轮次数作为特征来解释。去除这一混淆因素后,仍保留了一个较小但一致的几何信号,其分类性能不显著依赖于编码器选择。关键的是,该信号在对话早期出现:仅从短前缀开始,攻击结果就高于随机水平,比基线防护更可靠。一项支持性理论分析通过长度与形状的分解、基于前缀长度的检测界限以及编码器不变性解释了这些发现。综合来看,这些结果表明对抗性对话留下了早期、表示鲁棒的几何指纹,适用于在线监控。

英文摘要

Multi-turn jailbreak attacks on large language models (LLMs) reveal a mismatch in current guardrails: they operate on individual turns, while attacks unfold as trajectories across conversations. We propose a shift from content to dynamics, modeling conversations as paths in representation space and asking whether adversarial intent is encoded early in their geometry. We introduce PsychoPass, a framework that extracts geometric features from conversation trajectories in embedding space to predict a potential attack before harmful content is produced. These features achieve near-perfect performance in naïve classifiers, which is largely explained by the inclusion of number of turns as a feature. After removing this confound, a smaller but consistent geometric signal remains, with classification performance that does not depend meaningfully on encoder choice. Crucially, this signal appears early in the conversation: attack outcomes remain above chance from short prefixes alone, more reliably than baseline guardrails. A supporting theoretical analysis explains these findings via a decomposition of length and shape, a detection bound based on prefix length, and encoder invariance. Together, these results show that adversarial conversations leave an early, representation-robust geometric fingerprint suitable for online monitoring.

2606.03128 2026-06-03 cs.CR cs.AI cs.CL cs.LG

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

解耦式智能合约审计:通过蒸馏与聚合的轻量级LLM框架

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

发表机构 * University of Indonesia(印度尼西亚大学)

AI总结 提出一种基于轻量级开源LLM(0.6B-4B参数)的解耦式智能合约审计框架,通过rsLoRA、知识蒸馏和链式验证聚合策略,在漏洞检测中达到98.25%准确率,优于7B-34B参数模型。

Comments 12 pages, 4 figures, 5 tables. Accepted to IEEE ICWS 2026

详情
AI中文摘要

智能合约面临关键安全挑战,需要在去中心化网络服务中进行彻底审计。虽然大型语言模型(LLMs)在自动漏洞检测中展现出潜力,但现有方法缺乏严重性评估和可操作的修复建议,且计算开销过大。在本研究中,我们引入了一个高效的端到端智能合约安全审计框架,利用轻量级、高度优化的开源LLMs(0.6B-4B参数)。我们的框架将综合审计任务解耦为四个相互关联的组件:漏洞检测、解释、严重性分类和修复建议。为了在无需庞大参数量的情况下保持高准确性,我们实现了秩稳定低秩适配器(rsLoRA)、知识蒸馏以及自定义链式验证(CoVe)聚合策略,系统性地筛选并整合模型生成的多个草稿响应,形成高准确度的审计报告。实验结果表明,我们的轻量级流水线持续优于最先进的开源代码密集LLMs(7B至34B参数),在漏洞检测中达到98.25%的准确率,在生成解释任务中达到0.4375的对齐分数。此外,我们广泛的消融研究实证验证了我们的解耦审计过程相对于统一提示的优越性,并揭示了一种新颖的严重性中心性偏差,为未来LLM辅助审计研究建立了关键基准。

英文摘要

Smart contracts face critical security challenges that require thorough auditing in decentralized web services. While Large Language Models (LLMs) have shown promise in automated vulnerability detection, existing approaches lack severity evaluations with actionable remediation and demand unnecessarily massive computational overhead. In this study, we introduce an efficient end-to-end smart contract security audit framework utilizing lightweight, highly optimized open-source LLMs (0.6B-4B parameters). Our framework decouples comprehensive audit tasks into four interconnected components: vulnerability detection, explanation, severity classification, and remediation recommendation. To maintain high accuracy without massive parameters, we implement Rank-Stabilized Low-Rank Adapters (rsLoRA), knowledge distillation, and a custom Chain-of-Verification (CoVe) aggregation strategy to systematically screen and consolidate multiple draft responses from the model into a highly accurate audit report. Experimental results demonstrate that our lightweight pipeline consistently outperforms state-of-the-art open-source coder dense LLMs (7B to 34B parameters), achieving 98.25% accuracy in vulnerability detection and an alignment score of 0.4375 in generative explanation tasks. Furthermore, our extensive ablation studies empirically validate the superiority of our decoupled audit processes over unified prompting and uncover a novel severity centrality bias, establishing a critical benchmark for future research in LLM-assisted auditing.

2606.03063 2026-06-03 cs.LO cs.CL

ZX-Calculus:Trace-Indexed Dependent Types and Epistemic Semantics

ZX-演算:迹索引依赖类型与认知语义

Peng Chen

发表机构 * School of Information Science, Beijing Language and Culture University(北京语言文化大学信息科学学院)

AI总结 提出ZX-演算,通过迹索引类型、预层非单调语义和构造性AGM信念修正,保守扩展Martin-Löf依赖类型论,并给出Coq机械化证明。

详情
AI中文摘要

我们提出ZX-演算(知识演化演算),它是Martin-Löf依赖类型论(MLTT)的保守扩展,集成了迹索引类型、预层非单调语义和构造性AGM信念修正。本文附带Coq机械化证明(34个完整证明;两个核心结果零未完成)。(I)迹类型。FinTrace(s0,sn)是一个带类型的执行迹的归纳族。FinTrace和Star(Step)作为路径类型同构,但判断上不相等;TraceElim显式暴露事件标签e:Event,为事件驱动归纳提供了更符合人体工程学的接口。我们证明了迹可达性对应、确定性重放以及通过可归约候选(带传输引理,RC-elim推迟;所有其他核心结果经Coq验证)的规范性框架。(II)层语义。迹索引命题是自由迹偏序范畴Tf上的逆变层。分离定理(显式反模型)区分了证明论单调性和语义非单调性。项模型是初始CwF(句法泛性质,非经典完备性)。(III)AGM信念修正。我们给出了一个显式的构造性部分交收缩算法,经(C1)-(C4)验证。所有八条AGM公设(R1)-(R8)都是定理。R7和R8的证明使用了析取加固引理,并给出了自包含的构造性推导。(IV)集成。B^AGM在顺序修正中不满足层复合律BP-comp(显式反模型,Coq验证)。我们引入单步修正系统(SSRS),证明B^AGM是有效的SSRS(Coq验证),并表明这足以处理迹态射、收缩刻画和修正见证。BP-comp失败揭示了路径依赖信念修正与函子一致性之间的基本张力,此前未被识别。

英文摘要

We propose ZX-Calculus (Knowledge Evolution Calculus), a conservative extension of Martin-Lof Dependent Type Theory (MLTT) integrating trace-indexed types, presheaf non-monotone semantics, and constructive AGM belief revision. A Coq mechanisation accompanies the paper (34 complete proofs; zero admits for the two central results). (I) Trace types. FinTrace(s0,sn) is an inductive family of typed execution traces. FinTrace and Star(Step) are isomorphic as path types but not judgementally equal; TraceElim exposes the event label e:Event explicitly, giving a more ergonomic interface for event-driven induction. We prove the Trace-Reachability Correspondence, Deterministic Replay, and a canonicity framework via reducibility candidates with a Transport Lemma (RC-elim deferred; all other Core results are Coq-verified). (II) Sheaf semantics. Trace-indexed propositions are contravariant sheaves over the free trace partial-order category Tf. A Separation Theorem (explicit countermodel) distinguishes proof-theoretic monotonicity from semantic non-monotonicity. The term model is an initial CwF (syntactic universal property, not classical completeness). (III) AGM belief revision. We give an explicit constructive partial meet contraction algorithm verified against (C1)-(C4). All eight AGM postulates (R1)-(R8) are theorems. Proofs of R7 and R8 use the Disjunctive Entrenchment Lemma, given a self-contained constructive derivation. (IV) Integration. B^AGM fails the sheaf composition law BP-comp for sequential revision (explicit countermodel, Coq-verified). We introduce Single-Step Revision Systems (SSRS), prove B^AGM is a valid SSRS (Coq-verified), and show this suffices for trace morphisms, retraction characterisation, and revision witnesses. The BP-comp failure reveals a fundamental tension between path-dependent belief revision and functor consistency, not previously identified.

2606.03061 2026-06-03 cs.DC cs.AI cs.LG cs.NI cs.SY eess.SY

Brief Announcement: Generative Markov Model for Distributed Computing Systems

简要公告:分布式计算系统的生成马尔可夫模型

Alfreds Lapkovskis, Ali Beikmohammadi, Sindri Magnússon, Praveen Kumar Donta

发表机构 * Department of Computer and Systems Sciences, Stockholm University, Sweden(斯德哥尔摩大学计算机与系统科学系)

AI总结 针对分布式计算系统的异构性和复杂性,提出一种基于结构化状态分解的生成马尔可夫模型,实现可处理的模拟、推理和策略学习,并通过协作AI推理案例验证其有效性。

Comments Submitted to 40th International Symposium on Distributed Computing (DISC 2026)

详情
AI中文摘要

新兴的分布式计算范式,如计算连续体,本质上是异构、随机和复杂的。高效且有效地利用连续体中所有可用资源需要一个统一的系统形式化模型。为了解决这一差距,我们提出了一个通用框架,将分布式计算系统建模为生成马尔可夫模型,该模型在结构化系统状态上进行分解。在我们的模型中,状态分解为高维变量,每个变量进一步在其元素上分解,反映了分布式系统固有的稀疏依赖结构。这产生了一个可处理的模型,能够对原本难以处理的系统状态进行模拟、推理和策略学习,从而将分布式计算与马尔可夫链理论和强化学习(RL)联系起来。我们通过一个协作AI推理的案例研究来展示我们的框架,其中专用服务器将资源与服务用户自愿提供的资源相结合。我们的结果表明,集中式调度在规模上成为瓶颈,而将计算分布到用户设备上可减少延迟和服务器资源消耗。这些发现突显了自适应决策在分布式计算系统中的价值,并展示了该框架在建模、模拟和优化方面的实用性。

英文摘要

Emerging distributed computing paradigms, such as the computing continuum, are inherently heterogeneous, stochastic, and complex. Efficiently and effectively utilizing all available resources across the continuum demands a unified formal model of the system. To address this gap, we propose a general framework for modeling distributed computing systems as a generative Markov model, factorized over a structured system state. In our model, the state decomposes into high-dimensional variables, each further factorized over its elements, reflecting the sparse dependency structure inherent to distributed systems. This yields a tractable model enabling simulation, inference, and policy learning over otherwise intractable system states, bridging distributed computing with Markov chain theory and reinforcement learning (RL). We demonstrate our framework through a case study of collaborative AI inference, in which a dedicated server combines resources with those volunteered by service users. Our results show that centralized scheduling becomes a bottleneck at scale, while distributing computation across user devices reduces both latency and server resource consumption. These findings highlight the value of adaptive decision-making in distributed computing systems and demonstrate the framework's utility for modeling, simulation, and optimization.

2606.03034 2026-06-03 cs.MA cs.AI

Capability Advertisement as a Market for Lemons: A Trust Layer for Heterogeneous Agent Networks

能力广告作为柠檬市场:异构智能体网络的信任层

Gaurav Naresh Mittal

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对LLM智能体网络中的能力虚假声称问题,提出基于柠檬市场理论的信任层,通过概率描述、筛选和声誉机制实现可信委托。

详情
AI中文摘要

大型语言模型(LLM)智能体已开始相互委托工作。诸如模型上下文协议(MCP)和智能体间协议(A2A)等协议允许智能体发布其能力并允许其他智能体调用,且此类智能体的公共注册表已经出现。这些协议假设所广告的能力是静态的、真实的事实。然而,真实的智能体并非如此:其能力是概率性的,随输入变化,在底层模型更新时漂移,并且由于智能体本身是语言模型,它可以完全自信地描述自己却可能是错误的。因此,调用者看到的是智能体声称能做什么,而非实际能做什么,且没有原则性的方法区分可靠提供者和流利的冒名顶替者。我们认为这些困难有一个共同原因:柠檬市场。当质量隐藏且声称成本低廉时,好与坏的提供者变得难以区分,诚实的可靠性得不到回报,市场向最差参与者退化。经济学提供了三种补救措施:信号传递、筛选和声誉,而这些在当今的智能体协议中均不存在。我们做出四项贡献:(1)一个故障分类,将自信-错误命名为非对抗性的、相关的拜占庭故障子类,而经典容错模型对此建模不当;(2)一个柠檬市场模型,表明基于信仰的协议仅允许低信任均衡;(3)信任层,一个轻量级、协议无关的窄腰,位于MCP和A2A之上,添加概率能力描述、筛选和声誉,并在维持过度声称的成本超过其收益时允许分离均衡;(4)一个针对委托链的可靠性组合界限,具有端到端放置论证。该设计无需模型重新训练,并在其信任锚缺失或损坏时优雅降级。

英文摘要

Large language model (LLM) agents have begun to delegate work to one another. Protocols such as the Model Context Protocol (MCP) and the Agent2Agent protocol (A2A) let an agent publish what it can do and let others call it, and public registries of such agents are already appearing. These protocols assume an advertised capability is a static, truthful fact. A real agent is none of these things: its competence is probabilistic, varies with input, drifts when the underlying model is updated, and, because the agent is itself a language model, it can describe itself with complete confidence and be wrong. A caller therefore sees what an agent claims to do, not what it can do, with no principled way to tell a reliable provider from a fluent impostor. We argue these difficulties share one cause: the market for lemons. When quality is hidden and claims are cheap, good and bad providers become indistinguishable, honest reliability goes unrewarded, and the market decays toward its worst participants. Economics offers three remedies, signaling, screening, and reputation, and none are present in today's agent protocols. We make four contributions: (1) a failure taxonomy that names confident-wrong as a non-adversarial, correlated subclass of Byzantine faults that classical fault-tolerance mismodels; (2) a market-for-lemons model showing that faith-based protocols admit only a low-trust equilibrium; (3) the Trust Layer, a thin, protocol-agnostic narrow waist above MCP and A2A that adds probabilistic capability descriptors, screening, and reputation, and admits a separating equilibrium when the cost of sustaining an overclaim exceeds the gain from it; and (4) a reliability-composition bound for delegation chains with an end-to-end placement argument. The design needs no model retraining and degrades gracefully when its trust anchors are absent or corrupt.

2606.03026 2026-06-03 cs.NE cs.AI cs.LG

Spike-Aware C++ INT8 Inference for Sparse Spiking Language Models on Commodity CPUs

面向稀疏脉冲语言模型在商用CPU上的脉冲感知C++ INT8推理

Ting Liu

发表机构 * SymbolicLight Research(SymbolicLight研究院)

AI总结 本文提出一种脉冲感知的C++推理运行时,利用稀疏二进制脉冲状态作为执行原语,结合混合布局、AVX2/FMA内核和INT8量化,在商用CPU上实现脉冲语言模型的高效解码,吞吐量优于同等规模稠密模型但质量略逊。

Comments 11 pages, 7 tables

详情
AI中文摘要

脉冲语言模型展现出激活稀疏性,而稠密Transformer运行时无法直接利用。本文从系统角度研究这一特性。基于SymbolicLight V1脉冲门控语言模型家族,我们实现了一个C++ CPU推理运行时,将稀疏二进制脉冲状态视为执行原语,而非仅应用事后权重压缩。该运行时结合了清单驱动的权重加载器、混合行/列内存布局、AVX2/FMA内核、每通道对称INT8量化以及脉冲条件稀疏路径的整数域累加。在AMD Ryzen 7 5800X上,早期标量FP32基线解码速度为9.5 tokens/s。混合布局AVX2 FP32将其提升至14.7 tokens/s,而AVX2 INT8在相同step-30k导出模型上达到19.9 tokens/s,同时将权重占用从3.49 GB降至1.06 GB。对于可用的186k步874M参数INT8导出模型,C++运行时在单线程CPU基准测试中解码速度为22.63 tokens/s,相比之下,TinyLlama-1.1B Q8_0为16.31 tokens/s,Falcon3-1B Q8_0为11.26 tokens/s,Qwen2.5-1.5B Q8_0为9.70 tokens/s。线程扩展在四个CPU线程时达到47.90 tokens/s,512 token预填充从单线程的29.86 tokens/s提升至八线程的94.68 tokens/s。吞吐量提升伴随着质量代价:SNN报告WikiText-2困惑度为24.80,差于同一基准中的稠密基线。我们将结果定位为稀疏语言运行时的推理系统研究,长期动机在于可能受益于传感器和执行器附近本地低核推理的具身和边缘智能体。脉冲感知执行可以改善稀疏脉冲语言模型的CPU吞吐量和内存行为,而模型质量、受控稠密训练基线、具身任务评估和测量CPU能耗仍是开放问题。

英文摘要

Spiking language models expose activation sparsity that dense Transformer runtimes do not directly exploit. This paper studies that property from a systems perspective. Building on the SymbolicLight V1 spike-gated language model family, we implement a C++ CPU inference runtime that treats sparse binary spike states as an execution primitive rather than only applying post-hoc weight compression. The runtime combines a manifest-driven weight loader, mixed row/column memory layout, AVX2/FMA kernels, per-channel symmetric INT8 quantization, and integer-domain accumulation for spike-conditioned sparse paths. On an AMD Ryzen 7 5800X, an early scalar FP32 baseline decodes at 9.5 tokens/s. Mixed-layout AVX2 FP32 raises this to 14.7 tokens/s, and AVX2 INT8 reaches 19.9 tokens/s on the same step-30k export while reducing the weight footprint from 3.49 GB to 1.06 GB. For the available 186k-step 874M-parameter INT8 export, the C++ runtime decodes at 22.63 tokens/s in a single-thread CPU benchmark, compared with 16.31 tokens/s for TinyLlama-1.1B Q8_0, 11.26 tokens/s for Falcon3-1B Q8_0, and 9.70 tokens/s for Qwen2.5-1.5B Q8_0 under llama.cpp. Thread scaling reaches 47.90 tokens/s at four CPU threads, and 512-token prefill improves from 29.86 to 94.68 tokens/s from one to eight threads. The throughput result comes with a quality cost: the SNN reports WikiText-2 perplexity 24.80, worse than the dense baselines in the same benchmark. We frame the result as an inference-systems study for sparse language runtimes, with longer-term motivation in embodied and edge agents that may benefit from local, low-core inference near sensors and actuators. Spike-aware execution can improve CPU throughput and memory behavior for sparse spiking language models, while model quality, controlled dense training baselines, embodied-task evaluation, and measured CPU energy remain open problems.

2606.03019 2026-06-03 cs.CY cs.AI

Reproducibility is the New Copyleft: Defining AGI-oriented Reproducible Builds

可重现性是新的Copyleft:定义面向AGI的可重现构建

Masayuki Hatta

发表机构 * Surugadai University(上贺茂大学)

AI总结 本文提出面向通用人工智能(AGI)的可重现构建作为Copyleft的功能等价物,通过定义七项要求来确保模型从声明输入到输出的比特精确可重现性,并论证协议而非平台是更优的治理框架。

Comments Accepted at AGI-26. To appear in the proceedings (Springer LNCS)

详情
AI中文摘要

Copyleft,如GNU通用公共许可证中所实施的,是一种利用版权保证用户自由的法律技巧,通过将源代码的可用性与每次分发行为绑定。其规范力量依赖于一个隐含的技术前提:源代码和目标代码之间存在定义明确、可人工审计且可重现的关系。大型语言模型以及未来的通用人工智能(AGI)系统系统地违反了这一前提。重建模型所需的工件——代码、数据、权重、超参数、工具链和硬件配置——各自受到独立的法律、技术和经济约束,当前没有任何开源框架能完全解决这些问题。足够强大的AI系统还可以将许可下的源代码重写为功能等效的衍生作品,从而剥离原始义务,这是一种Copyleft无法有效防御的洗白形式。本文认为,对于AGI,Copyleft的功能等价物必须基于可重现构建,而非代码的共享相同条款:可重现构建是一种保证从声明输入到输出比特精确可重构性的实践。我们回顾了Copyleft的逻辑,批判性地审视了Maffulli的“第二次解放”论点(即AI实现了Stallman的梦想),并表明除非AGI系统本身是可重现的,否则该论点不成立。借鉴开源AI定义(OSAID)、模型开放框架(MOF)、OpenMDW和确定性推理研究,我们定义了面向AGI的可重现构建的七项要求。我们进一步论证,模型上下文协议(MCP)和类似的AI到AI耦合机制构成了一个新的动态链接层,Copyleft式许可对此并不适用,而Masnick的“协议而非平台”框架提供了更有前景的治理模板。

英文摘要

Copyleft, as implemented in licenses such as the GNU General Public License, was a legal hack that used copyright to guarantee user freedom by tying the availability of source code to every act of distribution. Its normative force rested on an implicit technical premise: that source code and object code stand in a well-defined, humanly auditable, and reproducible relationship. Large language models and, prospectively, Artificial General Intelligence (AGI) systems systematically violate this premise. The artifacts jointly required to reconstruct a model -- code, data, weights, hyperparameters, toolchain, and hardware configuration -- are each subject to independent legal, technical, and economic constraints that no current open-source framework fully resolves. Sufficiently capable AI systems can also rewrite licensed source into functionally equivalent derivatives stripped of their original obligations, a form of laundering against which copyleft has no effective defense. This paper argues that a functional analogue of copyleft for AGI must be grounded not in share-alike clauses over code, but in reproducible builds: a practice guaranteeing bit-exact reconstructability from declared inputs. We review the logic of copyleft, critically examine Maffulli's Second Liberation thesis according to which AI fulfills Stallman's dream, and show that the argument collapses unless AGI systems are themselves reproducible. Drawing on the Open Source AI Definition (OSAID), the Model Openness Framework (MOF), OpenMDW, and deterministic-inference research, we define seven requirements for AGI-oriented reproducible builds. We further argue that the Model Context Protocol (MCP) and analogous AI-to-AI coupling mechanisms constitute a new dynamic linking layer for which copyleft-style licensing is ill-suited, and that Masnick's "protocols, not platforms" framework offers a more promising governance template.