arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2060
2606.17235 2026-06-17 cond-mat.mtrl-sci cs.AI 新提交

Physics-Informed Attention Mechanism and Generalization Capability of Deep Learning-Based Grain Growth Evolution Prediction

物理信息注意力机制与基于深度学习的晶粒生长演化预测的泛化能力

Pungponhavoan Tep, Marc Bernacki

发表机构 * Mines Paris, PSL University Centre for Material Forming (CEMEF), UMR CNRS 06904(巴黎 Mines 学院,PSL 大学材料成型中心(CEMEF),CNRS UMR 06904)

AI总结 本研究评估了深度学习模型在晶粒生长预测中面对分布外数据的泛化能力,并提出边界掩码注意力机制,显著提升了双峰晶粒尺寸分布等场景的预测精度。

详情
AI中文摘要

用于晶粒生长预测的机器学习模型通常基于理想化的合成数据进行训练,然而实际应用需要泛化到训练分布之外的条件。本研究评估了我们先前研究中训练模型在三个测试案例上的分布外泛化能力,包括实验微观结构、具有双峰晶粒尺寸分布的微观结构以及异常晶粒生长。为了进一步探究物理信息架构设计是否能在这些不同条件下提升鲁棒性,我们专门针对晶粒生长提出了一种边界掩码注意力机制,将注意力限制在晶界像素上。基线模型和所提出的物理信息注意力模型均在分布外数据上未经重新训练或微调进行了评估。两个模型均成功泛化到所有三个测试案例,但边界掩码注意力机制提供了显著改进,最显著的提升出现在具有双峰晶粒尺寸分布的微观结构上,其中结构相似性指数从0.6221提高到0.7609,平均晶粒尺寸误差从8.75%降低到3.57%。注意力热图分析表明,边界掩码注意力模型学会了以与曲率驱动晶粒生长物理一致的方式将注意力集中在大晶界上,这种能力源于训练过程,而无需显式编码到架构中。这些结果表明,在合成数据上训练的模型可以无需重新训练而泛化到多种分布外条件,并且当边界形态与训练域匹配时,物理信息注意力可以提高精度。

英文摘要

Machine Learning (ML) models for grain growth prediction are typically trained on idealized synthetic data, yet practical applications require generalization to conditions outside the training distribution. This study evaluated the Out-Of-Distribution (OOD) generalization capability of the trained model from our previous study across three test cases, including experimental microstructures, microstructures characterized by a bimodal grain size distribution, and abnormal grain growth. To further probe whether physics-informed architectural design could improve robustness under these different conditions, a boundary-masked attention mechanism was proposed specifically for grain growth, constraining attention to grain boundary pixels. Both the baseline and the proposed physics-informed attention model were evaluated without retraining or fine-tuning on the OOD data. Both models successfully generalized to all three test cases, yet the boundary-masked attention mechanism provided substantial improvements, with the most notable gains for microstructures characterized by a bimodal grain size distribution, where Structural Similarity Index Measure (SSIM) improved from \num{0.6221} to \num{0.7609} and mean grain size ($\overline{R}$) error decreased from \SI{8.75}{\percent} to \SI{3.57}{\percent}. The attention heatmap analysis revealed that the boundary-masked attention model learned to concentrate attention on large grain boundaries in a manner consistent with curvature-driven grain growth physics, emerging from training without being explicitly encoded into the architecture. These results indicate that models trained on synthetic data can generalize to diverse OOD conditions without retraining, and that physics-informed attention may improve accuracy when the boundary morphology matches the training domain.

2606.18108 2026-06-17 astro-ph.IM cs.AI 新提交

Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

使用大语言模型查询天文数据库:ALeRCE文本到SQL系统

P. A. Estevez, J. Espejo-Moreira, S. Sanfeliu-Alvarez, F. Forster, A. M. Munoz Arancibia, G. Cabrera-Vives, F. E. Bauer, A. Bayo, M. Catelan, R. Dastidar, L. Hernandez-Garcia, J. A. Intriago, G. Pignata

发表机构 * Department of Electrical Engineering, University of Chile, Av. Tupper 2007, Santiago, Chile Millennium Institute of Astrophysics (MAS), Nuncio Monseñor Sótero Sanz 100, Providencia, Santiago, Chile Data Artificial Intelligence Initiative (ID\&IA), Universidad de Chile Center for Mathematical Modeling, Universidad de Chile, Beauchef 851, North building, 7th floor, Santiago 8320000, Chile Departamento de Astronom\'ia, Universidad de Chile, Casilla 36D, Santiago, Chile Department of Computer Science, Universidad de Concepción, Edmundo Larenas 219, Concepción, Chile Center for Data Artificial Intelligence, Universidad de Concepción, Edmundo Larenas 310, Concepción, Chile Heidelberg Institute for Theoretical Studies, Heidelberg, Baden-Württemberg, Germany Instituto de Alta Investigación, Universidad de Tarapacá, Casilla 7D, Arica, 1010000, Chile European Southern Observatory, Karl-Schwarzschild-Strasse 2, 85748 Garching bei München, Germany Instituto de Astrofísica, Facultad de Física, Pontificia Universidad Católica de Chile, Casilla 306, Santiago 22, Chile Centro de Astroingeniería, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7820436 Macul, Santiago, Chile Instituto de Estudios Astrof\'isicos, Facultad de Ingenier\'ia y Ciencias, Universidad Diego Portales, Av. Ej\'ercito Libertador 441, Santiago, Chile Centro Interdisciplinario de Data Science, Facultad de Ingenier\'ia y Ciencias, Universidad Diego Portales, Av. Ej\'ercito Libertador 441, Santiago, Chile

AI总结 提出基于大语言模型的文本到SQL系统,通过上下文学习和逐步生成框架(模式链接、查询分类、提示分解、自纠正)实现自然语言查询天文数据库,在ALeRCE数据集上评估13个模型,Claude Opus 4.6等表现最佳。

详情
AI中文摘要

我们开发了一个基于大语言模型(LLMs)的文本到SQL(结构化查询语言)系统,采用上下文学习方法,并将其应用于ALeRCE(自动学习快速事件分类)天文数据库。ALeRCE是Zwicky瞬变设施和Vera C. Rubin天文台的社区经纪人。该系统使用户能够以自然语言(NL)查询数据库,并生成可执行的SQL查询。为了开发和评估该系统,我们构建了一个包含110个NL/SQL对的数据集。我们提出了一个逐步生成框架,包含四个模块:模式链接、查询分类、提示分解和自纠正。使用上下文学习和提示工程技术评估了13个LLM的性能。文本到SQL的性能通过行标识符(例如对象标识符)和列标识符(即列名)的完美匹配(PM)率来评估。所提出的逐步框架始终优于直接推理基线,而自纠正模块持续减少执行错误。对于Claude Opus 4.6,简单查询的行(列)标识符PM性能较高,达到0.97(0.94),随着查询复杂度增加,中等查询降至0.44(0.72),困难查询降至0.59(0.49)。在评估的13个模型中,文本到SQL任务表现最佳的LLM是Claude Opus 4.6、Gemini 2.5 Pro、Gemini 3 Flash和GPT-5.2-Codex。

英文摘要

We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a dataset of 110 NL/SQL pairs. We propose a step-by-step generation framework comprising four modules: schema linking, query classification, prompt decomposition, and self-correction. The performance of thirteen LLMs is evaluated using in-context learning and prompt engineering techniques. Text-to-SQL performance is assessed using the perfect-match (PM) rate for row identifiers (e.g., object identifiers) and column identifiers (i.e., column names). The proposed step-by-step framework consistently outperforms a direct-inference baseline, while the self-correction module consistently reduces execution errors. For Claude Opus 4.6, PM performance on row (column) identifiers is high for simple queries, reaching 0.97 (0.94), and decreases with query complexity to 0.44 (0.72) for medium queries and 0.59 (0.49) for hard queries. Among the thirteen evaluated models, the best-performing LLMs for the text-to-SQL task are Claude Opus 4.6, Gemini 2.5 Pro, Gemini 3 Flash, and GPT-5.2-Codex.

2606.16072 2026-06-17 cs.CR cs.AI 新提交

MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens

MASCOT-Android: 一个用于安卓恶意软件源代码样本的精选数据集与自动收集管道

Bojing Li, Duo Zhong, Prajna Bhandary, Raguvir S, Charles Maxa, Robert J Joyce, Charles Nicholas

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校)

AI总结 提出MASCOT-Android数据集和自动收集框架,利用仓库级文档(README)训练LinearSVC分类器,以96.28%准确率和1.06%假阳性率从GitHub发现恶意软件源代码。

详情
AI中文摘要

与二进制文件和反编译代码相比,恶意软件源代码更直接地反映了攻击者的原始意图。然而,源代码的稀缺性和人工审查的高成本使得此类数据集难以构建和维护。我们提出了MASCOT-Android,一个精选的安卓恶意软件源代码数据集,以及一个用于在GitHub上可扩展地发现恶意软件源代码的自动收集框架。我们工作的一个关键发现是,仅仓库级文档就为恶意软件源代码收集提供了强信号。我们的模型从8,772个恶意软件和25,747个良性README文档中提取字符级TF-IDF特征,并训练一个LinearSVC分类器来区分恶意软件仓库。这个仅使用README的模型在本地评估中达到了96.28%的准确率和1.06%的假阳性率。此外,模型输出置信度分数,允许用户调整决策阈值以平衡假阳性率和覆盖率,这在现实世界的恶意软件源代码收集中是实用的。

英文摘要

Compared with binaries and decompiled code, malware source code more directly reflects the attackers' original intent. However, the scarcity of source code and the high cost of manual review make such datasets difficult to build and maintain. We propose MASCOT-Android, a curated dataset of Android malware source code and an automated collection framework for scalable malware source code discovery on GitHub. A key finding of our work is that repository-level documentation alone provides a strong signal for malware source code collection. Our model extracts character-level TF-IDF features from 8,772 malware and 25,747 benign README documents and trains a LinearSVC classifier to distinguish malware repositories. This README-only model achieves an accuracy of 96.28\% and an FPR of 1.06\% in local evaluation. In addition, the model outputs confidence scores, allowing users to adjust the decision threshold to balance FPR and coverage, which is practical in real-world malware source code collection.

2606.14954 2026-06-17 math.FA cs.LG math.OC stat.ML 新提交

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

数据科学中的表示代价:基础与深度神经网络的拟巴拿赫空间

Greg Ongie, Rahul Parhi

发表机构 * Marquette University(马凯特大学) University of California, San Diego(加州大学圣地亚哥分校)

AI总结 本文建立了一个统一框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价,揭示了深度神经网络诱导的本征空间是拟巴拿赫空间,并证明了表示定理等自然结果。

详情
AI中文摘要

我们开发了一个通用框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价。从这个抽象视角,我们定义了任意参数化模型的表示代价,并揭示了它们诱导的(本征)函数空间。这统一了最近数据拟合方法的函数空间观点。我们还证明了许多自然结果在这个抽象设置中成立,包括参数方法在其本征空间上的表示定理。该框架还严格地将参数化方法与其在充分过参数化下的等价非参数描述联系起来。经典方法及其本征空间,如核方法/再生核希尔伯特空间、小波/贝索夫空间和浅层神经网络/变分空间,都是我们抽象框架的特例。将表示代价研究“公理化”的一个副产品是,我们立即获得了深度神经网络的新结果:对于深度为$L$的前馈ReLU网络,其诱导的本征空间是$p$范数可拟的拟巴拿赫空间,其中$p = 2/L$。这揭示了深度神经网络的归纳偏置(由表示代价给出)在深度$L > 2$时无法被范数捕捉。

英文摘要

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

2606.14814 2026-06-17 cond-mat.mtrl-sci cs.AI physics.app-ph physics.chem-ph physics.comp-ph 新提交

A Multi-Level Architecture for Reusable Materials Ontologies -- The OntoCrafter Ceramics Ontology (OCO) as Reference Implementation

可复用材料本体的多层次架构——以OntoCrafter陶瓷本体(OCO)作为参考实现

Thomas Pannek, Wolfgang Grond

发表机构 * Numberland

AI总结 针对材料科学本体在水平、垂直和机制三个维度上的碎片化问题,提出一种多层次模块化架构,通过抽象层次和消费受众两个独立分类轴,并在材料特定层内采用七层机制解释骨架,以OntoCrafter陶瓷本体(OCO v0.94)作为参考实现。

Comments 3 figures, 55 pages

详情
AI中文摘要

材料科学与工程本体领域同时在多个轴向上呈现碎片化。水平方向:一项近期调查识别出94个本体,其中超过40个在结构上不兼容;每个新的应用领域——陶瓷、聚合物、电池、智能材料——通常从头开始重新设计本体。垂直方向:欧盟法规(CSRD、CSDDD、PPWR、CBAM、R2R、AI Act、ESPR)迫使材料、制造、供应链和生命周期数据集成到数字产品护照中,使得仅解决水平碎片化的本体对于任何当代消费者来说都是不完整的。机制方面:一个记录BNT-BT具有$d_{33} \approx 580$ pC/N的词汇表存储了一个事实,但如果没有系统的解释骨架,就无法揭示其原因——Bi-6s$^2$孤对电子立体活性、异常Born有效电荷、软模、缺陷化学。我们提出一种多层次模块化架构,具有两个独立的分类轴——抽象层次(L0桥梁、L1材料无关的实验室笔记本、L2材料类别特定、L3分类推理)和消费受众(材料与合规)——其中材料特定层次内部由适用于任何结晶离子氧化物的七层机制解释骨架(对称性、能量/DFT、热力学/CALPHAD、动力学、微观结构、缺陷化学、键合)组织。层次和受众的模块化解决了水平碎片化,合规受众吸收了垂直法规压力,而第2层的七层组织提供了机制解释深度。我们将该架构实例化为OntoCrafter陶瓷本体(OCO v0.94):跨44个模块的5,196个类;167,348个OWL公理(其中40,454个逻辑公理);1,674个属性;829个跨本体桥梁映射;1,172个SHACL形状;163个已发布的胜任力问题。

英文摘要

The Materials Science and Engineering ontology landscape is fragmented along multiple axes simultaneously. Horizontally: a recent survey identified 94 ontologies of which over 40 are structurally incompatible; each new application domain -- ceramics, polymers, batteries, smart materials -- typically restarts ontology design from scratch. Vertically: EU regulation (CSRD, CSDDD, PPWR, CBAM, R2R, AI Act, ESPR) forces material, manufacturing, supply-chain, and lifecycle data into integrated digital product passports, leaving ontologies that only address horizontal fragmentation incomplete for any contemporary consumer. And mechanistically: a vocabulary that records that BNT-BT has $d_{33} \approx 580$ pC/N stores a fact but cannot surface why -- Bi-6s$^2$ lone-pair stereo-activity, anomalous Born effective charges, soft modes, defect chemistry -- without a systematic explanation skeleton. We propose a multi-level modular architecture with two independent classification axes -- level of abstraction (L0 bridges, L1 material-agnostic laboratory-notebook, L2 material-class-specific, L3 categorical reasoning) and consumer audience (material vs. compliance) -- in which the material-specific level is internally organised by a seven-tier mechanistic-explanation skeleton (Symmetry, Energy/DFT, Thermo/CALPHAD, Kinetics, Microstructure, Defect chemistry, Bonding) applicable to any crystalline ionic oxide. The level-and-audience modularity dissolves the horizontal fragmentation, the compliance audience absorbs the vertical regulation pressure, and the seven-tier organisation of Level 2 delivers the mechanistic explanation depth. We instantiate the architecture as the OntoCrafter Ceramics Ontology (OCO v0.94): 5,196 classes across 44 modules; 167,348 OWL axioms (40,454 logical); 1,674 properties; 829 cross-ontology bridge mappings; 1,172 SHACL shapes; 163 published competency questions.

2606.14517 2026-06-17 cs.CR cs.AI 新提交

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

从盾牌到靶心:针对基于LLM的智能体护栏的拒绝服务攻击

Yuguang Zhou, Xunguang Wang, Pingchuan Ma, Zhantong Xue, Zhaoyu Wang, Shuai Wang

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 本文揭示基于LLM的护栏易受拒绝服务攻击,通过束搜索优化框架和机制感知结构变异生成恶意负载,导致令牌放大13-63倍、延迟放大148倍,威胁系统可用性。

详情
AI中文摘要

基于LLM的护栏已成为自主智能体中防御提示注入和越狱攻击的高效手段。然而,我们发现正是这种实现保护的推理和任务遵循能力引入了一种新的漏洞:攻击者可以注入精心构造的数据,使护栏陷入扩展推理循环,从而实施系统性的拒绝服务(DoS)攻击。为系统性地揭示这一威胁,我们设计了一个束搜索优化框架,利用策略库引导的LLM提议器,生成自然语言负载以最大化护栏推理长度。基于对护栏模式遵循性质的观察,我们还提供了另一种由机制感知结构变异驱动的攻击框架,计算负载更小。攻击效能通过两部分系统评估。首先,在独立评估中,攻击可泛化到多种护栏架构、安全模板和智能体基准。在单个开源替代模型上优化的负载成功迁移到八个领先模型骨干(如Claude、GPT、Gemini、DeepSeek和Qwen),实现13-63倍的令牌放大。其次,在端到端的真实世界智能体部署(网页、桌面、代码和多智能体系统)中,攻击揭示高达148倍的延迟放大。我们表明,单个中毒文档即可饱和共享护栏基础设施,有效饿死同位置智能体并瘫痪整个系统。通过揭示这一可用性缺陷,我们的工作强调了开发成本受限、推理鲁棒的护栏的紧迫性。

英文摘要

LLM-based guardrails have emerged as a highly effective defense against prompt injection and jailbreak attacks in autonomous agents. However, we reveal that the very reasoning and task-following capabilities enabling this protection introduce a novel vulnerability: attackers can inject crafted data to trap the guardrail in extended reasoning loops, effectuating a systematic denial-of-service (DoS) attack. To systematically expose this threat, we design a beam-search optimization framework that crafts natural-language payloads to maximize guardrail reasoning length, utilizing an LLM proposer guided by a strategy bank. Based on the observation of guardrail's schema-following nature, we also provide another attack framework driven by mechanism-aware structural mutations with less computational load. The attack efficacy is systematically evaluated in two parts. First, in standalone evaluations, the attack generalizes across diverse guardrail architectures, safety templates, and agent benchmarks. Payloads optimized on a single open-source surrogate successfully transfer to eight leading model backbones (e.g., Claude, GPT, Gemini, DeepSeek, and Qwen), achieving a 13--63$\times$ token amplification. Second, in end-to-end real-world agent deployments (web, desktop, code, and multi-agent systems), the attack reveals up to a 148$\times$ latency amplification. We show that a single poisoned document can saturate shared guardrail infrastructures, effectively starving co-located agents and paralyzing the entire system. By uncovering this availability flaw, our work underscores the urgent need to develop cost-bounded, reasoning-robust guardrails.

2606.14295 2026-06-17 cs.CR cs.AI cs.LG 新提交

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

AgentCyberRange:在真实网络靶场中基准测试前沿AI系统

Fengyu Liu, Jiarun Dai, Yihe Fan, Wuyuao Mai, Ziao Li, Bofei Chen, Jie Zhang, Zheng Lou, Bocheng Xiang, Qiyi Zhang, Xudong Pan, Geng Hong, Yuan Zhang, Min Yang

发表机构 * Fudan University(复旦大学)

AI总结 提出首个开源多靶场基础设施AgentCyberRange,集成110个漏洞和156个内部主机,评估前沿AI系统在真实网络攻击中的能力,发现GPT-5.5+Codex在web利用和后利用任务中表现最佳。

详情
AI中文摘要

前沿AI系统在网络安全任务中能力日益增强,包括代码库检查、漏洞检测和利用。然而,评估其攻击能力仍受限于缺乏开放、可复现、多主机的网络靶场。现有公开基准测试捕获了CTF解题、漏洞复现和利用生成等孤立技能,但通常忽略了真实的入侵工作流:发现暴露服务、获得立足点、收集内部信息以及跨主机扩大入侵范围。这一差距使得早期观察新兴风险变得困难,因为前沿AI系统很少在真实攻击条件下进行评估。我们引入了AgentCyberRange,这是首个用于在真实网络靶场中衡量自主网络攻击能力的开源多靶场基础设施。它整合了15个真实Web应用和8个企业级网络靶场中的110个漏洞,以及156个内部主机,并提供了Cage工具链用于执行、编排、结果收集和验证。该基准测试涵盖两个核心阶段:Web利用(代理探索暴露的应用并验证漏洞)和后利用(代理将初始立足点转化为更广泛的内部入侵)。我们在匹配的提示和预算下评估了六个前沿AI系统。GPT-5.5与Codex表现最佳,解决了16.1%的Web利用任务和31.7%的后利用任务;在更具体的提示下,这些比率分别提高到33.0%和46.3%。我们还观察到基准测试之外的发现,包括流行项目中的未知漏洞,以及绕过主机防御的有效载荷变异。这些结果表明,开放的网络靶场评估对于在真实且可复现的条件下观察新兴攻击能力是必要的。

英文摘要

Frontier AI systems are increasingly capable of cybersecurity tasks, including codebase inspection, vulnerability detection, and exploitation. However, evaluating their offensive capabilities remains constrained by limited access to open, reproducible, multi-host cyber ranges. Existing public benchmarks capture isolated skills such as CTF solving, vulnerability reproduction, and exploit generation, but often abstract away realistic intrusion workflows: discovering exposed services, gaining a foothold, collecting internal information, and expanding compromise across hosts. This gap makes it difficult to observe emerging risks early, because frontier AI systems are rarely evaluated under realistic attack conditions. We introduce AgentCyberRange, the first open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges. It combines 110 vulnerabilities across 15 real web applications and 8 enterprise-like cyber ranges with 156 internal hosts, plus Cage, a toolchain for execution, orchestration, result collection, and verification. The benchmark covers two core stages: web exploitation, where agents explore exposed applications and validate vulnerabilities, and post exploitation, where agents turn an initial foothold into broader internal compromise. We evaluate six frontier AI systems under matched prompts and budgets. GPT-5.5 with Codex performs best, solving 16.1% of web exploitation tasks and 31.7% of post-exploitation tasks; with more concrete hints, these rates increase to 33.0% and 46.3%. We also observe out-of-benchmark findings, including unknown vulnerabilities in popular projects, and payload mutation that bypasses host defenses. These results show that open cyber-range evaluation is necessary for observing emerging offensive capabilities under realistic and reproducible conditions.

2606.13919 2026-06-17 eess.IV cs.AI cs.CV 新提交

GMN4AD: Graph Matching Network for Alzheimer's Disease Diagnosis with Test-Time Domain Adaptation using Multi-centered Structure Magnetic Resonance Imaging

GMN4AD:基于图匹配网络的阿尔茨海默病诊断与测试时域适应方法在多中心结构磁共振成像中的应用

Chen Zhao, Huan Huang, Yixin Xie, Jiajing Huang, Weihua Zhou

发表机构 * Department of Computer Science, Kennesaw State University(肯纳邦大学计算机科学系) Department of Information Technology, Kennesaw State University(肯纳邦大学信息技术系) School of Data Science and Analytics, Kennesaw State University(肯纳邦大学数据科学与分析学院) Department of Applied Computing, Michigan Technological University(密歇根技术大学应用计算系)

AI总结 提出GMN4AD,利用图匹配网络建模异质脑图间关系,结合测试时域适应策略,在三个公共数据集上优于现有方法,实现鲁棒的AD诊断。

详情
AI中文摘要

阿尔茨海默病(AD)是一种进行性神经退行性疾病,影响数百万老年人,预计未来几年患病率将显著上升。早期诊断,特别是在轻度认知障碍(MCI)阶段,对于及时干预至关重要。结构磁共振成像(sMRI)已成为检测AD相关脑变化的关键模态,但传统的基于图的方法通常难以处理模态和站点间异质性,限制了诊断性能。在本文中,我们提出了用于阿尔茨海默病诊断的图匹配网络(GMN4AD),旨在建模来自神经影像数据的异质脑图之间的交互。与将每个脑图独立处理的传统方法不同,GMN4AD利用图匹配来捕获跨图关系,提高诊断精度。此外,我们引入了一种测试时域适应策略,结合对比学习来减轻推理过程中的域偏移。在三个公共AD数据集上的大量实验表明,GMN4AD相比最先进方法实现了优越的性能,为AD诊断提供了鲁棒且可泛化的解决方案。

英文摘要

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects millions of older adults, with prevalence expected to rise significantly in the coming years. Early diagnosis, particularly during the mild cognitive impairment (MCI) stage, is critical for timely intervention. Structural Magnetic Resonance Imaging (sMRI) has emerged as a key modality for detecting AD-related brain changes, but traditional graph-based approaches often struggle with modality and inter-site heterogeneity, limiting diagnostic performance. In this paper, we propose Graph Matching Network for Alzheimer's Disease Diagnosis (GMN4AD), designed to model interactions between heterogeneous brain graphs derived from neuroimaging data. Unlike conventional methods that treat each brain graph independently, GMN4AD leverages graph matching to capture cross-graph relationships, enhancing diagnostic precision. Furthermore, we introduce a test-time domain adaptation strategy that combines contrastive learning to mitigate domain shifts during inference. Extensive experiments on three public AD datasets demonstrate that GMN4AD achieves superior performance compared to state-of-the-art methods, offering a robust and generalizable solution for AD diagnosis.

2606.13827 2026-06-17 math.NA cs.LG cs.NA stat.ML 新提交

Approximating Gaussian Whittle-Matern Fields over Well-Centered Triangulations of Riemannian Manifolds

离散流形上的Whittle-Matérn场逼近

Srinivas Nambirajan

发表机构 * Riemannian Manifolds(黎曼流形) Discrete Exterior Calculus(离散外 calculus) Finite Element Exterior Calculus(有限元外 calculus)

AI总结 提出一种基于离散外微分的GMRF逼近方法,统一处理Whittle-Matérn场族,支持推断参数,兼容点/分段平滑测量,计算独立于插值函数,并给出低秩近似用于压缩感知。

Comments More specific title, updated acknowledgement, minor typos fixed

详情
AI中文摘要

马尔可夫Whittle-Matérn场已通过稀疏精度矩阵的高斯马尔可夫随机场(GMRF)收敛逼近,使用两参数族SPDE的有限元近似:\\( (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \\;\\; \kappa \in \mathbb{R}, \\; \alpha \in \mathbb{N} \\)。利用离散外微积分(DEC)分析的最新进展,我们提出了一种不同但密切相关的收敛GMRF逼近方法,适用于离散化为良好中心单纯复形的完备无边黎曼流形上的Matérn场。该收敛方法:(i) 对\\(\alpha, \kappa\\)不可知,从而允许对整个\\((\alpha, \kappa)\\)族GMRF的精度和协方差矩阵进行通用逼近方案,因此它们可以被推断而非猜测。(ii) 固有地模拟随机场的逐点和分段平滑测量,并对两者同样好地逼近。(iii) 计算上与所用插值函数无关——如果将一种收敛插值替换为同一网格上的另一种合适插值,不会产生额外开销。此外,我们证明,在精确意义上良好连接且体积集中的离散化上,精度矩阵是图拉普拉斯的谱函数。我们为该族Matérn GMRF提供了一个低秩逼近器,并提及一个用例:通过压缩感知减少建模GMRF所需的测量数量。

英文摘要

Markovian Whittle-Matérn fields have been convergently approximated by discrete Gauss Markov Random Fields (GMRFs) with sparse precision matrices using a Finite Element approximation of the two-parameter family, \[ (κ^2 - Δ)^{α/2} u = \mathcal{W}, \;\; κ\in \mathbb{R}, \; α\in \mathbb{N}. \] of SPDEs. Using recent developements in the analysis of Discrete Exterior Calculus (DEC), we present a different, yet closely related, convergent GMRF approximation to these Matérn fields over complete, boundaryless Riemannian manifolds discretized as well-centered simplicial complexes. This convergent method (i) is agnostic to $α, κ$ and thus allows a universal approximation scheme for the precision and covariance matrices of the entire $(α, κ)$-family of GMRFs, so they may be inferred rather than guessed. (ii) inherently models pointwise and piecewise-smoothed measurements of a random field and approximates both equally well (iii) is computationally independent of the interpolants used - it suffers no overhead if one convergent interpolant were replaced with another suitable interpolant over the same mesh. Furthermore, we show that, on discretizations that are well-connected in a precise sense, and volume-concentrated, the precision matrices are spectral functions of a graph-laplacian. We provide a low rank approximator to the family of such Matérn GMRFs and mention a use case: reducing the number of measurements needed to model the GMRF by compressed-sensing.

2606.11766 2026-06-17 eess.AS cs.AI cs.CL cs.SD 新提交

Fast Speech Foundation Model Distillation Using Interleaved Stacking

快速语音基础模型蒸馏使用交错堆叠

Eungbeom Kim, Kyogu Lee

发表机构 * IPAI AIIS Dept. of Intelligence and Information(智能与信息系)

AI总结 提出交错堆叠方法加速语音基础模型蒸馏训练,通过保持层位置一致性解决性能下降问题,在SUPERB上验证有效性。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

将大型语音基础模型(SFM)蒸馏为高效的学生模型已成功应用于低资源环境。尽管蒸馏减少了推理延迟,但它需要额外的学生模型训练。然而,SFM蒸馏的训练效率仍未得到充分探索。在这项工作中,我们探索了SFM蒸馏的训练加速以加快模型部署。我们研究了堆叠的潜力,其中模型深度通过训练逐步增加,直到达到目标模型深度。虽然现有的堆叠方法提高了训练速度,但它们遭受性能下降。为了解决这一限制,我们提出了交错堆叠,一种新颖的堆叠方法,在整个堆叠过程中始终保持层位置。这一特性在SFM中尤为关键,因为每一层编码了不同的层特定知识。我们在SUPERB上验证了所提方法的有效性。

英文摘要

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.

2606.09770 2026-06-17 q-bio.NC cs.LG 新提交

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

发现功能选择性脑区:一种深度地形多模态模型

Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf

发表机构 * University of California, Berkeley(加州大学伯克利分校) Max Planck Institute for Human Cognitive and Brain Sciences(马克斯·普朗克人类认知与脑科学研究所) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出Topo-Omni模型,通过空间平滑微调预训练基础模型,在单一连续虚拟皮层上整合视觉、听觉和语言/认知处理,产生与人类神经影像一致的多模态聚类,并用于发现新脑区。

Comments Preprint. First two author contributed equally

详情
AI中文摘要

皮层中的邻近神经元具有相似的反应特征,从而在感觉和认知系统中产生系统性的空间组织。最近的地形模型再现了这种结构的某些方面,但仍然是单模态的,并且对每一层分别施加空间约束,产生了碎片化的图谱,既不能捕捉皮层处理流的连续性,也不能捕捉跨模态的整合。我们引入了Topo-Omni,一种地形多模态模型,其中视觉、听觉和语言/认知处理共享一个单一的连续虚拟皮层。通过使用空间平滑目标微调预训练的基础模型,该架构在跨模态中发展出与人类神经影像一致的聚类,从感觉系统到认知系统。驱动或抑制一个聚类会选择性偏向或损害感知,这与人类干预研究相似。最后,我们使用我们的模型在虚拟皮层中筛选新的聚类,并发现了新的自然景观和动物网络,并在人类数据中验证了它们。因此,单一的空间原则组织了跨模态和处理阶段的表征,产生了关于皮层组织的可检验假设。

英文摘要

Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the contiguity of cortical processing streams nor their integration across modalities. We introduce Topo-Omni, a topographic multimodal model in which visual, auditory, and language/cognitive processing share a single contiguous in-silico sheet. Built by fine-tuning a pretrained foundation model with a spatial smoothness objective, this architecture develops clusters across modalities that are consistent with human neuroimaging, from sensory to cognitive systems. Driving or suppressing a cluster selectively biases or impairs perception, paralleling human intervention studies. Finally, we use our model to screen for novel clusters in-silico and discover new natural landscape and animal networks which we validate in human data. A single spatial principle thus organizes representations across modalities and processing stages, yielding testable hypotheses about cortical organization.

2606.09049 2026-06-17 stat.ME cs.LG math.ST stat.ML stat.TH 新提交

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

数据增强自助法:通过近似不变性统一置信区间构建

Kevin Han Huang

发表机构 * Department of Statistics, University of Warwick(华威大学统计系)

AI总结 提出数据增强自助法(DAB),利用数据的近似不变性构建置信区间,统一了经典自助法、共形预测等方法的理论,并引入数据增强启发式方法。

Comments Added comparison with arXiv:2604.15229

详情
AI中文摘要

我们提出了数据增强自助法(DAB),这是一个通过数据的近似不变变换来构建置信区间的框架。作为特例,DAB 恢复了依赖于精确群对称性的流行方法,例如共形预测、最大均值差异 U-统计量的 wild bootstrap 以及最近提出的 SymmPI。同时,DAB 也恢复了经典的自助法,该方法利用了随着数据集大小增长,数据索引均匀采样下数据集的近似不变性。对于所有 DAB 方法,我们建立了理论覆盖结果,这些结果根据不变性的强度在有限样本和渐近保证之间插值,且不假设群结构。近似不变性通过 Kolmogorov 距离度量,并且对于满足高斯普适性的统计量,简化为条件均值和方差匹配。这使我们能够将数据增强(DA)——一种基于近似不变性的广泛使用的机器学习启发式方法——纳入已知的统计方法中。我们通过实验测试了将 DA 纳入自助法、wild bootstrap 和共形预测在模拟设置以及图像、语言和科学数据上的性能。

英文摘要

We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

2606.06227 2026-06-17 physics.flu-dyn cs.LG 版本更新

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

减阻还是奖励黑客?赚取其奖励的循环多智能体强化学习

Giorgio Maria Cavallazzi, Miguel Pérez-Cuadrado, Alfredo Pinelli

发表机构 * School of Science and Technology, Department of Engineering, City St. George’s, University of London(伦敦大学科学与技术学院,工程系,圣乔治学院)

AI总结 针对壁湍流减阻控制中强化学习奖励与设计目标偏离的问题,提出可微投影、循环策略和真实壁面功率奖励的修正方案,在诚实核算下实现17%的保守减阻。

详情
AI中文摘要

强化学习智能体最大化其奖励,这可能偏离其设计者预期的结果。在物理控制中,奖励很少弥合这一差距,而壁湍流中的减阻使其具体化。质量守恒投影耦合了智能体的输出,并抹去了策略梯度所需的每个智能体信用;无记忆策略无法解决其作用的缓慢近壁循环;压力梯度奖励通过壁面泵送功率来支付名义上的减阻。两个退化控制器实现了大的减阻,而总耗散增加,因此报告的数字可能掩盖了更耗能的流动。我们将每个缺陷追溯到其原因并加以修复:恢复信用的可微投影、具有加宽感知模板的循环策略以及基于真实壁面功率的奖励。修正后的控制器在封闭能量预算内作用于流动,在诚实核算下实现了保守的17%减阻。

英文摘要

A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

2606.05861 2026-06-17 cs.MM cs.AI 版本更新

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

LLMCodec:适配视频编解码器用于大型语言模型的高效权重压缩

Rui Wang, Yan Zhao, Li Song, Zhengxue Cheng

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 提出LLMCodec方法,利用视频编解码器(如VVC/H.266)结合仿射量化压缩LLM权重,无需微调或校准数据,在2-bit精度下显著降低困惑度并提升下游任务准确率。

Comments The authors need to make further revisions before resubmission

详情
AI中文摘要

大型语言模型(LLMs)的快速发展在自然语言处理领域取得了显著进展。然而,这些模型规模的不断扩大在存储、传输和部署方面带来了巨大挑战。尽管在模型压缩和量化方面付出了巨大努力,但现有方法通常依赖于微调或校准数据,且在不同张量类型上泛化能力有限。本文中,我们认为视频编解码器为LLM压缩提供了一种有前景的解决方案,因为它们与矩阵结构数据具有内在兼容性、可配置的压缩策略,并且有高度优化、现成的实现可用。因此,我们提出了LLMCodec,一种基于视频编解码器的LLM压缩方法,它将仿射量化与最新的VVC/H.266视频编解码器相结合。除了VVC,我们还比较了一系列视频编解码器和编码配置文件,以评估它们对压缩性能的影响。在不同模型上的实验证明了LLMCodec的鲁棒性和通用性。值得注意的是,在LLaMA-3-8B模型上,以2-bit精度,与现有方法相比,LLMCodec将困惑度降低了1.5倍以上,并将下游任务准确率提高了21%。

英文摘要

The rapid development of large language models(LLMs) has led to remarkable advances in natural language processing. However, the increasing scale of these models introduces substantial challenges in terms of storage, transmission, and deployment. Though great efforts have been devoted to model compression and quantization, existing methods often rely on fine-tuning or calibration data, which exhibit limited generalization across different tensor types. In this paper, we argue that video codecs offer a promising solution for LLM compression, due to their inherent compatibility with matrix structured data, configurable compression strategies, and the availability of highly optimized, off-the-shelf implementations. Therefore, we present LLMCodec, a video codec-based LLM compression method that integrates affine quantization with the recent VVC/H.266 video codec. Beyond VVC, we further compare a range of video codecs and encoding profiles to evaluate their impact on compression performance. Experiments on different models demonstrate the robustness and generality of LLMCodec. Notably, on LLaMA-3-8B at 2-bit precision, LLMCodec reduces perplexity by over 1.5x and improves downstream task accuracy by 21% compared with the existing method.

2606.04990 2026-06-17 cs.CR cs.AI 版本更新

From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

从智能体痕迹到信任:LLM智能体中的证据追踪与执行溯源

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu, Qingqiang Sun, Zequn Sun, Zhangkai Wu, Manqing Dong, Mingkai Zhang, Xuefei Yin, Yanming Zhu

发表机构 * Griffith University(格里菲斯大学) Jiangsu University(江苏大学) University of Southern Queensland(南方昆士兰大学) Peking University(北京大学) Great Bay University(大湾大学) Nanjing University(南京大学) Macquarie University(麦觉瑞大学) Southern University of Science and Technology(南方科学与技术大学)

AI总结 本文系统综述了LLM智能体中的证据追踪与执行溯源方法,通过统一溯源视角连接检索、工具使用、记忆等环节,提出分类体系并讨论开放挑战。

详情
AI中文摘要

基于大语言模型(LLM)的智能体通过与外部工具、检索系统、记忆模块、环境及其他智能体交互,日益解决复杂任务。这些能力增强了智能体的自主性,但也使其行为更难以验证、调试和审计。仅凭最终答案的准确性无法解释输出是如何产生的、每个主张由哪些证据支持、工具调用是否合理、记忆如何影响后续决策或执行失败的根源。证据追踪和执行溯源通过建模检索到的证据、工具输出、记忆项、环境观察、中间主张、动作和最终答案在智能体执行过程中的连接方式,弥补了这一空白。本综述对LLM智能体中的证据追踪和执行溯源进行了系统回顾和概念框架构建。我们围绕统一的溯源视角组织相关工作,该视角连接了检索依据、主张支持、工具使用安全、记忆谱系、可观测性、调试、审计和恢复。我们引入了一个分类体系,涵盖追踪来源、证据和执行单元、溯源关系、追踪粒度和时机、表示形式以及信任功能。我们回顾了关键方法论方向,包括溯源表示、证据归因、工具使用溯源、运行时护栏、携带溯源的记忆、基于痕迹的可观测性和故障诊断。我们还绘制了现有基准、数据集和评估指标与溯源相关能力的映射,并讨论了评估如何从最终答案正确性转向过程级问责。最后,我们概述了开放挑战,包括统一痕迹模式、主张级和语义溯源、溯源感知的安全机制、现实执行痕迹基准、面向恢复的评估以及隐私感知的审计基础设施。

英文摘要

Large language model (LLM)-based agents are evolving from passive text generators into autonomous systems capable of planning, tool use, retrieval, memory access, environmental interaction, and multi-agent collaboration. These capabilities expand agent autonomy, but also make agent behavior harder to verify, debug, and audit. Final-answer accuracy alone cannot explain how an output was produced, which evidence supported each claim, whether tool calls were justified, how memory influenced later decisions, or where failures originated. This survey examines evidence tracing and execution provenance as foundations for process-level accountability in trustworthy LLM agents. We define execution provenance as the typed graph of an agent execution and evidence tracing as its projection onto evidence-support relations. This perspective connects retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, audit, and recovery within a unified framework. We introduce a taxonomy covering trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation forms, and trust functions. We then review key methodological directions, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, observability, and failure diagnosis. Finally, we discuss benchmarks, datasets, metrics, and open challenges for building provenance-aware, auditable, and recoverable agent systems.

2605.26195 2026-06-17 cs.CR cs.AI 版本更新

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

CyberEvolver:面向网络安全代理的即时结构化自我进化

Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan, Jiarun Dai, Hong Geng, Min Yang

发表机构 * Fudan University(复旦大学) Shanghai Innovation Institute(上海创新研究院) Shanghai Pudong Research Institute of Cryptology(上海浦东密码研究院)

AI总结 提出CyberEvolver框架,通过四层可进化架构、痕迹诊断机制和种群波束搜索,实现网络安全代理基于失败经验的支架自我进化,平均成功率提升13.6%。

详情
AI中文摘要

基于LLM的代理越来越多地用于网络安全任务,但现有系统大多依赖固定的、人工设计的支架,难以适应不同的目标和失败模式。我们提出了 extsc{CyberEvolver},一个自我进化的网络安全代理框架,它根据失败执行尝试的经验迭代地修改自己的支架。网络安全中的自我进化具有挑战性,因为可能的支架变化空间在很大程度上是非结构化的,执行反馈稀疏且常被环境掩盖,低多样性的更新可能导致错误在重复迭代中累积。 extsc{CyberEvolver}通过四层可进化代理架构(将支架优化分解为结构化组件)、痕迹诊断机制(将嘈杂的执行日志转化为可操作的修订信号)以及基于种群的波束搜索策略(在进化过程中保留多样化的代理变体)来应对这些挑战。我们在CTF挑战、漏洞利用和渗透测试任务上使用四个开源LLM评估了 extsc{CyberEvolver}。在这些设置中, extsc{CyberEvolver}将种子代理的成功率平均提高了13.6%,并优于六个人工设计的网络安全代理以及两种从其他领域改编的自我改进方法。这些结果表明,支架自我进化为构建用于安全测试的自适应LLM代理提供了一个有前景的方向。

英文摘要

LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.

2605.29669 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Eigen-Spike 涌现与共轭核在非线性可分数据上的二次等价

Collin Cranston, Zhichao Wang, Todd Kemp, Michael W. Mahoney

发表机构 * Department of Mathematics ICSI and Department of Statistics(数学系ICSI和统计系) University of California, San Diego, USA(美国加州大学圣地亚哥分校) University of California, Berkeley, USA(美国加州大学伯克利分校) Department of Mathematics ICSI, LBNL and Department of Statistics(数学系ICSI、劳伦斯伯克利国家实验室和统计系)

AI总结 针对非线性可分数据(XOR问题),通过共轭核矩阵的二次等价模型,分析异常特征值涌现及其与标签对齐的BBP型相变,揭示样本复杂度、信噪比、激活函数和预训练特征对非线性可学习性的影响。

Comments 81 pages, 8 figures

详情
AI中文摘要

近期随机矩阵理论(RMT)工作发展了确定性等价的概念:通常是线性代理模型,用于近似大型非线性随机矩阵(如神经网络中的非线性特征映射)的谱行为。一方面,这些确定性等价通过将复杂模型简化为具有经典RMT工具特性的更简单模型,使理论预测易于处理。然而,这留下了一个问题:在处理高维非线性可分数据(例如对非线性可分数据进行分类)时,这种理想化的线性等价是否仍然有意义。受此启发,我们考虑前馈神经网络的非线性特征映射——共轭核(CK),在典型的非线性可分数据集XOR问题上;我们利用CK中信息性异常特征值的研究及其对应特征向量是否渐近与XOR标签对齐,作为非线性可学习性的代理。我们开发了尖峰CK矩阵的稳健二次等价,从而能够精确分析随着修改机器学习实践中常见的各种旋钮(样本复杂度、信噪比、非线性激活选择以及预训练特征)时涌现的信息性尖峰。在每种情况下,我们推导出精确的BBP型相变,其中通过CK特征向量的线性分类变得可能。我们的分析有助于将RMT中确定性等价工具的力量转化为研究机器学习中实际相关的问题。

英文摘要

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). Such equivalents make theoretical predictions tractable by reducing a complex model to a simpler one with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful for classification of high-dimensional nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a one-layer feedforward NN, under a canonical nonlinearly separable dataset for the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent of the CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. We identify regimes in which these knobs move the CK beyond the linear equivalent and produce BBP-type transitions to label-aligned outlier eigenspaces. Our analysis helps bring deterministic-equivalence tools from RMT to bear on problems of practical relevance in ML.

2604.01904 2026-06-17 cs.CR cs.AI 版本更新

Combating Data Laundering in LLM Training

对抗LLM训练中的数据清洗

Muxing Li, Zesheng Ye, Sharon Li, Feng Liu

发表机构 * University of Melbourne(墨尔本大学) University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 针对数据清洗(通过变换风格隐藏数据来源)导致传统检测失效的问题,提出基于辅助LLM推断变换目标并合成查询的SDR方法,显著增强数据滥用检测能力。

Comments 29 pages, 2 figures

详情
AI中文摘要

数据权利所有者可以通过查询专有样本来检测大型语言模型(LLM)训练中未经授权的数据使用。通常,模型在某个样本上表现优于未训练数据(例如更高的置信度或更低的损失)意味着该样本属于训练语料,因为LLM在训练中见过的数据上表现更好。然而,这种检测在数据清洗(一种保留关键信息但改变专有数据风格形式以混淆数据来源的做法)下变得脆弱。当LLM仅在经过清洗的变体上训练时,它在原始数据上不再表现更好,从而消除了标准检测所依赖的信号。我们通过从对目标LLM的黑盒访问中推断未知的清洗变换,并借助辅助LLM合成模仿清洗数据的查询来应对这一问题,即使权利所有者只拥有原始数据。由于寻找真实清洗变换的搜索空间是无限的,我们将这一过程抽象为高层变换目标(例如“抒情改写”)和具体细节(例如“使用生动意象”),并引入合成数据还原(SDR)来实例化这一抽象。SDR首先识别最可能的合成目标以缩小搜索范围;然后迭代细化细节,使合成查询逐渐从目标LLM中引发更强的检测信号。在MIMIR基准上针对多种清洗实践和目标LLM系列(Pythia、Llama2和Falcon)的评估表明,SDR持续增强了数据滥用检测,为数据清洗提供了一种实用的对策。

英文摘要

Post-hoc unauthorized-training data detection for large language models (LLMs) typically assumes a query-with-originals regime: rights holders query a target LLM with raw proprietary data and assess whether the model assigns them stronger memorization-based detection signals, e.g., higher confidence or lower loss, than held-out non-training reference texts. We show that this regime becomes brittle under data laundering, where the target LLM is trained on semantics-preserving but stylistically or structurally transformed surrogates of proprietary data to obfuscate provenance. Since training-time exposure occurs in the laundered form, memorization signals may no longer appear on the originals, collapsing the candidate-reference signal separation that standard detectors rely on. We counter this threat by studying laundering-aware detection with raw proprietary data, a held-out reference corpus, and query access to the target LLM, while the laundering transformation is undisclosed. Since exact recovery of the laundered corpus is infeasible, we infer a detection-useful synthesis process via an auxiliary LLM that maps originals into training-like queries. To make this search tractable, we introduce Synthesis Data Reversion (SDR), which constrains the unbounded space of natural-language transformations through a goal-details abstraction: a high-level transformation goal, e.g., "lyrical rewriting", and fine-grained details, e.g., "with vivid imagery". SDR identifies the most likely goal and iteratively refines details so synthesized queries elicit stronger target-model detection signals. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently restores detection signals, offering a practical auditing layer against data laundering.

2605.29526 2026-06-17 cs.CR cs.AI cs.LG 版本更新

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

面向OOD区块链异常检测的时间模体感知图测试时自适应

Runang He, Tongya Zheng, Huiling Peng, Yuanyu Wan, Bingde Hu, Jiawei Chen, Canghong Jin, Mingli Song, Can Wang

发表机构 * State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) Zhejiang Provincial Engineering Research Center for Real-Time SmartTech in Urban Security Governance(浙江省实时智能科技在城市安全治理中的工程研究中心) Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security(杭州高新技术区(滨江)区块链与数据安全研究院)

AI总结 提出TEMG-TTA框架,通过时间模体分布捕获和测试时自适应策略,解决区块链异常检测中的模式演化和分布外问题,在5个数据集上平均提升54.88%。

Comments Accepted to IJCAI-ECAI 2026, Special Track on AI for Social Good

详情
AI中文摘要

不断演变的交易模式严重阻碍了新兴加密货币区块链上的异常检测,原因在于地址数量庞大且异常行为多样。近期应用于区块链的高级图异常检测(GAD)方法面临两个关键挑战:恶意行为者的对抗性模式演化以及区块链上不同交易语义导致的分布外(OOD)问题。为应对这些挑战,我们提出了一种新颖框架,称为时间模体感知图测试时自适应(TEMG-TTA)。首先,我们通过高效的计算机制全面捕捉每个活跃地址的三节点时间模体分布,从而实现下游时间模体感知图学习。其次,我们设计了一种简单而有效的测试时自适应策略,以促进训练图和测试图之间共享常见模式。在5个真实世界数据集上的大量实验表明,我们提出的TEMG-TTA平均优于最先进的GAD方法54.88%。进一步关于可解释模体模式的案例研究表明,TEMG-TTA明确刻画了异常地址的复杂交易模式,从而验证了我们技术设计的有效性。我们的代码将公开在 https://github.com/LuoXishuang0712/TEMG-TTA/。

英文摘要

Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: \textit{adversarial pattern evolution by malicious actors} and \textit{the out-of-distribution (OOD) problem caused by varied transaction semantics on blockchains}. To address these challenges, we propose a novel framework termed \textbf{TE}mporal \textbf{M}otif-aware \textbf{G}raph \textbf{T}est-\textbf{T}ime \textbf{A}daptation (\textbf{TEMG-TTA}). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed \textbf{TEMG-TTA} outperforms \textit{state-of-the-art} GAD approaches by an average of 54.88\%. A further case study on interpretable motif patterns reveals that \textbf{TEMG-TTA} explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code is publicly available at https://github.com/LuoXishuang0712/TEMG-TTA/.

2605.29179 2026-06-17 cond-mat.mtrl-sci cs.AI 版本更新

Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era

人工智能时代可持续的金属有机框架水收集器

Reid A. Coyle, Shyam Chand Pal, Peter Walther, Saeun Park, Bin Feng, Zhiling Zheng

发表机构 * Department of Chemistry, Washington University(华盛顿大学化学系) Institute of Materials Science & Engineering, Washington University(华盛顿大学材料科学与工程学院)

AI总结 本文探讨了金属有机框架(MOF)在干旱条件下水收集的设计原理,并介绍了人工智能(AI)、大语言模型(LLM)和数据挖掘如何加速高性能吸附剂的发现。

Comments 10 pages of main text, 26 total pages. 3 Figures and 1 Table of Content Graphic

详情
AI中文摘要

金属有机框架(MOF)因其可调节的孔隙环境而成为水收集的优秀候选材料,这些孔隙环境可以被精确设计以在干旱条件下捕获和释放水。将人工智能(AI)整合到MOF发现中可以进一步加速高性能吸附剂的设计,通过识别增强大气水收集(AWH)、稳定性和循环效率的结构特征。在这篇视角文章中,我们考察了关键的MOF设计原理,包括协同吸附、操作相对湿度(RH)、吸附容量、滞后现象和可扩展性。我们强调了最近的设计进展,如多变量策略和长臂连接体延伸,并考察了这些原理如何调节孔隙容量和亲水性,同时保持稳定性和结晶性。此外,我们讨论了AI、大语言模型(LLM)和数据挖掘如何通过预测合成、逆向设计以及阐明合成-结构-性能关系来加速下一代MOF水收集器的发现过程。

英文摘要

Metal-organic frameworks (MOFs) are excellent candidates for water harvesting due to their tunable pore environments, which can be precisely engineered to capture and release water in arid conditions. Integrating artificial intelligence (AI) into MOF discovery can further accelerate the design of high-performance sorbents by identifying structural features that enhance atmospheric water harvesting (AWH), stability, and cycling efficiency. In this Perspective, we examine key MOF design principles, including cooperative adsorption, operational relative humidity (RH), uptake capacity, hysteresis, and scalability. We highlight recent design advancements such as multivariate strategies and long-arm linker extension, and examine how these principles tune pore capacity and hydrophilicity, while preserving stability and crystallinity. Furthermore, we discuss how AI, large language models (LLMs), and data mining can accelerate the discovery process through predictive synthesis, inverse design, and elucidating synthesis-structure-property relationships for the next generation of MOF water harvesters.

2605.23243 2026-06-17 cs.CR cs.AI 版本更新

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

前沿大语言模型是否已为网络安全做好准备?来自双模式漏洞基准测试的垂直基础模型证据

Vivek Dahiya, Sunny Nehra, Vipul Dholariya, Bhavik Shangari, Chandra Khatri

发表机构 * super-intel.ai(超级智能人工智能公司)

AI总结 通过白盒函数级漏洞检测和黑盒Web应用安全测试双模式基准测试,评估前沿大语言模型在网络安全任务中的表现,发现其存在高误报率、低覆盖率等问题,而领域专用模型通过结构化方法显著提升性能。

详情
AI中文摘要

我们通过双模式基准测试评估前沿大语言模型是否已为网络安全做好准备:白盒函数级漏洞检测(VulnLLM-R,涵盖C/Java/Python)和黑盒Web应用安全测试(五个生产风格应用,包含118个真实漏洞,涉及20多个CWE家族,我们将开源)。我们测试了六个前沿模型(GPT-5.4、Codex~5.3、Claude Opus~4.6、Sonnet~4.6、Gemini~3.1~Pro和Gemini~3~Flash)以及两个领域专用模型,涵盖四种测试范式。我们的发现令人警醒:(1)每个前沿模型在白盒检测中产生10-50%的误报率,系统性地过度预测漏洞;(2)在黑盒测试中,前沿模型仅达到4-8%的真实漏洞覆盖率,即使借助外部安全工具(Playwright MCP、Burp Suite MCP)也仅提升至10-19%;(3)领域专用智能体中编码的结构化渗透测试方法将每个家族的检测率提升至50%以上,表明方法论而非规模是主要杠杆;(4)一个领域专用防御模型在单个GPU上实现了所有模型中最高的精确率(0.904)和最低的误报率(9.7%)。我们指出缺乏结构化安全测试痕迹(端到端请求/响应序列、失败密集型数据、多步攻击链)是根本的训练数据瓶颈,并提出自博弈安全测试作为数据生成策略。我们的结果为专门构建用于网络安全的垂直基础模型提供了依据。

英文摘要

We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detection (VulnLLM-R, across C/Java/Python) and black-box web application security testing (five production-style applications with 118 ground-truth vulnerabilities across 20+ CWE families, which we will open-source). We test six frontier models (GPT-5.4, Codex~5.3, Claude Opus~4.7, Sonnet~4.6, Gemini~3.1~Pro and Gemini~3~Flash) and two domain-specialized models across four testing paradigms. Our findings are sobering: (1)~every frontier model produces 10-50% false positive rates in white-box detection, systematically over-predicting vulnerabilities; (2)~in black-box testing, frontier models achieve only 4-8% ground-truth coverage, improving to just 10-19% even with external security tools (Playwright MCP, Burp Suite MCP); (3)~structured penetration-testing methodology encoded in domain-specialized agents raises per-family detection above 50%, demonstrating that methodology, not scale, is the primary lever; and (4)~a domain-specialized defense model achieves the highest precision (0.904) and lowest false positive rate (9.7%) among all models, on a single GPU. We identify the absence of structured security testing traces end-to-end request/response sequences, failure-heavy data, and multi-step attack chains as the fundamental training data bottleneck, and propose self-play security testing as a data generation strategy. Our results make the case for vertical foundation models purpose-built for cybersecurity.

2602.14211 2026-06-17 cs.CR cs.AI 版本更新

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

SkillJect:有效自动化基于技能的提示注入以针对具备技能的代理

Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, Philip Torr

发表机构 * Nanyang Technological University, Singapore(南洋理工大学,新加坡) Chongqing University, China(重庆大学) Northeastern University, China(东北大学) Sun Yat-sen University, China(中山大学) University of Oxford, UK(牛津大学)

AI总结 SkillJect 是首个自动化生成有效中毒技能的框架,通过隐藏恶意负载和重写指令通道,提升攻击效果,揭示可重用技能生态中的持久性攻击向量。

详情
AI中文摘要

SkillJect通过隐藏恶意负载和重写指令通道,有效自动化基于技能的提示注入,针对具备技能的代理提升攻击效果,揭示可重用技能生态中的持久性攻击向量。

英文摘要

Agent skills extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources, improving reusability but creating a new supply-chain attack surface. A malicious or compromised skill can be repeatedly loaded as trusted guidance and steer downstream tool use. Existing skill-based prompt-injection attacks are often manual and brittle, because explicit malicious instructions are rejected or ignored when they are not aligned with the original workflow. We propose SkillJect, the first automated framework for generating poisoned skills against skill-enabled agent systems. SkillJect uses two coordinated channels. In the artifact channel, it hides the payload inside an auxiliary helper script. In the instruction channel, it rewrites SKILL.md with a front-loaded inducement strategy, placing injected content at the beginning and framing the helper script as a mandatory prerequisite or initialization step. The rewritten instruction explicitly references the helper-script path and provides an executable example command, making the helper appear to be a legitimate setup step before normal skill operations. SkillJect further adopts a closed-loop multi-agent process to improve attack effectiveness. An Attack Agent generates poisoned skills, a Victim Agent executes downstream tasks with the poisoned skill, and an Evaluate Agent inspects execution traces to determine whether the hidden payload was executed. The Attack Agent then uses this feedback to diagnose failure causes and rewrite SKILL.md, while keeping the payload fixed. Experiments across skill-enabled platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection and prior manual skill-injection attacks, highlighting poisoned skills as a persistent threat in reusable skill ecosystems.

2511.19162 2026-06-17 cs.IR cs.CY cs.HC cs.LG cs.MM 版本更新

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

BioArtlas:生物艺术中多维复杂性的计算聚类

Joonhyung Bae

发表机构 * Graduate School of Culture Technology(文化科技研究生院)

AI总结 本文提出BioArtlas,通过新型轴感知表示对81件生物艺术作品进行多维分析,揭示四种组织模式,并通过交互式网页界面提供分析与探索。

Comments Bae, J. BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Creative AI Track: Humanity

详情
AI中文摘要

生物艺术的混合性质跨越艺术、科学、技术、伦理和政治,挑战传统单一轴分类。我提出了BioArtlas,利用新型轴感知表示分析81件生物艺术作品,共十三个 curated 维度。我们的代码本方法将相关概念分组为统一聚类,解决文化术语的多义性。对多达800种表示空间-算法组合的全面评估发现,Agglomerative clustering在k=15的4D UMAP上最优(轮廓系数0.664±0.008,信任度/连续性0.805/0.812)。该方法揭示了四种组织模式:艺术家特定的方法论凝聚力、基于技术的分段、时间艺术演变以及跨时间的概念亲和力。通过将分析优化与公共传播分离,我通过交互式网页界面(https://www.bioartlas.com)提供严谨分析和可访问的探索,数据集公开可用(https://github.com/joonhyungbae/BioArtlas).

英文摘要

Bioart brings living material into artistic practice, where a single work can be at once an aesthetic object, a scientific instrument, and an ethical provocation. Traditional categories sort such works along one axis at a time, which flattens the very hybridity that defines the field and leaves curators no way to compare works across many dimensions together. I introduce BioArtlas, a computational atlas that represents each bioartwork along many curated dimensions at once and organizes the field by conceptual similarity rather than by medium or chronology. My method embeds the keywords of all 81 works on each of thirteen interpretive axes, groups related concepts into a shared codebook that tames inconsistent terminology, and then searches systematically for a clustering that is both statistically clean and interpretable. Among the methods that place every work on the map, agglomerative clustering separates the field far more cleanly than the usual k-means baseline (silhouette 0.664 versus 0.483), whereas density-based methods reach higher scores only by discarding most of the corpus as noise. By separating rigorous analysis from public storytelling, BioArtlas turns the tangled complexity of bioart into a navigable landscape, openly available as an interactive interface (https://www.bioartlas.com) and dataset (https://github.com/joonhyungbae/BioArtlas).

2604.01197 2026-06-17 quant-ph cond-mat.stat-mech cs.CC cs.LG 版本更新

Learning and Generating Mixed States Prepared by Shallow Channel Circuits

通过浅层通道电路学习和生成混合态

Fangjun Hu, Christian Kokail, Milan Kornjača, Pedro L. S. Lopes, Weiyuan Gong, Sheng-Tao Wang, Xun Gao, Stefan Ostermann

发表机构 * QuEra Computing Inc.(QuEra计算公司) School of Engineering and Applied Sciences, Harvard University(哈佛大学工程与应用科学学院)

AI总结 研究通过浅层通道电路生成混合态的学习问题,证明在特定相态下,仅通过测量数据即可高效学习生成混合态,为量子生成模型提供结构基础。

Comments 44 pages, 14 figures, 1 table

详情
AI中文摘要

从测量数据中学习量子态是量子信息和计算复杂性中的核心问题。本文研究在有限维晶格上学习生成混合态的问题。受混合态物质相的最新发展启发,我们专注于平凡相中的任意态。一个态属于平凡相当于存在一个浅层准备通道电路,使得在准备过程中保持局部可逆性。我们证明了此类混合态可通过仅测量访问高效学习。具体而言,给定未知平凡相混合态的多个副本,我们的算法输出一个浅层局部通道电路,可近似生成该态。样本复杂度和运行时间与量子位数呈多项式(或准多项式)关系,假设电路深度和门局部性为常数(或多项式对数)。重要的是,学习者不被提供原始准备电路,仅依赖其存在。我们的结果为基于浅层通道电路的量子生成模型提供了结构基础。在经典极限下,我们的框架也启发了一种仅通过训练和生成的多项式过载高效算法,用于经典扩散模型。

英文摘要

Learning quantum states from measurement data is a central problem in quantum information and computational complexity. In this work, we study the problem of learning to generate mixed states on a finite-dimensional lattice. Motivated by recent developments in mixed state phases of matter, we focus on arbitrary states in the trivial phase. A state belongs to the trivial phase if there exists a shallow preparation channel circuit under which local reversibility is preserved throughout the preparation. We prove that any mixed state in this class can be efficiently learned from measurement access alone. Specifically, given copies of an unknown trivial phase mixed state, our algorithm outputs a shallow local channel circuit that approximately generates this state in trace distance. The sample complexity and runtime are polynomial (or quasi-polynomial) in the number of qubits, assuming constant (or polylogarithmic) circuit depth and gate locality. Importantly, the learner is not given the original preparation circuit and relies only on its existence. Our results provide a structural foundation for quantum generative models based on shallow channel circuits. In the classical limit, our framework also inspires an efficient algorithm for classical diffusion models using only a polynomial overhead of training and generation.

2605.12729 2026-06-17 cs.NI cs.AI cs.CR 版本更新

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

用于代理网络运维和AI运维的大型语言模型:架构、评估与安全

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu, Schahram Dustdar

发表机构 * School of Computing and Communications(计算与通信学院) University of Cambridge(剑桥大学) School of Software(软件学院) Nanjing University of Information Science and Technology(南京信息科技大學) TU Wien(维也纳技术大学) ICREA

AI总结 本文探讨了大型语言模型在网络运维和AI运维中的应用,分析了代理架构、评估方法及安全挑战,强调系统可靠性依赖于模型周边机制,而非模型本身。

Comments 49 pages, 15 figures, 6 tables; survey article

详情
AI中文摘要

大型语言模型正越来越多地用于支持网络运维(NetOps)和人工智能运维(AIOps),包括事件调查、根本原因分析、配置合成和有限的自动修复。在NetOps和AIOps中,这种转变正在改变任务管理方式。基于代理的操作作为工作流,从收集证据到采取行动,遵循权限、政策和检查,并在必要时提供回滚选项。这至关重要,因为操作决策可能立即产生影响。为了使论点具体化,我们围绕自主性层次、工具范围、证据轨迹和保证合同组织相关文献。这些合同定义了代理可以观察、提议和执行的内容,以及在允许任何行动前必须通过的检查。在 telemetry 查询推荐、诊断、根本原因分析、配置合成、变更规划和有限自动修复的研究中,出现了一致的模式。操作可靠性主要不来自模型本身,而是依赖于模型周围的机制。我们还主张评估应超越静态问答。代理NetOps和AIOps系统需要以工作流为中心的评估,包括轨迹质量、受限制的工具使用、安全提案生成、沙盒环境中的回放以及具有回滚意识的试用。没有这些措施,系统可能看起来稳健,但实际上可能过于脆弱。最后,我们检查了当代理接近操作控制面时,安全、隐私和治理风险变得尖锐的问题。综合来看,本文得出结论:智能NetOps和AIOps的进步将取决于将自主性视为受限制的操作控制问题,其输出必须可靠、可审计且安全可部署。

英文摘要

Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.

2604.23628 2026-06-17 cs.DS cs.LG 版本更新

Characterizing Admissible Objective Functions for Hierarchical Clustering

刻画层次聚类的可容许目标函数

Ryuki Tsukuba, Kazutoshi Ando

发表机构 * Faculty of Engineering, Shizuoka University(izuoka大学工学部) Graduate School of Integrated Science and Technology, Shizuoka University(izuoka大学综合科学技术研究院)

AI总结 本文研究层次聚类的可容许目标函数,对基于聚合相似度的和型目标函数,完整刻画了对称多项式次数≤2时的可容许性,并给出次数为3的充分条件;引入最大型目标函数,刻画了任意对称缩放函数的可容许性。

Comments 20 pages, 3 figures. Minor correction to abstract metadata. Manuscript unchanged from v2. Submitted to Discrete Applied Mathematics

详情
AI中文摘要

层次聚类是数据分析中的基本任务,但经典方法长期缺乏有原则的目标函数。Dasgupta [STOC~2016] 通过提出一个动机良好的聚类树目标函数,朝着填补这一空白迈出了重要一步。Cohen-Addad 等人 [J. ACM 2019] 随后引入了可容许性的概念:如果一个目标函数在输入相似度矩阵允许生成树时,其极小化器恰好是生成该矩阵的树,则该目标函数是可容许的。他们还给出了基于聚合簇间相似度的一类目标函数中可容许性的充要条件。我们将这类函数称为和型目标函数。然而,除了 Dasgupta 的原始目标函数外,该类中没有给出显式的可容许目标函数。本文从两个方向研究层次聚类的可容许目标函数。对于和型目标函数,当缩放函数是次数不超过2的对称多项式时,我们给出了完整的刻画,并推导了次数为3的多项式的充分条件。我们还证明,递归最稀疏割算法对我们刻画所覆盖的可容许目标函数实现了 O($\phi$) 的近似比,其中 $\phi$ 是最稀疏割子程序的近似因子。然后,我们引入了最大型目标函数,其中簇间相互作用通过最大簇间相似度而非聚合相似度来度量。对于该类,我们刻画了哪些目标函数对于任意对称缩放函数是可容许的,并在缩放函数是次数不超过2的对称多项式时给出了完整刻画。

英文摘要

Hierarchical clustering is a fundamental task in data analysis, but classical methods have long lacked a principled objective function. Dasgupta [STOC 2016] took an important step toward addressing this gap by proposing a well-motivated objective function for cluster trees. Cohen-Addad et al. [J. ACM 2019] subsequently introduced the notion of admissibility: an objective function is admissible if, whenever the input similarity matrix admits generating trees, its minimizers are precisely those generating trees. They also gave a necessary and sufficient condition for admissibility within a family of objective functions based on aggregate intercluster similarity. We refer to this family as sum-type objective functions. However, apart from Dasgupta's original objective function, no explicit admissible objective functions in this family were provided. In this paper, we study admissible objective functions for hierarchical clustering in two directions. For sum-type objective functions, we give a complete characterization when the scaling function is a symmetric polynomial of degree at most two, and we derive sufficient conditions for degree-three polynomials. We also show that the recursive sparsest cut algorithm achieves an O$(ϕ)$-approximation ratio for the admissible objective functions covered by our characterization, where $ϕ$ is the approximation factor of the sparsest cut subroutine. We then introduce max-type objective functions, where cluster interaction is measured by maximum, rather than aggregate, intercluster similarity. For this class, we characterize which objective functions are admissible for arbitrary symmetric scaling functions and give a complete characterization when the scaling function is a symmetric polynomial of degree at most two.

2604.16450 2026-06-17 cs.CY cs.LG q-bio.QM 版本更新

Evaluating Intersectional Fairness across Clinical Machine Learning Use Cases using Fairlogue and the All of Us Research Program

使用Fairlogue和All of Us研究计划评估临床机器学习用例中的交叉公平性

Nick Souligne, Vignesh Subbian

发表机构 * College of Engineering, The University of Arizona(亚利桑那大学工程学院)

AI总结 本文使用Fairlogue工具包在临床预测任务中评估交叉公平性,发现交叉群体差异大于单轴分析,但反事实诊断表明多数差异与随机分组相当。

Comments 10 pages, 7 figures, Accepted at the AMIA Annual Symposium 2026

详情
AI中文摘要

医疗数据中的交叉偏见可能在临床机器学习模型中产生复合差异,然而大多数公平性评估独立地评估人口统计属性。FairLogue是一个用于交叉公平性审计的工具包,被应用于多个临床预测任务,以评估跨组合人口统计群体的差异。使用All of Us数据集,选择两个已发表模型进行复制和评估:(A) 预测选择性5-羟色胺再摄取抑制剂相关的出血事件,(B) 房颤患者两年卒中风险。计算了跨种族、性别和交叉亚组的观察性公平性指标,随后进行反事实分析以评估差异是否可归因于群体成员身份。交叉评估揭示了比单轴分析更大的差异;然而,反事实诊断表明,大多数观察到的差异与随机群体成员身份下预期的差异相当。这些结果强调了交叉公平性审计的重要性,并展示了FairLogue如何为临床机器学习系统中的偏见提供更深入的洞察。

英文摘要

Intersectional biases in healthcare data can produce compound disparities in clinical machine learning models, yet most fairness evaluations assess demographic attributes independently. FairLogue, a toolkit for intersectional fairness auditing, was applied across multiple clinical prediction tasks to evaluate disparities across combined demographic groups. Using the All of Us dataset, two published models were selected for replication and evaluation: (A) prediction of selective serotonin reuptake inhibitor associated bleeding events and (B) two-year stroke risk in patients with atrial fibrillation. Observational fairness metrics were computed across race, gender, and intersectional subgroups, followed by counterfactual analysis to evaluate whether disparities were attributable to group membership. Intersectional evaluation revealed larger disparities than single-axis analyses; however, counterfactual diagnostics indicated that most observed disparities were comparable to those expected under randomized group membership. These results highlight the importance of intersectional fairness auditing and demonstrate how FairLogue provides deeper insight into bias in clinical machine learning systems.

2511.09204 2026-06-17 quant-ph cs.LG 版本更新

Resource-Efficient Variational Quantum Classifier

资源高效的变分量子分类器

Petr Ptáček, Paulina Lewandowska, Ryszard Kukulski

发表机构 * IT4Innovations, VSB - Technical University of Ostrava(IT4Innovations奥斯特拉瓦技术大学) Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava(电气工程与计算机科学学院,奥斯特拉瓦技术大学)

AI总结 提出基于汉明距离测量与经典后处理的无歧义量子分类器,通过更有效利用ansatz表达性提升分类性能,同时大幅减少电路评估次数,并增强对噪声的鲁棒性。

Comments 13 pages, 7 figures, 1 table; current format of preprint template

详情
AI中文摘要

我们引入了基于汉明距离测量与经典后处理的无歧义量子分类器。该方法通过更有效地利用ansatz的表达性来提升分类性能,同时显著减少电路评估次数。此外,该方法展现出对噪声的增强鲁棒性,这对近期的量子设备至关重要。我们在乳腺癌分类数据集上评估了所提出的方法。无歧义分类器实现了90%的平均准确率,相比基线提高了6.9个百分点,同时每次预测所需的电路执行次数减少了八倍。在存在噪声的情况下,改进幅度降至约3.1个百分点,执行成本降低相同。我们通过理论证据支持了该方法的实际性能,证实了我们的实验结果。

英文摘要

We introduce the unambiguous quantum classifier based on Hamming distance measurements combined with classical post-processing. The proposed approach improves classification performance through a more effective use of ansatz expressivity, while requiring significantly fewer circuit evaluations. Moreover, the method demonstrates enhanced robustness to noise, which is crucial for near-term quantum devices. We evaluate the proposed method on a breast cancer classification dataset. The unambiguous classifier achieves an average accuracy of 90%, corresponding to an improvement of 6.9 percentage points over the baseline, while requiring eight times fewer circuit executions per prediction. In the presence of noise, the improvement is reduced to approximately 3.1 percentage points, with the same reduction in execution cost. We substantiate our experimental results with theoretical evidence supporting the practical performance of the approach.

2603.18897 2026-06-17 cs.DC cs.AI 版本更新

Parallelizing Tool Execution and LLM Generation for Low-Latency Agent Serving

并行化工具执行与LLM生成以实现低延迟代理服务

Yifan Sui, Han Zhao, Rui Ma, Zhiyuan He, Hao Wang, Jianxun Li, Kaiqiang Xu, Kai Chen, Yuqing Yang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Microsoft Research(微软研究院) Stevens Institute of Technology(Stevens 工程学院) Google(谷歌) Hong Kong University of Science and Technology(香港科学与技术大学)

AI总结 提出PASTE系统,通过预测性执行未来工具调用与LLM生成并行,减少任务完成时间43.5%。

详情
AI中文摘要

基于LLM的代理通过模型生成和工具执行的顺序循环来执行任务。当今的服务系统串行化此循环,使工具延迟暴露在任务关键路径上。本文提出PASTE,一个工具感知的代理服务系统,它从重复的代理模式中预测具体的未来工具调用,并在LLM仍在生成时推测性执行它们。PASTE将推测结果隔离,直到LLM确认,并联合调度工具执行和返回的LLM会话,以避免将瓶颈转移到GPU。在深度研究、编码和科学代理工作负载上,PASTE将平均任务完成时间减少43.5%,并将观察到的工具延迟降低1.8倍。

英文摘要

LLM-powered agents execute tasks through a sequential loop of model generation and tool execution. Today's serving systems serialize this loop, leaving tool latency exposed on the task critical path. This paper presents PASTE, a tool-aware agent-serving system that predicts concrete future tool invocations from recurring agent patterns and executes them speculatively while the LLM is still generating. PASTE isolates speculative results until confirmed by the LLM and jointly schedules tool execution and returning LLM sessions to avoid shifting bottlenecks to the GPU. Across deep research, coding, and scientific-agent workloads, PASTE reduces average task completion time by 43.5% and lowers observed tool latency by 1.8x.

2503.17867 2026-06-17 cs.CR cs.AI cs.LG cs.NI 版本更新

Detecting and Mitigating DDoS Attacks with AI: A Survey

利用人工智能检测和缓解DDoS攻击:综述

Alexandru Apostu, Silviu Gheorghe, Andrei Hîji, Nicolae Cleju, Andrei Pătraşcu, Cristian Rusu, Radu Ionescu, Paul Irofti

发表机构 * Department of Computer Science, University of Bucharest(布加勒斯大学计算机科学系)

AI总结 本文综述了基于AI的DDoS攻击检测与缓解方法,提供了基于专家层次和AI生成树状图的分类法,讨论了数据集、对抗训练及未来研究方向。

详情
AI中文摘要

分布式拒绝服务攻击是一个活跃的网络安全研究问题。最近的研究从基于静态规则的防御转向基于AI的检测和缓解。本综述涵盖了几个关键主题。首先,讨论了最先进的AI检测方法。提供了基于手动专家层次和AI生成的树状图的深入分类法,从而解决了DDoS分类的歧义。随后讨论了可用的数据集,涵盖了数据格式选项及其在训练AI检测方法中的作用,以及对抗训练和示例增强。除了检测,还调查了基于AI的缓解技术。最后,提出了多个开放的研究方向。

英文摘要

Distributed Denial of Service attacks represent an active cybersecurity research problem. Recent research shifted from static rule-based defenses towards AI-based detection and mitigation. This comprehensive survey covers several key topics. Preeminently, state-of-the-art AI detection methods are discussed. An in-depth taxonomy based on manual expert hierarchies and an AI-generated dendrogram are provided, thus settling DDoS categorization ambiguities. An important discussion on available datasets follows, covering data format options and their role in training AI detection methods together with adversarial training and examples augmentation. Beyond detection, AI based mitigation techniques are surveyed as well. Finally, multiple open research directions are proposed.