arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2237
专题追踪
2606.17109 2026-06-17 cs.CR cs.AI cs.LG 新提交

Timestamp-Aware Spatio-Temporal Graph Contrastive Learning for Network Intrusion Detection

时间戳感知的时空图对比学习用于网络入侵检测

Jianli Dai, Guangwei Wu, Jiacheng Li, Weiping Wang, An He, Xinjun Xiao

发表机构 * Central South University of Forestry and Technology, School of Computer Science and Mathematics(中央林业科技大学计算机科学与数学学院) Central South University, School of Computer Science and Engineering(中南大学计算机科学与工程学院)

AI总结 提出一种自监督图神经网络框架,通过时间戳构建时序图,结合E-GraphSAGE和LSTM编码时空依赖,并采用多视图图对比学习(时空特征对比)及自适应权重策略,在四个数据集上达到与监督方法相当的性能。

详情
AI中文摘要

鉴于图神经网络(GNN)在建模网络流量间关系结构方面的有效性,它们已被广泛用于网络入侵检测系统(NIDS)。然而,大多数现有基于GNN的NIDS方法关注流量关系的结构,并将其视为时间独立,这限制了它们应对不断演变的攻击行为的能力。此外,它们对监督或半监督学习的依赖通常限制了对未见攻击的泛化能力。为解决这些限制,我们提出了一种新颖的自监督GNN框架。据我们所知,所提出的模型是首批显式利用真实时间戳的自监督GNN-based NIDS模型之一,这为表示学习提供了忠实的时间依赖关系。我们首先根据时间戳从网络流量中构建一系列时序图,然后采用基于E-GraphSAGE和LSTM的编码器充分提取网络流量的时间信息和空间依赖关系,而无需引入耗时的注意力机制。引入了一种多视图图对比学习(GCL)方案,其中联合执行时间、空间和特征对比,分别捕获时间连续性、保持结构一致性并提高所学表示的泛化性和鲁棒性。此外,设计了一种基于梯度范数的自适应加权策略来优化对比损失权重。在四个具有真实时间戳的代表性NIDS数据集上的实验结果表明,我们的方法显著优于现有自监督方法,并达到了与监督最先进GNN方法相当的性能,同时保持了高计算效率。

英文摘要

Given their effectiveness in modeling the relational structure among network traffic flows, graph neural networks (GNNs) have been widely adopted in network intrusion detection systems (NIDSs). However, most existing GNN-based NIDS approaches focus on the relational structure of traffic flows, and treat them as temporally independent, which limits their ability to cope with evolving attack behaviors. Moreover, their reliance on supervised or semi-supervised learning often restricts generalization to unseen attacks. To address these limitations, we propose a novel self-supervised GNN-based framework. To the best of our knowledge, the proposed model is among the first self-supervised GNN-based NIDS models to explicitly leverage real timestamps, which provides faithful temporal dependencies for representation learning. We first construct a series of temporal graphs from network traffic flows according to their timestamps, and then employ an E-GraphSAGE and LSTM based encoder to fully extract temporal information and spatial dependencies of network traffic, without introducing time-costly attention mechanisms. A multi-view graph contrastive learning (GCL) scheme is introduced, where temporal, spatial, and feature contrasts are jointly performed to capture temporal continuity, preserve structural consistency, and improve the generalization and robustness of the learned representations, respectively. In addition, a gradient-norm-based adaptive weighting strategy is designed to optimize the contrastive loss weights. Experimental results on four representative NIDS datasets with real timestamps demonstrate that our method significantly outperforms existing self-supervised approaches and achieves performance comparable to the supervised state-of-the-art GNN method, while maintaining high computational efficiency.

2606.17104 2026-06-17 cs.AR cs.AI cs.DC 新提交

Prefill/Decode-Aware Evaluation of LLM Inference on Emerging AI Accelerators

新兴AI加速器上LLM推理的Prefill/Decode感知评估

Shun Usami, Venkatram Vishwanath, E. Wes Bethel

发表机构 * Department of Computer Science(计算机科学系) San Francisco State University(旧金山州立大学) Argonne National Laboratory(阿贡国家实验室) Lawrence Berkeley National Laboratory(伯克利国家实验室)

AI总结 本文通过分离测量Prefill和Decode阶段,评估GPU与新兴AI加速器在Llama2-7B模型上的推理性能,发现GPU在计算密集的Prefill阶段占优,而GroqRack在Decode延迟上更优,但GPU随批处理增大在吞吐上反超。

Comments 8 pages, 5 figures. Accepted to the Workshop on HPC for AI Foundation Models & LLMs for Science (HPAI4S'26), co-located with IEEE IPDPS 2026

详情
AI中文摘要

随着大语言模型(LLM)越来越多地部署在对延迟和成本敏感的环境中,推理效率已成为一个核心系统挑战。尽管GPU主导当前部署,但越来越多的AI加速器声称在LLM推理方面具有优势,然而尚不清楚在何种条件下这些加速器在实践中优于GPU。最近的推理系统将执行分解为Prefill和Decode阶段,这两个阶段表现出不同的计算特征和延迟指标,通常由首次令牌时间(TTFT)和每个输出令牌时间(TPOT)衡量。本文使用通用模型Llama2-7B,对GPU和新兴AI加速器上的LLM推理性能进行了阶段感知评估。通过分别测量Prefill和Decode性能,我们揭示了加速器的优势因阶段和指标而异。我们的结果表明,GPU在计算密集的Prefill阶段始终表现出色,而GroqRack在Decode期间实现了显著更低的TPOT(当前不支持批处理)。然而,随着批处理大小的增加,GPU在Decode吞吐量上重新获得优势。这些发现表明,每个平台都表现出不同的阶段依赖性优势。我们进一步分析了不同加速器平台上的异构Prefill/Decode分离,识别了性能提升以及实现这些提升的工作负载和网络条件。

英文摘要

As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accelerators claim advantages for LLM inference, yet it remains unclear under which conditions such accelerators outperform GPUs in practice. Recent inference systems decompose execution into Prefill and Decode phases, which exhibit distinct computational characteristics and latency metrics, commonly captured by time to first token (TTFT) and time per output token (TPOT). This paper presents a phase-aware evaluation of LLM inference performance across GPUs and emerging AI accelerators using a common model, Llama2-7B. By separately measuring Prefill and Decode performance, we reveal that accelerator advantages differ by phase and metric. Our results show that GPUs consistently excel in the compute-intensive Prefill phase, while GroqRack achieves significantly lower TPOT during Decode (batching not currently supported). However, GPUs regain an advantage in Decode throughput as batch size increases. These findings demonstrate that each platform exhibits distinct phase-dependent strengths. We further analyze heterogeneous Prefill/Decode disaggregation across different accelerator platforms, identifying performance gains and the workload and network conditions under which such gains are realized.

2606.17099 2026-06-17 cs.SE cs.AI 新提交

Software Delegation Contracts: Measuring Reviewability in AI Coding-Agent Work

软件委托合约:衡量AI编码代理工作中的可审查性

Vincent Schmalbach

发表机构 * Independent Researcher(独立研究员)

AI总结 研究通过显式委托合约提升AI编码代理工作可审查性,实验发现合约虽不改善任务正确性,但显著提高证据充分性和降低审查歧义。

Comments 11 pages; empirical pilot study with 64 coding-agent runs and 192 blinded reviews

详情
AI中文摘要

AI编码代理越来越多地接受分配的软件任务,在有限权限下修改仓库,并返回工作包以供审查。先前工作提出了软件委托合约,涵盖任务、权限、返回的工作包和验收上下文,作为委托编码工作的分析单元,但未衡量其效果。本文报告了一项关于编码代理显式委托合约的受控试点研究。我们构建了一个无依赖的TypeScript API任务环境,包含种子缺陷和文档缺口,编写了五个系列的十个任务,并在三种条件下跨两个模型层级运行了64次代理执行:一个现实的问题风格提示、一个显式委托合约,以及一个带有必需证据包的合约。每次运行通过隐藏验收测试、变异检查和范围分析进行评分,然后由三位独立的条件盲审模型审查员使用固定评分标准进行审查,共192次审查。显式合约并未改善客观任务结果:所有64次运行均通过隐藏验收检查,零范围违规。但它们确实提高了可审查性。在30次配对比较中,证据充分性在22次中有所改善,没有一次恶化(5分量表上+0.83,p < 0.0001,Cliff's delta = 0.66);审查员歧义减少(p = 0.035);更改文件列表、已知限制部分、剩余风险部分和审查员检查表大多仅在合约要求时出现。合约消耗了+13%的代理令牌和+38%的挂钟时间,对较弱模型层级的影响更大。在这些小任务上,委托合约购买的是可审查性而非正确性。

英文摘要

AI coding agents increasingly accept assigned software tasks, modify repositories under bounded authority, and return work packages for review. Prior work proposed the software delegation contract, covering the task, authority, returned work package, and acceptance context, as the unit of analysis for delegated coding work, but did not measure its effects. This paper reports a controlled pilot study of explicit delegation contracts for coding agents. We built a dependency-free TypeScript API task environment with seeded defects and documentation gaps, authored ten tasks across five families, and ran 64 agent executions across two model tiers under three conditions: a realistic issue-style prompt, an explicit delegation contract, and a contract with a required evidence bundle. Each run was scored with hidden acceptance tests, mutation checks, and scope analysis, then reviewed by three independent condition-blinded model-based reviewers using a fixed rubric, for 192 reviews. Explicit contracts did not improve objective task outcomes: all 64 runs passed hidden acceptance checks, with zero scope violations. They did improve reviewability. Evidence sufficiency improved in 22 of 30 paired comparisons and worsened in none (+0.83 on a 5-point scale, p < 0.0001, Cliff's delta = 0.66); reviewer ambiguity decreased (p = 0.035); changed-file lists, known-limitations sections, residual-risk sections, and reviewer checklists appeared mostly or only when demanded by the contract. Contracts cost +13% agent tokens and +38% wall-clock time, with larger effects for the weaker model tier. On these small tasks, delegation contracts bought reviewability rather than correctness.

2606.17092 2026-06-17 cs.CR cs.CL 新提交

Securing Multi-Agent GIS Systems: Risk Evaluation and Prompt Hardening Optimization

保护多智能体GIS系统:风险评估与提示硬化优化

Kyle Gao, Pranavi Kotta, Linlin Xu, Jonathan Li, David A. Clausi

发表机构 * Department of Systems Design Engineering, University of Waterloo(系统设计工程系,滑铁卢大学) Department of Mechatronics Engineering, University of Waterloo(机电工程系,滑铁卢大学) Department of Geomatics Engineering, University of Calgary(测绘工程系,卡尔加里大学) Department of Geography and Environmental Management, University of Waterloo(地理与环境管理系,滑铁卢大学)

AI总结 针对多智能体GIS系统的安全风险,提出基于模块化状态机编排和提示优化的安全框架,通过红队测试和对抗演示提升系统鲁棒性。

Comments Kyle Gao and Pranavi Kotta contributed equally to this work

详情
AI中文摘要

智能体系统越来越多地与地理信息系统(GIS)集成,其中多智能体协调能够实现复杂的对话和空间分析,但同时也引入了安全风险。本文提出了一个面向安全的框架,用于多智能体GIS系统中的风险识别、评估和缓解,同时保持对更广泛智能体架构的适应性。我们在开发一个模块化的基于状态机的编排框架的同时,测试了商业地理空间合作伙伴的智能体系统,该框架将智能体行为抽象为可重用组件。我们使用一个包含自适应攻击者LLM和确定性评判器的红队框架来评估鲁棒性,该评判器在多轮攻击中产生带有支持理由的二元结果。我们进一步通过一个提示优化框架来提高韧性,该框架将提示视为结构化签名并注入对抗性演示,从而在不降低任务性能的情况下实现系统性的安全改进。

英文摘要

Agentic systems are increasingly integrated with geographic information systems (GIS), where multi-agent coordination enables complex conversational and spatial analysis but introduces security risks. This work presents a security-oriented framework for risk identification, evaluation, and mitigation in a multi-agent GIS system while maintaining adaptability to broader agentic architectures. We test the agentic system of a commercial geospatial partner while developing a modular state-machine-based orchestration framework that abstracts agent behavior into reusable components. We evaluate robustness using a red-teaming framework with an adaptive attacker LLM and a deterministic judge that produces binary outcomes with supporting rationales across multi-turn attacks. We further improve resilience with a prompt optimization framework that treats prompts as structured signatures and injects adversarial demonstrations, enabling systematic security improvements without degrading task performance.

2606.17090 2026-06-17 cs.PL cs.AI cs.MS 新提交

ANEForge: Python for direct computation on the Apple Neural Engine

ANEForge: 用于直接在Apple Neural Engine上进行计算的Python工具

Spencer H. Bryngelson

发表机构 * School of Computational Science \& Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA -0.35cm Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA -0.35cm George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

AI总结 ANEForge是一个Python包,通过编译惰性张量图直接编程Apple Neural Engine,无需CoreML,支持推理和训练,性能接近引擎硬件极限。

Comments 8 pages

详情
AI中文摘要

ANEForge是一个Python包,它直接编程Apple Neural Engine(ANE),即每个最新Apple设备上的固定功能神经加速器,无需CoreML。在生产环境中,引擎只能通过CoreML访问,而CoreML将其视为调度选项:没有配置要求使用ANE,模型可以静默地在CPU或GPU上运行。ANEForge将由58个融合操作和19个原生桥接操作构建的惰性张量图编译成单个ANE程序。该程序通过与Apple内部框架相同的ANE守护进程和内核驱动程序堆栈进行调度。除了推理之外,该包还访问引擎的原生融合注意力,流式传输int8、int4和稀疏权重,在步骤之间保持解码器和优化器状态,并在引擎上运行训练的前向传播、反向传播和优化器更新。一个小的融合程序在大约90微秒内完成一次调用,接近引擎每程序70微秒的调度下限,预训练的ResNet-18前向传播端到端运行时间为0.33毫秒。ResNet-18、句子编码器和Vision Transformer在框架参考上端到端运行,Stable Diffusion U-Net验证了其前向传播。ANEForge针对macOS 14及更高版本下的Apple Silicon。每个版本都针对记录的macOS和ANE编译器版本进行验证。

英文摘要

ANEForge is a Python package that programs the Apple Neural Engine (ANE), the fixed-function neural accelerator on every recent Apple device, directly and without CoreML. In production the engine is reachable only through CoreML, which treats it as a scheduling option: no configuration requires the ANE, and a model can silently run on the CPU or GPU instead. ANEForge compiles a lazy tensor graph, built from 58 fused operators and 19 native bridge operators, into a single ANE program. The program is dispatched through the same ANE daemon and kernel-driver stack as Apple's internal framework. Beyond inference, the package reaches the engine's native fused attention, streams int8, int4, and sparse weights, keeps decoder and optimizer state resident across steps, and runs the forward pass, backward pass, and optimizer update of training on the engine. A small fused program completes a call in about 90us, near the engine's 70us per-program dispatch floor, and a pretrained ResNet-18 forward runs end-to-end in 0.33ms. ResNet-18, a sentence encoder, and a Vision Transformer run end-to-end against framework references, and a Stable Diffusion U-Net validates its forward pass. ANEForge targets Apple Silicon under macOS 14 and later. Each release is verified against a recorded macOS and ANE-compiler version.

2606.17087 2026-06-17 cs.NE cs.AI 新提交

ZIVARI-TLBO: A Zero-Cost Inter-Group Evaluated-Elite Relay Mechanism for Teaching-Learning-Based Optimization

ZIVARI-TLBO:一种基于零成本组间评估精英中继的教学优化算法

Pezhman Zivari

发表机构 * Independent Researcher(独立研究者)

AI总结 提出ZIVARI-TLBO方法,通过固定环中组间传递已评估精英解实现零成本信息共享,在标准函数和工程问题上验证其性能,排名第二但非全局最优。

Comments 21 pages, 7 figures, 11 tables

详情
AI中文摘要

ZIVARI-TLBO是一种分组教学优化(TLBO)方法,它通过一个固定的组间评估精英中继来增强现有的种群状态控制器。在每个预定事件中,每个组将其已评估的精英解按固定环传递给下一组;只有当该精英的存储目标值更优时,它才替换接收组中最差的可替换学习者。由于精确中继复制了已评估的解及其存储的适应度,因此不需要额外的目标函数调用。冻结的gts-v4-cm-fixed实现在8个经典函数(维度10、30、50、100)和5个约束工程问题上,在等额10,000次评估预算下进行评估,使用30个匹配种子。与没有中继的相同分组景观感知控制器的直接消融实验记录了728/11/221胜/平/负,以及跨维度的秩双列效应大小为0.624。在八种方法的多维比较中,WOA获得最佳平均秩(2.914),ZIVARI-TLBO排名第二(3.382);ZIVARI-TLBO显著优于TLBO、MCTLBO、DE、PSO和GWO,显著劣于WOA,并且在Holm调整后与HHO无显著差异。可行性感知工程结果好坏参半,且对当前的静态惩罚公式敏感。证据支持有限的中继贡献和预算一致的信息共享机制,但不支持通用最先进、全局收敛、工程主导或CEC优越性的声明。

英文摘要

ZIVARI-TLBO is a grouped Teaching-Learning-Based Optimization (TLBO) method that augments an existing population-state controller with a fixed inter-group evaluated-elite relay. At each scheduled event, every group offers its already evaluated elite to the next group in a fixed ring; the elite replaces the receiver's worst eligible learner only when its stored objective value is better. Because the exact relay copies an already evaluated solution and its stored fitness, it requires no additional objective-function calls. The frozen gts-v4-cm-fixed implementation is evaluated under equal 10,000-evaluation budgets on eight classical functions at dimensions 10, 30, 50, and 100, with 30 matched seeds, and on five constrained engineering problems. A direct ablation against the same grouped landscape-aware controller without relay records 728/11/221 wins/ties/losses and a rank-biserial effect size of 0.624 across dimensions. In an eight-method multidimensional comparison, WOA obtains the best average rank (2.914) and ZIVARI-TLBO ranks second (3.382); ZIVARI-TLBO significantly outperforms TLBO, MCTLBO, DE, PSO, and GWO, loses significantly to WOA, and is not significantly different from HHO after Holm adjustment. Feasibility-aware engineering results are mixed and sensitive to the current static-penalty formulation. The evidence supports a scoped relay contribution and budget-consistent information-sharing mechanism, but not universal state-of-the-art, global-convergence, engineering-dominance, or CEC superiority claims.

2606.17081 2026-06-17 cs.AR cs.AI cs.DC cs.GT cs.PF 新提交

The Price of Anarchy in Disaggregated Inference

解耦推理中的无政府价格

Athos Georgiou

发表机构 * NCA

AI总结 本文通过博弈论分析解耦推理架构中的资源分配问题,提出自适应控制器降低无政府价格,在NVIDIA B200集群上实现最高3.1倍PoA下降。

Comments 38 pages, 7 figures, 8 tables. Measurements on a 3-node NVIDIA B200 cluster running NVIDIA Dynamo v0.9.0

详情
AI中文摘要

解耦推理架构将预填充和解码阶段物理分离到不同的GPU池中,创建了共享固定硬件预算的竞争“代理”。我们提供了据我们所知对该架构的首次正式博弈论分析,以NVIDIA Dynamo作为具体案例研究。我们将解耦服务建模为三个耦合博弈:预填充池和解码池之间的双人资源博弈、分层KV缓存上的自私缓存博弈以及具有正外部性的请求路由拥塞博弈。我们实证验证了后两者;P/D资源博弈通过分析处理(第9.2节)。我们描述了GPU饱和如何引发博弈收益结构转变的机制:低于饱和时,自私行为具有有界的无政府价格(PoA);在饱和时,超线性延迟和缓存外部性推动我们的经验估计器PoA-hat(定义见第6.4节)上升。基于此分析,我们设计了一个自适应控制器,实时检测饱和转换并相应调整路由参数,从缓存亲和性利用转向负载均衡拥塞避免。我们在一个3节点NVIDIA B200集群上实例化我们的框架,运行Dynamo和两个模型Nemotron-4-340B(TP=8,全节点工作节点,跨InfiniBand KV传输)和Llama-3.1-70B(TP=4),发现两个模型上具有相同的三区域PoA-hat结构,且第一个膝点后网格点相同(C=128)。自适应路由将每个模型转移到更好的工作点。我们最强的结果是在70B 1P/5D拓扑上,饱和阶段PoA-hat下降3.1倍(从66.4降至21.5),吞吐量成本为13%。在70B 1P/2D上,PoA-hat下降2.2倍,TTFT P99下降7.6倍(见第8.5节)。

英文摘要

Disaggregated inference architectures physically separate prefill and decode phases onto distinct GPU pools, creating competing "agents" that share a fixed hardware budget. We provide, to our knowledge, the first formal game-theoretic analysis of this architecture, using NVIDIA Dynamo as a concrete case study. We model disaggregated serving as three coupled games: a two-player resource game between prefill and decode pools, a selfish caching game over the hierarchical KV cache, and a congestion game with positive externalities for request routing. We empirically validate the latter two; the P/D resource game is treated analytically (Section 9.2). We characterize how GPU saturation induces regime transitions that shift the game's payoff structure: below saturation, selfish behavior has bounded Price of Anarchy (PoA); at saturation, superlinear latency and cache externalities drive our empirical estimator PoA-hat (defined in Section 6.4) upward. Based on this analysis, we design an adaptive controller that detects saturation transitions in real time and adjusts routing parameters accordingly, shifting from cache-affinity exploitation to load-balanced congestion avoidance. We instantiate our framework on a 3-node NVIDIA B200 cluster running Dynamo with two models, Nemotron-4-340B (TP=8, full-node workers with cross-InfiniBand KV transfers) and Llama-3.1-70B (TP=4), and find the same three-regime PoA-hat structure with the same first post-knee grid point (C=128) on both models. Adaptive routing shifts each model to a better operating point. Our strongest result is on the 70B 1P/5D topology, where PoA-hat drops 3.1x (66.4 to 21.5) in the saturated phase at a 13% throughput cost. On the 70B 1P/2D, PoA-hat drops 2.2x and TTFT P99 drops 7.6x (see Section 8.5).

2606.17074 2026-06-17 cs.AR cs.AI 新提交

Surveying GenAI-based Automation in Printed Circuit Board Design and Test

基于GenAI的印刷电路板设计与测试自动化综述

Sahana Srinivasan, Benjamin Turnbull, Hammond Pearce

发表机构 * University of New South Wales(新南威尔士大学)

AI总结 综述生成式AI在PCB全生命周期(从供应链到测试)中的应用,分类现有工作并指出数据稀缺与工具集成挑战,展望未来研究方向。

Comments 33 pages, 5 figures, 11 tables. Under review

详情
AI中文摘要

生成式人工智能(GenAI)越来越多地应用于硬件和软件领域。它旨在减少复杂系统在发布前开发和测试中涉及的人工工作量。在硬件领域,大多数任务集中在集成电路的设计自动化,特别是使用硬件描述语言。然而,也存在其他类型的硬件!在本综述中,我们转而考察GenAI如何已经并正在应用于印刷电路板(PCB)设计生命周期。这包括从供应链、系统规范、电路设计、布局与优化、验证与测试,到PCB组装与分销的所有环节。通过这一视角,我们提出了所发现工作的分类法,根据其意图和贡献进行分类。本综述还指出了GenAI在该领域面临的关键技术挑战,例如特定领域数据稀缺以及与现有PCB工具集成的支持有限。最后,讨论了未来的研究方向:我们的综述表明,在考虑如何将GenAI集成到PCB设计与测试的各种任务中时,仍存在许多机会。

英文摘要

Generative artificial intelligence (GenAI) is increasingly used for applications in the hardware and software domains. It purports to reduce the manual effort involved in the development and testing of complex systems before release. Within the hardware space, most tasks have focused on design automation of integrated circuits, particularly with hardware description languages. However, other types of hardware also exist! In this survey, we instead examine how GenAI has been and is being across the printed circuit board (PCB) design life cycle. This includes everything from supply chains, system specification, circuit design, layout and optimisation, validation and test, and PCB assembly and distribution. Through this lens we present a taxonomy of discovered works, categorising them according to their intent and contributions. This survey also identifies key technical challenges that GenAI faces in this space, such as domain-specific data scarcity and limited support for integration with existing PCB tools. Finally, future research directions are discussed: our survey shows that there are many opportunities remaining when considering how GenAI may be integrated into various tasks in PCB design and test.

2606.17059 2026-06-17 cs.DC cs.AI 新提交

Towards Distributed Inference of LLMs on a P2P Network

面向P2P网络的LLM分布式推理

Shabari S Nair, Krishanu Saini

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Department of Computer Science, The University of Texas at Austin(德克萨斯大学奥斯汀分校计算机科学系)

AI总结 提出一种去中心化的前缀缓存感知路由方案,用于P2P网络中的LLM推理,通过本地基数树和异步反熵更新缓存信息,避免集中协调和KV缓存传输,在低延迟和偏斜前缀分布下提升性能。

详情
AI中文摘要

前缀缓存可以通过在具有共享提示的请求之间重用KV缓存来减少LLM推理延迟,但集群规模的重用具有挑战性,因为缓存在节点之间是分区的。我们提出了一种用于对等LLM服务的去中心化、前缀缓存感知路由方案。每个节点维护其自身缓存前缀的本地基数树,并使用周期性反熵异步刷新对等缓存的估计。请求被路由到具有最长估计前缀匹配的节点,无需集中协调或KV缓存传输。过时的元数据只会导致缓存未命中,而不会产生错误输出,因此弱一致性足以保证正确性。在模拟MMLU工作负载上的评估表明,去中心化路由在低通信延迟和偏斜前缀分布下改善了延迟,而高网络延迟和亲和性引起的热点限制了其优势。

英文摘要

Prefix caching can reduce LLM inference latency by reusing KV caches across requests with shared prompts, but cluster-scale reuse is challenging because caches are partitioned across nodes. We propose a decentralized, prefix-cache-aware routing scheme for peer-to-peer LLM serving. Each node maintains a local radix tree of its own cached prefixes and asynchronously refreshed estimates of peer caches using periodic anti-entropy. Requests are routed to the node with the longest estimated prefix match, without centralized coordination or KV-cache transfer. Stale metadata only causes cache misses, not incorrect outputs, making weak consistency sufficient for correctness. Evaluation on simulated MMLU workloads show that decentralized routing improves latency under low communication delay and skewed prefix distributions, while high network latency and affinity-induced hotspots limit its benefits.

2606.18183 2026-06-17 stat.ML cs.LG math.PR 新提交

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

马尔可夫噪声下线性特征时序差分学习的扩散近似

M. Forzo, E. Monzio Compagnoni, A. Russo, A. Pacchiano

发表机构 * Technical University of Munich (TUM), Munich, Germany(慕尼黑技术大学) University of Basel, Basel, Switzerland(巴塞尔大学) Boston University, Boston, USA(波士顿大学)

AI总结 针对线性TD(0)在马尔可夫噪声下的随机波动,提出随机微分方程近似模型,揭示投影Bellman算子收缩动力学与马尔可夫采样影响的区别,解释常数步长误差下限。

详情
AI中文摘要

带有线性函数逼近的时序差分(TD)学习是策略评估的核心方法。其经典连续时间描述为常微分方程(ODE),捕捉渐近均值动态但忽略了决定误差下限的随机波动。我们引入了马尔可夫噪声下线性TD(0)的随机微分方程(SDE)近似。所得模型将投影Bellman算子控制的收缩动力学与马尔可夫采样的影响区分开来。因此,该模型通过马尔可夫长期协方差与投影Bellman算子收缩几何之间的相互作用解释了常数步长误差下限。

英文摘要

Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error floor. We introduce a stochastic differential equation (SDE) approximation for linear TD(0) under Markovian noise. The resulting model distinguishes the contraction dynamics governed by the projected Bellman operator from the influence of Markovian sampling. As a consequence, the model explains the constant-stepsize error floor through the interaction between Markovian long-run covariance and the contraction geometry of the projected Bellman operator.

2606.17426 2026-06-17 stat.ML cs.LG math.PR 新提交

Bounded Difference Concentration for Infinitely Exchangeable Sequences with Applications to AI Benchmark Uncertainty

无限可交换序列的有界差分集中不等式及其在AI基准不确定性中的应用

Fangyuan Lin, Spencer Frei, Victor H. de la Pena

发表机构 * Department of Statistics, Columbia University(哥伦比亚大学统计系) Google DeepMind(谷歌DeepMind)

AI总结 通过de Finetti测度分解有界差分函数的偏差,提出有效方差代理的集中不等式,并证明零和线性对比中潜在混合项完全抵消,应用于AI基准如MMLU的不确定性量化。

详情
AI中文摘要

我们考虑无限可交换随机变量函数的集中性质。通过对de Finetti导向测度取条件,我们证明任何具有有界差分常数$c_1, \dots, c_n$的函数的偏差分解为条件采样波动和潜在混合波动。当该潜在混合是$\sigma_{\mathrm{mix}}^2$-次高斯时,我们建立了一个有效方差代理为$\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$的集中不等式。关键的是,我们证明对于零和线性对比,例如子样本均值与总体均值之差,潜在混合项完全抵消。这种抵消产生了一个紧的、无混合的Hoeffding型界,为近期有限可交换集中结果的无限可扩展极限提供了直接的de Finetti机制。我们将该框架应用于量化复合AI基准(如MMLU)中的不确定性,其中问题项在领域间自然表现出可交换依赖性。我们的结果既提供了一个领域分层层次模型来限制准确率分数的不确定性,也提供了一个无分布、节省成本的统计保证,用于从随机子集准确估计完整的基准分数。

英文摘要

We consider the concentration properties of functions of infinitely exchangeable random variables. By conditioning on the de Finetti directing measure, we show that the deviation of any function with bounded-difference constants $c_1, \dots, c_n$ decomposes into a conditional sampling fluctuation and a latent mixture fluctuation. When this latent mixture is $σ_{\mathrm{mix}}^2$-subgaussian, we establish a concentration inequality with an effective variance proxy of $\frac{1}{4}\sum_i c_i^2 + σ_{\mathrm{mix}}^2$. Crucially, we demonstrate that for zero-sum linear contrasts, such as the difference between a subsample mean and a full population mean, the latent mixture term cancels exactly. This cancellation yields a tight, mixture-free Hoeffding-type bound that provides a direct de Finetti mechanism for the infinite-extendibility limit of recent finite-exchangeable concentration results. We apply this framework to quantify uncertainty in composite AI benchmarks, such as MMLU, where question items naturally exhibit exchangeable dependence across domains. Our results provide both a domain-stratified hierarchical model for bounding the uncertainty of accuracy scores, and a distribution-free, cost-saving statistical guarantee for accurately estimating full benchmark scores from random subsets.

2606.17319 2026-06-17 stat.ML cs.LG math.CO math.ST stat.TH 新提交

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

低次稀疏布尔多项式的紧 $L_\infty$ 样本复杂度

Jasper van Doornmalen, Mathieu Molina, Victor Verdugo, José Verschae

发表机构 * Institute for Mathematical and Computational Engineering(数学与计算工程研究所) Pontificia Universidad Católica de Chile(智利天主教大学) Blavatnik School of Computer Science and AI(Blavatnik计算机科学与人工智能学院) Tel Aviv University(特拉维夫大学) Department of Industrial and Systems Engineering(工业与系统工程系)

AI总结 针对有界二进制黑箱函数优化,研究布尔超立方体上多项式代理的学习问题,要求均匀 $L_\infty$ 误差保证,刻画了次高斯噪声下两类有界多项式的最小最大样本复杂度。

详情
AI中文摘要

受有界二进制黑箱函数优化的启发,我们研究了在布尔超立方体上学习多项式代理的问题。为了确保优化代理能为底层目标产生良好解,我们需要均匀的 $L_\infty$ 误差保证,而非通常的 $L_2$ 型保证。我们刻画了次高斯噪声下两类有界多项式的均匀估计的最小最大样本复杂度。首先,对于 $n$ 个变量上次数至多为 $d$ 的多项式,样本复杂度为 $n^{d+1}$。其次,对于 $s$-稀疏 Fourier-Walsh 多项式且 $s \leq n$,样本复杂度为 $ns^2$。这些速率在结构上不同于无噪声情形,其中均匀精确恢复的速率分别为 $n^d$ 和 $ns$。我们的下界甚至对任意自适应学习者也成立,表明额外的因子是噪声情形固有的。$L_2$ 范数的标准傅里叶分析工具不能自然地扩展到 $L_\infty$ 设置以产生均匀保证。我们的证明通过依赖适当选择的辅助范数作为控制 $L_\infty$ 误差的代理来克服这一困难。总之,我们的结果提供了学习优化安全多项式代理的样本复杂度的紧刻画。

英文摘要

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

2606.18218 2026-06-17 math.PR cs.LG cs.SY eess.SY math.OC stat.ML 新提交

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

随机网络中的有限时间队列峰值律:几何阈值后的对数缩放

Hao Liang, Cheng Tang, Yunzong Xu

发表机构 * University of Illinois Urbana–Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 研究广义交换机中有限时间队列峰值,证明在均匀内部松弛条件下,漂移最小化调度策略的峰值包络从平方根律转变为对数律,并给出匹配下界和几何阈值。

详情
AI中文摘要

我们研究广义交换机中的有限时间队列峰值,广义交换机是一种标准随机网络模型,其中许多队列共享受限的服务资源。到达过程可以是依赖的、时变的,并且适应于过去;稳态负载条件是均匀内部松弛,即条件均值到达向量始终位于容量区域的一个固定收缩内。我们表明,这种松弛重塑了漂移最小化调度策略(如MaxWeight)的有限时间峰值律。没有松弛时尖锐的平方根包络仅持续到几何依赖的阈值;超过该阈值,运行最大值随水平期仅对数增长,无论是高概率还是期望意义下。其机制是自归一化:在当前队列方向上,投影波动尺度被稳定化漂移尺度归一化。这从对数系数中消除了容量几何,而几何仍保留在阈值中。匹配的下界表明,对数项和几何阈值都是不可避免的。当有限时间状态空间塌缩可用时,可以使用局部瓶颈几何来锐化阈值。对于广义输入排队交换机,我们获得了具有紧对数系数的有限时间峰值界。仿真说明了理论预测的两阶段包络、局部几何改进和方差敏感改进。

英文摘要

We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.

2606.17762 2026-06-17 math.OC cs.AI 新提交

Symplectic Transversality and Endpoint Green Estimates for Finite-Horizon Pontryagin Systems

有限时域Pontryagin系统的辛横截性与端点Green估计

Pyuyi Chufeng Huang, Zikang Song, Xingshu Chen

发表机构 * School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China(四川大学信息科学与工程学院,成都,四川,中国) School of Mathematics, Sichuan University, Chengdu, Sichuan, China(四川大学数学学院,成都,四川,中国)

AI总结 针对有限时域离散时间Pontryagin边值系统,通过缩放稳定-不稳定边界横截性验证线性化端点逆,结合加权压缩证明端点修正Green估计,获得与视界无关的存在唯一性、Lipschitz依赖和一阶展开。

Comments 20 pages

详情
AI中文摘要

我们研究了在光滑控制消除后有限时域离散时间Pontryagin边值系统的视界一致局部分支。核心输入是线性化的两点端点逆。我们通过缩放稳定-不稳定边界横截性验证该逆,证明相关的端点修正Green估计,并将其与加权压缩结合,以获得存在性、唯一性、Lipschitz依赖性和一阶展开,且常数与视界无关。该框架涵盖光滑非线性端点映射,包括固定初始状态并将终端协态耦合到终端状态的原始Pontryagin行。辛和Riccati准则在矩阵数据层面验证逆假设;特别地,每个具有可逆动力学和定号权重的可镇定线性二次系统都被覆盖,包括非交换耦合数据。数值部分展示了证书和视界一致一阶展开。

英文摘要

We study horizon-uniform local branches of finite-horizon discrete-time Pontryagin boundary value systems after smooth control elimination. The central input is a two-point endpoint inverse for the linearization. We verify this inverse from scaled stable--unstable boundary transversality, prove the associated endpoint-corrected Green estimate, and combine it with weighted contractions to obtain existence, uniqueness, Lipschitz dependence, and first-order expansions with constants independent of the horizon. The framework covers smooth nonlinear endpoint maps, including the original Pontryagin rows that fix the initial state and couple the terminal costate to the terminal state. Symplectic and Riccati criteria verify the inverse hypothesis at the level of the matrix data; in particular, every stabilizable linear-quadratic system with invertible dynamics and definite weights is covered, including noncommuting coupled data. A numerical section illustrates the certificates and the horizon-uniform first-order expansion.

2606.17523 2026-06-17 math.OC cs.LG 新提交

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

超越IGO流:面向连续空间中IGO的收敛性分析

Ryosuke Kimura, Youhei Akimoto

发表机构 * University of Tsukuba, Tsukuba, Japan(茨口大学) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan(理化学研究所先进情报项目)

AI总结 研究离散时间IGO在连续空间中的收敛性,针对强凸二次目标函数上的多元高斯族,证明了协方差矩阵收敛到零矩阵,并在条件数有界时均值向量收敛到全局最优。

Comments Accepted at PPSN 2026

详情
AI中文摘要

信息几何优化(IGO)通过将搜索分布的适应解释为自然梯度更新,为黑箱优化提供了统一框架。尽管其概念重要,IGO的收敛理论仍然有限:大多数现有结果涉及连续时间理想化,如IGO流,而非具有非无穷小学习率的离散时间更新。在本文中,我们研究连续空间中的离散时间IGO,将其表述为指数族期望参数坐标下的自然梯度更新。特别地,我们分析了在强凸二次目标函数上对多元高斯族的IGO。我们的分析涵盖了一个同时结合全协方差适应、固定正学习率和基于分位数权重的设置。在此设置中,我们证明了协方差矩阵收敛到零矩阵。我们进一步表明,如果适当缩放的协方差矩阵的条件数在足够频繁的迭代中有界,则均值向量收敛到全局最优。这些结果推进了IGO的收敛理论,并有助于弥合IGO数学理论与实际协方差自适应搜索方法(如CMA-ES)之间的差距。

英文摘要

Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

2606.17260 2026-06-17 math.OC cs.LG stat.ML 新提交

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

基于确定性积分时间的哈密顿动力学的加速凸优化

Xiuyuan Wang, Vishwak Srinivasan, Qiang Fu, Siddharth Mitra, Ashia Wilson, Andre Wibisono

发表机构 * Department of Computer Science, Yale University(耶鲁大学计算机科学系) Department of EECS, Massachusetts Institute of Technology(麻省理工学院电子工程与计算机科学系)

AI总结 提出基于哈密顿动力学的平滑凸优化算法,通过利用平均哈密顿流轨迹的收缩而非端点收缩,实现确定性加速收敛,并推导出具有最优一阶复杂度的离散实现。

Comments 51 pages, 7 figures. Accepted to the 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

我们开发了基于哈密顿动力学的平滑凸优化算法,实现了加速收敛速率。通过利用平均哈密顿流轨迹的收缩而非要求轨迹端点处的收缩,我们证明了基于哈密顿动力学的优化方法具有确定性的加速收敛保证,扩展了先前仅限于二次目标或仅在期望中成立的工作。我们分析了一个理想的连续时间算法,并推导了具有最优一阶复杂度的实用离散时间实现,从而将哈密顿动力学确立为确定性加速凸优化的有用算法原语。

英文摘要

We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.

2606.18175 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 新提交

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

一种基于凸拟线性化的物理信息神经网络求解非线性偏微分方程的方法

Gbenga T. Awojinrin, Abdul-Akeem Olawoyin, Rami M. Younis

发表机构 * Texas A\&M University, College Station, Texas, U.S.A.(德克萨斯大学阿姆斯特朗分校)

AI总结 提出LiL-Q方法,通过Bellman-Kalaba拟线性化将非线性PDE转化为线性子问题序列,采用线性参数化试验空间(LiL)和凸最小二乘求解,避免非凸梯度训练,理论保证牛顿-康托罗维奇收敛,在多个基准上以少量外迭代达到高精度。

Comments Preprint. 56 pages, 18 figures. Code: https://github.com/awojinrin/lilq-pinn

详情
AI中文摘要

我们提出了一种数值方法,用于求解非线性偏微分方程(PDE)的正向问题。该方法中,Bellman-Kalaba拟线性化将非线性问题简化为一系列线性子问题,每个子问题通过配置法离散到参数线性输入的试验空间上,并通过单次直接线性最小二乘QR分解求解。该试验空间称为线性可学习(LiL),包含其可训练参数线性进入的表示,包括随机特征极限学习机、谱多项式基和三角展开,每个都作为物理信息神经网络实现。因此,该方法用凸的每步求解替代了限制标准PINN的非凸梯度训练。我们建立了外迭代在显式小条件下局部牛顿-康托罗维奇收敛到残差受限邻域,极限精度由试验空间的最佳逼近残差决定,而非优化容差。该方法记为LiL-Q,在七个基准上进行了评估,涵盖标量非线性PDE(Bratu、粘性Burgers、Buckley-Leverett)、耦合系统(平面应变弹性和二维及三维不可压缩Navier-Stokes方程)以及具有非均匀渗透率的稳态达西流。在这些问题中,LiL-Q在大多数情况下以个位数外迭代收敛,即使在最粗的基尺寸下且与参数数量无关。当精确解位于试验空间的张成空间中时,该方法在单次求解中恢复至机器精度。在Navier-Stokes基准上,它匹配或超过已发表的PINN求解器,可训练参数少两个数量级,且无需梯度优化。

英文摘要

We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters and solved by a single direct linear least-squares QR factorization. The trial space, which we term Linear-in-Learnables (LiL), comprises representations whose trainable parameters enter linearly, including random-feature extreme learning machines, spectral polynomial bases, and trigonometric expansions, each implemented as a physics-informed neural network. The method thus replaces the nonconvex gradient-based training that limits standard PINNs with a convex per-step solve. We establish local Newton-Kantorovich convergence of the outer iteration to a residual-limited neighborhood under an explicit smallness condition, with the limiting accuracy governed by the best-approximation residual of the trial space rather than by an optimization tolerance. The method, denoted LiL-Q, is assessed on seven benchmarks spanning scalar nonlinear PDEs (Bratu, viscous Burgers, Buckley-Leverett), coupled systems (plane-strain elasticity and the incompressible Navier-Stokes equations in two and three spatial dimensions), and steady-state Darcy flow with heterogeneous permeability. Across these problems, LiL-Q converges in single-digit outer iterations in most cases, even at the coarsest basis sizes and independent of the parameter count. When the exact solution lies in the span of the trial space, the method recovers it to machine precision in a single solve. On the Navier-Stokes benchmarks, it matches or exceeds published PINN solvers with up to two orders of magnitude fewer trainable parameters, without gradient-based optimization.

2606.18032 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 新提交

INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities

INI-VPINN:一种隐式处理纽曼边界和界面的变分物理信息神经网络,适用于具有几何奇异性的多材料域

Shayan Dodge, Alessandro Formisano, Sami Barmada

发表机构 * DESTeC University of Pisa(DESTeC 帕尔米斯大学)

AI总结 提出一种新的弱形式物理信息神经网络INI-VPINN,通过隐式处理纽曼边界和界面条件,无需额外损失项或多子域网络,在多材料问题中实现更高精度和更快收敛。

Comments Preprint version. Under peer review. Code available at: https://github.com/ShayanDodge/INI-VPINN

详情
AI中文摘要

我们提出了一种新的弱形式物理信息神经网络方法(命名为INI-VPINN)。INI-VPINN将纽曼边界和界面条件自然地纳入变分公式中,消除了对额外损失项或多个子域网络的需求。该框架采用紧支撑加权函数和分部积分来隐式地施加通量和连续性约束,从而在材料边界上隐式地确保物理一致性。所提出的方法在具有尖锐界面和复杂几何的泊松和拉普拉斯问题上进行了测试。结果表明,与其他几种基于物理信息神经网络的公式相比,INI-VPINN始终实现更高的精度、更平滑和更快的收敛。所提出的框架提供了一种使用神经网络求解具有复杂几何和混合纽曼-狄利克雷边界条件的多材料问题的通用方法。该实现已在GitHub仓库中公开。

英文摘要

We propose a new weak-form Physics-Informed Neural Network approach (named INI-VPINN). INI-VPINN naturally incorporates Neumann boundary and interface conditions into the variational formulation. It removes the need for additional loss terms or multiple subdomain networks. This framework employs compact support weighting functions and integration by parts to implicitly impose flux and continuity constraints. In this way, it implicitly ensures physical consistency across material boundaries. The proposed method is tested on Poisson and Laplace problems with sharp interfaces and complex geometries. Results show that, compared with several other Physics Informed Neural Networks-based formulations, the INI-VPINN consistently achieves higher accuracy, smoother and faster convergence. The proposed framework provides a general approach for solving multimaterial problems with complex geometries and mixed Neumann-Dirichlet boundary conditions using neural networks. The implementation is publicly available in a GitHub repository.

2606.17121 2026-06-17 stat.AP cs.LG physics.flu-dyn 新提交

Regularized Machine Learning for System Identification of Ship Free-Running Manoeuvres from CFD-Based Synthetic Data: A Comparative Study

基于CFD合成数据的船舶自由航行操纵系统辨识的正则化机器学习:比较研究

R. F. Suárez, J. C. Berndt, M. Abdel-Maksoud

发表机构 * Hamburg University of Technology (TUHH)(汉堡技术大学)

AI总结 本研究使用正则化回归方法从CFD生成的自由航行数据中辨识船舶水动力系数,重点评估了系数集大小、训练长度和操纵组合对模型性能的影响,发现Ridge回归在计算效率和预测精度间取得最佳平衡。

Comments 28 pages

详情
AI中文摘要

本研究探讨了从CFD生成的自由航行仿真数据中辨识船舶水动力系数的监督机器学习技术。具体而言,将普通最小二乘法和正则化回归方法应用于Abkowitz型操纵模型。训练和验证数据集来自Z形和回转操纵的URANS仿真,这些仿真已通过实验基准数据验证。分析评估了系数集大小、预测模型训练所需的最小训练长度以及操纵组合对模型性能的影响。结果表明,只要通过适当的系数选择、回归模型或输入数据变异性解决多重共线性问题,大角度Z形操纵适用于水动力系统辨识。较大的系数集为可变条件提供了更大的模型灵活性,但更容易出现多重共线性。正则化回归技术有效缓解了多重共线性,并显著提高了预测精度,而纳入更多样化的操纵数据同样如此。在测试的模型中,Ridge回归在计算效率和预测精度之间提供了最佳折衷。

英文摘要

This study investigates supervised machine learning techniques for identifying ship hydrodynamic coefficients from CFD-generated data from free-running simulations. Specifically, ordinary least squares and regularized regression methods are applied to Abkowitz-type manoeuvring models. Training and validation datasets are derived from URANS simulations of zig-zag and turning circle manoeuvres, which are validated against experimental benchmark data. The analysis evaluates the effects of coefficient set size, minimum training length required for predictive model training, and manoeuvre combinations on model performance. Results demonstrate the suitability of large-angle zig-zag manoeuvres for hydrodynamic system identification, provided that multicollinearity is addressed through appropriate coefficient selection, regression models, or input data variability. Larger coefficient sets offer greater model flexibility for variable conditions but are more prone to multicollinearity. Regularized regression techniques effectively mitigate multicollinearity and notably enhance prediction accuracy, as does incorporating more diverse manoeuvring data. Among tested models, Ridge regression provided the best compromise between computational efficiency and prediction accuracy.

2606.17530 2026-06-17 physics.soc-ph cs.LG econ.GN q-fin.EC stat.AP 新提交

Public transit gains and spatially uneven travel demand changes after NYC congestion pricing

纽约市拥堵收费后公共交通增益与空间不均的出行需求变化

Donghang Li, Dingyi Zhuang, Yunlin Li, Chenan Shen, Nina Cao, Yunhan Zheng, Shenhao Wang, Jinhua Zhao

发表机构 * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology(麻省理工学院土木与环境工程系) Department of Urban Studies and Planning, Massachusetts Institute of Technology(麻省理工学院城市研究与规划系) Mathematical Institute, University of Oxford(牛津大学数学院) Department of Mechanical Engineering, Massachusetts Institute of Technology(麻省理工学院机械工程系) College of Urban and Environmental Sciences, Peking University(北京大学城市与环境科学学院) Department of Urban and Regional Planning, University of Florida(佛罗里达大学城市与区域规划系) Center for Computational Science and Engineering, Massachusetts Institute of Technology(麻省理工学院计算科学与工程中心)

AI总结 利用时间序列基础模型生成概率反事实预测,评估纽约市2025年实施的拥堵收费政策,发现公交和地铁客流量显著增加,但总体出行需求略有下降,且影响存在空间异质性。

详情
AI中文摘要

纽约市于2025年1月实施了全国首个基于区域的拥堵收费计划,为评估全系统城市出行如何响应大规模定价干预提供了机会。由于此类政策会在不同交通方式和区域间产生溢出效应,因此难以构建可信的控制组。我们利用时间序列基础模型生成具有校准不确定性的概率反事实需求预测,以应对这一挑战。将该框架应用于公交、地铁和总出行量数据,我们发现,与预期无政策需求相比,政策实施后公交和地铁客流量显著增加,而总体出行需求略有下降。影响存在空间异质性:总体出行需求的减少集中在拥堵缓解区内,而公共交通的增益则延伸至曼哈顿核心区以外。社会人口分析进一步揭示了不同社区之间的适应差异,凸显了空间公平性问题。我们的框架为在缺乏干净控制组的情况下,对全系统城市干预进行不确定性感知评估提供了一种可扩展的方法。

英文摘要

New York City implemented the nation's first cordon-based congestion pricing program in January 2025, providing an opportunity to evaluate how system-wide urban mobility responds to large-scale pricing interventions. Because such policies generate spillovers across modes and locations, credible control groups are difficult to construct. We address this challenge using time series foundation models to generate probabilistic counterfactual demand forecasts with calibrated uncertainty. Applying this framework to bus, subway, and aggregate trip volume data, we find that post-policy bus and subway ridership increased significantly relative to expected no-policy demand, while overall travel demand decreased modestly. The effects are spatially heterogeneous: while reductions in overall travel demand are concentrated within the Congestion Relief Zone, transit gains extend beyond Manhattan's core. Socio-demographic analyses further reveal uneven adaptation across neighborhoods, highlighting spatial equity implications. Our framework provides a scalable approach for the uncertainty-aware evaluation of system-wide urban interventions when clean control groups are unavailable.

2606.17076 2026-06-17 physics.ao-ph cs.AI 新提交

CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science

CMIP-Forge:一个检索、计算和自我审查气候科学的智能体系统

Dmitrii Pantiukhin, Boris Shapkin, Ivan Kuznetsov, Thomas Jung, Nikolay Koldunov

发表机构 * Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research(阿尔弗雷德·韦格ener研究所,极地与海洋研究中心)

AI总结 提出CMIP-Forge系统,结合检索增强生成与自主分析,通过多层级防御架构和独立审查机制,实现从文献到实时气候数据的端到端自主研究。

Comments 28 pages, 9 figures. Code available at https://github.com/CliDyn/cmip6_gpt

详情
AI中文摘要

第六次耦合模式比较计划(CMIP6)已产生了数千篇同行评审出版物,记录了模式配置、评估程序、涌现约束和预估不确定性。随着领域向CMIP7过渡,高效提取并利用这些非结构化知识以及进行实时数据分析成为一个关键瓶颈。本文提出CMIP-Forge,一个混合检索增强生成(RAG)与自主分析系统,弥合科学文献与地球系统网格联盟(ESGF)数据档案之间的鸿沟。该系统将包含6,581篇CMIP6相关开放获取出版物(101,828个索引块)的精选语料库与一个智能体流水线配对,其中工具增强的工作者规划并执行实时气候数据的Python工作流,同时一组独立的审查模型端到端审计其方法论。CMIP-Forge引入了一种多层纵深防御架构,通过可执行机制强制执行物理和方法论不变性:抽象语法树(AST)静态分析、审计的科学原语以及自主对抗性同行评审协议。我们通过涵盖大气遥相关、海洋动力学、区域极端事件和全球变暖预估的端到端自主研究流水线展示了系统的能力。一个基于同行评审文献、受自动化代码护栏约束并由独立对抗性审查循环审计的智能体分析系统能够自主完成复杂的气候研究工作流。同样的实验暴露了审查循环的具体失败模式(谄媚回归、从未解决的REVISE裁决以及提交存根代码供审查),每种模式均可从随文章发布的不变遥测和溯源记录中诊断。

英文摘要

The Coupled Model Intercomparison Project Phase 6 (CMIP6) has generated thousands of peer-reviewed publications documenting model configurations, evaluation procedures, emergent constraints, and projection uncertainties. As the community transitions toward CMIP7, efficiently extracting and operationalizing this unstructured knowledge alongside live data analysis represents a critical bottleneck. Here we present CMIP-Forge, a hybrid retrieval-augmented generation (RAG) and autonomous analysis system that bridges the gap between scientific literature and Earth System Grid Federation (ESGF) data archives. The system pairs a curated corpus of 6,581 CMIP6-related open-access publications (101,828 indexed chunks) with an agentic pipeline in which a tool-augmented worker plans and executes Python workflows over live climate data, while a panel of independent reviewer models audits its methodology end to end. CMIP-Forge introduces a multi-layered Defense-in-Depth architecture that enforces physical and methodological invariants through executable mechanisms: Abstract Syntax Tree (AST) static analysis, audited scientific primitives, and an autonomous adversarial peer-review protocol. We demonstrate the system's capabilities through end-to-end autonomous research pipelines spanning atmospheric teleconnections, ocean dynamics, regional extremes, and global warming projections. An agentic analysis system grounded in peer-reviewed literature, constrained by automated code guardrails, and audited by an independent adversarial review loop can complete complex climate-research workflows autonomously. The same experiments expose concrete failure modes of the review loop (sycophantic regression, REVISE verdicts that are never resolved, and the submission of stub code for review), each diagnosable from the immutable telemetry and provenance record released with the article.

2606.17070 2026-06-17 physics.ao-ph cs.AI cs.LG 新提交

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

KFTD: 用于连续海洋时空预测的Koopman-Fourier时间可微网络

Qinghui Chen, Zekai Zhang, Hailong Liu, Jinglin Zhang, Cong Bai

发表机构 * Shandong University(山东大学) Laoshan Laboratory(崂山实验室) Chinese Academy of Sciences(中国科学院) Zhejiang University of Technology(浙江工业大学)

AI总结 提出KFTD网络,通过Koopman线性空间和傅里叶分析实现连续时间插值,结合轻量残差网络进行预测,在四个海洋数据集上均方误差平均降低5.6%,效率提升76.25%。

详情
AI中文摘要

准确的海洋预测对于气候监测和灾害预警至关重要。然而,海洋时空预测面临建模复杂动力系统和确保计算效率的双重挑战。我们提出了Koopman傅里叶时间可微(KFTD)网络,一种时间连续的两阶段范式,将插值与预测解耦,以实现高效且可扩展的时空建模。我们将复杂的非线性动力学映射到Koopman线性空间,并利用傅里叶分析实现任意子步的连续时间插值。一个轻量级残差网络消耗高保真中间状态以产生最终预测。与扩散模型不同,KFTD消除了多步噪声采样,直接在连续时间内演化系统,实现了4倍的计算加速。我们进一步引入DPP损失,以端到端方式支持任意PDE约束,打破了纯数据驱动方法的物理一致性瓶颈。在四个海洋数据集上的实验结果证实,我们的连续时间框架使MSE平均降低5.6%(SST最高达12.7%),并且效率比MCVD提高了76.25%。

英文摘要

Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring computational efficiency. We present Koopman Fourier Time-Differentiable (KFTD) Network, a time continuous twostage paradigm that decouples interpolation from prediction to achieve efficient and scalable spatiotemporal modeling. We map complex nonlinear dynamics into the Koopman linear space and exploit Fourier analysis to enable continuous time interpolation at arbitrary sub-steps. A lightweight residual network consumes the high fidelity intermediate states to yield the final forecast. Unlike diffusion models, KFTD eliminates multi step noise sampling and directly evolves the system in continuous time, yielding a 4 computational speedup. We further introduce a DPP Loss that supports arbitrary PDE constraints in an endtoend manner, breaking the physical consistency bottleneck of pure data-driven approaches. Empirical results on four ocean datasets confirm that our continuous time framework reduces MSE by an average of 5.6% (up to 12.7% for SST) and improves efficiency over MCVD by 76.25%.

2606.17235 2026-06-17 cond-mat.mtrl-sci cs.AI 新提交

Physics-Informed Attention Mechanism and Generalization Capability of Deep Learning-Based Grain Growth Evolution Prediction

物理信息注意力机制与基于深度学习的晶粒生长演化预测的泛化能力

Pungponhavoan Tep, Marc Bernacki

发表机构 * Mines Paris, PSL University Centre for Material Forming (CEMEF), UMR CNRS 06904(巴黎 Mines 学院,PSL 大学材料成型中心(CEMEF),CNRS UMR 06904)

AI总结 本研究评估了深度学习模型在晶粒生长预测中面对分布外数据的泛化能力,并提出边界掩码注意力机制,显著提升了双峰晶粒尺寸分布等场景的预测精度。

详情
AI中文摘要

用于晶粒生长预测的机器学习模型通常基于理想化的合成数据进行训练,然而实际应用需要泛化到训练分布之外的条件。本研究评估了我们先前研究中训练模型在三个测试案例上的分布外泛化能力,包括实验微观结构、具有双峰晶粒尺寸分布的微观结构以及异常晶粒生长。为了进一步探究物理信息架构设计是否能在这些不同条件下提升鲁棒性,我们专门针对晶粒生长提出了一种边界掩码注意力机制,将注意力限制在晶界像素上。基线模型和所提出的物理信息注意力模型均在分布外数据上未经重新训练或微调进行了评估。两个模型均成功泛化到所有三个测试案例,但边界掩码注意力机制提供了显著改进,最显著的提升出现在具有双峰晶粒尺寸分布的微观结构上,其中结构相似性指数从0.6221提高到0.7609,平均晶粒尺寸误差从8.75%降低到3.57%。注意力热图分析表明,边界掩码注意力模型学会了以与曲率驱动晶粒生长物理一致的方式将注意力集中在大晶界上,这种能力源于训练过程,而无需显式编码到架构中。这些结果表明,在合成数据上训练的模型可以无需重新训练而泛化到多种分布外条件,并且当边界形态与训练域匹配时,物理信息注意力可以提高精度。

英文摘要

Machine Learning (ML) models for grain growth prediction are typically trained on idealized synthetic data, yet practical applications require generalization to conditions outside the training distribution. This study evaluated the Out-Of-Distribution (OOD) generalization capability of the trained model from our previous study across three test cases, including experimental microstructures, microstructures characterized by a bimodal grain size distribution, and abnormal grain growth. To further probe whether physics-informed architectural design could improve robustness under these different conditions, a boundary-masked attention mechanism was proposed specifically for grain growth, constraining attention to grain boundary pixels. Both the baseline and the proposed physics-informed attention model were evaluated without retraining or fine-tuning on the OOD data. Both models successfully generalized to all three test cases, yet the boundary-masked attention mechanism provided substantial improvements, with the most notable gains for microstructures characterized by a bimodal grain size distribution, where Structural Similarity Index Measure (SSIM) improved from \num{0.6221} to \num{0.7609} and mean grain size ($\overline{R}$) error decreased from \SI{8.75}{\percent} to \SI{3.57}{\percent}. The attention heatmap analysis revealed that the boundary-masked attention model learned to concentrate attention on large grain boundaries in a manner consistent with curvature-driven grain growth physics, emerging from training without being explicitly encoded into the architecture. These results indicate that models trained on synthetic data can generalize to diverse OOD conditions without retraining, and that physics-informed attention may improve accuracy when the boundary morphology matches the training domain.

2606.18108 2026-06-17 astro-ph.IM cs.AI 新提交

Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

使用大语言模型查询天文数据库:ALeRCE文本到SQL系统

P. A. Estevez, J. Espejo-Moreira, S. Sanfeliu-Alvarez, F. Forster, A. M. Munoz Arancibia, G. Cabrera-Vives, F. E. Bauer, A. Bayo, M. Catelan, R. Dastidar, L. Hernandez-Garcia, J. A. Intriago, G. Pignata

发表机构 * Department of Electrical Engineering, University of Chile, Av. Tupper 2007, Santiago, Chile Millennium Institute of Astrophysics (MAS), Nuncio Monseñor Sótero Sanz 100, Providencia, Santiago, Chile Data Artificial Intelligence Initiative (ID\&IA), Universidad de Chile Center for Mathematical Modeling, Universidad de Chile, Beauchef 851, North building, 7th floor, Santiago 8320000, Chile Departamento de Astronom\'ia, Universidad de Chile, Casilla 36D, Santiago, Chile Department of Computer Science, Universidad de Concepción, Edmundo Larenas 219, Concepción, Chile Center for Data Artificial Intelligence, Universidad de Concepción, Edmundo Larenas 310, Concepción, Chile Heidelberg Institute for Theoretical Studies, Heidelberg, Baden-Württemberg, Germany Instituto de Alta Investigación, Universidad de Tarapacá, Casilla 7D, Arica, 1010000, Chile European Southern Observatory, Karl-Schwarzschild-Strasse 2, 85748 Garching bei München, Germany Instituto de Astrofísica, Facultad de Física, Pontificia Universidad Católica de Chile, Casilla 306, Santiago 22, Chile Centro de Astroingeniería, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7820436 Macul, Santiago, Chile Instituto de Estudios Astrof\'isicos, Facultad de Ingenier\'ia y Ciencias, Universidad Diego Portales, Av. Ej\'ercito Libertador 441, Santiago, Chile Centro Interdisciplinario de Data Science, Facultad de Ingenier\'ia y Ciencias, Universidad Diego Portales, Av. Ej\'ercito Libertador 441, Santiago, Chile

AI总结 提出基于大语言模型的文本到SQL系统,通过上下文学习和逐步生成框架(模式链接、查询分类、提示分解、自纠正)实现自然语言查询天文数据库,在ALeRCE数据集上评估13个模型,Claude Opus 4.6等表现最佳。

详情
AI中文摘要

我们开发了一个基于大语言模型(LLMs)的文本到SQL(结构化查询语言)系统,采用上下文学习方法,并将其应用于ALeRCE(自动学习快速事件分类)天文数据库。ALeRCE是Zwicky瞬变设施和Vera C. Rubin天文台的社区经纪人。该系统使用户能够以自然语言(NL)查询数据库,并生成可执行的SQL查询。为了开发和评估该系统,我们构建了一个包含110个NL/SQL对的数据集。我们提出了一个逐步生成框架,包含四个模块:模式链接、查询分类、提示分解和自纠正。使用上下文学习和提示工程技术评估了13个LLM的性能。文本到SQL的性能通过行标识符(例如对象标识符)和列标识符(即列名)的完美匹配(PM)率来评估。所提出的逐步框架始终优于直接推理基线,而自纠正模块持续减少执行错误。对于Claude Opus 4.6,简单查询的行(列)标识符PM性能较高,达到0.97(0.94),随着查询复杂度增加,中等查询降至0.44(0.72),困难查询降至0.59(0.49)。在评估的13个模型中,文本到SQL任务表现最佳的LLM是Claude Opus 4.6、Gemini 2.5 Pro、Gemini 3 Flash和GPT-5.2-Codex。

英文摘要

We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a dataset of 110 NL/SQL pairs. We propose a step-by-step generation framework comprising four modules: schema linking, query classification, prompt decomposition, and self-correction. The performance of thirteen LLMs is evaluated using in-context learning and prompt engineering techniques. Text-to-SQL performance is assessed using the perfect-match (PM) rate for row identifiers (e.g., object identifiers) and column identifiers (i.e., column names). The proposed step-by-step framework consistently outperforms a direct-inference baseline, while the self-correction module consistently reduces execution errors. For Claude Opus 4.6, PM performance on row (column) identifiers is high for simple queries, reaching 0.97 (0.94), and decreases with query complexity to 0.44 (0.72) for medium queries and 0.59 (0.49) for hard queries. Among the thirteen evaluated models, the best-performing LLMs for the text-to-SQL task are Claude Opus 4.6, Gemini 2.5 Pro, Gemini 3 Flash, and GPT-5.2-Codex.

2606.16072 2026-06-17 cs.CR cs.AI 新提交

MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens

MASCOT-Android: 一个用于安卓恶意软件源代码样本的精选数据集与自动收集管道

Bojing Li, Duo Zhong, Prajna Bhandary, Raguvir S, Charles Maxa, Robert J Joyce, Charles Nicholas

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校)

AI总结 提出MASCOT-Android数据集和自动收集框架,利用仓库级文档(README)训练LinearSVC分类器,以96.28%准确率和1.06%假阳性率从GitHub发现恶意软件源代码。

详情
AI中文摘要

与二进制文件和反编译代码相比,恶意软件源代码更直接地反映了攻击者的原始意图。然而,源代码的稀缺性和人工审查的高成本使得此类数据集难以构建和维护。我们提出了MASCOT-Android,一个精选的安卓恶意软件源代码数据集,以及一个用于在GitHub上可扩展地发现恶意软件源代码的自动收集框架。我们工作的一个关键发现是,仅仓库级文档就为恶意软件源代码收集提供了强信号。我们的模型从8,772个恶意软件和25,747个良性README文档中提取字符级TF-IDF特征,并训练一个LinearSVC分类器来区分恶意软件仓库。这个仅使用README的模型在本地评估中达到了96.28%的准确率和1.06%的假阳性率。此外,模型输出置信度分数,允许用户调整决策阈值以平衡假阳性率和覆盖率,这在现实世界的恶意软件源代码收集中是实用的。

英文摘要

Compared with binaries and decompiled code, malware source code more directly reflects the attackers' original intent. However, the scarcity of source code and the high cost of manual review make such datasets difficult to build and maintain. We propose MASCOT-Android, a curated dataset of Android malware source code and an automated collection framework for scalable malware source code discovery on GitHub. A key finding of our work is that repository-level documentation alone provides a strong signal for malware source code collection. Our model extracts character-level TF-IDF features from 8,772 malware and 25,747 benign README documents and trains a LinearSVC classifier to distinguish malware repositories. This README-only model achieves an accuracy of 96.28\% and an FPR of 1.06\% in local evaluation. In addition, the model outputs confidence scores, allowing users to adjust the decision threshold to balance FPR and coverage, which is practical in real-world malware source code collection.

2606.14954 2026-06-17 math.FA cs.LG math.OC stat.ML 新提交

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

数据科学中的表示代价:基础与深度神经网络的拟巴拿赫空间

Greg Ongie, Rahul Parhi

发表机构 * Marquette University(马凯特大学) University of California, San Diego(加州大学圣地亚哥分校)

AI总结 本文建立了一个统一框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价,揭示了深度神经网络诱导的本征空间是拟巴拿赫空间,并证明了表示定理等自然结果。

详情
AI中文摘要

我们开发了一个通用框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价。从这个抽象视角,我们定义了任意参数化模型的表示代价,并揭示了它们诱导的(本征)函数空间。这统一了最近数据拟合方法的函数空间观点。我们还证明了许多自然结果在这个抽象设置中成立,包括参数方法在其本征空间上的表示定理。该框架还严格地将参数化方法与其在充分过参数化下的等价非参数描述联系起来。经典方法及其本征空间,如核方法/再生核希尔伯特空间、小波/贝索夫空间和浅层神经网络/变分空间,都是我们抽象框架的特例。将表示代价研究“公理化”的一个副产品是,我们立即获得了深度神经网络的新结果:对于深度为$L$的前馈ReLU网络,其诱导的本征空间是$p$范数可拟的拟巴拿赫空间,其中$p = 2/L$。这揭示了深度神经网络的归纳偏置(由表示代价给出)在深度$L > 2$时无法被范数捕捉。

英文摘要

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

2606.14814 2026-06-17 cond-mat.mtrl-sci cs.AI physics.app-ph physics.chem-ph physics.comp-ph 新提交

A Multi-Level Architecture for Reusable Materials Ontologies -- The OntoCrafter Ceramics Ontology (OCO) as Reference Implementation

可复用材料本体的多层次架构——以OntoCrafter陶瓷本体(OCO)作为参考实现

Thomas Pannek, Wolfgang Grond

发表机构 * Numberland

AI总结 针对材料科学本体在水平、垂直和机制三个维度上的碎片化问题,提出一种多层次模块化架构,通过抽象层次和消费受众两个独立分类轴,并在材料特定层内采用七层机制解释骨架,以OntoCrafter陶瓷本体(OCO v0.94)作为参考实现。

Comments 3 figures, 55 pages

详情
AI中文摘要

材料科学与工程本体领域同时在多个轴向上呈现碎片化。水平方向:一项近期调查识别出94个本体,其中超过40个在结构上不兼容;每个新的应用领域——陶瓷、聚合物、电池、智能材料——通常从头开始重新设计本体。垂直方向:欧盟法规(CSRD、CSDDD、PPWR、CBAM、R2R、AI Act、ESPR)迫使材料、制造、供应链和生命周期数据集成到数字产品护照中,使得仅解决水平碎片化的本体对于任何当代消费者来说都是不完整的。机制方面:一个记录BNT-BT具有$d_{33} \approx 580$ pC/N的词汇表存储了一个事实,但如果没有系统的解释骨架,就无法揭示其原因——Bi-6s$^2$孤对电子立体活性、异常Born有效电荷、软模、缺陷化学。我们提出一种多层次模块化架构,具有两个独立的分类轴——抽象层次(L0桥梁、L1材料无关的实验室笔记本、L2材料类别特定、L3分类推理)和消费受众(材料与合规)——其中材料特定层次内部由适用于任何结晶离子氧化物的七层机制解释骨架(对称性、能量/DFT、热力学/CALPHAD、动力学、微观结构、缺陷化学、键合)组织。层次和受众的模块化解决了水平碎片化,合规受众吸收了垂直法规压力,而第2层的七层组织提供了机制解释深度。我们将该架构实例化为OntoCrafter陶瓷本体(OCO v0.94):跨44个模块的5,196个类;167,348个OWL公理(其中40,454个逻辑公理);1,674个属性;829个跨本体桥梁映射;1,172个SHACL形状;163个已发布的胜任力问题。

英文摘要

The Materials Science and Engineering ontology landscape is fragmented along multiple axes simultaneously. Horizontally: a recent survey identified 94 ontologies of which over 40 are structurally incompatible; each new application domain -- ceramics, polymers, batteries, smart materials -- typically restarts ontology design from scratch. Vertically: EU regulation (CSRD, CSDDD, PPWR, CBAM, R2R, AI Act, ESPR) forces material, manufacturing, supply-chain, and lifecycle data into integrated digital product passports, leaving ontologies that only address horizontal fragmentation incomplete for any contemporary consumer. And mechanistically: a vocabulary that records that BNT-BT has $d_{33} \approx 580$ pC/N stores a fact but cannot surface why -- Bi-6s$^2$ lone-pair stereo-activity, anomalous Born effective charges, soft modes, defect chemistry -- without a systematic explanation skeleton. We propose a multi-level modular architecture with two independent classification axes -- level of abstraction (L0 bridges, L1 material-agnostic laboratory-notebook, L2 material-class-specific, L3 categorical reasoning) and consumer audience (material vs. compliance) -- in which the material-specific level is internally organised by a seven-tier mechanistic-explanation skeleton (Symmetry, Energy/DFT, Thermo/CALPHAD, Kinetics, Microstructure, Defect chemistry, Bonding) applicable to any crystalline ionic oxide. The level-and-audience modularity dissolves the horizontal fragmentation, the compliance audience absorbs the vertical regulation pressure, and the seven-tier organisation of Level 2 delivers the mechanistic explanation depth. We instantiate the architecture as the OntoCrafter Ceramics Ontology (OCO v0.94): 5,196 classes across 44 modules; 167,348 OWL axioms (40,454 logical); 1,674 properties; 829 cross-ontology bridge mappings; 1,172 SHACL shapes; 163 published competency questions.

2606.14517 2026-06-17 cs.CR cs.AI 新提交

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

从盾牌到靶心:针对基于LLM的智能体护栏的拒绝服务攻击

Yuguang Zhou, Xunguang Wang, Pingchuan Ma, Zhantong Xue, Zhaoyu Wang, Shuai Wang

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 本文揭示基于LLM的护栏易受拒绝服务攻击,通过束搜索优化框架和机制感知结构变异生成恶意负载,导致令牌放大13-63倍、延迟放大148倍,威胁系统可用性。

详情
AI中文摘要

基于LLM的护栏已成为自主智能体中防御提示注入和越狱攻击的高效手段。然而,我们发现正是这种实现保护的推理和任务遵循能力引入了一种新的漏洞:攻击者可以注入精心构造的数据,使护栏陷入扩展推理循环,从而实施系统性的拒绝服务(DoS)攻击。为系统性地揭示这一威胁,我们设计了一个束搜索优化框架,利用策略库引导的LLM提议器,生成自然语言负载以最大化护栏推理长度。基于对护栏模式遵循性质的观察,我们还提供了另一种由机制感知结构变异驱动的攻击框架,计算负载更小。攻击效能通过两部分系统评估。首先,在独立评估中,攻击可泛化到多种护栏架构、安全模板和智能体基准。在单个开源替代模型上优化的负载成功迁移到八个领先模型骨干(如Claude、GPT、Gemini、DeepSeek和Qwen),实现13-63倍的令牌放大。其次,在端到端的真实世界智能体部署(网页、桌面、代码和多智能体系统)中,攻击揭示高达148倍的延迟放大。我们表明,单个中毒文档即可饱和共享护栏基础设施,有效饿死同位置智能体并瘫痪整个系统。通过揭示这一可用性缺陷,我们的工作强调了开发成本受限、推理鲁棒的护栏的紧迫性。

英文摘要

LLM-based guardrails have emerged as a highly effective defense against prompt injection and jailbreak attacks in autonomous agents. However, we reveal that the very reasoning and task-following capabilities enabling this protection introduce a novel vulnerability: attackers can inject crafted data to trap the guardrail in extended reasoning loops, effectuating a systematic denial-of-service (DoS) attack. To systematically expose this threat, we design a beam-search optimization framework that crafts natural-language payloads to maximize guardrail reasoning length, utilizing an LLM proposer guided by a strategy bank. Based on the observation of guardrail's schema-following nature, we also provide another attack framework driven by mechanism-aware structural mutations with less computational load. The attack efficacy is systematically evaluated in two parts. First, in standalone evaluations, the attack generalizes across diverse guardrail architectures, safety templates, and agent benchmarks. Payloads optimized on a single open-source surrogate successfully transfer to eight leading model backbones (e.g., Claude, GPT, Gemini, DeepSeek, and Qwen), achieving a 13--63$\times$ token amplification. Second, in end-to-end real-world agent deployments (web, desktop, code, and multi-agent systems), the attack reveals up to a 148$\times$ latency amplification. We show that a single poisoned document can saturate shared guardrail infrastructures, effectively starving co-located agents and paralyzing the entire system. By uncovering this availability flaw, our work underscores the urgent need to develop cost-bounded, reasoning-robust guardrails.

2606.14295 2026-06-17 cs.CR cs.AI cs.LG 新提交

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

AgentCyberRange:在真实网络靶场中基准测试前沿AI系统

Fengyu Liu, Jiarun Dai, Yihe Fan, Wuyuao Mai, Ziao Li, Bofei Chen, Jie Zhang, Zheng Lou, Bocheng Xiang, Qiyi Zhang, Xudong Pan, Geng Hong, Yuan Zhang, Min Yang

发表机构 * Fudan University(复旦大学)

AI总结 提出首个开源多靶场基础设施AgentCyberRange,集成110个漏洞和156个内部主机,评估前沿AI系统在真实网络攻击中的能力,发现GPT-5.5+Codex在web利用和后利用任务中表现最佳。

详情
AI中文摘要

前沿AI系统在网络安全任务中能力日益增强,包括代码库检查、漏洞检测和利用。然而,评估其攻击能力仍受限于缺乏开放、可复现、多主机的网络靶场。现有公开基准测试捕获了CTF解题、漏洞复现和利用生成等孤立技能,但通常忽略了真实的入侵工作流:发现暴露服务、获得立足点、收集内部信息以及跨主机扩大入侵范围。这一差距使得早期观察新兴风险变得困难,因为前沿AI系统很少在真实攻击条件下进行评估。我们引入了AgentCyberRange,这是首个用于在真实网络靶场中衡量自主网络攻击能力的开源多靶场基础设施。它整合了15个真实Web应用和8个企业级网络靶场中的110个漏洞,以及156个内部主机,并提供了Cage工具链用于执行、编排、结果收集和验证。该基准测试涵盖两个核心阶段:Web利用(代理探索暴露的应用并验证漏洞)和后利用(代理将初始立足点转化为更广泛的内部入侵)。我们在匹配的提示和预算下评估了六个前沿AI系统。GPT-5.5与Codex表现最佳,解决了16.1%的Web利用任务和31.7%的后利用任务;在更具体的提示下,这些比率分别提高到33.0%和46.3%。我们还观察到基准测试之外的发现,包括流行项目中的未知漏洞,以及绕过主机防御的有效载荷变异。这些结果表明,开放的网络靶场评估对于在真实且可复现的条件下观察新兴攻击能力是必要的。

英文摘要

Frontier AI systems are increasingly capable of cybersecurity tasks, including codebase inspection, vulnerability detection, and exploitation. However, evaluating their offensive capabilities remains constrained by limited access to open, reproducible, multi-host cyber ranges. Existing public benchmarks capture isolated skills such as CTF solving, vulnerability reproduction, and exploit generation, but often abstract away realistic intrusion workflows: discovering exposed services, gaining a foothold, collecting internal information, and expanding compromise across hosts. This gap makes it difficult to observe emerging risks early, because frontier AI systems are rarely evaluated under realistic attack conditions. We introduce AgentCyberRange, the first open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges. It combines 110 vulnerabilities across 15 real web applications and 8 enterprise-like cyber ranges with 156 internal hosts, plus Cage, a toolchain for execution, orchestration, result collection, and verification. The benchmark covers two core stages: web exploitation, where agents explore exposed applications and validate vulnerabilities, and post exploitation, where agents turn an initial foothold into broader internal compromise. We evaluate six frontier AI systems under matched prompts and budgets. GPT-5.5 with Codex performs best, solving 16.1% of web exploitation tasks and 31.7% of post-exploitation tasks; with more concrete hints, these rates increase to 33.0% and 46.3%. We also observe out-of-benchmark findings, including unknown vulnerabilities in popular projects, and payload mutation that bypasses host defenses. These results show that open cyber-range evaluation is necessary for observing emerging offensive capabilities under realistic and reproducible conditions.

2606.13919 2026-06-17 eess.IV cs.AI cs.CV 新提交

GMN4AD: Graph Matching Network for Alzheimer's Disease Diagnosis with Test-Time Domain Adaptation using Multi-centered Structure Magnetic Resonance Imaging

GMN4AD:基于图匹配网络的阿尔茨海默病诊断与测试时域适应方法在多中心结构磁共振成像中的应用

Chen Zhao, Huan Huang, Yixin Xie, Jiajing Huang, Weihua Zhou

发表机构 * Department of Computer Science, Kennesaw State University(肯纳邦大学计算机科学系) Department of Information Technology, Kennesaw State University(肯纳邦大学信息技术系) School of Data Science and Analytics, Kennesaw State University(肯纳邦大学数据科学与分析学院) Department of Applied Computing, Michigan Technological University(密歇根技术大学应用计算系)

AI总结 提出GMN4AD,利用图匹配网络建模异质脑图间关系,结合测试时域适应策略,在三个公共数据集上优于现有方法,实现鲁棒的AD诊断。

详情
AI中文摘要

阿尔茨海默病(AD)是一种进行性神经退行性疾病,影响数百万老年人,预计未来几年患病率将显著上升。早期诊断,特别是在轻度认知障碍(MCI)阶段,对于及时干预至关重要。结构磁共振成像(sMRI)已成为检测AD相关脑变化的关键模态,但传统的基于图的方法通常难以处理模态和站点间异质性,限制了诊断性能。在本文中,我们提出了用于阿尔茨海默病诊断的图匹配网络(GMN4AD),旨在建模来自神经影像数据的异质脑图之间的交互。与将每个脑图独立处理的传统方法不同,GMN4AD利用图匹配来捕获跨图关系,提高诊断精度。此外,我们引入了一种测试时域适应策略,结合对比学习来减轻推理过程中的域偏移。在三个公共AD数据集上的大量实验表明,GMN4AD相比最先进方法实现了优越的性能,为AD诊断提供了鲁棒且可泛化的解决方案。

英文摘要

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects millions of older adults, with prevalence expected to rise significantly in the coming years. Early diagnosis, particularly during the mild cognitive impairment (MCI) stage, is critical for timely intervention. Structural Magnetic Resonance Imaging (sMRI) has emerged as a key modality for detecting AD-related brain changes, but traditional graph-based approaches often struggle with modality and inter-site heterogeneity, limiting diagnostic performance. In this paper, we propose Graph Matching Network for Alzheimer's Disease Diagnosis (GMN4AD), designed to model interactions between heterogeneous brain graphs derived from neuroimaging data. Unlike conventional methods that treat each brain graph independently, GMN4AD leverages graph matching to capture cross-graph relationships, enhancing diagnostic precision. Furthermore, we introduce a test-time domain adaptation strategy that combines contrastive learning to mitigate domain shifts during inference. Extensive experiments on three public AD datasets demonstrate that GMN4AD achieves superior performance compared to state-of-the-art methods, offering a robust and generalizable solution for AD diagnosis.