arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
专题追踪
2606.02982 2026-06-03 cs.PF cs.DC cs.LG

DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference

DriftSched: 多租户GPU推理中运行时令牌漂移下的自适应QoS感知调度

Kathiravan Palaniappan

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出DriftSched框架,通过运行时反馈驱动的漂移补偿和自适应偏差校正,解决多租户LLM推理中令牌漂移导致的调度问题,在NVIDIA L4 GPU上实现平均38.8%的估计误差降低和42%的中位延迟改善。

Comments 17 pages, 22 figures, 7 tables

详情
AI中文摘要

大型语言模型(LLM)推理服务的快速增长增加了对高效多租户GPU调度的需求。尽管现代推理运行时(如vLLM)通过连续批处理和优化内存管理提高了吞吐量,但准确估计异构推理请求的运行时成本仍然是一个重大挑战。在实践中,观察到的输出长度通常偏离准入时的估计值,产生运行时令牌漂移,可能导致工作负载错误分类、队列不平衡、尾延迟增加和服务质量(QoS)下降。本文提出了DriftSched,一个用于NVIDIA L4 GPU上多租户LLM推理服务的自适应QoS感知调度框架。DriftSched结合了工作负载分类、令牌预算估计、租户感知队列管理和运行时反馈驱动的漂移补偿,以改进准入时的调度决策。该框架在异构多租户工作负载下评估了FIFO、优先级、加权、最短作业优先(SJF)和老化优先级调度策略。实验结果表明,各工作负载类别存在可测量的运行时令牌漂移。自适应偏差校正将工作负载估计误差平均降低38.8%(MAE)和40.5%(RMSE),提高了工作负载分类稳定性和调度准确性。在所有评估的调度器中,SJF实现了最佳整体性能,在持续GPU争用下,相对于FIFO,中位端到端延迟降低了约42%,P99延迟降低了约16%。该工作贡献了一个自适应漂移感知调度架构、一个运行时令牌漂移补偿机制,以及一个用于评估共享GPU基础设施上QoS感知LLM推理调度的可重复基准测试框架。

英文摘要

The rapid growth of large language model (LLM) inference services has increased the demand for efficient multi-tenant GPU scheduling. While modern inference runtimes such as vLLM improve throughput through continuous batching and optimized memory management, accurately estimating the runtime cost of heterogeneous inference requests remains a significant challenge. In practice, observed output lengths often deviate from admission-time estimates, creating runtime token drift that can lead to workload misclassification, queue imbalance, increased tail latency, and degraded Quality-of-Service (QoS). This paper presents DriftSched, an adaptive QoS-aware scheduling framework for multi-tenant LLM inference serving on NVIDIA L4 GPUs. DriftSched combines workload classification, token-budget estimation, tenant-aware queue management, and runtime feedback-driven drift compensation to improve admission-time scheduling decisions. The framework evaluates FIFO, Priority, Weighted, Shortest-Job-First (SJF), and Aging Priority scheduling policies under heterogeneous multi-tenant workloads. Experimental results demonstrate measurable runtime token drift across workload categories. Adaptive bias correction reduces workload estimation error by an average of 38.8% (MAE) and 40.5% (RMSE), improving workload classification stability and scheduling accuracy. Among all evaluated schedulers, SJF achieves the best overall performance, reducing median end-to-end latency by approximately 42% and P99 latency by approximately 16% relative to FIFO under sustained GPU contention. The work contributes an adaptive drift-aware scheduling architecture, a runtime token-drift compensation mechanism, and a reproducible benchmarking framework for evaluating QoS-aware LLM inference scheduling on shared GPU infrastructure.

2606.02967 2026-06-03 cs.ET cs.AI cs.AR cs.SY eess.SY

Glass Box at Orbit: A Constitutional AI Verification Framework for Trustworthy Autonomous CubeSat Intelligence

轨道上的玻璃盒:面向可信自主立方星智能的宪法AI验证框架

Karthik Barma, Anil Sanneboyina, V C Premchand Yadav

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出玻璃盒框架,通过运行时宪法AI验证层拦截自主航天器决策,利用六项物理约束和七项线性时序逻辑安全不变式确保安全,并证明其验证开销与模型规模无关。

Comments 12 pages, 2 figures, 2 tables, 32 references. Paper 1 of the Project October series on autonomous orbital intelligence

详情
AI中文摘要

航天工业正在悄然构建一个尚未被充分认识的事物:在地球上空550公里处运行数千个自主AI工作负载的轨道数据中心,且无人类参与。微软、AWS以及越来越多的轨道计算企业正在将云规模处理从地面转移到轨道。然而,它们都尚未回答治理问题——当轨道数据中心规模的自主AI系统在太空中做出错误决策时,如何在决策变得不可逆转之前阻止它们?我们引入玻璃盒:一个运行时宪法AI验证层,在单个命令到达任何航天器子系统之前,拦截来自机载AI策略的每个候选动作,并根据六项基于物理的宪法约束和七项线性时序逻辑(LTL)安全不变式对其进行评估。每个批准的动作都附带一个加权可解释性分数E(a_t)(范围[0,1])和完整的宪法审计日志。我们在Project October中演示了玻璃盒:一个针对CubeSat级航天器的完全模拟的五层自主轨道智能架构。我们证明玻璃盒的验证开销为O(N_c),其中N_c是宪法规则的数量,与模型大小或航天器状态维度无关。我们提供了宪法约束语法的完整形式规范、通过Z3和NuSMV模型检查验证的七项LTL安全不变式,以及一个详细的工作示例,展示玻璃盒在电池状态退化的日食入口处拦截不安全推理请求。随着轨道计算向数据中心基础设施规模发展,运行时宪法验证不再是研究上的新奇事物——它是每个自主轨道平台最终将需要的任务关键型安全基础设施。

英文摘要

The space industry is quietly building toward something nobody has fully reckoned with: orbital data centers running thousands of autonomous AI workloads with no human in the loop, 550 km above the Earth. Microsoft, AWS, and a growing list of orbital computing ventures are moving cloud-scale processing off the ground and into orbit. What none of them have answered yet is the governance question -- when autonomous AI systems at orbital data center scale make wrong decisions in space, what stops those decisions before they become irreversible? We introduce Glass Box: a runtime constitutional AI verification layer that intercepts every candidate action from an onboard AI policy and evaluates it against six physics-grounded constitutional constraints and seven Linear Temporal Logic (LTL) safety invariants before a single command reaches any spacecraft subsystem. Every approved action carries a weighted explainability score E(a_t) in [0,1] and a complete constitutional audit log. We demonstrate Glass Box within Project October: a fully simulated five-layer autonomous orbital intelligence architecture for CubeSat-class spacecraft. We prove that Glass Box verification overhead is O(N_c) in the number of constitutional rules, independent of model size or spacecraft state dimension. We present a complete formal specification of the constitutional constraint grammar, seven LTL safety invariants verified by Z3 and NuSMV model checking, and a detailed worked example of Glass Box intercepting an unsafe inference request at eclipse-entry under degraded battery state. As orbital computing scales toward data center infrastructure, runtime constitutional verification is no longer a research novelty -- it is mission-critical safety infrastructure that every autonomous orbital platform will eventually require.

2606.02964 2026-06-03 cs.AR cs.CL cs.LG

Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving

多段注意力:实现高效KV缓存管理以加速大型语言模型服务

Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao, Bin Cui

发表机构 * Peking University(北京大学)

AI总结 提出AsymCache,一种计算延迟感知的KV缓存管理系统,通过多段注意力、缓存驱逐策略和自适应分块调度器,在保持无损精度的同时显著降低TTFT和TPOT。

详情
AI中文摘要

大型语言模型(LLM)推理依赖键值(KV)缓存以避免冗余的注意力计算。虽然近似KV缓存保留技术通过牺牲模型精度来减少内存使用,但无损方法则从GPU内存中驱逐KV缓存块并按需重建以保留精确输出。现有的无损KV缓存管理系统主要基于访问频率或位置启发式做出驱逐决策,而不考虑不同KV缓存块如何影响GPU注意力内核的执行效率。在本文中,我们提出了AsymCache,一种用于LLM推理的计算延迟感知KV缓存管理系统,它明确地将缓存驻留决策与GPU注意力内核性能对齐,包括三个关键组件:用于高效非连续KV上下文处理的多段注意力(MSA)、联合优化命中率和位置感知重计算成本的缓存驱逐策略,以及用于高硬件利用率的自适应分块调度器。实验表明,与最新基线相比,AsymCache将TTFT降低了高达1.90-2.03倍,每输出令牌时间(TPOT)降低了1.62-1.71倍,证实了该方法在常见工作负载中的有效性,并验证了其平衡计算效率与缓存命中率的设计目标。此外,AsymCache的低级设计允许无缝集成到诸如Continuum的代理服务系统中,进一步将平均作业延迟降低高达18.1%。

英文摘要

Large Language Model (LLM) inference relies on key-value (KV) caches to avoid redundant attention computation. While approximate KV cache retention techniques reduce memory usage by sacrificing model accuracy, lossless approaches instead evict KV cache blocks from GPU memory and reconstruct them on demand to preserve exact outputs. Existing lossless KV cache management systems primarily base eviction decisions on access frequency or positional heuristics, without considering how different KV cache blocks affect the execution efficiency of GPU attention kernels. In this paper, we propose AsymCache, a computation-latency-aware KV cache management system for LLM inference that explicitly aligns cache residency decisions with GPU attention kernel performance, including three key components: Multi-Segment Attention (MSA) for efficient non-contiguous KV context processing, a cache eviction policy that jointly optimizes hit rate and position-aware recomputation cost, and an adaptive chunking scheduler for high hardware utilization. Experiments show that AsymCache reduces TTFT by up to 1.90-2.03x and time-per-output-token (TPOT) by 1.62-1.71x over latest baselines, confirming the effectiveness of the method in common workloads and validating its design goal of balancing computational efficiency with cache hit rate. Moreover, the low-level design of AsymCache allows seamless integration into agent serving systems such as Continuum, where it further reduces average job latency by up to 18.1%.

2606.02958 2026-06-03 cs.CR cs.AI

Echelon: Auditable Aggregate-Only Language-Model Adaptation Across Privacy Boundaries

Echelon: 跨隐私边界的可审计聚合专用语言模型适配

Hina Dixit, Punit Kumar, Irene Tenison, Nevasini Sasikumar

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出Echelon架构,通过强制设备级模型状态不可导出为系统不变量,仅允许聚合后的跨边界数据传输,并结合缓冲半异步安全聚合、陈旧感知加权等机制,在1B参数LoRA适配中实现低通信开销下的稳定训练。

详情
AI中文摘要

跨组织语言模型适配日益面临严格的治理约束:在许多部署中,设备级模型状态(参数、激活值、优化器状态及每设备更新)无法导出到管理边界之外。现有的分布式和联邦学习栈通常假设跨站点模型交换,然后改造隐私机制,这使合规性复杂化并导致审计脆弱。我们提出Echelon,一种边界优先的训练架构,将设备级模型状态不可导出作为系统不变量强制执行。设备在每个边界内本地训练;唯一的跨边界负载是安全聚合的边界级增量加上O(1)协调元数据,并通过具体的审计接口暴露。将交换限制为聚合改变了优化问题:系统必须在广域网延迟、异构参与、节点波动和非独立同分布数据下保持稳定,尽管全局层面从未看到每设备更新。Echelon结合了缓冲半异步安全聚合、陈旧感知加权、参与窗口、近端局部目标以及漂移感知外同步控制器。在M=2个边界上的1B参数LoRA适配中,预算匹配的竞赛(三个种子,24.88M tokens)达到验证损失3.887 +/-0.010,并在固定token、固定字节、固定挂钟时间和固定同步次数预算下,在调优的低通信基线中表现最佳或并列最佳。在OpenWebText压力测试中,Echelon在评估的广域网和非独立同分布处理下维持2,139-2,176 tokens/s的吞吐量;Echelon-DA在广域网延迟下相对于隐私对等的DiLoCo+SA基线改善了达到目标的时间,并且在200ms模拟延迟或严重非独立同分布分区下质量最多下降2.2%。

英文摘要

Cross-organization language-model adaptation increasingly faces hard governance constraints: in many deployments, device-level model state-parameters, activations, optimizer state, and per-device updates-cannot be exported outside an administrative boundary. Existing distributed and federated stacks typically assume cross-site model exchange and then retrofit privacy mechanisms, which complicates compliance and makes auditing brittle. We present Echelon, a boundary-first training architecture that enforces device-level model-state non-export as a systems invariant. Devices train locally inside each boundary; the only cross-boundary payloads are securely aggregated boundary-level deltas plus O(1) coordination metadata, exposed through a concrete audit surface. Restricting exchange to aggregates changes the optimization problem: the system must remain stable under WAN delay, heterogeneous participation, churn, and non-IID data even though the global plane never sees per-device updates. Echelon combines buffered semi-asynchronous secure aggregation, staleness-aware weighting, participation windows, proximal local objectives, and a drift-aware outer synchronization controller. In 1B-parameter LoRA adaptation across M= 2 boundaries, a budget-matched contest over three seeds (24.88M tokens) reaches validation loss 3.887 +/-0.010 and is best or tied-best among tuned low-communication baselines under fixed-token, fixed-bytes, fixed-wall-clock, and fixed-sync-count budgets. In OpenWebText stress tests, Echelon sustains 2,139-2,176 tokens/s across evaluated WAN and non-IID treatments, Echelon-DA improves time-to-target under WAN latency relative to a privacy-parityDiLoCo+SA baseline, and quality degrades by at most 2.2% under 200ms emulated latency or severe non-IID partitioning.

2606.02902 2026-06-03 cs.CY cs.LG

Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review

医疗保健中深度强化学习的公平性定义与指标:药物发现的快速证据综述

Esmaeil Shakeri, Ronnie de Souza Santos, Behrouz Far

发表机构 * Department of Electrical and Software Engineering, Schulich School of Engineering, University of Calgary(电气与软件工程系,Schulich工程学院,卡尔加里大学)

AI总结 本文通过快速证据综述,系统总结了深度强化学习在药物分子生成中公平性的定义、测量指标,并分析了数据集组成、奖励设计对公平性的影响。

Comments 10 pages, 6 figures, 3 tables. Accepted as a full paper at a symposium of IEEE COMPSAC 2026

详情
AI中文摘要

深度强化学习(DRL)越来越多地应用于从头分子设计,但数据、奖励和评估的选择可能导致在不同疾病区域和化学类型上的性能不均。尽管如此,目前尚无关于DRL药物发现中公平性如何定义、测量和测试的简明综合。在这篇快速证据综述中,我们综合了医疗保健中DRL驱动分子生成的公平性定义和指标。我们关注三个问题:(i)数据集组成和划分策略(特别是支架划分与随机划分)如何影响评估和分布偏移;(ii)奖励设计(如QED、对接、毒性、合成可及性)如何产生或减轻偏差,重点关注癌症靶点;(iii)哪些可测量指标最能捕捉公平性。这包括癌症与非癌症适应症之间以及癌症亚型之间的均等性。还包括关键物理化学描述符的分布平衡、支架/化学类型多样性、组间有效性、毒性和合成可及性。从2017年起,我们检索了主要的生物医学、计算机科学和工程文献数据库,并使用arXiv进行地平线扫描。记录通过PRISMA式程序筛选,并通过内容编码分析,将报告的均等性结果与数据集和奖励选择联系起来。我们的综述为DRL分子生成提供了一套简洁的公平性定义和指标。它为报告分布均等性和结果均等性提供了实用指南。它还总结了数据集和奖励选择如何与观察到的均等性效应相关,并指出了与可信、癌症相关的DRL生成相关的未解决问题。

英文摘要

Deep reinforcement learning (DRL) is increasingly applied to de novo molecular design, but choices in data, rewards, and evaluation can yield uneven performance across disease areas and chemotypes. Despite this, there is no concise synthesis of how fairness is defined, measured, and tested in DRL-based drug discovery. In this rapid evidence review, we synthesize fairness definitions and metrics for DRL-driven molecule generation in healthcare. We focus on three questions: (i) how dataset composition and split strategies, especially scaffold versus random splits, affect evaluation and distribution shift; (ii) how reward design (e.g., QED, docking, toxicity, synthetic accessibility) can create or mitigate bias, with emphasis on cancer targets; and (iii) which measurable metrics best capture fairness. This includes parity across cancer versus non-cancer indications and across cancer subtypes. It also includes distributional balance in key physicochemical descriptors, scaffold/chemotype diversity, groupwise validity, toxicity, and synthetic accessibility. From 2017 onward, we searched major biomedical, computer science, and engineering literature databases and used arXiv for horizon scanning. Records were screened using PRISMA-style procedures and analyzed via content coding to link reported parity outcomes to dataset and reward choices. Our review provides a concise set of fairness definitions and metrics for DRL molecule generation. It offers practical guidance for reporting distribution parity and outcome parity. It also summarizes how dataset and reward choices relate to observed parity effects and identifies open gaps relevant to trustworthy, cancer-relevant DRL generation.

2606.02883 2026-06-03 cs.HC cs.AI cs.CY cs.IR

LLM-Assisted Reranking to Operationalize Nuanced Objectives in Recommender Systems

LLM辅助重排序以在推荐系统中实现细微目标

Amir Ghasemian, Homa Hosseinmardi, Upasana Dutta, Duncan J. Watts

发表机构 * Department of Communication, University of California, Los Angeles, CA 90095(通信系,加州大学洛杉矶分校,CA 90095) Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104(计算机与信息科学系,宾夕法尼亚大学,Philadelphia, PA 19104) Amenenberg School of Communication, University of Pennsylvania, Philadelphia, PA 19104(安纳伯格通信学院,宾夕法尼亚大学,Philadelphia, PA 19104) Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA 19104(运营、信息与决策系,宾夕法尼亚大学,Philadelphia, PA 19104)

AI总结 本研究通过零样本指令提示对YouTube侧边栏候选进行重排序,发现无约束的LLM辅助重排序会放大极端和阴谋论内容,而轻量级提示正则化可在轻微损失相关性的情况下减少极端内容并增加意识形态多样性。

Comments 30 pages total; 11 pages, 5 figures, 2 tables (main text); 19 pages, 11 figures, 9 tables (appendix)

详情
AI中文摘要

推荐系统已从内容组织工具发展为塑造日常行为的复杂系统。通过控制我们所看到的内容,它们塑造了我们的感知,引发了对过滤气泡、激进化、两极分化和社会不平等的担忧。大型语言模型(LLM)实现了更强大的个性化,加剧了这些动态。然而,大多数推荐系统针对参与度或有限的准确性指标进行调优,很少关注更广泛的社会影响,例如个性化如何重塑社会重要领域中的曝光度。我们研究了LLM辅助重排序在提高个性化的同时,是否无意中放大了对意识形态极端或阴谋论政治内容的曝光,这是一种在新闻推荐中理论上存在但尚未得到实证表征的风险。使用真实的新闻消费历史,我们通过零样本、基于指令的提示对YouTube侧边栏候选进行重排序。我们比较了基线提示与一个约束变体,该变体保持主题相关性并扩大意识形态曝光,同时减少阴谋论或极端内容。在没有约束的情况下,重排序加强了个性化,但增加了对历史中包含此类内容的用户的阴谋论和极端主义材料的曝光。轻量级提示级正则化减少了对极端内容的推广并增加了意识形态多样性,同时相关性损失较小。合成实验表明,LLM通过语言中的统计规律而非对意识形态的语义理解进行重排序,这解释了为什么朴素提示会放大这些模式,而正则化可以重塑它们。总之,我们的结果突显了LLM在高风险推荐中实现上下文细微差别的能力,以及评估LLM辅助个性化超越准确性并将提示设计视为有价值负载而非中性默认的必要性。

英文摘要

Recommender systems have grown from content-organization tools into sophisticated systems that shape daily behavior. By controlling what we see, they shape what we perceive, raising concerns about filter bubbles, radicalization, polarization, and social inequality. Large language models (LLMs) enable more powerful personalization, intensifying these dynamics. Yet most recommenders are tuned for engagement or limited accuracy metrics, with little attention to broader social implications, e.g. how personalization reshapes exposure in socially consequential domains. We investigate whether LLM-assisted reranking, while improving personalization, inadvertently amplifies exposure to ideologically extreme or conspiratorial political content, a risk theorized but not empirically characterized in news recommendation. Using real news-consumption histories, we rerank YouTube's sidebar candidates through zero-shot, instruction-based prompting. We compare a baseline prompt with a constrained variant that preserves topical relevance and broadens ideological exposure while reducing conspiratorial or extreme content. Without constraints, reranking strengthened personalization but increased exposure to conspiratorial and extremist material for users whose histories contained such content. Lightweight prompt-level regularization reduced promotion of extreme content and increased ideological diversity, with modest relevance loss. Synthetic experiments suggest that LLMs rerank via statistical regularities in language rather than semantic understanding of ideology, clarifying why naive prompts amplify these patterns and why regularization can reshape them. Together, our results highlight the power of LLMs to operationalize contextual nuance in high-stakes recommendation, and the need to evaluate LLM-assisted personalization beyond accuracy and treat prompt design as a value-laden rather than neutral default.

2606.02872 2026-06-03 eess.SY cs.MA cs.RO cs.SY

Terminal Time and Angle-Constrained Nonlinear Intercept Guidance

终端时间和角度约束的非线性拦截制导

Shivam Bajpai, Abhinav Sinha

发表机构 * University of California(加州大学)

AI总结 针对单一控制输入下的欠驱动非线性拦截问题,提出基于分层滑模的制导律,同时控制终端时间和角度,并扩展至常速目标拦截。

详情
AI中文摘要

本文考虑使用横向加速度作为唯一控制输入,同时控制拦截器的撞击时间和撞击角度的问题。由于单一控制输入,非线性交战运动学本质上是欠驱动的,这使得制导律综合变得复杂。为了克服这一挑战,开发了一种基于分层滑模的制导律,以同时调节两个终端约束。所提出的架构包括一个两层滑模流形。第一层由分别对应撞击时间和撞击角度误差动力学的两个子滑模面组成,而第二层引入了一个组合两个单独子滑模面的复合滑模流形。然后,设计了一种变增益自适应制导律,以确保对静止目标的带时间和角度约束的拦截,并进一步扩展至拦截常速目标。针对各种交战场景进行了仿真,以证明所提出方法的有效性。

英文摘要

This paper considers the problem of simultaneously controlling an interceptor's impact time and impact angle using its lateral acceleration as the sole control input. With a single control input, the nonlinear engagement kinematics is inherently underactuated, which complicates guidance law synthesis. To overcome this challenge, a hierarchical sliding mode-based guidance law is developed to concurrently regulate the two terminal constraints. The proposed architecture consists of a two-layer sliding manifold. The first layer comprises two sub-sliding surfaces corresponding to the impact time and impact angle error dynamics, respectively, while the second layer introduces a composite sliding manifold that combines the two individual sub-surfaces. Then, a variable-gain adaptive guidance law is designed to ensure time and angle-constrained interception against a stationary target, which is further extended to intercept a constant velocity target. Simulations are conducted for various engagement scenarios to attest to the efficacy of the proposed approach.

2606.02867 2026-06-03 cs.MA cs.AI q-bio.PE

The Epi-LLM Framework: probing LLM behavioral priors through epidemiological agent-based models

Epi-LLM框架:通过流行病学基于智能体的模型探究LLM行为先验

Petra Ferenz, Ava Keeling, Tobias O'Keefe, Lorenzo Stigliano, Francesco Di Lauro, Andres Colubri, Jasmina Panovska-Griffiths

发表机构 * Big Data Institute, Li Ka Shing Center for Health Information and Discovery, University of Oxford, Oxford, United Kingdom(大数据研究所、李嘉诚健康信息与发现中心、牛津大学、牛津、英国) Leverhulme Centre for Demographic Science, Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom(勒弗赫姆人口科学中心、努尔菲尔德人口健康系、牛津大学、牛津、英国) Pandemic Sciences Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom(流行病学科学研究所、努尔菲尔德医学系、牛津大学、牛津、英国) Department of Genomics and Computational Biology, UMass Chan Medical School, United States(基因组与计算生物学系、UMass Chan医学学校、美国) Broad Institute of Harvard and MIT, United States(哈佛大学和麻省理工学院Broad研究所、美国) The Queen’s College, University of Oxford, Oxford, United Kingdom(女王学院、牛津大学、牛津、英国)

AI总结 提出Epi-LLM框架,整合基于智能体的建模、真实流行病游戏和大语言模型,模拟疫情中智能体行为,发现LLM智能体减少峰值感染,感知健康严重性是隔离行为最强预测因子,且LLM架构影响疫情动态。

Comments Submitted to American Journal of Epidemiology

详情
AI中文摘要

流行病期间的人类行为会影响传染病动态,但量化这一点仍然极具挑战性。本文介绍了Epi-LLM框架:一种新颖的集成方法,结合了基于智能体的建模、真实流行病游戏和大语言模型(LLM),其中合成智能体社会在疫情接触网络上进行推理并动态适应。将合成智能体行为与无干预的SEIR基线和来自AUIB流行病游戏研究的人类参与者数据进行比较,我们发现四种不同架构的LLM智能体减少了峰值活跃感染,在15天模拟的第6天,隔离合规率达到58-65%。二项广义线性模型显示,感知健康严重性是隔离行为的最强预测因子(β = 0.33, p = 0.002),伪R²为0.055,与人类试验中观察到的0.072相当。LLM架构是疫情动态的关键决定因素:低方差架构为测试行为规则提供了更高的内部效度,而高方差模型可能更好地代表现实世界中的决策。仅凭地理标签无法诱导文化差异化的行为;需要明确的态参数化。这项原理验证工作为将Epi-LLM框架部署为可扩展、无风险的模拟环境用于大流行准备研究奠定了基础。

英文摘要

Human behaviour during epidemics affects infectious disease dynamics, but quantifying this remains deeply challenging. Here we introduce the Epi-LLM framework: a novel integration of agent-based modelling, real-life epigames, and large language models (LLMs) in which a synthetic society of agents reasons and adapts dynamically over an outbreak contact network. Comparing synthetic agent behaviour against a no-intervention SEIR baseline and human participant data from the AUIB epigame study, we find that LLM agents across four different architectures reduced peak active infections, with quarantine compliance peaking at 58-65% on day six of the 15-day simulation. A binomial generalised linear model showed that perceived health severity was the strongest predictor of quarantine behaviour ($β= 0.33, p = 0.002$), yielding a pseudo-$R^2$ of 0.055, comparable to the 0.072 observed in the human trial. LLM architecture is a key determinant of epidemic dynamics: low-variance architectures offer greater internal validity for testing behavioural rules, while high-variance models may better represent real-world decision-making. Geographic labels alone do not induce culturally differentiated behaviour; explicit attitudinal parameterisation is required. This proof-of-principle work lays the groundwork for deploying the Epi-LLM framework as a scalable, risk-free simulation environment for pandemic preparedness research.

2606.02834 2026-06-03 cs.CR cs.AI

Large Byte Model: Teaching Language Models About Compiled Code

大型字节模型:教会语言模型关于编译代码的知识

Florian Störtz, Catalin-Andrei Stan, Alexandru Dinu, Sandra Servia-Rodríguez, Mihaela Gaman, Calin Miron, Edward Raff

发表机构 * CrowdStrike U.K.(CrowdStrike英国分公司) CrowdStrike Romania(CrowdStrike罗马尼亚分公司) CrowdStrike USA(CrowdStrike美国分公司)

AI总结 本文提出首个字节原生大语言模型,通过定制字节分词器扩展词汇表,使其能直接处理可执行文件原始字节并回答恶意软件分析问题,在家族分类和架构分类上分别达到69%和98%的准确率。

详情
AI中文摘要

恶意软件分析始于可执行程序的原始字节,而将其“提升”到更高级表示(如汇编)的工具成本高昂且容易出错。大型语言模型(LLM)无法处理原始字节表示并回答相关问题。为此,我们提出了首个字节原生LLM。基于使用定制字节分词器的词汇扩展技术,该模型能够回答关于恶意软件二进制的复杂问题,准确率从恶意软件家族分类的69%到架构分类的98%不等。我们的发现表明,在训练过程中提供领域知识对此应用至关重要——现成的模型既缺乏准确性也缺乏洞察力。我们已将该新兴解决方案部署给有限数量的分析师,以收集反馈进行进一步改进。

英文摘要

Malware analysis starts with the raw bytes of an executable program, and tools to "lift" these to higher-level representations, such as assembly, are expensive and subject to error. Large Language Models (LLMs) cannot process raw byte representations and answer questions about them. To this end, we present the first byte-native LLM. Based on a vocabulary expansion technique using a bespoke byte tokenizer, such a model is capable of responding to complex questions about malware binaries, with accuracies ranging from 69% for malware family classification to 98% for architecture classification. Our findings indicate that providing domain knowledge during training is essential for this application -- off-the-shelf models lack both accuracy and insight. We've deployed this emerging solution to a limited number of analysts to gather feedback for further improvements.

2606.02822 2026-06-03 cs.CR cs.AI

Which Defense Closes Which Threat? Attributing OWASP-LLM-Top-10 Coverage and Its Brittleness Under Paraphrasing

哪种防御措施应对哪种威胁?归因OWASP-LLM-Top-10覆盖及其在释义下的脆弱性

Alexandre Cristovão Maiorano

发表机构 * Lumytics

AI总结 本文通过归因分析,测量了不同防御家族(拒绝过滤、预算控制等)对OWASP-LLM-Top-10威胁的覆盖情况,并揭示了拒绝防御在释义攻击下的脆弱性。

Comments 17 pages, 4 figures, 7 tables

详情
AI中文摘要

生产级LLM应用堆叠了多种防御家族——拒绝短语过滤器、令牌预算控制、模型白名单、速率限制、工具注册认证——然而现有的攻防模拟(BAS)基准报告单一的总体覆盖数字,隐藏了哪个家族应对哪种威胁。我们测量归因。我们将四个OWASP-LLM-Top-10感知的智能体添加到一个21智能体的基线扫描器中,并针对四个合成LLM端点的格点:$L_0$(无防御)、$L_1$(仅拒绝)、$L_2$(仅预算)和$L_3$(全栈)。$L_1$和$L_2$是兄弟单轴消融,互不为子集;$L_3$是它们的并集加上工具注册认证和凭证清洗。在$N=10$次重复中,每个OWASP的发现计数清晰:仅拒绝消除所有LLM01(越狱)和LLM07(系统提示泄露)发现;仅预算通过终止多步序列消除所有LLM02(敏感信息泄露)和LLM10(无限制消耗)发现;LLM06(过度代理)需要全栈。我们探测释义下的脆弱性:使用300个Gemini生成的释义(在60模板脆弱性语料库上$K=5$),$L_1$拒绝阻断率在LLM01上下降15个百分点,在LLM07上下降25个百分点。第五个目标$L_4$-real,将存根后端替换为Gemini-2.5-flash,使用相同的$L_3$正则表达式,并与$L_1$完全匹配,表明除了正则表达式外没有可测量的对齐贡献(不是关于对齐的一般性声明)。预算控制没有下降(在扣除速率限制下限后为0个百分点)。一个通过静态基准的拒绝白名单可以被LLM驱动的释义器击败而不改变攻击意图;预算控制抵抗相同的变异。

英文摘要

Production LLM applications stack several defense families -- refusal-phrase filters, token-budget controls, model allowlists, rate limits, tool-registry authentication -- yet existing breach-and-attack-simulation (BAS) benchmarks report a single aggregate coverage number, hiding which family closes which threat. We measure attribution. We add four OWASP-LLM-Top-10-aware agents to a 21-agent baseline scanner and target a lattice of four synthetic LLM endpoints: $L_0$ (no defenses), $L_1$ (refusal-only), $L_2$ (budget-only), and $L_3$ (full stack). $L_1$ and $L_2$ are sibling single-axis ablations, not subsets of each other; $L_3$ is their union plus tool-registry authentication and credential scrubbing. Across $N=10$ replications, the per-OWASP finding count is clean: refusal alone removes all LLM01 (jailbreak) and LLM07 (system-prompt leakage) findings; budget alone removes all LLM02 (sensitive-info disclosure) and LLM10 (unbounded consumption) findings by terminating multi-step sequences; LLM06 (excessive agency) requires the full stack. We probe brittleness under paraphrasing: with 300 Gemini-generated paraphrases ($K=5$ over a 60-template brittleness corpus), $L_1$ refusal block rate falls 15 pp on LLM01 and 25 pp on LLM07. A fifth target, $L_4$-real, swaps the stub backend for Gemini-2.5-flash behind the same $L_3$ regex and matches $L_1$ exactly, indicating no measurable alignment contribution beyond the regex (not a general claim about alignment). Budget controls show no drop (0 pp once the rate-limit floor is factored out). A refusal whitelist that clears a static benchmark can be defeated by an LLM-driven paraphraser without changing attack intent; a budget control resists the same mutation.

2606.02781 2026-06-03 cs.AR cs.AI cs.ET

CRAM-ER: Error-Resilient Spintronic Computational Random Access Memory for Scalable In-Memory Computation

CRAM-ER:面向可扩展存内计算的容错自旋计算随机存取存储器

Sohan Salahuddin Mugdho, Md. Shahedul Hasan, Brahmdutta Dixit, Yang Lv, Jian-Ping Wang, Cheng Wang

发表机构 * Electrical and Computer Engineering Iowa State University of Science and Technology(电气与计算机工程学院爱荷华州立大学科学与技术学院) Electrical and Computer Engineering University of Minnesota Twin Cities(电气与计算机工程学院明尼苏达大学双城分校)

AI总结 针对基于MRAM的计算随机存取存储器(CRAM)在加速深度神经网络时面临的概率性开关错误和低吞吐量问题,提出一种混合自旋-CRAM与CMOS加法器树的容错架构(CRAM-ER),通过硬件-软件协同设计实现高能效、高可靠性的矩阵向量乘法。

详情
AI中文摘要

深度神经网络(DNN)在多个领域取得了最先进的性能。然而,传统的冯·诺依曼计算范式面临严重的内存瓶颈。新兴的近内存和存内计算方法缓解了这一问题,但引入了显著的外围开销。基于MRAM的计算随机存取存储器(CRAM)能够实现无外围开销的原位逻辑,提供了一种密集、节能的解决方案。然而,概率性的MRAM开关会导致门级错误,限制了CRAM在加速DNN时的可扩展性和可靠性。此外,大量的顺序MRAM写入严重制约了CRAM的吞吐量。为了解决这些挑战,我们提出了一种容错CRAM(CRAM-ER)架构,用于可扩展的存内矩阵向量乘法(MVM)。我们的错误感知硬件-软件协同设计框架利用混合自旋-CRAM + CMOS加法器树架构来减轻器件级错误的影响,展示了具有高面积和能效的MVM功能。我们进一步开发了错误感知模型微调和细粒度纠错技术,以增强错误容限。在DNN基准测试上对CMOS+自旋混合架构的评估显示,在将CRAM延迟降低多达两个数量级的同时,实现了近乎无损的精度,在能效和能量延迟积方面均优于CPU/GPU+高带宽DRAM。

英文摘要

Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.

2606.02737 2026-06-03 cs.IR cs.AI cs.CL

Attention Calibration for Position-Fair Dense Information Retrieval

面向位置公平的密集信息检索的注意力校准

Andrianos Michail, Elias Schuhmacher, Juri Opitz, Simon Clematide, Rico Sennrich

发表机构 * Department of Computational Linguistics University of Zurich(计算语言学系苏黎世大学)

AI总结 针对密集检索模型的位置偏差问题,提出在推理时通过注意力校准(引入强度系数λ插值原始与完全校准分布)来提升位置公平性,无需重新训练且不牺牲整体检索效果,在多个数据集和模型上验证了部分校准优于完全校准,并提供了默认配置。

详情
AI中文摘要

密集检索模型存在位置偏差:当相关信息出现在段落较后位置时,检索效果会下降(Zeng et al., 2025)。我们探究是否可以在推理时减少这种偏差,无需重新训练且不牺牲整体检索效果。为此,我们将推理时的注意力校准(Schuhmacher et al., 2026)适配到下游检索,并引入强度系数λ,在原始注意力分布和完全校准的注意力分布之间进行插值。在SQuAD-PosQ和FineWeb-PosQ上的三个嵌入模型上,我们考察了篮子大小、校准层集和强度如何影响位置公平性与检索效果之间的权衡,发现部分校准通常优于完全校准。单个配置(B=128, λ=0.5, 50%层深度)在FineWeb-PosQ上提升了所有三个模型跨位置组的nDCG@10的调和平均值,无需逐模型调参,并且适用于<s>-池化和最后token池化两种架构。该默认配置无需修改即可迁移到PosIR(涵盖10种语言和31个领域),在所有16种长度四分位×模型×检索设置组合中降低了位置敏感指数,同时保持或提升了整体nDCG@10。我们在以下网址发布扩展后的代码库:this https URL

英文摘要

Dense retrieval models exhibit positional bias: retrieval effectiveness degrades when relevant information appears later in a passage (Zeng et al., 2025). We ask whether this bias can be reduced at inference time, without retraining and without sacrificing overall retrieval effectiveness. To this end, we adapt inference-time attention calibration (Schuhmacher et al., 2026) to downstream retrieval and extend it with a strength coefficient lambda that interpolates between the original and fully calibrated attention distributions. Across three embedding models on SQuAD-PosQ and FineWeb-PosQ, we examine how basket size, calibrated layer set, and strength affect the trade-off between positional fairness and retrieval effectiveness, finding that partial calibration frequently outperforms full calibration. A single configuration (B=128, lambda=0.5, 50% layer depth) improves the harmonic mean of nDCG@10 across positional groups on FineWeb-PosQ for all three models without per-model tuning, and applies to both <s>-pooled and last-token-pooled architectures. This default configuration transfers without modification to PosIR, which spans 10 languages and 31 domains, reducing the Position Sensitivity Index in all 16 length-quartile x model x retrieval-setting combinations, while preserving or improving aggregate nDCG@10. We release our extended codebase at https://github.com/impresso/fair-sentence-transformers

2606.02644 2026-06-03 cs.CR cs.AI

A New Framework for Cybersecurity Refusals in AI Agents

AI代理中网络安全拒绝的新框架

Eliot Krzysztof Jones, Mateusz Dziemian, Matt Fredrikson, J Zico Kolter

发表机构 * Gray Swan Gray Swan AI Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出首个针对AI代理在进攻性安全场景中建立拒绝边界的框架,包括拒绝原则、任务分类和评估方法,并发现8个前沿模型中6个拒绝率接近零。

详情
AI中文摘要

代理脚手架显著提升了LLM在复杂、长期任务上的表现,在网络安全等领域带来了广泛益处和放大风险。现有的AI代理网络安全基准主要关注能力测量——代理能多有效地完成进攻性安全任务——但忽略了一个关键问题:代理何时以及如何拒绝有害请求?我们提出了首个在进攻性安全场景中建立拒绝边界的框架。我们的框架定义了(1)任务应被拒绝的原则性标准,(2)应被拒绝的任务类别,以及(3)在良性和对抗条件下测量代理鲁棒性的评估方法。我们应用该框架评估当前基于LLM的代理在一系列基于Web的进攻性安全场景中是否遵守适当的拒绝边界,发现测试的8个前沿模型中有6个拒绝率接近零,只有2个模型(GPT-5.2和GPT-5.1 Codex)表现出任何有意义的拒绝行为。

英文摘要

Agentic scaffolds have dramatically improved LLM performance on complex, long-horizon tasks, yielding both broad benefits and amplified risks in domains like cybersecurity. Existing benchmarks for AI agents in cybersecurity focus mainly on measuring proficiency--how effectively agents can complete offensive security tasks--but neglect a critical question: when and how should agents refuse harmful requests? We present the first framework for establishing refusal boundaries in offensive security contexts. Our framework defines (1) principled criteria for when tasks should be refused, (2) categories of tasks that warrant refusal, and (3) evaluation methodology for measuring agent robustness under both benign and adversarial conditions. We apply this framework to assess how current LLM-powered agents adhere to appropriate refusal boundaries across a range of web-based offensive security scenarios, finding that 6 of 8 frontier models tested show near-zero refusal rates, with only 2 models (GPT-5.2 and GPT-5.1 Codex) demonstrating any meaningful refusal behavior.

2606.02643 2026-06-03 cs.CR cs.AI cs.DB

Inference Cost Attacks for Retrieval-Augmented Large Language Models

检索增强型大语言模型的推理成本攻击

Chengliang Liu, Liangbo Ning, Yujuan Ding, Wenqi Fan

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 提出RA-ICA攻击范式,通过向外部知识库注入恶意文档,利用CREEP框架和MA-GRPO算法,使RAG增强的LLM系统推理时token消耗增加高达13.12倍且成功率超过90%。

Comments Accepted at The ACM Web Conference 2026 (WWW '26)

Journal ref Proceedings of the ACM Web Conference 2026 (WWW '26), April 13-17, 2026, Dubai, United Arab Emirates

详情
AI中文摘要

检索增强生成(RAG)增强的LLM系统虽然强大,但由于包含额外的多阶段流水线(动态检索和综合外部知识源的信息),引入了大量的推理成本。这种高运营成本暴露了一个关键漏洞,即推理成本攻击(ICA)。然而,现有的ICA通常依赖于直接提示操纵的不切实际的假设。我们认为,对RAG增强的LLM系统更可行且更强大的威胁来自污染外部知识库(例如,来自互联网的网络知识)。在这项工作中,我们引入了检索增强推理成本攻击(RA-ICA),这是一种新颖的攻击范式,通过向外部知识语料库注入恶意文档来针对RAG增强的LLM系统的计算成本。为了实现这种攻击,我们提出了通过外部投毒耗尽计算资源(CREEP),这是一种新颖的框架,利用LLM代理自动制作恶意文档,这些文档在语义上相关以便检索,并且能够有效诱导推理阶段token消耗的异常增加。为了提高攻击的有效性,我们引入了记忆增强组相对策略优化(MA-GRPO),这是一种新颖的强化学习算法,通过从历史最佳对抗文档的动态记忆中学习来微调代理。在三个真实世界数据集上的大量实验表明,RA-ICA在不降低生成答案完整性的情况下,将token消耗增加了高达13.12倍,成功率超过90%。

英文摘要

Retrieval-Augmented Generation (RAG)-enhanced LLM systems, while powerful, introduce substantial inference costs due to the inclusion of an extra multi-stage pipeline that dynamically retrieves and synthesizes information from external knowledge sources. This high operational cost exposes a critical vulnerability to Inference Cost Attacks (ICAs). However, existing ICAs often rely on the impractical assumption of direct prompt manipulation. We argue that a more feasible and potent threat to RAG-enhanced LLM systems arises from poisoning external knowledge bases (e.g., web knowledge from the Internet). In this work, we introduce the Retrieval-Augmented Inference Cost Attack (RA-ICA), a novel attacking paradigm that targets the computational cost of RAG-enhanced LLM systems by injecting malicious documents into external knowledge corpus. To operationalize this attack, we propose Computational Resource Exhaustion via External Poisoning (CREEP), a novel framework that leverages LLM agents to automatically craft malicious documents that are both semantically relevant for retrieval and potent for inducing an abnormal increase in token consumption during the inference phase. To enhance the attack's effectiveness, we introduce Memory-Augmented Group Relative Policy Optimization (MA-GRPO), a novel reinforcement learning algorithm that fine-tunes the agents by learning from a dynamic memory of historical best adversarial documents. Extensive experiments across three real-world datasets demonstrate that RA-ICA increases token consumption by up to 13.12 times with an over 90% success rate, without degrading the integrity of the generated answer.

2606.02630 2026-06-03 cs.CR cs.AI

MultiTurnPSB: Evaluating Multi-Turn Jailbreak Attacks an dClassifier-Based Defenses for Medical AI Safety

MultiTurnPSB:评估多轮越狱攻击与基于分类器的防御在医疗AI安全中的应用

Anushka Sheoran, Yiduo Hao

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出多轮对抗基准MultiTurnPSB,通过四轮对话评估医疗聊天机器人的安全漏洞,发现多轮攻击下不安全响应率从35%升至近80%,并验证了轻量级输入分类器可降低52个百分点的不安全响应但存在高误报率。

详情
AI中文摘要

面向患者的医疗聊天机器人通常在单轮提示上进行评估,但真实用户在被拒绝后会继续追问、增加紧迫感并援引权威。我们引入了MultiTurnPSB,这是PatientSafetyBench的一个四轮对抗扩展,并在固定模板、模板自适应和实时对抗攻击下评估了GPT-4.1-mini。在实时攻击下,不安全响应从第1轮的35%上升到第4轮的近80%。在相同的攻击者下,GPT-4.1-mini和Claude Sonnet 4.5在基线时统计上无差异,但到第4轮时差距扩大到19倍,这种差异在单轮评估中不可见。我们描述了四种退化轨迹特征,并识别出一个导致大多数灾难性失败的双元素攻击公式。一个轻量级的输入侧分类器将第4轮不安全响应降低了52个百分点,尽管准确性严重下降,但对良性查询的45%误报率是主要的部署限制。还出现了一个方法论发现:Claude Sonnet在超过一半的后期对话中拒绝生成对抗性消息,尽管有明确的红队框架,这表明安全训练可能泛化到攻击者角色。

英文摘要

Patient-facing medical chatbots are commonly evaluated on single-turn prompts, yet real users push back after refusals, add urgency, and invoke authority. We introduce MultiTurnPSB, a four-turn adversarial extension of PatientSafetyBench, and evaluate GPT-4.1-mini under fixed template, template-adaptive, and live adversarial attacks. Unsafe responses rise from 35% to nearly 80% by Turn 4 under live attack. Under the same adversary, GPT-4.1-mini and Claude Sonnet 4.5 are statistically indistinguishable at baseline but diverge to a 19x gap by Turn 4, a difference invisible to single-turn evaluation. We characterize four degradation trajectory signatures and identify a two-element attack formula responsible for most catastrophic failures. A lightweight input-side classifier reduces Turn 4 unsafe responses by 52 percentage points despite severe accuracy degradation, but the 45% false alarm rate on benign queries is the primary deployment constraint. A methodological finding also emerges: Claude Sonnet refused to generate adversarial messages in over half of late-turn conversations despite explicit red team framing, suggesting safety training may generalize to the attacker role.

2606.02623 2026-06-03 cs.NE cs.AI cs.LG

Oscillatory State-Space Models as Inductive Biases for Physics-Informed Neural PDE Solvers

振荡状态空间模型作为物理信息神经PDE求解器的归纳偏置

Abhishek Chandra, Taniya Kapoor

发表机构 * KTH Royal Institute of Technology(皇家理工学院) Wageningen University & Research(瓦赫宁根大学与研究中心)

AI总结 提出一种结合振荡状态空间动力学和PDE感知空间谱的PINN方法,以改进时变PDE求解的精度和内存效率。

详情
AI中文摘要

求解时变偏微分方程(PDE)是计算科学与工程中的一个重要问题。物理信息神经网络(PINN)从控制方程中学习PDE解。然而,准确捕捉时间演化仍然具有挑战性。最近的基于序列模型的方法使用通用序列模型参数化时间演化,这些模型捕捉时间依赖性,但没有显式编码PDE解的结构化动力学。此外,它们的内存需求可能随序列长度和分辨率而不利地扩展,限制了在大规模或高维设置中的适用性。本文介绍了一种PINN方法,该方法结合了振荡状态空间动力学来表示PDE解的模态结构。所提出的方法利用基于线性振荡器的时间演化,以及空间上的PDE感知谱基。这种设计实现了闭式空间微分和边界条件的一致强制执行。该方法在前向、逆和高维PDE问题上进行了评估,包括高达100个空间维度的情况。结果表明,与最近基于序列模型的PINN方法相比,该方法提高了精度并减少了内存使用。总体而言,本文强调了将结构化动力学先验纳入神经PDE求解器的时间演化中的好处,并建议设计更符合物理和计算高效的PINN架构。

英文摘要

Solving time-dependent partial differential equations (PDEs) is an important problem in computational science and engineering. Physics-informed neural networks (PINNs) learn PDE solutions from governing equations. However, accurately capturing temporal evolution remains challenging. Recent sequence-model-based approaches parameterize time evolution using general-purpose sequence models, which capture temporal dependencies but do not explicitly encode the structured dynamics of PDE solutions. In addition, their memory requirements can scale unfavorably with sequence length and resolution, limiting applicability in large-scale or high-dimensional settings. This work introduces a PINN approach that incorporates oscillatory state-space dynamics to represent the modal structure of PDE solutions. The proposed method leverages a linear-oscillator-based temporal evolution, together with a PDE-aware spectral basis in space. This design enables closed-form spatial differentiation and consistent enforcement of boundary conditions. The method is evaluated on forward, inverse, and high-dimensional PDE problems, including cases up to 100 spatial dimensions. The results show improved accuracy and reduced memory usage compared to recent sequence-model-based PINN approaches. Overall, this work highlights the benefits of incorporating structured dynamical priors into the temporal evolution of neural PDE solvers and suggests designing more physics-aligned and computationally efficient PINN architectures.

2606.02618 2026-06-03 cs.CE cs.AI cs.MA physics.chem-ph

Closed-Loop Molecular Design with Calibrated Deference

闭环分子设计中的校准式退让

Newman Cheng, Gordon Broadbent, Jason Dong, Syed Mohammed Ali Hussaini, Farman Ullah, Morris Sharp, Gabrielle Barnes, Nanlin Guo, Deyu Zou, Karin Strauss, William Chappell, David G. Kwabi, Bichlien H. Nguyen, Jake A. Smith

发表机构 * Microsoft Discovery & Quantum(微软发现与量子) Microsoft Research(微软研究院) Department of Chemical and Environmental Engineering, Yale University(耶鲁大学化学与环境工程系) CanAm Bioresearch Inc.(CanAm 生物研究公司)

AI总结 提出CLIO智能体,通过持续更新的信念状态图和递归计划-行动循环实现校准式退让,在闭环人机协作中成功设计出性能优于文献基准的AORFB负极电解液。

详情
AI中文摘要

我们提出了通过原位优化实现认知循环(CLIO),这是一种将持续更新的信念状态图与递归计划-行动循环相结合的智能体。结果产生了一个推理智能体,能够贡献某种定性的不同之处,我们称之为“校准式退让”:即识别自身工具或假设何时失败、相应调整策略、并生成指导实验修订的机制性假设的能力。我们在一个闭环人机协作活动中测试了CLIO,以设计一种水性有机氧化还原液流电池(AORFB)负极电解液,CLIO在与合成、表征并参与设计选择的化学家密切合作中主导了提议和解释。在三轮共17个候选分子中,CLIO收敛于一个最佳的膦酸酯候选物;表征证实其氧化还原电位比文献基准提高了130 mV。随后表征揭示了出乎意料的差电化学可逆性——这是所有性质预测器都未能标记的回归。CLIO生成了相互竞争的机制性假设,优先安排了诊断性实验,将失败归因于膦酸酯-钾离子配对,并建议用磺酸酯替代。所得化合物显示出显著改善的电化学可逆性,并保持了90 mV的氧化还原电位提升,从而闭环了设计-制造-测试-再设计循环。

英文摘要

We present Cognitive Loop via In-Situ Optimization (CLIO), an agent that couples a continuously-updated belief-state graph with a recursive plan-then-act loop. The result is a reasoning agent that can contribute something qualitatively different, which we term \emph{calibrated deference}: the capacity to recognize when its own tools or assumptions are failing, to adapt its strategy in response, and to generate mechanistic hypotheses that guide experimental revision. We tested CLIO in a closed-loop human-AI campaign to design an aqueous organic redox flow battery (AORFB) negolyte, with CLIO leading proposal and interpretation in close partnership with chemists who synthesized, characterized, and weighed in on design choices. Across 17 candidates over three rounds, CLIO converged on a top phosphonate candidate; characterization confirmed a 130~mV improvement in redox potential over the literature baseline. Characterization then revealed unexpectedly poor electrochemical reversibility -- a regression no property predictor had flagged. CLIO generated competing mechanistic hypotheses, prioritized discriminating diagnostics, traced the failure to phosphonate-potassium ion pairing, and prescribed a sulfonate replacement. The resulting compound showed substantially improved electrochemical reversibility and maintained a 90~mV improvement in redox potential, closing the design-make-test-redesign loop.

2606.02614 2026-06-03 cs.CE cs.AI

Margin Play: A Multi-Agent System For Public Policy Analysis In The Brazilian Equatorial Margin

边际博弈:巴西赤道边缘地区公共政策分析的多智能体系统

Antonio de Sousa Leitão Filho, Fabrício Saul Lima, Selby Mykael Lima dos Santos, Rejani Bandeira Vieira Sousa, Luís Jorge Mesquita de Jesus, Dennys Correia da Silva, Allan Kardec Duailibe Barros Filho

发表机构 * Aia Context Universidade Federal do Maranhão — UFMA(佛罗里达州立大学马纳汉分校) Universidade Estadual de Campinas — UNICAMP(坎皮纳斯州立大学)

AI总结 针对巴西赤道边缘地区石油勘探对马拉尼昂州福利影响的问题,提出基于多智能体强化学习(MARL)的仿真系统Margin Play,通过CTDE范式和BRO-MARL训练六个智能体,发现福利增益取决于制度安排,MA-Prospero配置可显著提升福利并降低环境负债。

详情
AI中文摘要

巴西赤道边缘(BEM)是巴西下一个海上石油前沿,预计于2026年在亚马逊福斯盆地开始运营。其资产在财政和领土上主要与马拉尼昂州相关联——该州在联邦中人类发展指数最低(0.676,IBGE 2022)。这引出了核心政策问题:在什么条件下,BEM的勘探能为马拉尼昂州产生净正外部性?问题本质上是多智能体的:联邦政府寻求收入和能源安全;州政府在宪法规定的特许权使用费专用下寻求区域福利;运营商在风险下最大化利润;ANP和IBAMA持有冲突的职责;亚马逊社区优先考虑领土和环境因素而非货币收入。我们提出Margin Play,一个多智能体强化学习(MARL)系统,在巴西经验校准和经典经济学文献下模拟这些张力。它实现了CTDE范式下的六个智能体,使用BRO-MARL进行训练。来自六个场景中60,000个回合的结果表明,答案取决于制度安排:在参考基线之下,福利增益微乎其微(Waval约1.68),而MA-Prospero配置产生Delta W = +17.5%和Delta Rcom = +21.3%,同时环境负债较低(Eamb = 0.048 vs. 0.076)。根本问题并非生产与福利之间的权衡,而是与勘探相关的公共政策制度的选择。

英文摘要

The Brazilian Equatorial Margin (BEM) is Brazil's next offshore oil frontier, with operations expected to begin in 2026 in the Foz do Amazonas basin. Its assets are fiscally and territorially linked primarily to Maranhao -- the state with the lowest HDI in the Federation (0.676, IBGE 2022). This raises the central policy question: under what conditions does BEM exploration generate net positive externalities for Maranhao? The problem is intrinsically multi-agent: the Federal Government seeks revenue and energy security; the state seeks regional welfare under constitutional royalty earmarking; the operator maximizes profit under risk; ANP and IBAMA hold conflicting mandates; and Amazonian communities prioritize territorial and environmental vectors over monetary income. We present Margin Play, a Multi-Agent Reinforcement Learning (MARL) system simulating these tensions under Brazilian empirical calibration and classical economic literature. It implements six agents under the CTDE paradigm, trained with BRO-MARL. Results from 60,000 episodes across six scenarios indicate the answer is conditional on the institutional regime: under the reference baseline, the welfare gain is marginal (Waval approx. 1.68), whereas the MA-Prospero configuration yields Delta W = +17.5% and Delta Rcom = +21.3%, with a lower environmental liability (Eamb = 0.048 vs. 0.076). The fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration.

2606.02610 2026-06-03 cs.CE cs.AI cs.LG physics.ao-ph

Samudra 2: Scaling Ocean Emulators across Resolutions

Samudra 2: 跨分辨率扩展海洋仿真器

Yuan Yuan, Jesse Rusak, Alexander Merose, Adam Subel, Pavel Perezhogin, Alistair Adcroft, Carlos Fernandez-Granda, Laure Zanna

发表机构 * Courant Institute School of Mathematics, Computing, and Data Science, New York University(Courant学院数学、计算与数据科学系,纽约大学) Open Athena AI Foundation, Inc.(开放Athena人工智能基金会) Program in Atmospheric and Oceanic Sciences, Princeton University(大气与海洋科学项目,普林斯顿大学)

AI总结 针对现有海洋神经仿真器在长期自回归滚动中出现的方差崩溃和印记伪影问题,提出Samudra 2,通过改进U-Net骨干网络和动态损失函数,在1°分辨率下将上层海洋全球平均温度R²从0.56提升至0.87,并将深层海洋温度误差降低约七倍,且可扩展至1/2°和1/4°分辨率。

详情
AI中文摘要

海洋环流模式(OGCM)对气候科学至关重要,但计算成本高,限制了集合规模和强迫情景。神经仿真器有望实现数量级的加速,然而现有的海洋仿真器未能将精细空间分辨率与多年自回归滚动相结合。Samudra是第一个产生多十年全球滚动的自回归神经海洋仿真器,但仅限于$1^\\\circ$分辨率,并表现出两种长期故障模式:\\emph{方差崩溃},即时间变异性的丧失,以及\\emph{印记伪影},即速度模式泄漏到深海场中。我们提出Samudra 2,它引入了更宽的U-Net骨干网络,采用修改后的ConvNeXt风格块和减小的块内扩展因子,以及一个动态损失函数,根据预测误差重新加权输出通道,从而增强缓慢演变的深海场的梯度。在$1^\\\circ$分辨率下,Samudra 2将上层海洋全球平均温度$R^2$从0.56提高到0.87,并将深海温度误差降低约七倍。相同的架构可扩展到$1/2^\\\circ$和$1/4^\\\circ$分辨率,在大约8年的自回归滚动中恢复中尺度涡旋和尖锐的西边界流。在单个GPU上运行,Samudra 2能够为海平面预测、海洋热吸收和气候变率研究提供更大的集合。我们在此https URL提供代码、文档和基准资源。

英文摘要

Ocean general circulation models (OGCMs) are essential to climate science but computationally expensive, limiting ensemble size and forcing scenarios. Neural emulators promise orders-of-magnitude speedups, yet existing ocean emulators have not combined fine spatial resolution with multi-year autoregressive rollouts. Samudra, the first autoregressive neural ocean emulator to produce multi-decade global rollouts, is limited to $1^\circ$ resolution and exhibits two long-horizon failure modes: \emph{variance collapse}, the loss of temporal variability, and \emph{imprinting artifacts}, in which velocity patterns leak into deep-ocean fields. We present Samudra 2, which introduces a wider U-Net backbone with modified ConvNeXt-style blocks and a reduced block-internal expansion factor, together with a dynamic loss that reweights output channels according to their prediction errors, strengthening gradients for slow-evolving deep-ocean fields. At $1^\circ$, Samudra 2 increases upper-ocean global-mean temperature $R^2$ from 0.56 to 0.87 and reduces deep-ocean temperature error by roughly sevenfold. The same architecture scales to $1/2^\circ$ and $1/4^\circ$ over approximately 8-year autoregressive rollouts, recovering mesoscale eddies and sharp western boundary currents. Running on a single GPU, Samudra 2 enables larger ensembles for sea-level projections, ocean heat uptake, and climate variability studies. We provide code, documentation, and benchmark resources at https://openathena.ai/Ocean_Emulator/.

2606.02588 2026-06-03 cs.LO cs.AI cs.PL

Lean-GAP: A Dataset of Formalized Graduate Algebra Problems

Lean-GAP:形式化研究生代数问题数据集

Seewoo Lee, Byung-Hak Hwang, Hyojae Lim, Jihoon Hyun, Ilkyoo Choi, Yeachan Park, Jineon Baek, Hyukpyo Hong, Keewoo Lee, Jaeseong Heo, Hyungryul Baik, Chul-hee Lee, Kyu-Hwan Lee

发表机构 * University of California, Berkeley(加州大学伯克利分校) Korea Advanced Institute of Science and Technology(韩国科学技术院) Hanyang University(翰阳大学) Hufs University(Hufs大学) Sungkyunkwan University(成均馆大学) University of Wisconsin - Madison(威斯康星大学麦迪逊分校) Sejong University(世宗大学) University of Connecticut(康涅狄格大学)

AI总结 本文提出Lean-GAP数据集,包含430个来自Dummit和Foote《抽象代数》的形式化研究生代数问题,并开发了从PDF预处理到自动形式化再到验证的可扩展流水线。

详情
AI中文摘要

我们提出了Lean-GAP(Lean-研究生代数问题),包含来自Dummit和Foote的教科书《抽象代数》中的430个形式化研究生代数问题。我们开发了一个可扩展的流水线,包括PDF到LaTeX的预处理、自动形式化为Lean 4以及非正式-正式对应关系的验证。虽然预处理和自动形式化阶段可以很大程度上自动化,但我们发现验证仍然是最微妙和最劳动密集的组成部分,需要仔细的人工监督。我们的贡献包括:(i) 构建了一个结构化的形式化习题数据集,(ii) 一种系统化的教科书数学形式化方法,以及(iii) 对形式化过程中反复出现的挑战的分析。我们还比较了不同自动形式化模型的性能,并强调了将非正式陈述翻译为形式语言的关键瓶颈。

英文摘要

We present Lean-GAP (Lean-Graduate Agebra Problems), 430 formalized graduate-level algebra problems from the textbook Abstract Algebra by Dummit and Foote. We develop a scalable pipeline consisting of PDF-to-LaTeX preprocessing, autoformalization into Lean 4, and verification of informal-formal correspondence. While the preprocessing and autoformalization stages can be largely automated, we find that verification remains the most subtle and labor-intensive component, requiring careful human oversight. Our contributions include (i) the construction of a structured dataset of formalized exercises, (ii) a systematic methodology for formalizing textbook mathematics, and (iii) an analysis of recurring challenges in the formalization process. We also compare the performance of different autoformalization models and highlight key bottlenecks in translating informal statements into formal language.

2606.02582 2026-06-03 cs.CE cs.LG cs.NA math.NA

Applying Two-Grid Preconditioner for Subsurface Flow Simulation using Attention-enhanced Hybrid Network to Accelerate Multiscale Discretization in High-contrast Media

应用注意力增强混合网络的两网格预条件子进行高对比度介质中地下流动模拟以加速多尺度离散化

Peiqi Li, Jie Chen, Shubin Fu

发表机构 * xjtlu.edu.cn(XTL大学)

AI总结 提出一种结合学习与多尺度数值方法的混合框架,利用注意力增强混合网络预测多尺度基函数,并通过两网格预条件求解器加速高对比度介质中达西方程的数值求解。

详情
AI中文摘要

本文研究了强非均质、高对比度渗透率介质中达西方程的高效数值求解,提出了一种结合学习与多尺度数值方法的混合框架。学习组件用于预测混合广义多尺度有限元方法(混合GMsFEM)中的多尺度基函数,旨在减少离线阶段所需的重复局部计算。一旦预测出这些基函数,全局系统被组装,并通过两网格预条件求解器计算压力场。所提方法加速了昂贵的局部基函数构建阶段,同时保留了底层求解器的多尺度离散化和预条件迭代结构。在二维非均质达西问题上的数值实验表明,与几种代表性基于学习的方法相比,所提框架能获得更准确的最终压力重构,并在强非均质和高对比度系数下保持稳定。与传统混合GMsFEM相比,其主要优势在于基函数生成阶段的效率,而全局求解的质量仍由两网格预条件子保证。这些结果表明,通过学习加速多尺度基函数构建,同时保留成熟的全局问题数值求解器,为高分辨率达西型模拟提供了一种可行方法。

英文摘要

In this paper, we study the efficient numerical solution of Darcy equations in strongly heterogeneous media with high-contrast permeability and propose a hybrid framework that combines learning with multiscale numerical methods. The learning component is used for the prediction of multiscale basis functions in the mixed generalized multiscale finite element method (mixed GMsFEM), with the goal of reducing the repeated local computations required in the offline stage. Once these basis functions are predicted, the global system is assembled and the pressure field is computed by a two-grid preconditioned solver. The resulting method accelerates the costly local basis-construction stage while retaining the multiscale discretization and preconditioned iterative structure of the underlying solver. Numerical experiments on two-dimensional heterogeneous Darcy problems show that the proposed framework yields more accurate final pressure reconstruction than several representative learning-based methods and remains stable under strong heterogeneity and high-contrast coefficients. In comparison with the traditional mixed GMsFEM, its main advantage lies in the efficiency of the basis-generation stage, while the quality of the global solve is still ensured by the two-grid preconditioner. These results indicate that accelerating multiscale basis construction through learning, while preserving a mature numerical solver for the global problem, provides a viable approach for high-resolution Darcy-type simulations.

2606.02758 2026-06-03 math.DG cs.LG math.CT

Theoretical Aspects of Lie Groupoid and Lie Algebroid Equivariant Convolutional Neural Networks

李群胚与李代数胚等变卷积神经网络的理论方面

Michael Astwood

发表机构 * Department of Mathematics, University of Manitoba(曼尼托巴大学数学系)

AI总结 本文引入李群胚等变神经网络作为拓扑范畴等变神经网络在可微情形的特化,证明其与李代数胚等变神经网络的等价性,并推广了群不变全局池化。

Comments 28 pages, 2 figures. Preliminary version. Comments and criticism welcome!

详情
AI中文摘要

我们将李群胚等变神经网络作为最近提出的拓扑范畴等变神经网络在可微情形的特化引入。李群胚等变神经网络由李群胚提升卷积和李群胚卷积层组成,并且我们展示了对于合适的李群胚,它们等价于某些李代数胚等变神经网络。此外,我们将群不变全局池化描述为群不变全局池化的推广。进一步,我们通过证明上述每一层都是最近引入的可容许范畴等变层的特例,即它们定义了连续特征函子之间的连续自然变换,从而证明了这一点。

英文摘要

We introduce Lie groupoid equivariant neural networks as a specialization of recently proposed topological category-equivariant neural networks to the differentiable setting. Lie groupoid equivariant neural networks are composed from Lie groupoid lifting convolutions and Lie groupoid convolution layers, and we show how for suitable Lie groupoids they are equivalent to certain Lie algebroid-equivariant neural networks. We additionally describe groupoid invariant global pooling as a generalization of group invariant global pooling. Furthermore, we show that each of the aforementioned layers is a special case of recently introduced admissible category-equivariant layers by demonstrating that they define continuous natural transformations between continuous feature functors.

2606.03517 2026-06-03 quant-ph cs.AI cs.LG

Scalable On-Hardware Training of Quantum Neural Networks and Application to Clinical Data Imputation

可扩展的量子神经网络片上训练及其在临床数据填补中的应用

Natansh Mathur, Panagiotis Kl. Barkoutsos, Masako Yamada, Martin Roetteler, Iordanis Kerenidis

发表机构 * IRIF, CNRS and Université Paris Cité(巴黎-萨克雷大学 IRIF 实验室、法国国家科学研究中心和巴黎-萨克雷大学) QC Ware, France(法国 QC Ware 公司) IonQ(IonQ 公司) Quantum Signals(量子信号)

AI总结 提出一种结合蝴蝶电路架构、逐层训练策略和并行化参数位移规则的训练框架,将梯度估计成本从O(n^2)降至O(log n),并在MIMIC-III数据集上验证了其可扩展性和性能。

Comments 13 pages, 9 figures

详情
AI中文摘要

在量子硬件上训练量子神经网络(QNN)目前受限于梯度估计的成本:标准参数位移方法所需的电路评估次数随可训练参数数量二次增长,使得在小型系统之外难以进行基于硬件的优化。在这项工作中,我们引入了一个训练框架,将该成本降低到量子比特数的对数级别,使得在近期硬件上以更大规模进行基于梯度的QNN优化成为可能。我们的框架结合了三个协同设计的要素:(i)一种结构化的、保持子空间的蝴蝶电路架构,具有$O(n \log n)$个参数和对数深度;(ii)一种逐层训练策略,将片上优化限制在每次一个小型、结构良好的层上;(iii)一种并行化的参数位移规则,利用每个蝴蝶层内的交换结构,在恒定数量的电路执行中提取所有梯度。这些共同将每个优化步骤所需的独立电路评估次数从$O(n^2)$减少到$O(\log n)$。我们使用MIMIC-III电子健康记录数据集在临床数据填补上验证了该框架,这是一个对优化不稳定性和模型方差敏感的高要求基准。混合经典-量子模型直接在IonQ Forte Enterprise离子阱硬件上以16量子比特进行训练,性能相对于理想或噪声模拟没有下降,并通过张量网络模拟以32量子比特进行训练,32量子比特推理在硬件上执行。得到的模型在下游患者生存预测中匹配或超过强经典神经网络基线,同时表现出跨运行的低方差,证明了所提出的框架在现实硬件约束下实现了实用、可扩展的QNN训练。

英文摘要

Training quantum neural networks (QNNs) on quantum hardware is currently bottlenecked by the cost of gradient estimation: standard parameter-shift methods require a number of circuit evaluations that grows quadratically with the number of trainable parameters, making hardware-based optimisation impractical beyond small system sizes. In this work, we introduce a training framework that reduces this cost to logarithmic in the number of qubits, making gradient-based QNN optimisation feasible on near-term hardware at increasing scales. Our framework combines three co-designed ingredients: (i) a structured, subspace-preserving Butterfly circuit architecture with $O(n \log n)$ parameters and logarithmic depth; (ii) a layer-wise training strategy that confines on-hardware optimisation to one small, well-structured layer at a time; and (iii) a parallelised parameter-shift rule that exploits the commuting structure within each Butterfly layer to extract all gradients in a constant number of circuit executions. Together these reduce the number of distinct circuit evaluations per optimisation step from $O(n^2)$ to $O(\log n)$. We validate the framework on clinical data imputation using the MIMIC-III electronic health record dataset, a demanding benchmark sensitive to optimisation instability and model variance. Hybrid classical-quantum models are trained directly on IonQ Forte Enterprise trapped-ion hardware at 16 qubits without performance degradation relative to ideal or noisy simulation and via tensor-network simulation at 32 qubits, with 32-qubit inference executed on hardware. The resulting models match or exceed strong classical neural baselines in downstream patient survival prediction while exhibiting reduced variance across runs, demonstrating that the proposed framework enables practical, scalable QNN training under realistic hardware constraints.

2606.02655 2026-06-03 quant-ph cs.GT cs.LG math.OC

Coherent Swap Regret and Channel-Proof Learning

相干交换遗憾与信道证明学习

Sohail Sarkar

发表机构 * Sohail (Neel) Sarkar

AI总结 针对量子博弈中局部CPTP偏差,提出相干交换遗憾作为基准,并通过熵镜像上升算法实现O(√(dT log d))的遗憾界,揭示了非幺正使用推荐寄存器是困难根源,并应用于有限量子博弈达到ε-近似可分量子相关均衡。

Comments 23 pages

详情
AI中文摘要

外部遗憾仅保证相对于固定替代行为的稳定性。在量子博弈中,这遗漏了一个自然的物理操作:玩家可以对其实际接收或制备的状态应用局部完全正迹保持(CPTP)映射。我们引入相干交换遗憾作为针对所有此类局部CPTP偏差的遗憾基准,并给出一种算法,通过熵镜像上升在CPTP Choi切片上结合不动点博弈规则,实现O(√(dT log d))的相干交换遗憾。主要结果是一个三级偏差类景观。替换通道以Θ(√(T log d))的速率恢复普通外部遗憾。幺正通道(包括幺正偏差和幺正混合)具有零极小极大遗憾。确定性测量-制备通道在中等时间范围内已迫使Ω(√(dT log d))的遗憾,且该速率对所有CPTP偏差也是充分的。因此,困难源于对推荐寄存器的非幺正使用,而非仅量子相干性。作为应用,有限量子博弈中的去中心化完全信息学习在T=O(max_i d_i log d_i/ε^2)轮后达到ε-近似可分量子相关均衡。我们将这些均衡与中介量子推荐协议的信道证明性等同,给出适用于任意有限维状态的局部CPTP可剥削性的SDP审计,并包含一个在Haar随机纯态探测下具有伪遗憾O(d^{4/3}T^{2/3}(log d)^{1/3})的探测-赌博机扩展。

英文摘要

External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared. We introduce coherent swap regret as the regret benchmark against all such local CPTP deviations, and give an algorithm achieving $O(\sqrt{dT\log d})$ coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule. The main result is a three-level deviation-class landscape. Replacement channels recover ordinary external regret at rate $Θ(\sqrt{T\log d})$. Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret. Deterministic measurement-and-preparation channels already force $Ω(\sqrt{dT\log d})$ regret in the moderate-horizon regime, and this rate is also sufficient for all CPTP deviations. Thus the hardness comes from non-unital use of the recommendation register, not from quantum coherence alone. As an application, decentralized full-information learning in finite quantum games reaches an $\varepsilon$-approximate separable quantum correlated equilibrium after $T=O(\max_i d_i\log d_i/\varepsilon^2)$ rounds. We identify these equilibria with channel-proofness of mediated quantum recommendation protocols, give an SDP audit for local CPTP exploitability applicable to arbitrary finite-dimensional states, and include a probing-bandit extension with pseudo-regret $O(d^{4/3}T^{2/3}(\log d)^{1/3})$ under Haar-random pure-state probes.

2606.03917 2026-06-03 physics.app-ph cs.LG

Beyond Gradient Descent: Adam for Analog Ising Machines

超越梯度下降:用于模拟伊辛机的Adam优化器

Stijn Van Vooren, Guy Van der Sande, Guy Verschaffelt

发表机构 * Applied Physics research group, Vrije Universiteit Brussel(应用物理研究组,布鲁塞尔自由大学)

AI总结 研究将动量法和Adam优化器应用于模拟连续时间伊辛机,通过推导连续时间版本,在Max-Cut基准测试中显著缩短求解时间并提高解质量,并引入一阶连续时间近似作为物理实现的简化起点。

Comments submitted to Physical Review E

详情
AI中文摘要

随着摩尔定律达到极限,伊辛机为难优化问题提供了一种有前景的替代计算方法。然而,许多模拟、时间连续的伊辛机依赖类似梯度下降的动力学来寻找解,这可能限制速度和鲁棒性。我们研究了动量法和Adam优化是否能改进这些系统。由于这些优化器传统上以离散时间形式表述,我们推导了适用于模拟、时间连续伊辛机动力的连续时间版本。在Max-Cut基准测试中,我们发现基于Adam的动力学相比基于梯度下降和动量的动力学,显著减少了达到目标的时间并提高了解质量。我们进一步引入了Adam的一阶连续时间近似,旨在作为未来物理实现的更简单起点,并且在连续时间设置中表现优于完整的Adam公式。我们还研究了纯算法离散时间设置,其中在较容易的问题实例上性能差距缩小,而在较难的加权问题实例上基于Adam的更新规则表现最佳。这些结果将连续时间Adam动力学确定为模拟伊辛机的一个强大设计原则。

英文摘要

As Moore's law reaches its limits, Ising machines offer a promising alternative computing approach for difficult optimization problems. However, many analog, time-continuous Ising machines rely on gradient-descent-like dynamics to find solutions, which can limit speed and robustness. We investigate whether momentum and Adam optimization can improve these systems. Since these optimizers are traditionally formulated in discrete time, we derive continuous-time versions suitable for analog, time-continuous Ising-machine dynamics. On Max-Cut benchmarks, we find that Adam-based dynamics substantially reduce time-to-target and improve solution quality compared with gradient-descent- and momentum-based dynamics. We further introduce a first-order continuous-time approximation of Adam that is intended as a simpler starting point for future physical implementations and while performing better than the full Adam formulation in a continuous-time setting. We also study a purely algorithmic discrete-time setting, where the performance gap is reduced on easier problem instances, while the Adam-based update rule performs best on harder weighted problem instances. These results identify continuous-time Adam dynamics as a powerful design principle for analog Ising machines.

2606.02646 2026-06-03 physics.soc-ph cs.AI cs.MA

The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size

多智能体大语言模型系统中的林格曼效应:有效团队规模的缩放定律

Blaž Bertalanič, Carolina Fortuna

发表机构 * Jozef Stefan Institute(乔泽夫·斯蒂芬研究所)

AI总结 本文推导出两参数缩放定律 $R(N) = N_\text{eff}/N = 1/(1+c(N-1)N^{-\beta})$,将多智能体LLM系统分为三种渐近状态,并通过44个实验单元验证了该定律,发现密集辩论无法增加答案多样性,噪声安慰剂可模拟自我修正效果,且仅异构团队能突破硬上限。

Comments 41 pages, 9 figures, 20 tables

详情
AI中文摘要

推理时多智能体大语言模型缩放缺乏共享单位:计数名义智能体混淆了成本与独立证据。我们推导出一个两参数缩放定律 $R(N) = N_\text{eff}/N = 1/(1+c(N-1)N^{-\beta})$,其中状态指数 $\beta$ 将任何配置分类为三种渐近状态之一——硬上限为 $1/c$($\beta = 0$)、亚线性为 $N^\beta/c$($0 < \beta < 1$)或线性($\beta \ge 1$),并且平均场定理预测智能体辩论中的同伴数量 $k$ 和轮次 $\tau$ 仅通过其乘积 $k\tau$ 进入动力学。该定律适用于两个层面:答案多样性和正确性冗余。在44个(模型 $\times$ 任务 $\times$ 条件)单元中,涵盖同伴辩论、自我修正、随机噪声安慰剂、自一致性、三个开放权重系列(Qwen、Llama、Ministral)从7B到32B规模,并辅以前沿API检查(Gemini)、思维模型、异构团队和稀疏通信,函数形式在每个条件下拟合 $R^2 > 0.99$;仅 $(c, \beta)$ 发生偏移。在自由形式数学问题上,密集同伴影响将答案层面状态从亚线性坍缩为硬上限;正确性层面拟合始终保持硬上限。三个发现具有实际意义。 (i) 三十个密集辩论智能体在MMLU-Hard上产生的答案多样性不超过一个智能体。 (ii) 噪声安慰剂在自由形式数学问题及4倍规模下追踪自我修正,因此在同质团队中,通常归因于“辩论”的收益来自重新评估,而非同伴内容。 (iii) 单个 $N \le 5$ 的试点预测了 $N=30$ 的结构上限,并且在测试的配置中,只有架构多样性(异构团队)降低了 $c$ 并逃离了硬上限状态,通信模式干预则不能。

英文摘要

Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law $R(N) = N_\text{eff}/N = 1/(1+c(N-1)N^{-β})$ where the regime exponent $β$ classifies any configuration into one of three asymptotic regimes -- hard-ceiling at $1/c$ ($β= 0$), sublinear at $N^β/c$ ($0 < β< 1$), or linear ($β\ge 1$), and a mean-field theorem predicts that peer count $k$ and rounds $τ$ during agent debate enter the dynamics only through their product $kτ$. The law applies at two levels: answer diversity and correctness redundancy. Across 44 (model $\times$ task $\times$ condition) cells spanning peer debate, self-correction, random-noise placebo, self-consistency, three open-weight families (Qwen, Llama, Ministral) at scales from 7B to 32B with a frontier API check (Gemini), thinking models, heterogeneous teams, and sparse communication, the functional form fits every condition at $R^2 > 0.99$; only $(c, β)$ shifts. On free-form math, dense peer influence collapses the answer-level regime from sublinear into hard-ceiling; correctness-level fits remain hard-ceiling throughout. Three findings have practical implications. \emph{(i)}~Thirty dense debating agents produce no more answer diversity than one on MMLU-Hard. \emph{(ii)}~A noise placebo tracks self-correction on free-form math and at $4\times$ scale, so within homogeneous teams the gain commonly attributed to ``debate'' comes from re-evaluation, not peer content. \emph{(iii)}~A single $N \le 5$ pilot predicts the $N=30$ structural ceiling, and within the configurations tested only architectural diversity (heterogeneous teams) lowers $c$ and escapes the hard-ceiling regime, communication-mode interventions do not.

2606.03735 2026-06-03 nlin.CD cs.MA cs.RO

On dynamic multi-agent pathfinding methods: review, simulations and modifications

动态多智能体路径规划方法:综述、仿真与改进

Gabriel Fejziaj, Salama Hassona, Wieslaw Marszalek

发表机构 * Department of Computer Science, Opole University of Technology(计算机科学系,奥波尔技术大学)

AI总结 本文系统研究动态多智能体路径规划(D-MAPF)中的六种代表性算法,并提出一种基于模板的A**算法,通过离线几何路径生成与在线时间适应解耦,在频繁变化和有限感知环境中提高解质量。

详情
AI中文摘要

本文系统研究了动态多智能体路径规划(D-MAPF)背景下的路径规划算法,该设置结合了动态障碍物、部分可观测性和智能体间冲突。我们在统一的仿真框架内评估了六种代表性算法:Dijkstra、D* Lite、Space-Time A*、WHCA*、M*以及一种新方法A**。提出的A**算法引入了一种基于模板的方法,将离线几何路径生成与在线时间适应解耦。通过预计算多条多样候选路径并使用时空规划动态重新连接,A**在频繁变化和有限感知的环境中提高了解质量。

英文摘要

This paper presents a systematic study of pathfinding algorithms in the context of Dynamic Multi-Agent Pathfinding (D-MAPF), a setting that combines dynamic obstacles, partial observability, and inter-agent conflicts. We evaluate six representative algorithms: Dijkstra, D* Lite, Space-Time A*, WHCA*, M*, and a novel method denoted as A** within a unified simulation framework. The proposed A** algorithm introduces a template-based approach that decouples offline geometric path generation from online temporal adaptation. By precomputing multiple diverse candidate paths and dynamically reconnecting to them using space-time planning, A** improves solution quality in environments with frequent changes and limited sensing

2606.02600 2026-06-03 cond-mat.dis-nn cs.LG

High-Dimensional Latents Should Be Diagnosed Through Phase Structure

高维潜在变量应通过相结构进行诊断

Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado

发表机构 * Queensland University of Technology(昆士兰技术大学)

AI总结 本文通过自旋玻璃理论分析自编码器和变分自编码器的潜在空间,提出基于相结构的诊断方法,并展示其在生成和异常检测任务中的实际效益。

Comments 9+22 pages, 4+6 figures, under review

详情
AI中文摘要

我们通过自旋玻璃理论的视角研究自编码器和变分自编码器的潜在空间。本文包含两个部分。首先,我们形式化了一个潜在空间自旋玻璃字典:对于固定的解码器,重建项与超球坐标先验共同在潜在球面上诱导出一个哈密顿量,其中潜在坐标扮演连续自旋的角色,先验则充当外部磁场。这使我们能够引入可操作的自旋玻璃诊断——重叠分布、磁化率和块自旋粗粒化——来检测训练后潜在表示中的有序、无序和边缘稳定相。其次,我们表明,有意将潜在系统推向拓扑平凡化区域的边缘稳定状态会带来具体的下游后果。在生成方面,超球压缩改善了CIFAR-10和CelebA64上的重建-生成权衡,在保持或改善重建的同时降低了自FID。在异常检测方面,相同的半有序潜在几何提高了完全无监督和条件性OOD检测的性能,包括真实世界的火星车和Galaxy Zoo数据集,以及基于CIFAR-10/100和Imagenette的OOD基准。因此,我们倡导对AE/VAE采用相感知的评估范式,其中自旋玻璃可观测量补充标准机器学习指标,并揭示在许多情况下决定下游成功或失败的潜在区域。

英文摘要

We study autoencoder and variational-autoencoder latent spaces through the lens of spin-glass theory. The paper has two components. First, we formalize a latent-space spin-glass dictionary: for a fixed decoder, the reconstruction term together with a hyperspherical coordinates prior induces a Hamiltonian on the latent sphere, where latent coordinates play the role of continuous spins and the prior acts as an external magnetic field. This allows us to import operational spin-glass diagnostics -- overlap distributions, susceptibility, and block-spin coarse-graining -- to detect ordered, disordered, and edge-of-stability phases in trained latent representations. Second, we show that deliberately driving the latent system toward the edge-of-stability of the topological trivialization regime has concrete downstream consequences. In generation, hyperspherical compression improves the reconstruction-generation trade-off on CIFAR-10 and CelebA64, yielding lower self-FID while preserving or improving reconstruction. In anomaly detection, the same semi-ordered latent geometry improves both fully unsupervised and conditional OOD detection, including real-world Mars Rover and Galaxy Zoo datasets, as well as CIFAR-10/100 and Imagenette-based OOD benchmarks. We therefore advocate a phase-aware evaluation paradigm for AEs/VAEs, in which spin-glass observables complement standard ML metrics and expose the latent regimes that underlie downstream success or failure in many cases.

2606.02912 2026-06-03 astro-ph.IM cs.LG gr-qc physics.geo-ph

Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures

基于Transformer架构的三分量地震图数据驱动预测

Waleed Esmail, Stuart Russell, Jana Klinge, Alexander Kappes, Christine Thomas

发表机构 * Institut für Kernphysik, Universität Münster(穆斯特大学核物理研究所) Institut für Geophysik, Universität Münster(穆斯特大学地质物理研究所) James Cook University(詹姆斯·库克大学) Geological Survey of Denmark and Greenland(丹麦和格陵兰地质调查局)

AI总结 提出基于Transformer的自回归模型SeismoGPT,通过物理约束的延续问题框架直接预测三分量地震波形,在合成数据上实现中位数归一化互相关>0.93,证明了Transformer序列模型可学习地震波场的稳定动力学延续。

Comments 35 pages, 13 figures and 4 tables

详情
AI中文摘要

由于地震波传播的非线性、色散和多尺度特性,预测超出观测数据的地震波形仍然具有挑战性。在这项工作中,我们引入了 extsc{SeismoGPT},一种基于Transformer的自回归模型,旨在直接在时域中预测三分量地震波形。预测被表述为一个物理约束的延续问题,其中模型接收从P波到达开始并延伸至S波到达后定义时间的波形上下文,之后在没有真实样本的情况下递归生成未来运动。在合成地震图上进行评估,这些地震图覆盖了5--100 km的震源深度、10--90$^\circ$的震中距离以及$3 \leq M_w \leq 7$的震级。为了区分上下文长度和预测范围的影响,我们使用距离归一化上下文比率和固定的120秒及240秒预测范围定义了三种评估配置。在所有配置中,模型的中位数归一化互相关均高于0.93。对代表性预测的分析表明,成功的预测保留了相位一致性和频谱能量分布。在出现失败案例时,主要原因是自回归展开过程中的逐渐相位漂移,而非非物理的信号生成。这些结果表明,基于Transformer的序列模型可以学习地震波场的稳定动力学延续,凸显了基础模型方法在物理驱动时间序列预测中的潜力。该方法在地震预警和减灾中具有潜在应用,特别是对于下一代引力波观测站,如爱因斯坦望远镜。

英文摘要

Forecasting seismic waveforms beyond observed data remains challenging due to the nonlinear, dispersive, and multi-scale nature of seismic wave propagation. In this work, we introduce \textsc{SeismoGPT}, a transformer-based autoregressive model designed to forecast three-component seismic waveforms directly in the time domain. Forecasting is formulated as a physically constrained continuation problem in which the model receives waveform context beginning at the P-wave arrival and extending a defined time beyond the S-wave arrival, after which future motion is generated recursively without access to ground-truth samples. Evaluation is performed on synthetic seismograms spanning source depths of 5--100\,km, epicentral distances of 10--90$^\circ$, and magnitudes $3 \leq M_w \leq 7$. To disentangle the effects of context length and prediction horizon, we define three evaluation configurations using a distance-normalized context ratio and fixed prediction horizons of 120 and 240\,s. Across all configurations, the model achieves median normalized cross correlation above 0.93. Analysis of representative forecasts shows that successful predictions preserve both phase coherence and spectral energy distribution. Where failure cases arise, this is primarily due to gradual phase drift during autoregressive rollout rather than unphysical signal generation. These results demonstrate that transformer-based sequence models can learn stable dynamical continuation of seismic wavefields, highlighting the potential of foundation-model approaches for physics-driven time-series forecasting. There are potential applications of this methodology in seismic warning and hazard mitigation, particularly for next-generation gravitational-wave observatories, such as the Einstein Telescope.

2606.02788 2026-06-03 astro-ph.IM cs.LG

Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction

中微子指纹:基于图像的 IceCube 事件编码用于 CNN 方向重建

Floriano Tori, Brecht Verbeken, Vincent Ginis

发表机构 * Data Analytics Lab, Vrije Universiteit Brussel(自由大学布鲁塞尔数据分析实验室) imec-SMIT, Vrije Universiteit Brussel(imec-SMIT,自由大学布鲁塞尔) School of Engineering and Applied Sciences, Harvard University(哈佛大学工程与应用科学学院)

AI总结 提出将 IceCube 中微子事件编码为紧凑的 72×72×3 图像(中微子指纹),利用 ResNet18 卷积网络实现方向重建,平均角误差为 1.10 rad,性能媲美更复杂架构。

Comments 6 pages, 1 figure

详情
AI中文摘要

在 IceCube 中微子天文台中重建入射中微子的方向是天体物理学中的一个重要问题。公开的 IceCube--Neutrinos in Deep Ice Kaggle 竞赛提供了 1.4 亿个模拟事件来基准测试重建技术。为了从新颖的角度解决这一挑战,我们引入了中微子指纹——紧凑的 $72 \times 72 \times 3$ 图像,其中每个像素代表一个探测器,脉冲时序和电荷统计编码为颜色通道。这种表示将稀疏、不规则的脉冲数据转换为适合卷积处理的密集图像。我们的 ResNet18 模型实现了 $1.10$ rad 的平均角误差,表明基于指纹训练的卷积网络在性能上可与更复杂的架构相媲美,同时为 IceCube 事件重建提供了有效、可解释的基线。

英文摘要

Reconstructing the direction of incoming neutrinos in the IceCube Neutrino Observatory is an important problem in astrophysics. The public IceCube--Neutrinos in Deep Ice Kaggle competition provided 140 million simulated events to benchmark reconstruction techniques. To address this challenge from a novel perspective we introduce neutrino fingerprints compact $72 \times 72 \times 3$ images in which each pixel represents a single detector, with pulse timing and charge statistics encoded as color channels. This representation transforms sparse, irregular pulse data into dense images suitable for convolutional processing. Our ResNet18 model achieves a mean angular error of $1.10$ rad, indicating that convolutional networks trained on fingerprints rival more complex architectures while offering an effective, interpretable baseline for IceCube event reconstruction.