arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2051
2508.15943 2026-06-11 cs.AI

T-ILR: a Neurosymbolic Integration for LTLf

T-ILR:一种用于LTLf的神经符号集成

Riccardo Andreoni, Andrei Buliga, Alessandro Daniele, Chiara Ghidini, Marco Montali, Massimiliano Ronzani

发表机构 * Fondazione Bruno Kessler(布鲁诺·科塞勒基金会) Free University of Bozen-Bolzano(博兹纳-博尔扎诺自由大学) University of Bozen-Bolzano(博兹纳-博尔扎诺大学)

AI总结 本文提出T-ILR框架,将LTLf时序逻辑规范直接融入深度学习架构,提升序列任务的准确性和效率。

Comments Accepted for presentation at NeSy 2025. 10 pages

详情
Journal ref
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025)
AI中文摘要

本文提出T-ILR框架,将LTLf时序逻辑规范直接融入深度学习架构,提升序列任务的准确性和效率。

英文摘要

State-of-the-art approaches for integrating symbolic knowledge with deep learning architectures have demonstrated promising results in static domains. However, methods to handle temporal logic specifications remain underexplored. The only existing approach relies on an explicit representation of a finite-state automaton corresponding to the temporal specification. Instead, we aim at proposing a neurosymbolic framework designed to incorporate temporal logic specifications, expressed in Linear Temporal Logic over finite traces (LTLf), directly into deep learning architectures for sequence-based tasks. We extend the Iterative Local Refinement (ILR) neurosymbolic algorithm, leveraging the recent introduction of fuzzy LTLf interpretations. We name this proposed method Temporal Iterative Local Refinement (T-ILR). We assess T-ILR on an existing benchmark for temporal neurosymbolic architectures, consisting of the classification of image sequences in the presence of temporal knowledge. The results demonstrate improved accuracy and computational efficiency compared to the state-of-the-art method.

2505.11308 2026-06-11 cs.LG physics.comp-ph

Reinforcement Learning Closures for Underresolved Partial Differential Equations using Synthetic Data

利用合成数据为未解析偏微分方程构建强化学习闭合模型

Lothar Heimbach, Sebastian Kaltenbach, Petr Karnakov, Francis J. Alexander, Petros Koumoutsakos

发表机构 * ETH Zurich/ Harvard University(苏黎世联邦理工学院/哈佛大学) Harvard University(哈佛大学) Argonne National Laboratory(阿贡国家实验室)

AI总结 本文提出利用合成数据和强化学习为未解析偏微分方程构建闭合模型,通过伯格斯方程和输运方程验证方法有效性,并展示闭合模型可从非均匀方程泛化到均匀方程。

详情
AI中文摘要

偏微分方程(PDEs)描述从湍流和流行病到量子力学和金融市场等广泛现象。尽管计算科学近期取得进展,但为现实应用求解此类PDEs仍因需解析广泛的空间时间尺度而成本过高。因此,从业者常依赖粗粒度近似,以牺牲精度换取计算资源减少。为缓解此类近似带来的细节损失,闭合模型用于表示未解析的空间时间相互作用。本文提出一种利用合成数据(通过制造解法获取)开发PDE闭合模型的框架。这些数据与强化学习结合,为粗粒度PDEs提供闭合。通过一维和二维伯格斯方程及二维输运方程验证方法有效性,并展示闭合模型训练于非均匀PDEs可有效泛化至均匀PDEs。结果展示了在数据稀缺系统中开发准确且计算高效的闭合模型的潜力。

英文摘要

Partial Differential Equations (PDEs) describe phenomena ranging from turbulence and epidemics to quantum mechanics and financial markets. Despite recent advances in computational science, solving such PDEs for real-world applications remains prohibitively expensive because of the necessity of resolving a broad range of spatiotemporal scales. In turn, practitioners often rely on coarse-grained approximations of the original PDEs, trading off accuracy for reduced computational resources. To mitigate the loss of detail inherent in such approximations, closure models are employed to represent unresolved spatiotemporal interactions. We present a framework for developing closure models for PDEs using synthetic data acquired through the method of manufactured solutions. These data are used in conjunction with reinforcement learning to provide closures for coarse-grained PDEs. We illustrate the efficacy of our method using the one-dimensional and two-dimensional Burgers' equations and the two-dimensional advection equation. Moreover, we demonstrate that closure models trained for inhomogeneous PDEs can be effectively generalized to homogeneous PDEs. The results demonstrate the potential for developing accurate and computationally efficient closure models for systems with scarce data.

2407.08035 2026-06-11 cs.CL cs.IR

FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios

FsPONER: 面向领域特定场景的少样本提示优化命名实体识别

Yongjian Tang, Rakebul Hasan, Thomas Runkler

发表机构 * Technical University of Munich(慕尼黑技术大学) Siemens AG(西门子股份公司)

AI总结 本文提出FsPONER方法,通过三种少样本选择策略优化提示,针对工业制造和维护领域,在数据稀缺场景下实现比微调BERT模型更高的F1分数。

Comments accepted in the main track at the 27th European Conference on Artificial Intelligence (ECAI-2024)

详情
AI中文摘要

大型语言模型(LLMs)为命名实体识别(NER)任务提供了新路径。与微调相比,基于LLMs的提示方法避免了训练需求,节省了大量计算资源,并依赖极少的标注数据。先前研究在通用NER基准上实现了与全监督BERT微调方法相当的性能,但未探讨LLMs在领域特定场景下的少样本学习效率。为此,本文引入FsPONER,一种优化少样本提示的新方法,并在工业制造和维护领域评估其性能,使用GPT-4-32K、GPT-3.5-Turbo、LLaMA 2-chat和Vicuna等多模型。FsPONER包含基于随机采样、TF-IDF向量和两者结合的三种少样本选择方法。我们比较了这些方法与通用GPT-NER方法在少样本示例增加时的性能,并评估其最优NER性能与微调BERT和LLaMA 2-chat的对比。在考虑的数据稀缺现实场景中,FsPONER结合TF-IDF在F1分数上比微调模型高出约10%。

英文摘要

Large Language Models (LLMs) have provided a new pathway for Named Entity Recognition (NER) tasks. Compared with fine-tuning, LLM-powered prompting methods avoid the need for training, conserve substantial computational resources, and rely on minimal annotated data. Previous studies have achieved comparable performance to fully supervised BERT-based fine-tuning approaches on general NER benchmarks. However, none of the previous approaches has investigated the efficiency of LLM-based few-shot learning in domain-specific scenarios. To address this gap, we introduce FsPONER, a novel approach for optimizing few-shot prompts, and evaluate its performance on domain-specific NER datasets, with a focus on industrial manufacturing and maintenance, while using multiple LLMs -- GPT-4-32K, GPT-3.5-Turbo, LLaMA 2-chat, and Vicuna. FsPONER consists of three few-shot selection methods based on random sampling, TF-IDF vectors, and a combination of both. We compare these methods with a general-purpose GPT-NER method as the number of few-shot examples increases and evaluate their optimal NER performance against fine-tuned BERT and LLaMA 2-chat. In the considered real-world scenarios with data scarcity, FsPONER with TF-IDF surpasses fine-tuned models by approximately 10% in F1 score.

2411.10077 2026-06-11 cs.CV

Hierarchical Mutual Distillation for Multi-View Fusion: Learning from All Possible View Combinations

多视角融合的分层互蒸馏:从所有可能的视角组合中学习

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(翰阳大学) Hankuk University of Foreign Studies(韩国民法大学)

AI总结 本文提出一种新颖的多视角不确定性加权互蒸馏方法,通过分层互蒸馏提升预测一致性,有效利用各视角信息并缓解不确定预测的影响。

详情
Journal ref
Pattern Recognition 178 (2026) 113432
AI中文摘要

多视角学习常面临有效利用不同角度和位置拍摄图像的挑战,尤其是在处理视角间不一致性和不确定性时更为突出。本文提出了一种新颖的多视角不确定性加权互蒸馏(MV-UWMD)方法。我们的方法通过在所有可能的视角组合中进行分层互蒸馏来增强预测一致性,包括单视角、部分多视角和全多视角预测。这引入了一种基于不确定性的加权机制,通过互蒸馏有效利用每个视角的独特信息,同时减轻不确定预测的影响。我们扩展了CNN-Transformer混合架构以促进在多个视角组合中的稳健特征学习和整合。我们使用了一个大规模、非结构化的数据集进行广泛实验,该数据集来自多样且非固定视角的拍摄。结果表明,MV-UWMD相比现有多视角学习方法在预测准确性和一致性方面有所提升。

英文摘要

Multi-view learning often faces challenges in effectively leveraging images captured from different angles and locations. This challenge is particularly pronounced when addressing inconsistencies and uncertainties between views. In this paper, we propose a novel Multi-View Uncertainty-Weighted Mutual Distillation (MV-UWMD) method. Our method enhances prediction consistency by performing hierarchical mutual distillation across all possible view combinations, including single-view, partial multi-view, and full multi-view predictions. This introduces an uncertainty-based weighting mechanism through mutual distillation, allowing effective exploitation of unique information from each view while mitigating the impact of uncertain predictions. We extend a CNN-Transformer hybrid architecture to facilitate robust feature learning and integration across multiple view combinations. We conducted extensive experiments using a large, unstructured dataset captured from diverse, non-fixed viewpoints. The results demonstrate that MV-UWMD improves prediction accuracy and consistency compared to existing multi-view learning approaches.

2502.07990 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

在复杂流体的多时空尺度上学习有效动力学

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * Harvard SEAS(哈佛大学SEAS)

AI总结 本文提出Graph-LED框架,利用图神经网络和注意力自回归模型从少量模拟数据中提取有效动力学,用于预测复杂流体的时空物理行为。

Comments Conference on Parsimony and Learning (CPAL)

详情
AI中文摘要

对具有多时空尺度动力学的复杂流体流动建模和模拟是许多科学和工程领域中的基本挑战。全尺度解析模拟对于如高度湍流系统等系统在可预见的未来不可行,因此降阶模型必须捕捉涉及多尺度相互作用的动力学。在本文中,我们提出了一种新的框架,即基于图的学习有效动力学(Graph-based Learning of Effective Dynamics,Graph-LED),该框架利用图神经网络(GNNs)以及基于注意力的自回归模型,从少量模拟数据中提取有效动力学。GNNs将流场表示为无结构网格上的图,并有效处理复杂几何和非均匀网格。所提出的方法结合了基于GNN的变量大小无结构网格降维方法,以及能够自动学习时间依赖性的自回归时间注意力模型。我们评估了所提出的方法在一系列流体动力学问题上的性能,包括圆柱后方流动和背向台阶上的流动,涵盖了不同雷诺数范围。结果表明,该方法在时空物理预测方面具有稳健和有效的能力;在圆柱后方流动的情况下,既捕捉到了靠近圆柱的小尺度效应,也捕捉到了其尾流。

英文摘要

Modeling and simulation of complex fluid flows with dynamics that span multiple spatio-temporal scales is a fundamental challenge in many scientific and engineering domains. Full-scale resolving simulations for systems such as highly turbulent flows are not feasible in the foreseeable future, and reduced-order models must capture dynamics that involve interactions across scales. In the present work, we propose a novel framework, Graph-based Learning of Effective Dynamics (Graph-LED), that leverages graph neural networks (GNNs), as well as an attention-based autoregressive model, to extract the effective dynamics from a small amount of simulation data. GNNs represent flow fields on unstructured meshes as graphs and effectively handle complex geometries and non-uniform grids. The proposed method combines a GNN based, dimensionality reduction for variable-size unstructured meshes with an autoregressive temporal attention model that can learn temporal dependencies automatically. We evaluated the proposed approach on a suite of fluid dynamics problems, including flow past a cylinder and flow over a backward-facing step over a range of Reynolds numbers. The results demonstrate robust and effective forecasting of spatio-temporal physics; in the case of the flow past a cylinder, both small-scale effects that occur close to the cylinder as well as its wake are accurately captured.

2402.00972 2026-06-11 cs.LG cs.MA physics.comp-ph

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

基于网格强化学习的粗粒化偏微分方程闭合发现

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * ETH Zurich(苏黎世联邦理工学院) Harvard SEAS(哈佛大学工程学院)

AI总结 本文提出利用网格强化学习系统性地识别粗粒化偏微分方程中的闭合项,通过数值解验证了该方法在预测和加速计算方面的有效性。

Comments Conference on Parsimony and Learning (CPAL)

详情
AI中文摘要

可靠预测临界现象,如天气、野火和流行病,通常依赖于由偏微分方程(PDE)描述的模型。然而,捕捉由此类PDE描述的全部时空尺度的模拟往往成本过高。因此,通常采用各种启发式和经验闭合项进行粗粒化模拟。我们提出了一种新颖且系统的方法,利用基于网格的强化学习来识别粗粒化PDE中的闭合项。该方法通过高效fully convolutional network(FCN)表示中心策略,利用归纳偏置和局部性。通过求解传播方程和Burgers方程的数值解,展示了框架的能力和限制。结果表明,对于输入和输出分布测试用例都能实现准确预测,并且相比解析所有尺度有显著加速。

英文摘要

Reliable predictions of critical phenomena, such as weather, wildfires and epidemics often rely on models described by Partial Differential Equations (PDEs). However, simulations that capture the full range of spatio-temporal scales described by such PDEs are often prohibitively expensive. Consequently, coarse-grained simulations are usually deployed that adopt various heuristics and empirical closure terms to account for the missing information. We propose a novel and systematic approach for identifying closures in under-resolved PDEs using grid-based Reinforcement Learning. This formulation incorporates inductive bias and exploits locality by deploying a central policy represented efficiently by a Fully Convolutional Network (FCN). We demonstrate the capabilities and limitations of our framework through numerical solutions of the advection equation and the Burgers' equation. Our results show accurate predictions for in- and out-of-distribution test cases as well as a significant speedup compared to resolving all scales.

2412.12231 2026-06-11 cs.RO cs.LG

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

展示连接全球实验室生产站点的数据到知识流程

Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner

发表机构 * Chair for Laser Technology, RWTH Aachen University(激光技术系,亚琛RWTH大学) Knowledge Based Systems Group, RWTH Aachen University(知识系统小组,亚琛RWTH大学) Communication Science, RWTH Aachen University(通信科学,亚琛RWTH大学) Chair of Textile Technology, RWTH Aachen University(纺织技术系,亚琛RWTH大学) Laboratory for Machine Tools and Production Engineering, RWTH Aachen University(机械加工与生产工程实验室,亚琛RWTH大学) Human Computer Interaction Center, RWTH Aachen University(人机交互中心,亚琛RWTH大学) Fraunhofer Institute for Laser Technology(弗劳恩霍夫激光技术研究所)

AI总结 本文提出数据到知识流程,用于连接全球实验室生产站点,通过数字影子网络实现数据整合与存储,提升工业效率和可扩展性。

Comments 15 pages, 6 figures, submitted to CAiSE 2025

详情
Journal ref
MDPI MAKE (Machine Learning and Knowledge Extraction (2026), 8(5)
AI中文摘要

生产数字化转型需要新的数据整合和存储方法,以及在开发、生产和使用周期中垂直和水平运作的决策支持系统。本文提出数据到知识(和知识到数据)流程作为生产中的通用概念,基于数字影子网络(一种增强数字孪生的概念)。我们展示了一个概念证明,基于现有基础设施,1)在数据湖仓中捕获并语义标注多个独立组织和用例中相似但独立的机器人的轨迹数据,2)一个独立过程动态查询匹配数据以训练反向动态基础模型用于机器人控制。本文讨论了该方法的挑战和益处,以及数据到知识流程如何在全球实验室中提升效率和工业可扩展性。

英文摘要

The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.

2408.00157 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn

Generative Learning of the Solution of Parametric Partial Differential Equations Using Guided Diffusion Models and Virtual Observations

利用引导扩散模型和虚拟观测生成参数偏微分方程解的生成学习

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * School of Engineering and Applied Sciences, Harvard University(哈佛大学工程与应用科学学院)

AI总结 本文提出一种生成学习框架,通过梯度引导和虚拟观测建模高维参数系统,通过两个案例研究展示其在无结构网格和有结构网格上的有效性,减少计算成本,提高流体动力学预测效率。

详情
AI中文摘要

我们介绍了一种生成学习框架,用于利用梯度引导和虚拟观测建模高维参数系统。我们考虑由偏微分方程(PDEs)描述的系统,其通过结构化或非结构化网格离散化。该框架整合多层次信息以生成高保真的系统动态时间序列。我们通过两个案例研究展示了该框架的有效性和通用性:一个是无结构网格上的不可压缩二维低雷诺数圆柱流,另一个是有结构网格上的不可压缩湍流通道流,两者均通过雷诺数参数化。我们的结果展示了该框架的鲁棒性及其在各种参数设置下生成准确流体序列的能力,显著降低了计算成本,从而实现了高效的流体动力学预测和重构。

英文摘要

We introduce a generative learning framework to model high-dimensional parametric systems using gradient guidance and virtual observations. We consider systems described by Partial Differential Equations (PDEs) discretized with structured or unstructured grids. The framework integrates multi-level information to generate high fidelity time sequences of the system dynamics. We demonstrate the effectiveness and versatility of our framework with two case studies in incompressible, two dimensional, low Reynolds cylinder flow on an unstructured mesh and incompressible turbulent channel flow on a structured mesh, both parameterized by the Reynolds number. Our results illustrate the framework's robustness and ability to generate accurate flow sequences across various parameter settings, significantly reducing computational costs allowing for efficient forecasting and reconstruction of flow dynamics.

2312.11540 2026-06-11 cs.LG

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

随机森林中节点数与树数之间的权衡

Tatsuya Akutsu, Avraham A. Melkman, Atsuhiro Takasu

发表机构 * Bioinformatics Center, Institute for Chemical Research, Kyoto University(京都大学生物信息学中心,化学研究所) Department of Computer Science, Ben-Gurion University of the Negev(巴伊兰大学内盖夫分校计算机科学系) National Institute of Informatics, Chiyoda-ku, Tokyo, Japan(日本东京千代田区国立信息研究所)

AI总结 研究了随机森林预测阶段中用更少的树集合表示树袋的问题,证明了当n-T为常数时,n变量的多数函数可由T棵多项式大小的树表示,且n和T必须为奇数以避免平局。

详情
AI中文摘要

本文聚焦于随机森林的预测阶段,研究如何用更少的树袋表示树袋的问题,仅考虑二元域上的二元决策问题和简单决策树,其中内部节点仅查询单变量的布尔值。主要结果表明,当n-T为常数时,n个变量的多数函数可由T(<n)棵多项式大小的决策树表示,其中n和T必须为奇数以避免平局。此外,还证明当允许小分类误差时,n棵决策树可由T棵多项式大小的决策树表示。还给出了关于k-out-of-n函数的相关结果。

英文摘要

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

2606.12387 2026-06-11 cs.DB cs.AI 新提交

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

TAHOE: 基于经验的自动提示优化文本到SQL系统

Zhiyi Chen, Jie Song, Peng Li

发表机构 * ByteDance Inc.(字节跳动公司) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出TAHOE系统,通过错误驱动的提示学习管道将调试痕迹转化为结构化提示库,结合策略层建模用户意图,在Spider 2.0-Snow上无需更新参数即可显著提升Text-to-SQL性能。

详情
AI中文摘要

大型语言模型(LLM)通过Text-to-SQL使数据库访问民主化,但从原型到生产部署仍然困难。实际部署必须处理严格的SQL方言、大规模模式和不断变化的用户偏好,而有监督微调成本高且僵化,代理测试时扩展昂贵。我们提出Tahoe,一个将提示优化视为动态数据管理问题的系统。Tahoe在开发和部署阶段使用错误驱动的提示学习管道,将调试痕迹整合到结构化的提示库中。编译器反馈被提炼为可重用的语法提示(针对方言特定规则),而执行和用户反馈被转换为语义提示(针对模式和用户特定逻辑)。Tahoe进一步引入策略层,将冲突的用户意图建模为共享自然语言触发下的竞争策略,并利用近期信号和学习后归因统计来总结经验成功、危害、惰性和支持。在推理时,Tahoe检索相关提示,并通过逻辑规划后接SQL合成引导LLM。我们实现并评估了开发阶段的工作流,将部署时的人类反馈更新留作未来工作。在Spider 2.0-Snow上,Tahoe在不更新模型参数的情况下显著改进了Text-to-SQL。在113个有监督的Spider 2.0-Snow-0212示例上使用GPT-5.5,Tahoe将通过率从61.95%提高到79.42%,pass-at-4从72.57%提高到87.61%,实现了100%的Snowflake语法通过率,并将每个采样候选的平均编译器反馈批评轮次从2.79降低到0.12。相同的提示库也迁移到较弱的骨干模型,包括在Doubao-2.0-lite上获得19.7个百分点的通过率提升。

英文摘要

Large Language Models (LLMs) have democratized database access through Text-to-SQL, but moving from prototypes to production remains difficult. Real deployments must handle strict SQL dialects, massive schemas, and evolving user preferences, while supervised fine-tuning is costly and rigid and agentic test-time scaling is expensive. We present Tahoe, a system that treats prompt optimization as a dynamic data management problem. Tahoe uses an error-driven hint learning pipeline across Development and Deployment to consolidate debugging traces into a structured Hint Bank. Compiler feedback is distilled into reusable Syntax Hints for dialect-specific rules, while execution and user feedback are converted into Semantic Hints for schema- and user-specific logic. Tahoe further introduces a Strategy Layer that models conflicting user intents as competing strategies under shared natural-language triggers, with recency signals and post-learning attribution statistics that summarize empirical success, harm, inertness, and support. At inference time, Tahoe retrieves relevant hints and guides the LLM through Logic Planning followed by SQL Synthesis. We implement and evaluate the development-phase workflow, leaving deployment-time human-feedback updates for future work. On Spider 2.0-Snow, Tahoe substantially improves Text-to-SQL without updating model parameters. On 113 supervised Spider 2.0-Snow-0212 examples using GPT-5.5, Tahoe raises pass rate from 61.95 percent to 79.42 percent and pass-at-4 from 72.57 percent to 87.61 percent, achieves 100 percent Snowflake syntax pass rate, and reduces average compiler-feedback critic rounds from 2.79 to 0.12 per sampled candidate. The same Hint Bank also transfers to weaker backbones, including a 19.7 percentage-point pass-rate gain on Doubao-2.0-lite.

2606.12382 2026-06-11 cs.NE cs.AI 新提交

SPEA2$^+$: Improved Density Estimation in SPEA2 with Provable Runtime Guarantees

SPEA2$^+$:具有可证明运行时间保证的改进SPEA2密度估计

Duc-Cuong Dang, Andre Opris, Dirk Sudholt

发表机构 * University of Dortmund(Dortmund大学)

AI总结 针对SPEA2处理支配解时多样性不足的问题,提出使用所有成对距离改进密度估计的SPEA2$^+$,在OneTrapZeroTrap基准上达到与其他主流算法相同的性能保证。

Comments To appear in the Proceedings of PPSN 2026

详情
AI中文摘要

强度帕累托进化算法2(SPEA2)是解决多目标优化问题的流行且著名的进化算法。尽管其受欢迎,但SPEA2的理论分析直到最近才出现。此外,这些分析仅关注SPEA2如何处理非支配解,而忽略了处理支配解的算法组件。我们首次对SPEA2进行了运行时分析,其中分析了这些组件。我们证明,与其他主流算法(包括相同设置下具有恒定种群大小和重复消除的NSGA-II、NSGA-III和SMS-EMOA)不同,SPEA2无法有效覆盖OneTrapZeroTrap基准的帕累托前沿。我们的结果表明,在适应度分配中使用k近邻距离提供的信号不足以维持支配个体间的多样性。为了解决这个问题,我们提出了一种改进的变体SPEA2$^+$,它考虑了所有成对距离。新算法在OneTrapZeroTrap上实现了与其他主流算法相同的性能保证,同时在更简单的问题上匹配原始SPEA2的性能。实验结果补充了我们的理论发现。

英文摘要

The Strength Pareto Evolutionary Algorithm 2 (SPEA2) is a popular and prominent evolutionary algorithm for solving multi-objective optimisation problems. Despite its popularity, theoretical analyses of SPEA2 have only appeared recently. Moreover, these analyses focus exclusively on how SPEA2 handles non-dominated solutions and disregard the algorithmic components responsible for handling dominated solutions. We conduct a first runtime analysis of SPEA2 for which these components are analysed. We prove that, unlike other prominent algorithms, including NSGA-II, NSGA-III and SMS-EMOA under the same setting of constant population size and duplicate elimination, SPEA2 is unable to cover the Pareto front of the OneTrapZeroTrap benchmark efficiently. Our results indicate that using k-th nearest-neighbour distance in the fitness assignment provides an insufficient signal to maintain diversity among dominated individuals. To address this issue, we propose an improved variant, SPEA2$^+$, that considers all pairwise distances. The new algorithm achieves the same performance guarantees as the other prominent algorithms on OneTrapZeroTrap, while matching the performance of the original SPEA2 on simpler problems. Experimental results complement our theoretical findings.

2606.12337 2026-06-11 math.NA cs.LG cs.NA 新提交

Adjoint Method versus Physics-Informed Neural Networks in PDE-Constrained Inverse Problems

伴随方法与物理信息神经网络在PDE约束逆问题中的比较

Zhen Zhang, Alessandro Alla, George Em Karniadakis

发表机构 * Brown University(布朗大学) University of Rome 1(罗马大学)

AI总结 针对PDE约束逆问题,公平比较伴随优化与PINN,发现未知参数表示决定方法选择:网格场适合伴随,神经表示适合PINN;PINN在时间依赖问题中成本更低,且可预热启动伴随。

Comments 35 pages, 10 figures

详情
AI中文摘要

由偏微分方程(PDE)控制的逆问题是计算力学的核心,通常通过伴随优化求解,而物理信息神经网络(PINN)已成为一种灵活的替代方案。由于这两种方法通常在不同公式、参数化、优化器和正则化选择下进行比较,因此它们的相对性能难以评估。我们针对PDE约束逆问题,对伴随优化和PINN进行了公平比较。从共同的抽象公式出发,我们在相同的域、控制方程、观测模型和正则化项上实例化两种方法,并在适用情况下匹配优化器、未知参数化和算术精度。基准测试包括非定常Burgers方程、噪声达西渗透率反演、三维Allen-Cahn反应识别和非定常Navier-Stokes粘度识别。结果表明,未知参数的表示在很大程度上决定了首选方法:基于网格的场有利于离散伴随,而神经表示是PINN的原生方法,适用于封闭和本构建模。对于时间依赖问题,伴随反演可能因轨迹存储和微分而成本高昂,而PINN以较低成本提供令人满意的重建。然后,PINN预热启动的伴随策略以大幅降低的成本恢复伴随级别的精度。

英文摘要

Inverse problems governed by partial differential equations (PDEs) are central to computational mechanics and are commonly solved by adjoint-based optimization, while physics-informed neural networks (PINNs) have emerged as a flexible alternative. Their relative performance remains difficult to assess because the two approaches are often compared under different formulations, parameterizations, optimizers, and regularization choices. We present a fair comparison of adjoint optimization and PINNs for PDE-constrained inverse problems. From a common abstract formulation, we instantiate both methods on identical domains, governing equations, observation models, and regularization terms, while matching the optimizer, unknown parameterization, and arithmetic precision wherever applicable. The benchmarks include unsteady Burgers, noisy Darcy permeability inversion, three-dimensional Allen--Cahn reaction identification, and unsteady Navier--Stokes viscosity identification. The results show that the representation of the unknown largely determines the preferred method: grid-based fields favor the discrete adjoint, whereas neural representations are native to PINNs and relevant for closure and constitutive modeling. For time-dependent problems, adjoint inversion can be dominated by trajectory storage and differentiation, while PINNs provide satisfactory reconstructions at lower cost. A PINN-warm-started adjoint strategy then recovers adjoint-level accuracy at substantially reduced cost.

2606.12287 2026-06-11 cs.NE cs.AI 新提交

SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks

SpikeDecoder: 用脉冲神经网络实现GPT架构

Claas Beger, Florian Walter, Alois Knoll

发表机构 * Chair of Robotics, Artificial Intelligence and Real-time Systems(机器人、人工智能与实时系统教授席)

AI总结 提出SpikeDecoder,一种基于脉冲神经网络(SNN)的Transformer解码器,用于自然语言处理,通过替换ANN模块和优化嵌入方法,在保持性能的同时降低理论能耗87%-93%。

详情
AI中文摘要

Transformer架构被广泛认为是自然语言处理最强大的工具,但由于大量复杂操作,其本质上存在高能耗问题。为解决这一问题,我们考虑脉冲神经网络(SNN),它通过天然的事件驱动方式处理信息,是传统人工神经网络(ANN)的节能替代方案。然而,这本质上使得SNN难以训练。通常,许多基于SNN的模型通过转换预训练的ANN来规避这一问题。最近,有研究尝试设计可直接训练的基于SNN的Transformer模型结构改编。尽管结果显示出巨大潜力,但应用领域是计算机视觉,且所提模型仅包含编码器模块。在本文中,我们提出SpikeDecoder,一种完全基于SNN的Transformer解码器模块实现,用于自然语言处理。通过一系列实验,我们分析了用脉冲替代方案交换ANN模型不同模块的影响,以识别权衡和性能损失的主要来源。我们进一步研究了残差连接的作用以及SNN兼容归一化技术的选择。除了模型架构的工作,我们还制定并比较了将文本数据投影为脉冲的不同嵌入方法。最后,我们证明,与ANN基线相比,所提出的基于SNN的解码器模块将理论能耗降低了87%至93%。

英文摘要

The Transformer architecture is widely regarded as the most powerful tool for natural language processing, but due to a high number of complex operations, it inherently faces the issue of high energy consumption. To address this issue, we consider Spiking Neural Networks (SNNs), which are an energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their naturally event-driven approach to processing information. However, this inherently makes them difficult to train. Often, many SNN-based models circumvent this issue by converting pre-trained ANNs. More recently, attempts have been made to design directly trainable SNN-based adaptations of the Transformer model structure. Although the results showed great promise, the application field was computer vision. Moreover, the proposed model incorporates only encoder blocks. In this paper, we propose SpikeDecoder, a fully SNN-based implementation of the Transformer decoder block, for applications in natural language processing. In a series of experiments, we analyze the impact of exchanging different blocks of the ANN model with spike-based alternatives to identify trade-offs and significant sources of performance loss. We further investigate the role of residual connections and the selection of SNN-compatible normalization techniques. Besides the work on the model architecture, we formulate and compare different embedding methods to project text data into spikes. Finally, we demonstrate that our proposed SNN-based decoder block reduces the theoretical energy consumption by 87% to 93% compared to the ANN baseline.

2606.12281 2026-06-11 cs.MA cs.AI cs.LG 新提交

CCKS: Consensus-based Communication and Knowledge Sharing

CCKS:基于共识的通信与知识共享

Jinyuan Zu, Xiaowei Lv, Yongcai Wang, Deying Li, Yunjun Han, Wenping Chen, Fengyi Zhang, Naiqi Wu

发表机构 * Public Computing Cloud, Renmin University of China(中国人民大学公共计算云) School of Information, Renmin University of China(中国人民大学信息学院) State Key Laboratory of Multimodal Artificial Intelligence Systems, Beijing Engineering Research Center of Intelligent Systems and Technology, Institute of Automation, Chinese Academy of Sciences(多模态人工智能系统国家重点实验室,智能系统与技术北京工程研究中心,中国科学院自动化研究所) The Information Science Academy, China Electronics Technology Group Corporation(中国电子科技集团有限公司信息科学研究院) Department of Mechatronics Engineering, Guangdong University of Technology(广东工业大学机电工程学院)

AI总结 针对多智能体强化学习中动作建议过度依赖教师指导的问题,提出基于共识的通信与知识共享框架,通过对比学习构建共识模型,平衡探索与学习,提升合作效率与性能。

详情
AI中文摘要

在分布式训练和分布式执行(DTDE)的协作多智能体强化学习(MARL)中,基于动作建议的知识共享促进了智能体间的可解释和可扩展合作。然而,当前的动作建议方法往往过于遵循教师的指导,而未评估师生兼容性,导致过度建议、稳定性欠佳和性能下降。为克服这些挑战,本文提出了一种基于共识的通信与知识共享(CCKS)框架,该框架允许智能体基于共识衍生的约束采纳建议,并更智能地遵循教师指令。该机制使智能体能够平衡探索与向经验丰富的教师学习,从而提升整体性能。关键在于共识模型的构建,为此我们提出在智能体训练阶段利用对比学习基于局部观测构建共识模型。在动作选择中,智能体根据共识和共享知识对动作进行评分和选择。CCKS设计为即插即用解决方案,可无缝集成到现有DTDE算法中。在Google Research Football环境和复杂的星际争霸II多智能体挑战中进行的实验表明,与当前的DTDE基线相比,集成CCKS显著提高了合作效率、学习速度和整体性能。代码可从此https URL获取。

英文摘要

In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at https://github.com/yuanxpy/CCKS.

2606.12279 2026-06-11 cs.NE cs.AI cs.LG 新提交

Mathematical perspective on genetic algorithms with optimization guided operators

遗传算法与优化引导算子的数学视角

Anna Brandenberger, Ilan Doron-Arad, Elchanan Mossel

发表机构 * Department of Mathematics, MIT(麻省理工学院数学系)

AI总结 本文从数学角度建模遗传算法,将优化问题转化为查询复杂度问题,并证明某些问题必须依赖生成、变异和重组算子,同时揭示了多样性在解池中的关键作用。

Comments 18 pages, 1 figure

详情
AI中文摘要

近期机器学习工作将遗传算法应用于推理阶段,以迭代改进优化问题的解。所涉及的基本变异和重组算子在性质上不同于经典研究。变异不再是随机的;机器学习算法以改进目标为目的对解进行变异。同样,重组不再基于父代解的随机拼接,而是基于机器学习的优化算子,其目标是从输入中合成改进的解。因此,这些变异和重组算子更有可能改进目标,但其计算成本更高。我们引入了一个遗传算法的通用模型,并使用强化学习的语言将优化问题表述为查询复杂度问题。然后我们研究专门模型。我们证明某些优化问题必须通过生成、变异和重组来解决。接着,我们在此框架内为一类问题获得了定性紧的算法,该算法捕捉了解池中多样性的非平凡作用,这是实际机器学习遗传算法的一个关键特征。

英文摘要

Recent work in ML applies genetic algorithms at inference time to iteratively improve solutions to optimization problems. The basic mutation and recombination operators involved are qualitatively different from those studied classically. Mutations are no longer random; an ML algorithm mutates a solution with the goal of improving an objective. Similarly, recombination is not based on random collages of parent solutions. Instead, it is an ML optimization-based operator whose goal is to synthesize improved solutions from its inputs. Thus, these mutation and recombination operators are more likely to improve the objective, but their computational cost is much higher. We introduce a general model of genetic algorithms and formulating optimization in this model as a query-complexity problem, using the language of reinforcement learning. We then study specialized models. We show that some optimization problems require generation, mutation, and recombination to be solved. We then obtain qualitatively tight algorithms for a family of problems within this framework that captures the nontrivial role of diversity in the solution pool, a key feature of practical ML genetic algorithms.

2606.12260 2026-06-11 econ.TH cs.AI cs.GT cs.LG stat.ML 新提交

Market Design for AI: Beyond the Copyright Binary

人工智能的市场设计:超越版权二元论

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani

发表机构 * MIT Operations Research Center(麻省理工学院运筹学中心) MIT Sloan School of Management(麻省理工学院斯隆管理学院) Washington University School of Law(华盛顿大学法学院)

AI总结 本文通过静态和动态博弈模型,分析AI训练数据市场中“自由使用”与“强知识产权”两种模式的失败,提出通过数据中介内部化外部性并补贴创新贡献的市场设计。

详情
AI中文摘要

我们如何设计一个用于训练AI模型的人类生成内容市场,既能促进技术进步,又能保留个人创作高质量内容的激励?现有方法采取两极立场:基于合理使用的“自由使用”模式和“强知识产权”模式。我们证明两者均失败:自由使用不补偿创作者,而通过建模为静态Stackelberg博弈,强知识产权也削弱了创作激励。我们发现这对更具创新性的创作者尤其如此,我们将此现象称为“原创性惩罚”。将这一见解扩展到动态模型,我们发现另一种市场失灵会损害AI模型性能,即使对于初始良好的模型也是如此:此类模型导致人类更依赖AI辅助创作,导致同质化内容反馈到训练中,从而降低模型性能——即“精确性诅咒”。我们进一步提出一种市场设计,通过数据中介内部化跨创作者外部性并补贴创新贡献,从而恢复效率。

英文摘要

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

2606.12247 2026-06-11 cs.CY cs.CL 新提交

Beyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Research

超越第三人称审计:以用户为中心的LLM偏见研究的场景交互审计

Andrés Abeliuk, Cinthia Sanchez Macias, Valentina Alarcón, Álvaro Madariaga, Claudia Lopez

发表机构 * Department of Computer Science University of Chile Center for Artificial Intelligence (CENIA) Santiago, Chile(计算机科学系智利大学人工智能中心(CENIA)圣地亚哥,智利) Center for Artificial Intelligence (CENIA) Santiago, Chile(人工智能中心(CENIA)圣地亚哥,智利) Institute of Sociology Pontificia Universidad Católica de Chile Santiago, Chile(社会学研究所智利天主教大学圣地亚哥,智利)

AI总结 提出场景交互审计(SIA)框架,通过分析用户画像信号(如社会人口统计标记、写作风格和身份陈述)如何系统性地影响LLM响应质量、内容和语气,以用户为中心研究LLM偏见。

详情
AI中文摘要

大型语言模型(LLM)的偏见研究主要集中在第三人称审计上,即研究模型如何作为外部主体表征或评估人口群体。然而,这种范式忽略了一个结构性盲点:用户不在审计中。在实践中,LLM用于开放式的个人交互,在此过程中模型隐式地代表用户并相应调整其响应。当相同的请求因提问者不同而产生不同响应时,偏见不仅体现在模型如何描述他人,还体现在它如何对待对话者。我们提出场景交互审计(SIA),这是一个以用户为中心的框架,用于研究用户画像信号——隐式社会人口统计标记、写作风格和陈述身份——如何系统性地塑造LLM响应质量、内容和语气。我们通过一个案例研究来展示该框架,该案例研究跨多个任务领域交叉了性别和社会经济地位信号,并概述了SIA作为自然语言处理新使命的研究议程。

英文摘要

Research on bias in large language models (LLMs) has predominantly focused on third-person audits, which study how models represent or evaluate demographic groups as external subjects. However, this paradigm overlooks a structural blind spot because the user is absent from the audit. In practice, LLMs are used in open-ended, personal interactions, during which the model implicitly represents the user and adjusts its responses accordingly. When identical requests yield different responses depending on who is asking, bias manifests not in how the model describes others but in how it treats its interlocutor. We propose Situated Interaction Auditing (SIA), a user-centered framework for studying how user profile signals -- implicit sociodemographic markers, writing style, and stated identity -- systematically shape LLM response quality, content, and tone. We demonstrate the framework through a case study that intersects gender and socioeconomic status signals across multiple task domains and outline a research agenda for SIA as a new mission for natural language processing.

2606.12245 2026-06-11 cs.IR cs.AI 新提交

DiffCold: A Diffusion-based Generative Model for Cold-Start Item Recommendation

DiffCold: 基于扩散的生成模型用于冷启动物品推荐

Kangning Zhang, Yingjie Qin, Weinan Zhang, Yong Yu, Jianghao Lin

发表机构 * Shanghai Jiao Tong University(上海交通大学) Xiaohongshu Inc.(小红书公司)

AI总结 针对冷启动物品推荐中的跷跷板困境,提出基于条件扩散的生成模型DiffCold,通过从内容重建温物品嵌入并保持流形结构,结合检索增强聚合器和模拟表示对齐模块,统一冷热物品表示。

Comments Accepted by ECML-PKDD 2026

详情
AI中文摘要

冷启动物品推荐由于缺乏交互历史,在现实系统中仍然是一个持续的挑战。虽然先前的模型尝试利用物品内容特征来弥合这一差距,但它们普遍遭受\textbf{跷跷板困境}:提升冷物品的性能不可避免地会降低温物品的性能,反之亦然。我们发现这一困境源于根本的\textbf{分布差异}:温物品嵌入占据由丰富交互信号塑造的复杂“行为流形”,而冷物品嵌入则被限制在仅从辅助内容导出的“语义流形”上。现有方法通常强制在这些不一致空间之间进行刚性映射,导致模型为了适应冷物品而牺牲温表示的精度。为了解决这个问题,我们提出\textbf{DiffCold},一种基于扩散的生成模型,统一了温表示和冷表示。与GAN或VAE不同,DiffCold利用条件扩散从内容重建温物品嵌入,保留底层流形结构而不退化。我们进一步针对这一范式设计了两个特定模块:一个\textbf{检索增强聚合器},利用语义相似的温物品初始化生成,以绕过低效的噪声;以及一个\textbf{基于模拟的表示对齐}模块,通过对比学习强制生成嵌入与真实嵌入之间的分布一致性。在三个基准上的实验证实,DiffCold解决了跷跷板困境,在所有指标上持续优于最先进的方法。

英文摘要

Cold-start item recommendation remains a persistent challenge in real-world systems due to the absence of interaction histories. While prior models attempt to bridge this gap using item content features, they universally suffer from the \textbf{seesaw dilemma}: enhancing performance for cold items inevitably degrades performance for warm items, and vice versa. We identify that this dilemma stems from a fundamental \textbf{distributional disparity}: warm item embeddings occupy a complex ``behavioral manifold" shaped by rich interaction signals, whereas cold item embeddings are constrained to a ``semantic manifold" derived solely from auxiliary content. Existing methods often force a rigid mapping between these inconsistent spaces, causing the model to sacrifice the precision of warm representations to accommodate cold ones. To address this, we propose \textbf{DiffCold}, a diffusion-based generative model that unifies warm and cold representations. Unlike GANs or VAEs, DiffCold leverages conditional diffusion to reconstruct warm item embeddings from content, preserving the underlying manifold structure without degradation. We further tailor this paradigm with two specific designs: a \textbf{Retrieval-enhanced Aggregator} that initializes generation using semantically similar warm items to bypass inefficient noise, and a \textbf{Simulation-based Representation Alignment} module that enforces distribution consistency between generated and real embeddings via contrastive learning. Experiments on three benchmarks confirm that DiffCold resolves the seesaw dilemma, consistently outperforming state-of-the-art methods across all metrics.

2606.12231 2026-06-11 cs.SE cs.AI 新提交

Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study

AI IDE中的规则分类与演化:挖掘与调查研究

Guangzong Cai, Ruiyin Li, Peng Liang, Zengyang Li, Mojtaba Shahin

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) School of Computer Science, Central China Normal University(中央师范大学计算机学院) School of Computing Technologies, RMIT University(皇家墨尔本理工大学计算技术学院)

AI总结 通过挖掘83个开源项目中的7310条规则和99份从业者调查,建立了包含5个主类和25个子类的规则分类法,发现开发者重视架构约束但实际配置多为低级工作流和代码格式规则,规则演化主要由建设性上下文扩展和丰富驱动,且更新规则可使工件合规率平均提升22.99%。

Comments 52 pages, 21 images, 8 tables, Manuscript submitted to a Journal (2026)

详情
AI中文摘要

AI驱动的集成开发环境(AI IDE)的采用引入了“规则”作为一种新颖的软件工件,允许开发者将项目特定的约束和架构指导原则持久地注入到大语言模型(LLM)的上下文中。尽管这些规则在使AI行为与开发者意图对齐方面发挥作用,但它们的分类、演化及实际影响仍 largely unexplored。为填补这一空白,我们对AI IDE规则进行了混合方法实证研究。通过挖掘83个开源项目并提取7,310条规则,我们建立了一个包含5个主类和25个子类的全面分类法。随后,我们将这些工件与99名从业者的调查反馈进行三角验证。我们的分析发现开发者优先级与实际配置之间存在反差:虽然从业者认为架构约束非常重要,但仓库中的规则文件主要由低级工作流和代码格式约束组成。此外,我们对1,540个规则演化事件的分析表明,规则更新频繁。仓库数据进一步表明,规则演化主要由建设性上下文扩展(29.17%)和丰富(26.59%)驱动。相比之下,受访开发者报告修改规则主要是为了纠正AI错误(77.78%),通常通过添加新的负面约束而非编辑现有约束。最后,对160个规则演化事件的工件合规性评估显示,更新规则显著提高了软件工件的合规性,更新后平均工件合规率从49.14%提升至72.13%,增加了22.99%。我们的研究提供了实证见解,可帮助开发者优化提示策略,并指导工具构建者为AI IDE设计自动冲突检测和上下文管理机制。

英文摘要

The adoption of AI-powered Integrated Development Environments (AI IDEs) has introduced "Rules" as a novel software artifact, allowing developers to persistently inject project-specific constraints and architectural guidelines into the context of Large Language Models (LLMs). Despite their role in aligning AI behavior with developer intent, the taxonomy, evolution, and practical impact of these rules remain largely unexplored. To bridge this gap, we conducted a mixed-methods empirical study on AI IDE rules. By mining 83 open-source projects and extracting 7,310 rules, we established a comprehensive taxonomy comprising 5 primary and 25 secondary categories. We then triangulated these artifacts with survey responses from 99 practitioners. Our analysis identified a contrast between developer priorities and actual configurations: while practitioners rate architectural constraints as highly important, rule files in repositories primarily consist of low-level workflow and code formatting constraints. Furthermore, our analysis of 1,540 rule evolution events revealed that rules are updated frequently. Repository data further indicate that rule evolution is primarily driven by constructive context expansions (29.17%) and enrichments (26.59%). In contrast, surveyed developers reported modifying rules primarily to correct AI errors (77.78%), typically by adding new negative constraints rather than editing existing ones. Finally, an artifact compliance assessment of 160 rule evolution events revealed that updating rules significantly improves the adherence of software artifacts, with the average artifact compliance rate increasing by 22.99% (from 49.14% to 72.13%) following an update. Our study provides empirical insights that can help developers optimize prompting strategies and guide tool builders in designing automated conflict-detection and context-management mechanisms for AI IDEs.

2606.12211 2026-06-11 quant-ph cs.LG 新提交

Quantum Occam Learning: Sample-Supported Expressibility for Circuit-Based Quantum Learning

量子奥卡姆学习:基于电路的量子学习中样本支持的表达能力

Jeongho Bang, Kyoungho Cho, Jeongwoo Jae

发表机构 * Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, Republic of Korea(融合技术研究中心,延世大学,首尔) Department of Quantum Information, Yonsei University, Incheon 21983, Republic of Korea(量子信息系,延世大学,仁川) Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea(统计与数据科学系,延世大学,首尔) Department of Physics, Hanyang University, Seoul, 04763, Republic of Korea(物理系,翰林大学,首尔)

AI总结 针对有限大小量子电路生成的数据,提出信息论奥卡姆理论,证明样本支持的表达能力定律:在迹距离精度ε下,M个样本最多支持约Mε²个门,将电路复杂度转化为自适应统计资源。

Comments 22 pages (main text + appendix), 2 figures

详情
AI中文摘要

量子机器学习的一个核心原则是,ansatz 应具有足够的表达能力来表示感兴趣的量子数据。然而,只有当能够从有限数量的未知量子态副本中学习时,表达能力才具有统计意义。在这项工作中,我们为有限大小量子电路生成的量子数据开发了一种信息论奥卡姆理论。对于最多使用 $G$ 个双量子比特门可制备的 $n$ 量子比特纯态类 $S_{n,G}$,度量熵论证给出了在电路受限情况下的可实现样本律 $\widetilde{\Theta}(G/\epsilon^2)$。对于任意源 $\hat{\rho}$,我们引入了最佳 $G$ 门近似误差 $d_G(\hat{\rho})$ 和近似电路复杂度 $C_\eta(\hat{\rho})$。我们证明了一个不可知的量子奥卡姆定理:使用 $M$ 个副本,可以学习到最佳 $G$ 门近似误差加上统计惩罚 $\widetilde{O}(\sqrt{G/M})$。然后,通过一个自适应模型选择定理消除了预先知道 $G$ 的需要,该定理的 oracle 不等式选择了数据所证明的电路复杂度。匹配的下界给出了一个样本支持的表达能力定律:在迹距离精度 $\epsilon$ 下,$M$ 个样本只能支持 $G_{\rm supported} \simeq M\epsilon^2$ 个门,直到对数因子和 $2^n$ 的层析饱和。因此,电路复杂度成为一种自适应统计资源,而不是静态承诺。我们的框架将有界电路复杂度转化为量子机器学习的模型选择原则。

英文摘要

A central principle in quantum machine learning is that an ansatz should be expressive enough to represent the quantum data of interest. Yet, the expressibility is statistically meaningful only insofar as it can be learned from finitely many copies of an unknown quantum state. In this work, we develop an information-theoretic Occam theory for quantum data generated by finite-size quantum circuits. For the class $S_{n,G}$ of $n$-qubit pure states preparable with at most $G$ two-qubit gates, a metric-entropy argument gives the realizable sample law $\widetildeΘ(G/ε^2)$ in the circuit-limited regime. For an arbitrary source $\hatρ$, we introduce the best $G$-gate approximation error $d_G(\hatρ)$ and the approximate circuit complexity $C_η(\hatρ)$. We prove an agnostic quantum Occam theorem: with $M$ copies, one can learn up to the best $G$-gate approximation error plus a statistical penalty $\widetilde{O}(\sqrt{G/M})$. We then remove the need to know $G$ in advance through an adaptive model-selection theorem whose oracle inequality selects the circuit complexity justified by the data. Matching lower bounds yield a sample-supported expressibility law: at trace-distance accuracy $ε$, $M$ samples can support only $G_{\rm supported} \simeq Mε^2$ gates, up to logarithmic factors and tomography saturation at $2^n$. Thus, the circuit complexity becomes an adaptive statistical resource rather than a static promise. Our framework turns bounded circuit complexity into a model-selection principle for quantum machine learning.

2606.12199 2026-06-11 eess.AS cs.CL cs.SD 新提交

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

哪种语音表示更匹配文本原生推理?帧率和表示对语音-文本对齐的研究

Zhen Ye, Xu Tan, Yiming Li, Guangyan Zhang, Chimin Chan, Haohe Liu, Zhengxi Liu, Hongzhan Lin, Zheqi Dai, Xinshen Zhang, Peiwen Sun, Qiuqiang Kong, Wei Xue

发表机构 * Hong Kong University of Science and Technology, Hong Kong SAR(香港理工大学) Tencent, China(腾讯) University of Surrey, United Kingdom(Surrey大学) Chinese University of Hong Kong, Hong Kong SAR(香港中文大学) Hong Kong Baptist University, Hong Kong SAR(香港 Baptist大学) Hong Kong Polytechnic University, Hong Kong SAR(香港理工大学) Independent Researcher(独立研究者)

AI总结 研究语音与文本模态差异中的时间粒度不匹配问题,提出因子化FSQ和轻量非自回归音频LM头以降低帧率,发现4.17Hz帧率结合中间层表示对齐在语音问答中表现最佳。

Comments Accepted by Interspeech 2026 long paper

详情
AI中文摘要

口语对话模型通常以文本LLM骨干网络为基础,但在以语音而非文本为条件时,推理能力往往会下降。我们将这种模态差异部分归因于时间粒度不匹配:在语义匹配的情况下,语音标记在时间上是冗余的,且远长于文本,这稀释了每个标记的语义密度,削弱了文本原生的推理动态。我们将语音标记设计视为一个表示选择问题,并在固定信息速率下,在冻结的LLM骨干网络中扫描帧率。为了实现低帧率,我们引入了因子化FSQ和一个轻量级的非自回归音频LM头,在不牺牲高效预测的情况下将容量扩展到近300比特/帧。在消除瓶颈后,我们扫描帧率(50→2.08 Hz)和对齐深度,并观察到在4.17 Hz帧率下,结合中间层表示对齐,语音问答存在一致的最佳区域。

英文摘要

Spoken dialogue models typically start from text LLM backbones, yet reasoning often degrades when conditioning on speech instead of text. We attribute part of this modality gap to a temporal-granularity mismatch: speech tokens are temporally redundant and far longer than text under matched semantics, diluting per-token semantic density and weakening text-native reasoning dynamics. We study speech token design as a representation selection problem and sweep frame rates under a frozen LLM backbone with a fixed information rate. To make low frame rates feasible, we introduce factorized FSQ and a lightweight non-autoregressive audio LM head, scaling capacity to nearly 300\,bits/frame without sacrificing efficient prediction. With the bottleneck removed, we sweep frame rates (50$\rightarrow$2.08\,Hz) and alignment depth, and observe a consistent best regime for speech QA at 4.17\,Hz with intermediate-layer representation alignment.

2606.12075 2026-06-11 cs.CR cs.LG 新提交

Categorical Robustness Assessment for Machine Learning based Network Intrusion Detection Systems

基于机器学习的网络入侵检测系统的分类鲁棒性评估

Mayank Raj, Nathaniel D. Bastian, Lance Fiondella, Gokhan Kul

发表机构 * University of Massachusetts Dartmouth(马萨诸塞大学达特茅斯分校) United States Military Academy(美国军事学院)

AI总结 本文系统比较了CNN、LSTM和随机森林三种分类器在对抗攻击下的鲁棒性,发现随机森林基线准确率虽高但极易被攻破,而CNN表现最稳健。

详情
AI中文摘要

网络入侵检测系统(NIDS)广泛使用机器学习(ML),但ML模型可能受到对抗性攻击的操纵。这些攻击向网络流量数据添加精心设计的扰动,导致误分类。虽然先前的工作已经证明了孤立环境下的对抗性漏洞,但在受控攻击条件下,跨架构以及基于攻击类别和类型的系统比较仍然有限,这使得从业者在对抗性环境中部署哪些模型缺乏明确指导。本文提出了一个简单的问题:当攻击者试图操纵系统时,哪种分类器架构实际上能够保持稳定?我们对三种流行架构进行了测试:一维卷积神经网络(CNN)、长短期记忆网络(LSTM)和随机森林(RF)集成。使用ACI-IoT-2023数据集(超过120万个样本,涵盖12种攻击类型),我们使用FGSM和PGD对抗攻击对每个模型进行攻击,这些攻击在归一化特征空间中应用基于梯度的扰动,符合既定的对抗性ML评估协议,扰动预算范围为$\epsilon=0.01$到$\epsilon=0.1$。令人惊讶的是,随机森林实现了近乎完美的基线准确率(99.98%),但在攻击下灾难性地崩溃,在我们测试的最小扰动下下降了73个百分点。另一方面,CNN在$\epsilon=0.01$时保持了95.5%的准确率,并且随着扰动的增加而优雅地退化。LSTM介于两者之间。这些发现颠覆了传统观念:如果模型在对抗压力的第一个迹象下就崩溃,那么高基线准确率毫无意义。对于在对抗性环境中部署入侵检测的从业者,我们推荐基于CNN的架构,并提供特定场景的部署指导。

英文摘要

Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These attacks add carefully crafted perturbations to network traffic data that leads to misclassifications. While prior work has demonstrated adversarial vulnerabilities in isolated settings, systematic cross-architecture as well as class and category of attack based comparisons under controlled attack conditions remain limited, leaving practitioners without clear guidance on which models to deploy in adversarial environments. This paper asks a simple question: what type of classifier architectures actually hold up when attackers try to manipulate the systems? We put three popular architectures through their paces: a 1D Convolutional Neural Network, a Long Short-Term Memory (LSTM) network, and a Random Forest (RF) ensemble. Using the ACI-IoT-2023 dataset (over 1.2 million samples spanning 12 attack types), we subject each model with FGSM and PGD adversarial attacks, which apply gradient-based perturbations in normalized feature space consistent with established adversarial ML evaluation protocols, at perturbation budgets ranging from $ε=0.01$ to $ε=0.1$. Surprisingly, Random Forest achieved near-perfect baseline accuracy (99.98\%), yet collapsed catastrophically under attack, dropping 73 percentage points at the smallest perturbation we tested. CNN, on the other hand, retained 95.5\% accuracy at $ε=0.01$ and degraded gracefully as perturbations increased. LSTM fell somewhere in between. These findings flip the conventional wisdom where high baseline accuracy means nothing if a model shatters at the first sign of adversarial pressure. For practitioners deploying intrusion detection in adversarial environments, we recommend CNN-based architectures and provide scenario-specific deployment guidance.

2606.12073 2026-06-11 cs.SI cs.AI 新提交

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

“那就是AI垃圾,你这个机器人!”:研究针对LLM生成评论的指责、证据与可信度

Jason Miklian, John E. Katsos

发表机构 * University of Oslo(奥斯陆大学) American University of Sharjah(沙迦美国大学)

AI总结 分析2023-2026年Hacker News和Reddit上2500万条评论,发现对AI生成文本的指责增长超十倍,但被指责的文本并非真正由AI生成,而是基于感知真实性的社会把关行为。

详情
AI中文摘要

生成式AI使得流畅的散文变得廉价易得,打破了“好文章意味着真思考”的旧承诺。读者如何回应?这能告诉我们关于反AI态度变化的什么信息?我们分析了来自Hacker News和Reddit(2023-2026年)的2500万条评论,结合了对7500个抽样AI使用指责的LLM判断、情感轨迹、300个确认AI使用指责的言语行为编码,以及被指责与未被指责的父评论的匹配对照测试。我们发现,两个平台上指责中贬义标签的份额增长了十倍以上,而2022年前的不真实性词汇(如shill、astroturf)的安慰剂词汇则没有。这一转变反映了一个快速增长的趋势:将任何可疑或看似不真实的散文标记为“AI垃圾”。AI垃圾框架现在占贬义提及的94%,主导评论的语气从嘲笑转向把关和结构性抗议。关键惊喜来自匹配对照测试,该测试发现,统计上区分AI与人类文本的散文特征并不能预测哪些人类文本会被指责为AI。新的指责作为感知真实性的社会把关,实际上并不筛查AI。这项研究扩展了信号理论,表明当底层检测问题无法在非专家层面解决时,即使不准确,社会使用的替代信号也会增长。它表明,AI对写作的影响从读者侧来看与生产(作者)侧不同。检测技术无法解决这种动态,因为指责的社会功能日益表现为社会把关和群体内信号传递,而非识别AI生成的写作。

英文摘要

Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

2606.12071 2026-06-11 cs.DL cs.AI 新提交

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

论LLM作为评审在科学新颖性评估中的局限性

Soumitra Sinhahajari, Navonil Majumder, Soujanya Poria

发表机构 * DeCLaRe Lab, Nanyang Technological University(德克莱实验室,南洋理工大学)

AI总结 本文通过构建RQ-Bench基准,发现LLM评审对模型生成的研究问题产生新颖性幻觉,而人类专家则持相反意见,揭示了LLM在评估科学新颖性时的可靠性问题。

详情
AI中文摘要

LLM越来越多地被用于生成和评判科学想法。这使得新颖性评估成为一个核心问题。完整想法的评估很困难,因为它通常需要判断方法、可行性及其经验前景。因此,我们研究一个更清晰的上游对象:研究问题(RQ)。RQ生成是科学构思的前提,并且RQ可以与真实论文中探讨的问题进行比较。我们引入了RQ-Bench,一个基于近期arXiv论文构建的基准。对于每篇论文,我们从其引用的背景、空白和贡献中重建作者锚定的RQ。这些RQ并非针对同一背景的唯一有效问题。它们是用于测试新颖性判断的作者锚定参考点。我们使用独立LLM评审、比较LLM评审和人类专家评估来评估模型生成的RQ。LLM评审一致地将模型生成的RQ评为高度新颖,产生新颖性幻觉;在比较评估中,这种偏好甚至更强。然而,领域专家得出相反结论,更偏好作者锚定的参考问题。我们进一步发现,许多生成的RQ狭窄或受限于来源,这是LLM评审通常忽略的维度,除非明确测试。总体而言,LLM评审与人类专家之间矛盾的新颖性评估引发了关于使用LLM评估研究问题科学新颖性可靠性的严重担忧。

英文摘要

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research question (RQ). RQ generation is a prerequisite for scientific ideation, and RQs can be compared against questions pursued in real papers. We introduce RQ-Bench, a benchmark built from recent arXiv papers. For each paper, we reconstruct author-anchored RQs from its cited background, gaps, and contributions. These RQs are not the only valid questions for the same background. They are author-anchored reference points for testing novelty judgments. We evaluate model-generated RQs with standalone LLM judging, comparative LLM judging, and human expert evaluation. LLM judges consistently rate model-generated RQs as highly novel, producing a novelty mirage; in comparative evaluations, this preference becomes even stronger. Domain experts, however, reach the opposite conclusion and prefer the author-anchored reference questions. We further find that many generated RQs are narrow or source-bound, a dimension that LLM judges often miss unless explicitly tested. Overall, the contradictory novelty evaluations between LLM judges and human experts raise a serious concern about the reliability of using LLMs to assess the scientific novelty of research questions.

2606.12058 2026-06-11 stat.ML cond-mat.dis-nn cs.LG 新提交

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

注意力中的相变:复制头涌现的贝叶斯理论

Itay Lavie, Kirsten Fischer, Andrey Lekov, Frederic Van Maele, Zohar Ringel, Moritz Helias

发表机构 * Racah Institute of Physics, Hebrew University of Jerusalem(拉卡学院物理研究所,耶路撒冷希伯来大学) John A. Paulson School of Engineering and Applied Sciences, Harvard University(约翰·A·保罗森工程与应用科学学校,哈佛大学) Institute for Advanced Simulation (IAS-6), Computational and Systems Neuroscience, Jülich Research Center(高级模拟研究所(IAS-6),计算与系统神经科学,茹里奇研究中心) Institute of AI for Health, Helmholtz Munich(健康人工智能研究所,海德堡-穆恩) RWTH Aachen University(亚琛工业大学) Department of Physics, Faculty 1, RWTH Aachen University(物理系,亚琛工业大学)

AI总结 通过分析单层softmax注意力网络在复制任务上的训练,提出贝叶斯理论揭示注意力矩阵的后验分布存在相变,并对比线性注意力发现softmax注意力呈现一阶相变。

详情
AI中文摘要

注意力是Transformer中上下文学习的关键机制,经验上观察到注意力模式在训练过程中突然涌现。我们提出了注意力中特征学习的贝叶斯理论;然后通过分析在复制任务上训练的单层softmax注意力网络,专注于归纳头第一层中复制子电路的学习方式。我们推导出注意力矩阵上的闭式后验,并将其简化为低维序参数空间。这种简化揭示了训练数据量上的相变,我们通过贝叶斯采样和使用Adam的标准训练验证了这一点。我们将结果与线性注意力对比,发现softmax注意力表现出\emph{一阶相变},而在线性注意力中,初始的\emph{二阶相变}之后是向结构化注意力模式的平滑连续演化(\emph{交叉})。我们的工作为复制子电路的突然涌现提供了第一性原理的理论解释,这让人联想到在大语言模型训练中观察到的现象。

英文摘要

Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a \emph{first-order phase transition} while in linear attention an initial \emph{second-order phase transition} is followed by a smooth, continuous evolution toward the structured attention pattern (\emph{crossover}). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models.

2606.12022 2026-06-11 cs.FL cs.AI 新提交

Runtime Enforcement of Hybrid System Properties

混合系统属性的运行时强制执行

Mir Md Sajid Sarwar, Srinivas Pinisetty, Rajarshi Ray, Thierry Jéron

发表机构 * Indian Institute of Technology Bhubaneswar(印度理工学院布巴内斯瓦尔分校) Indian Association for the Cultivation of Science(印度科学培养协会) Univ Rennes, Inria, CNRS, IRISA(里昂大学、Inria、CNRS、IRISA)

AI总结 提出一种结合离散事件编辑与连续时间监控的运行时强制执行框架,使用混合自动机建模安全需求,通过运行时可达性分析合成安全纠正动作,在自适应巡航控制系统中验证有效性。

详情
AI中文摘要

运行时强制执行已成为确保在不确定和动态环境中运行的自主和网络物理系统安全的一种有前景的方法。与传统的运行时验证不同,运行时强制执行通过在执行期间主动干预,修改不安全系统行为以防止属性违反。现有的强制执行框架主要关注无时间或离散时间规范,并且通常仅限于延迟或抑制事件,这使得它们对于表现出复杂连续动态的反应式系统不充分。在本文中,我们提出了一种运行时强制执行框架,其中安全需求使用混合自动机(HA)建模。该框架将离散事件编辑与连续时间监控相结合,以支持在任意时间点执行抑制、延迟和插入事件等强制执行操作。在观察环境输入后,自动机被初始化,并使用运行时可达性分析来综合安全纠正动作。我们正式定义了安全混合自动机的强制执行问题,建立了可强制执行条件,并提出了一种用于反应式系统的在线强制执行算法。关于自适应巡航控制(ACC)系统的详细案例研究证明了所提出方法在不安全控制器行为下维护安全属性的有效性。实验结果表明,该框架在实时确保持续符合安全要求的同时,引入了最小的计算开销。

英文摘要

Runtime enforcement has emerged as a promising approach for ensuring the safety of autonomous and cyber-physical systems operating in uncertain and dynamic environments. Unlike traditional runtime verification, runtime enforcement actively intervenes during execution to prevent property violations by modifying unsafe system behaviors. Existing enforcement frameworks primarily focus on untimed or discrete-time specifications and are often limited to delaying or suppressing events, making them inadequate for reactive systems exhibiting complex continuous dynamics. In this paper, we propose a runtime enforcement framework where safety requirements are modeled using Hybrid Automata (HA). The framework combines discrete-event editing with continuous-time monitoring to support enforcement actions such as suppression, delay, and insertion of events at arbitrary time instants. Upon observing environmental inputs, the automaton is initialized, and runtime reachability analysis is used to synthesize safe corrective actions. We formally define the enforcement problem for safety hybrid automata, establish enforceability conditions, and present an online enforcement algorithm for reactive systems. A detailed case study on an Adaptive Cruise Control (ACC) system demonstrates the effectiveness of the proposed approach in maintaining safety properties under unsafe controller behaviors. Experimental results show that the framework introduces minimal computational overhead while ensuring continuous compliance with safety requirements in real time.

2606.11976 2026-06-11 cs.SE cs.AI 新提交

Exploration Structure in LLM Agents for Multi-File Change Localization

LLM代理中的探索结构用于多文件变更定位

Akeela Darryl Fattha, Kia Ying Chua, Lingxiao Jiang, Laura Wynter

发表机构 * School of Computing and Information Systems, Singapore Management University(计算与信息系统学院,新加坡国立管理学院)

AI总结 针对多子系统变更场景,提出非线性、领域范围的并行代理探索结构,在SWE Bench Pro基准上,小规模Haiku类模型通过领域代理并行生成实现高微F1分数,优于线性顺序探索。

详情
AI中文摘要

软件工程工具越来越依赖基于LLM的代理来定位需要更改的文件以解决软件问题。大多数AI代理以线性方式探索仓库,即每步访问一个目录或文件。我们假设这对于跨越多个子系统的变更存在结构上的不匹配。我们比较了线性顺序探索与非线性的、领域范围的并行代理探索。使用SWE Bench Pro作为初始基准,我们专注于ansible作为示例。我们构建了一种方法,用于在单个基础提交上对GitHub问题进行持久会话评估。我们将我们的非线性领域代理文件遍历系统与没有直接仓库访问权限的基础LLM、具有持久Python REPL的单代理递归语言模型(RLM)基线以及使用Codex 5.5 High的外部CLI基线进行比较。使用小型Haiku类模型的领域范围并行代理生成在Haiku类模型中实现了最高的微F1分数,且领先幅度较大。在我们自己的扩展基准(包括2025年和2026年更近期的PR)上,领域代理仅次于更大的Codex 5.5 High。在原始、精选的2020年SWE-bench Pro基准上,较大的Sonnet普通LLM基线通过预测少量文件获得了更高的微F1分数,从而实现了更高的精确度,但所有黄金召回率显著较低。我们还提出了三个额外发现。首先,文档演化是所有方法都未解决的潜在依赖关系。其次,天真的文件系统访问可能会因测试文件过度预测而降低定位性能。最后,强制多代理协商没有明显帮助,并且会大幅增加令牌成本。

英文摘要

Software engineering tools increasingly rely on LLM based agents to localize files to change to resolve a software issue. Most AI agents explore repositories linearly, that is, visiting one directory or file per step. We postulate that this is a structural mismatch for changes that span several subsystems. We compare linear sequential exploration against non-linear, domain-scoped parallel agentic exploration. Using SWE Bench Pro as initial benchmark, we focus on ansible as an exemplar. We construct an approach for persistent-session evaluation of GitHub issues anchored at a single base commit. We compare our non-linear domain-agent file traversal system against a base LLM without direct repository access, a single agent Recursive Language Model (RLM) baseline with a persistent Python REPL and an external CLI baseline using Codex 5.5 High. Domain scoped parallel agent spawning with a small Haiku-class model achieves the highest micro F1 among Haiku class models by a large margin. Domain-agents is the second highest behind only the much larger Codex 5.5 High on our own expanded benchmark including over more recent PRs from 2025 and 2026. On the original, curated, 2020 SWE-bench Pro benchmark, a larger Sonnet plain LLM baseline attains higher micro F1 by predicting few files, leading to higher precision, but at significantly lower all gold recall. We also present three additional findings. First, documentation evolution is a latent dependency unresolved by any approach. Second, naive file system access can degrade localization driven by test-file over prediction. Lastly, forced multi-agent consultation does not measurably help and raises token cost substantially.

2606.11946 2026-06-11 cs.DB cs.CC cs.LG cs.LO 新提交

Neuro-Relational Programs: Unifying Queries and Neural Computation over Structured Data

神经关系程序:统一结构化数据上的查询与神经计算

Arie Soeteman, Balder ten Cate, Maurice Funk, Benny Kimelfeld, Carsten Lutz, Moritz Schönherr

发表机构 * ILLC, University of Amsterdam(伊拉斯谟罗素学院,阿姆斯特丹大学) Leipzig University(莱比锡大学) ScaDS.AI Center(ScaDS.AI研究中心) Technion(技术学院) RelationalAI(关系AI)

AI总结 提出神经关系程序(NRP),一种扩展Datalog规则的声明式查询语言,通过嵌入操作融合关系推理与可学习神经组件,实现关系数据上的通用神经计算。

Comments 37 pages

详情
AI中文摘要

在关系数据库上进行深度学习的传统方法是将图神经网络(GNN)等神经模型应用于数据库的图表示。最近的方法则直接操作数据库,将元组与嵌入关联,并扩展查询机制以联合处理嵌入和关系内容。受这些发展的启发,我们引入了神经关系程序(NRP),这是一种针对关系数据库的声明式查询语言,其事实携带数值向量嵌入。NRP扩展了Datalog风格的规则,增加了组合、聚合和转换嵌入的操作,从而在单一形式主义中交错关系推理和可学习神经组件。这产生了一种对关系数据进行神经计算的通用方法:NRP既可以看作带有可训练组件的查询计划,也可以看作内置关系结构的神经架构。NRP的自然语法片段恢复了现有架构和查询形式主义。零元NRP对应于非自适应查询算法;一元NRP推广了GNN风格的消息传递,并精确捕捉了深度同态网络,我们将这一联系扩展到带有行ID的数据库上的前沿保护NRP。我们通过FOCQ(一阶逻辑在实权重结构上的计数扩展)刻画了带有ReLU-FFN变换的无限制NRP的表达能力,从而建立了与有序数据库上的均匀TC$^0$的精确联系。这些结果共同确立了NRP作为关系数据上查询和神经计算的广泛声明式框架。

英文摘要

The conventional approach to deep learning over relational databases applies neural models, such as Graph Neural Networks (GNNs), to a graph representation of the database. Recent approaches instead operate on databases directly, associating tuples with embeddings and extending query mechanisms to jointly process embeddings and relational content. Inspired by these developments, we introduce Neuro-Relational Programs (NRPs), a declarative query language for relational databases whose facts carry numeric vector embeddings. NRPs extend Datalog-style rules with operations that combine, aggregate, and transform embeddings, thereby interleaving relational reasoning and learnable neural components within a single formalism. This yields a general approach to neural computation over relational data: an NRP can be read both as a query plan with trainable components and as a neural architecture with relational structure built in. Natural syntactic fragments of NRPs recover existing architectures and query formalisms. Zero-ary NRPs correspond to non-adaptive query algorithms; monadic NRPs generalize GNN-style message passing and precisely capture Deep Homomorphism Networks, a connection that we extend to frontier-guarded NRPs over databases with row-ids. We characterize the expressive power of unrestricted NRPs with ReLU-FFN transformations by FOCQ, an extension of first-order logic with counting interpreted over real-weighted structures, yielding a precise connection with uniform TC$^0$ over ordered databases. Together, these results establish NRPs as a broad declarative framework for querying and neural computation over relational data.

2606.11916 2026-06-11 cs.SE cs.AI 新提交

Characterizing Software Aging in GPU-Based LLM Serving Systems

基于GPU的大语言模型服务系统中的软件老化特征分析

Domenico Cotroneo, Bojan Cukic

发表机构 * College of Computing and Informatics, University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校计算机与信息学院)

AI总结 提出一种实证方法研究GPU大语言模型服务系统中的软件老化,通过216小时实验发现所有部署均存在显著内存老化,泄漏率与运行时和配置强相关,并提供了可复现框架。

Comments 7 pages

详情
AI中文摘要

本文提出了一种实证方法,用于研究基于GPU的大语言模型服务系统中的软件老化。传统的老化研究侧重于以CPU为中心的软件,且工作负载相对规律;而大语言模型服务则不同,它跨越Python主机和CUDA设备,处理成本相差数个数量级的请求,并依赖于快速演进的软件栈。我们在相同的压力条件下,对六个共置部署进行了216小时的实验,并行监控主机、设备和客户端指标,并应用了考虑自相关和多重比较的统计流程。结果显示,所有部署均存在统计上显著的内存老化,泄漏率强烈依赖于服务运行时和部署配置。除这些发现外,我们还提供了一个可复现的框架,为软件老化与再生领域以及大语言模型服务社区开辟了交叉研究方向。

英文摘要

This paper proposes an empirical methodology to study software aging in GPU-based LLM serving systems. Traditional aging studies focus on CPU-centric software with relatively regular workloads; LLM serving is different, spanning a Python host and a CUDA device, handling requests whose cost varies by orders of magnitude, and relying on rapidly evolving software stacks. We run a 216-hour campaign across six co-located deployments under identical stress conditions, monitor host, device, and client metrics in parallel, and apply a statistical pipeline that accounts for autocorrelation and multiple testing. Our results reveal statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and deployment configuration. Beyond these findings, we provide a reproducible framework that opens a research direction at the intersection of the software aging and rejuvenation and LLM serving communities.

2606.11914 2026-06-11 eess.SP cs.LG 新提交

NARRAS: Edge-Triggered Distributed Inference for CSI-Based Localization in Vehicular IoT Networks

NARRAS:车载物联网中基于CSI的定位的边缘触发分布式推理

Rodrigo Oliver, Ricardo Vazquez Alvarez, Alejandro Lancho, Stefano Rini

发表机构 * Signal Theory and Communications Department, Universidad Carlos III de Madrid(信号理论与通信系,卡洛斯三世大学马德里分校) Gregorio Marañón Health Research Institute(格雷戈里奥·马兰农健康研究 institute) Department of Electrical and Computer Engineering, National Yang-Ming Chiao-Tung University(电子与计算机工程系,国家阳明交通大学) German Aerospace Center (DLR)(德国航空航天中心(DLR))

AI总结 针对分布式天线阵列CSI定位中资源受限问题,提出NARRAS边缘触发分布式推理策略,各阵列本地决策是否上报观测,通过可微活动惩罚和通道图正则化实现预算控制,在低活动率下提升定位精度。

Comments 10 pages, 5 figures, 5 tables. Under review at the IEEE Internet of Things Journal

详情
AI中文摘要

基于CSI的定位与空间分布式天线阵列存在基本的资源权衡。每个阵列可以提供丰富的信道视图,但当只有少数阵列携带有用信息时,将所有阵列的观测结果转发到融合中心是浪费的,且共享上行链路仅支持有限数量的同时传输。我们让每个阵列本地决定其当前观测是否值得报告,受限于平均活跃发射机数量的预算。我们将这种抽象称为边缘触发分布式推理(ETDI)。它捕获了一类更广泛的任务导向通信问题,其中资源受限设备共享接入信道以完成共同推理任务。我们将ETDI实例化用于基于CSI的定位,这是车载物联网中的常见场景。空间分布的远程天线阵列(RAA)将来自用户设备(UE)传输的本地信道状态信息(CSI)编码为潜在特征,融合中心根据报告的特征子集估计UE位置。我们提出NARRAS,一种去中心化的报告策略,其中每个RAA将其最近观测的循环摘要与其最后传输的潜在记忆相结合。训练通过可微活动惩罚和验证校准的确定性阈值来控制显式活动预算,并使用通道图正则化来塑造潜在几何结构。实验表明,在可比的上行链路活动下,NARRAS比学习型和启发式稀疏报告策略提高了定位精度,而密集全报告模型仍然作为有用的无预算参考。在低活动率下,图正则化进一步减少了高百分位定位误差,表明几何感知的潜在表示在稀疏报告下更加鲁棒。

英文摘要

CSI-based localization with spatially distributed antenna arrays exposes a basic resource trade-off. Each array can provide a rich view of the channel, but forwarding observations from all arrays to a fusion center is wasteful when only a few carry useful information, and the shared uplink supports only a limited number of simultaneous transmissions. We let each array decide locally whether its current observation is worth reporting, subject to a budget on the average number of active transmitters. We refer to this abstraction as Edge-Triggered Distributed Inference (ETDI). It captures a broader class of task-oriented communication problems where resource-constrained devices share an access channel for a common inference task. We instantiate ETDI for CSI-based localization, a common scenario in vehicular IoT networks. Spatially distributed remote antenna arrays (RAAs) encode local channel state information (CSI) from user equipment (UE) transmissions into latent features, and the fusion center estimates the UE position from the subset of reported features. We propose NARRAS, a decentralized reporting policy in which each RAA combines a recurrent summary of its recent observations with a memory of the last latent it transmitted. Training controls an explicit activity budget through differentiable activity penalties and validation-calibrated deterministic thresholds, and uses channel-chart regularization to shape the latent geometry. Experiments show that, at comparable uplink activity, NARRAS improves localization accuracy over learned and heuristic sparse-reporting strategies, while dense full-report models remain useful budget-free references. In low-activity regimes, chart regularization further reduces high-percentile localization errors, suggesting that geometry-aware latent representations are more robust under sparse reporting.