arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
2601.22400 2026-05-11 quant-ph cs.AI

Spectral Filtering for Complex Linear Dynamical Systems

Elad Hazan, Annie Marsden

发表机构 * Princeton University(普林斯顿大学) Google(谷歌) DeepMind(深度Mind)

AI总结 本文研究了具有扇形有界谱的复值线性动态系统(CLDS)的学习问题,这类系统广泛存在于信号处理、结构状态空间模型和量子系统中。作者提出了一种基于Slepian基的谱滤波方法,证明了系统的可学习性由一个与状态空间维度无关的有效维度所决定。该方法进一步推导出适用于CLDS序列预测的维度无关的遗憾界,为复杂动态系统的高效学习提供了理论保证。

详情
英文摘要

We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduce a spectral filtering method based on the Slepian basis and show that learnability is governed by an effective dimension independent of the ambient state dimension. As a consequence, we obtain dimension-free regret bounds for sequence prediction in CLDS with spectrum contained in a sector of the unit disk.

2601.21951 2026-05-11 stat.ML cs.LG stat.CO

Diffusion Path Samplers via Sequential Monte Carlo

James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz

发表机构 * Department of Mathematics, Imperial College London, UK(伦敦帝国学院数学系,英国)

AI总结 本文提出了一种基于扩散路径的采样方法,用于从仅知归一化常数的目标分布中进行采样。研究通过构建一条从简单基础分布到目标分布的扩散路径,并结合序贯蒙特卡洛方法,高效估计时间变化分布的得分函数和密度函数。为降低得分估计的方差,作者还设计了实用的控制变量调度策略,并将该框架应用于多种扩散路径模型,理论分析与实验结果均验证了方法的有效性。

详情
英文摘要

We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models. We tackle the score estimation problem by developing an efficient sequential Monte Carlo sampler that evolves auxiliary variables from conditional distributions along the path, providing principled score and density estimates for time-varying distributions. To control the variance of score estimates, we further propose practical control variate schedules that incur minimal overhead. We adapt this general framework to paths induced by the Ornstein-Uhlenbeck (OU) time-reversal process, stochastic interpolants, and diffusion annealed Langevin dynamics, outlining their trade-offs. Finally, we provide theoretical guarantees and empirically demonstrate the effectiveness of our method on several synthetic and real-world datasets.

2512.19408 2026-05-11 math.NA cs.CE cs.NA cs.RO cs.SY eess.SY math.DS

Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework

Philipp L. Kinon, Simon R. Eugster, Peter Betsch

发表机构 * Karlsruhe Institute of Technology (KIT)(卡尔斯鲁厄理工学院) Eindhoven University of Technology (TU/e)(埃因霍温理工大学)

AI总结 本文提出了一种基于能量的非线性空间Cosserat杆动力学建模框架,适用于大位移和大旋转情况。该方法采用混合变量形式,独立处理位移、速度和应力变量,并通过引入方向量描述有限旋转,避免了奇点并保持质量矩阵恒定,最终形成一个具有二次能量泛函的无限维端口哈密顿系统。通过结构保持的有限元离散化,得到具有哈密顿结构的有限维系统,有利于设计能量-动量一致的积分方案,并自然地集成阻尼材料行为和非标准驱动方式,为计算力学中涉及有限旋转的问题提供了新的能量-动量一致建模方法。

Comments 39 pages, 16 figures

详情
英文摘要

An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.

2512.14018 2026-05-11 cs.SE cs.AI

PerfCoder: Large Language Models for Interpretable Code Performance Optimization

Jiuding Yang, Shengyao Lu, Hongxuan Liu, Shayan Shirahmad Gale Bagi, Zahra Fazel, Tomasz Czajkowski, Di Niu

发表机构 * University of Alberta(阿尔伯塔大学) University of Victoria(维多利亚大学) Huawei Technologies Ltd.(华为技术有限公司)

AI总结 PerfCoder 是一种专门用于生成高性能代码的大语言模型,旨在解决当前模型在代码性能优化方面能力不足的问题。该模型通过可解释的定制化优化策略,结合真实优化轨迹和人类注释进行微调,并利用运行时测量进行强化学习对齐,从而直接提出并应用针对性的性能改进方案。实验表明,PerfCoder 在代码性能基准 PIE 上显著优于现有模型,同时还能生成可解释的代码反馈,提升大模型在代码优化任务中的表现。

详情
英文摘要

Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only due to data scarcity but, more importantly, because they lack supervision that guides interpretable and effective performance improvements. In this work, we introduce PerfCoder, a family of LLMs specifically designed to generate performance-enhanced code from source code via interpretable, customized optimizations. PerfCoder is fine-tuned on a curated collection of real-world optimization trajectories with human-readable annotations, and preference-aligned by reinforcement fine-tuning using runtime measurements, enabling it to propose input-specific improvement strategies and apply them directly without relying on iterative refinement. On the PIE code performance benchmark, PerfCoder surpasses all existing models in both runtime speedup and effective optimization rate, demonstrating that performance optimization cannot be achieved by scale alone but requires optimization stratetgy awareness. In addition, PerfCoder can generate interpretable feedback about the source code, which, when provided as input to a larger LLM in a planner-and-optimizer cooperative workflow, can further improve outcomes. Specifically, we elevate the performance of 32B models and GPT-5 to new levels on code optimization, substantially surpassing their original performance.

2512.05967 2026-05-11 cs.IR cs.AI cs.CL cs.LG

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

Francesco Granata, Francesco Poggi, Misael Mongiovì

发表机构 * Department of Mathematics and Computer Science, University of Catania, Italy(卡塔尼亚大学数学与计算机科学系) Institute of Cognitive Sciences and Technologies (ISTC), National Research Council of Italy (CNR)(意大利国家研究委员会认知科学与技术研究所(ISTC))

AI总结 在大型语言模型时代,检索增强生成(RAG)架构因其能基于可靠知识源生成文本而受到关注,但在专业领域中,仅依赖语义相似性的RAG系统常因术语歧义影响检索准确性。本文提出ELERAG,一种结合实体链接技术的增强型RAG架构,旨在提升教育问答系统的事实准确性,特别是在意大利语环境下。通过引入基于Wikidata的实体链接模块和混合重排序策略,实验表明ELERAG在专业领域数据集上显著优于传统方法,验证了领域适配的混合策略在提升教育类RAG系统事实精度中的有效性。

Journal ref Big Data and Cognitive Computing, 10(4), 120. 2026

详情
英文摘要

In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their effectiveness, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes ELERAG, an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements a hybrid re-ranking strategy based on Reciprocal Rank Fusion (RRF). To validate our approach, we compared it against standard baselines and state-of-the-art methods, including a Weighted-Score Re-ranking, a standalone Cross-Encoder and a combined RRF+Cross-Encoder pipeline. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, ELERAG significantly outperforms both the baseline and the Cross-Encoder configurations. Conversely, the Cross-Encoder approaches achieve the best results on the general-domain dataset. These findings provide strong experimental evidence of the domain mismatch effect, highlighting the importance of domain-adapted hybrid strategies to enhance factual precision in educational RAG systems without relying on computationally expensive models trained on disparate data distributions. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.

2510.22944 2026-05-11 cs.CR cs.AI

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang, Yenan Huang, Hui Li

发表机构 * Organization(机构)

AI总结 本文研究了良性但表述不佳的提示对大型语言模型生成代码安全性的影响,提出了一个包含目标清晰度、信息完整性和逻辑一致性的提示质量评估框架,并构建了CWE-BENCH-PYTHON基准数据集。实验表明,提示规范性越低,生成的代码越不安全,而使用思维链和自我修正等高级提示技术可显著提升代码安全性。该研究强调提升用户提示质量是增强AI生成代码安全性的关键策略。

Comments Accepted for publication in Empirical Software Engineering (EMSE) Journal

详情
英文摘要

Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quality encompassing three key dimensions: goal clarity, information completeness, and logical consistency. Based on this framework, we construct and publicly release CWE-BENCH-PYTHON, a large-scale benchmark dataset containing tasks with prompts categorized into four distinct levels of normativity (L0-L3). Extensive experiments on multiple state-of-the-art LLMs reveal a clear correlation: as prompt normativity decreases, the likelihood of generating insecure code consistently and markedly increases. Furthermore, we demonstrate that advanced prompting techniques, such as Chain-of-Thought and Self-Correction, effectively mitigate the security risks introduced by low-quality prompts, substantially improving code safety. Our findings highlight that enhancing the quality of user prompts constitutes a critical and effective strategy for strengthening the security of AI-generated code.

2510.00322 2026-05-11 cs.CR cs.CC cs.DS cs.LG

Privately Estimating Black-Box Statistics

私有估计黑盒统计量

Günter F. Steinke, Thomas Steinke

发表机构 * University of Canterbury(坎特伯雷大学) Google DeepMind(谷歌深Mind)

AI总结 本文提出一种在黑盒函数上实现差分隐私的方案,平衡统计效率与 oracle 效率,并展示其近最优性。

详情
AI中文摘要

标准差分隐私估计技术如拉普拉斯或高斯噪声添加需要保证估计器的敏感性界。但这些敏感性界往往较大或未知。因此我们寻求适用于任意黑盒函数的差分隐私方法。已存在少量此类技术,但它们要么在数据使用上效率低下,要么需要评估指数数量的输入。本文提出一种方案,权衡统计效率(即所需数据量)和 oracle 效率(即评估次数)。我们还展示了下界,证明该方案的近最优性。

英文摘要

Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this work we present a scheme that trades off between statistical efficiency (i.e., how much data is needed) and oracle efficiency (i.e., the number of evaluations). We also present lower bounds showing the near-optimality of our scheme.

2509.08350 2026-05-11 physics.soc-ph cs.LG math.AT

Chordless cycle filtrations for dimensionality detection in complex networks via topological data analysis

无弦循环过滤法用于通过拓扑数据分析在复杂网络中检测维度性

Aina Ferrà Marcús, Robert Jankowski, Meritxell Vila Miñana, Carles Casacuberta, M. Ángeles Serrano

发表机构 * Universitat de Barcelona Institute of Complex Systems (UBICS), Universitat de Barcelona, Barcelona, Spain(巴塞罗那大学复杂系统研究所(UBICS)、巴塞罗那大学、西班牙巴塞罗那) Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 CD, Delft, Netherlands(代尔夫特理工大学电子工程、数学和计算机科学学院、荷兰代尔夫特) Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA(复杂网络与系统研究中心、信息学、计算与工程学院、印第安纳大学、美国布卢明顿)

AI总结 本文提出基于无弦循环的拓扑数据分析加权方案,用于数据驱动估计网络维度,结合代数拓扑和机器学习,提供鲁棒方法揭示复杂网络隐藏几何结构。

详情
AI中文摘要

许多复杂网络,从社会到生物系统,表现出与潜在双曲几何一致的结构模式。揭示此潜在空间的维度可以解构社区结构复杂性,影响高效网络导航,并根本塑造连接性和系统行为。我们引入基于无弦循环的图拓扑数据分析加权方案,以数据驱动方式估计网络维度。我们进一步表明,所得到的描述符可以有效利用一个在为该目的构建的合成图数据库上训练的神经网络架构来估计网络维度,该数据库不需要重新训练即可有效转移到现实网络。因此,通过结合循环感知过滤、代数拓扑和机器学习,我们的方法提供了一种稳健有效的方法,以揭示复杂网络的隐藏几何结构,并指导准确建模和低维嵌入。

英文摘要

Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact efficient network navigation, and fundamentally shape connectivity and system behavior. We introduce a topological data analysis weighting scheme for graphs based on chordless cycles to estimate network dimensionality in a data-driven way. We further show that the resulting descriptors can effectively estimate network dimensionality using a neural network architecture trained on a synthetic graph database constructed for this purpose, which requires no retraining to transfer effectively to real-world networks. Thus, by combining cycle-aware filtrations, algebraic topology, and machine learning, our approach provides a robust and effective method for uncovering the hidden geometry of complex networks and guiding accurate modeling and low-dimensional embedding.

2508.10880 2026-05-11 cs.CR cs.AI cs.CL

Searching for Privacy Risks in LLM Agents via Simulation

通过模拟搜索LLM代理中的隐私风险

Yanzhe Zhang, Diyi Yang

发表机构 * Georgia Tech(佐治亚理工学院) Stanford University(斯坦福大学)

AI总结 本文提出通过模拟搜索改进攻击与防御策略,发现攻击策略从直接请求演变为伪装和伪造同意等复杂手段,防御策略从规则约束升级为身份验证状态机,为隐私安全代理发展提供洞察。

Comments ICLR 2026

详情
AI中文摘要

LLM-based agents的广泛应用可能引入关键隐私威胁:恶意代理通过多轮交互主动提取敏感信息。然而,动态对话的演变性使预测新兴漏洞和设计有效防御具有挑战性。为此,我们提出一个基于搜索的框架,通过模拟隐私关键代理交互交替改进攻击和防御策略。具体而言,我们利用LLM作为优化器分析模拟轨迹,并迭代提出新代理指令。为更高效探索策略空间,我们进一步利用并行搜索与多线程交叉传播。通过此过程,发现攻击策略从直接请求演变为伪装和伪造同意等复杂手段,而防御策略从简单规则约束升级为稳健的身份验证状态机。发现的攻击和防御策略在多样化场景和基础模型中具有通用性,为开发隐私感知代理提供有用见解。

英文摘要

The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues makes it challenging to anticipate emerging vulnerabilities and design effective defenses. To tackle this problem, we present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions. Specifically, we employ LLMs as optimizers to analyze simulation trajectories and iteratively propose new agent instructions. To explore the strategy space more efficiently, we further utilize parallel search with multiple threads and cross-thread propagation. Through this process, we find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery, while defenses evolve from simple rule-based constraints to robust identity-verification state machines. The discovered attacks and defenses generalize across diverse scenarios and backbone models, providing useful insights for developing privacy-aware agents.

2508.02001 2026-05-11 cs.NI cs.LG

Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC

多功能且高效的网络流量分析:将网络基础模型卸载到智能网卡

Chungang Lin, Xuying Meng, Tianyu Zuo, Weiyao Zhang, Meng Shen, Ruijie Zhao, Guanming Che, Ruiqi Meng, Ziyue Huang, Haitong Luo, Zhiwei Xu, Yujun Zhang

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences, China(中国科学院计算技术研究所) University of Chinese Academy of Sciences, China(中国科学院大学) University of Virginia, USA(弗吉尼亚大学) Beijing Institute of Technology, China(北京理工大学) Southeast University, China(东南大学) Northeastern University, China(东北大学) Haihe Lab of ITAI, China(ITAI海河实验室) Corresponding authors(通讯作者)

AI总结 本文提出Nepco系统,通过将网络基础模型卸载到智能网卡,实现多功能且高效的网络流量分析,减少端到端延迟。

Comments Under review

详情
AI中文摘要

普及加密使大规模标注在流量分析中不可行,而安全操作要求边缘分析以避免服务降级和进一步漏洞。这些压力产生了两个分离的研究方向:1)通过网络基础模型实现多功能分析,以降低标注依赖;2)通过硬件卸载实现高效分析,以降低分析延迟。然而,多功能和高效似乎从根本上不可兼得,以往研究通常牺牲一方以换取另一方,但我们表明这种不可兼得是由于流量分析系统三个组成部分(即流量处理、模型架构和分析执行)设计选择极化的结果。为此,我们提出了Nepco,一种多功能且高效的网络流量分析系统,将网络基础模型卸载到智能网卡。我们的关键观察是,判别性流量信息集中在局部字节区域,这促使我们采用多功能且高效的局部字节序列建模,而不是低效的全局建模。为了利用这一特性而不产生复杂编码步骤的延迟瓶颈,我们采用了一种硬件友好的处理流程,直接嵌入原始字节序列。关键的是,为了在多样化任务中保持多功能性,我们提出了一种具有专用评分和门控机制的模式感知卷积架构。通过利用翻译不变性,该设计动态定位并提取显著的语义特征。我们将在Nvidia BlueField-3智能网卡上原型Nepco,并进行多引擎协作分析执行。实验结果表明,Nepco在宏F1上与8种最先进的网络基础模型的最佳性能相媲美,同时将端到端延迟减少328倍,达到毫秒级。

英文摘要

Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with prior work consistently sacrificing one for the other, yet we show that this incompatibility is a consequence of polarized design choices across the three components of traffic analysis systems, i.e., traffic processing, model architecture, and analysis execution. In response, we present Nepco, a versatile yet efficient network traffic analysis system that offloads network foundation models to SmartNIC. Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling. To exploit this without incurring the latency bottlenecks of complex encoding steps, we employ a hardware-friendly processing pipeline that directly embeds raw byte sequences. Crucially, to maintain versatility across diverse tasks, we propose a pattern-aware convolutional architecture equipped with dedicated scoring and gating mechanisms. By exploiting translation invariance, this design dynamically locates and extracts salient semantic signatures. We prototype Nepco on the Nvidia BlueField-3 SmartNIC with multiengine collaborative analysis execution. The experimental results demonstrate that Nepco achieves macro F1 competitive with the best performances achieved by 8 state-of-the-art network foundation models, while reducing end-to-end latency by 328x to the millisecond scale.

2506.04565 2026-05-11 cs.MA cs.CL

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

从独立大语言模型到整合智能:复合AI系统综述

Jiayi Chen, Junyi Ye, Guiling Wang

发表机构 * New Jersey Institute of Technology(新泽西理工学院)

AI总结 本文综述了复合AI系统,探讨了其整合大语言模型与外部组件的方法,分析了四种基础范式,并指出规模化、互操作性等挑战及未来研究方向。

详情
AI中文摘要

复合AI系统(CAIS)是一种新兴范式,通过整合大语言模型(LLMs)与外部组件,如检索器、代理、工具和协调器,以克服独立模型在需要记忆、推理、实时接地和多模态理解任务中的局限性。这些系统通过将多个专用模块组合成连贯的工作流,实现更强大且上下文感知的行为。尽管在学术界和工业界日益普及,CAIS领域仍碎片化且缺乏统一的分析、分类和评估框架。本文定义了CAIS的概念,提出了基于组件角色和协调策略的多维分类法,并分析了四种基础范式:检索增强生成(RAG)、LLM代理、多模态LLM(MLLM)和协调。我们回顾了代表性系统,比较了设计权衡,并总结了这些范式中的评估方法。最后,我们指出了关键挑战——包括可扩展性、互操作性、基准测试和协调——并概述了未来研究的前景。本文旨在为研究人员和实践者提供全面的基础,以理解和开发下一代系统级人工智能。

英文摘要

Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented and lacks a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and Orchestration. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges - including scalability, interoperability, benchmarking, and coordination - and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.

2505.11325 2026-05-11 stat.ME cs.AI cs.LG stat.CO stat.ML

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

利用鞅后验的先验-数据拟合网络不确定性量化

Thomas Nagler, David Rügamer

发表机构 * Department of Statistics, LMU Munich(统计系,慕尼黑大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心(MCML))

AI总结 本文提出一种高效无调参的采样方法,基于鞅后验构建贝叶斯后验,用于量化先验-数据拟合网络预测均值、分位数等的不确定性,通过模拟和实际数据验证了方法的有效性和校准性。

详情
AI中文摘要

先验-数据拟合网络(PFNs)已作为表格数据预测的有前景的基础模型出现,能够在小到中等数据规模上实现最先进的性能,而无需调参。尽管PFNs受贝叶斯思想启发,但它们不提供预测均值、分位数等量的不确定性量化。本文提出了一种原理上正确、高效且无调参的采样过程,基于鞅后验构建此类估计的贝叶斯后验,并证明其收敛性。几个模拟和实际数据示例展示了该方法在推断应用中的效率和校准性。

英文摘要

Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the efficiency and calibration of our method in inference applications.

2504.12922 2026-05-11 math.OC cs.LG math.LO math.PR

An abstract effective convergence theorem for stochastic processes, with applications to stochastic approximation

关于随机过程的有效收敛定理抽象,及其在随机逼近中的应用

Morenikeji Neri, Nicholas Pischke, Thomas Powell

发表机构 * Department of Mathematics, Technische Universität Darmstadt(达姆施塔特技术大学数学系) Department of Computer Science, University of Bath(巴斯大学计算机科学系)

AI总结 本文提出一个通用定理,研究满足放松超级鞅条件的随机过程的渐近行为,提供更一般的定量收敛保证,并展示其在随机逼近中的应用。

Comments 25 pages

详情
AI中文摘要

我们提供一个关于满足放松超级鞅条件的随机过程渐近行为的一般定理。本结果的特点是提供了比通常在随机逼近文献中看到的更高抽象和一般性的定量收敛保证,特别以一般模$τ$的形式表述,直观上捕捉相关解的期望唯一性变体。我们的收敛速率高度均匀,仅依赖于非常有限的数据之外的$τ$。然后,我们通过推导随机逼近中几个关键概念和定理的新定量版本,展示本结果作为统一框架的用处,包括Robbins-Siegmund定理、Dvoretzky的收敛定理以及随机准Fejér单调序列的收敛,后者在新的高度一般度量上下文中提出。在整个过程中,我们隔离并讨论了我们结果的特殊情形,允许构造快速,特别是线性的速率。我们结果及一般方法在随机逼近中的各种应用也进行了讨论,并特别在相关工作中明确推导。

英文摘要

We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abstraction and generality than is typically seen in the stochastic approximation literature, formulated in particular in terms of a general modulus $τ$ that, on an intuitive level, captures an effective variant of the uniqueness in expectation of associated solutions. Our convergence rate is highly uniform, depending on very few data beyond $τ$. We then demonstrate the utility of our result as a unifying framework by deriving new quantitative versions of several key concepts and theorems from stochastic approximation, including the Robbins-Siegmund theorem, Dvoretzky's convergence theorem, and the convergence of stochastic quasi-Fejér monotone sequences, the latter formulated in a novel and highly general metric context. Throughout, we isolate and discuss special cases of our results which allow for the construction of fast, and in particular linear, rates. Various applications of our results and our general methodology to stochastic approximation are discussed, and in particular explicitly derived in related work of the authors.

2204.05551 2026-05-11 math.OC cs.LG cs.SY eess.SY math.DS

Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems

网络化系统近最优分布式线性二次调节器

Sungho Shin, Yiheng Lin, Guannan Qu, Adam Wierman, Mihai Anitescu

发表机构 * Mathematics and Computer Science Division, Argonne National Laboratory(阿贡国家实验室数学与计算机科学部) California Institute of Technology(加州理工学院) Department of Electrical and Computer Engineering, Carnegie Mellon University(卡内基梅隆大学电气与计算机工程系) Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 本文研究了在线性二次控制设置中,去中心化程度与控制器性能之间的权衡。通过分析图上相互关联的智能体系统及一种称为κ-分布式控制的控制器,展示了在温和假设下,κ-分布式控制与集中最优控制的性能差异随κ指数级减小,表明适度去中心化可实现近最优性能。

Journal ref SIAM Journal on Control and Optimization, 2023

详情
AI中文摘要

本文研究了在线性二次控制设置中去中心化程度与控制器性能之间的权衡。我们研究了一个图上相互关联的智能体系统及一种称为κ-分布式控制的控制器,该控制器允许智能体基于图上距离κ内的状态信息做出控制决策。此控制器可通过参数κ调节去中心化程度,从而表征去中心化与性能之间的关系。我们证明,在温和假设下,包括可控性、可检测性和子指数增长图条件,κ-分布式控制与集中最优控制的性能差异随κ呈指数级减小。这一结果表明,分布式控制可通过适度的去中心化实现近最优性能,因此是大规模网络化系统的有效控制器架构。

英文摘要

This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $κ$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $κ$-distributed control and centralized optimal control becomes exponentially small in $κ$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems.

2005.06674 2026-05-11 math.OC cs.LG math.DS

On the Convergence of Overlapping Schwarz Decomposition for Nonlinear Optimal Control

关于非线性最优控制中重叠Schwarz分解的收敛性

Sen Na, Sungho Shin, Mihai Anitescu, Victor M. Zavala

发表机构 * Department of Statistics, University of Chicago(芝加哥大学统计学系)

AI总结 本文研究了用于求解非线性最优控制问题的重叠Schwarz分解算法的收敛性,证明了其局部线性收敛性,并展示了重叠区域大小对收敛速率的指数影响,同时建立了二次规划的全局收敛性结果。

Comments 16 pages

Journal ref IEEE Transactions on Automatic Control, 2022

详情
AI中文摘要

我们研究了用于求解非线性最优控制问题(OCPs)的重叠Schwarz分解算法的收敛性质。该算法将时间域分解为一组重叠子域,并并行求解所有子域上的子问题。通过在重叠子域边界更新对偶信息来实现收敛。我们证明该算法具有局部线性收敛性,且重叠区域大小增加时收敛速率呈指数提升。同时,我们为一般二次规划建立了全局收敛性结果,这使得Schwarz方案可以应用于二次优化算法(如序列二次规划)。我们的收敛性分析的理论基础是非线性OCPs的灵敏度结果,称为“灵敏度指数衰减”(EDS)。直观上,EDS表明在域边界(即初始和终端时间)上的扰动对解的影响随着进入域的深入而呈指数衰减。在此,我们扩展了文献中已有的分析,证明在均匀二次充分条件、可控性条件和有界条件下的非线性OCPs的对偶解也满足EDS。我们通过四旋翼运动规划问题和PDE控制问题的实验验证了我们的理论,并展示了该方法比ADMM更高效,与集中求解器Ipopt同样高效。

英文摘要

We study the convergence properties of an overlapping Schwarz decomposition algorithm for solving nonlinear optimal control problems (OCPs). The algorithm decomposes the time domain into a set of overlapping subdomains, and solves all subproblems defined over subdomains in parallel. The convergence is attained by updating primal-dual information at the boundaries of overlapping subdomains. We show that the algorithm exhibits local linear convergence, and that the convergence rate improves exponentially with the overlap size. We also establish global convergence results for a general quadratic programming, which enables the application of the Schwarz scheme inside second-order optimization algorithms (e.g., sequential quadratic programming). The theoretical foundation of our convergence analysis is a sensitivity result of nonlinear OCPs, which we call "exponential decay of sensitivity" (EDS). Intuitively, EDS states that the impact of perturbations at domain boundaries (i.e. initial and terminal time) on the solution decays exponentially as one moves into the domain. Here, we expand a previous analysis available in the literature by showing that EDS holds for both primal and dual solutions of nonlinear OCPs, under uniform second-order sufficient condition, controllability condition, and boundedness condition. We conduct experiments with a quadrotor motion planning problem and a PDE control problem to validate our theory; and show that the approach is significantly more efficient than ADMM and as efficient as the centralized solver Ipopt.

2605.07517 2026-05-11 cs.IR cs.AI

LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation

LARAG:超链接技术文档中基于链接的RAG系统检索策略

Giorgia Bolognesi, Claudio Estatico, Ulderico Fugacci, Isabella Mastroianni, Claudio Muselli, Luca Oneto

发表机构 * Rulex s.r.l.(Rulex公司) Department of Mathematics (DIMA), University of Genoa(热那亚大学数学系) Department of Computer Science, Bioengineering, Robotics, and Systems Engineering (DIBRIS), University of Genoa(热那亚大学计算机科学、生物工程、机器人学和系统工程系)

AI总结 LARAG通过利用HTML文档中已有的超链接结构,改进RAG系统检索效果,提升答案质量并降低资源消耗。

详情
AI中文摘要

检索增强生成(RAG)通过将大型语言模型的输出与外部文档结合,增强其事实准确性。然而,标准基于嵌入的检索器将自然结构化的语料库(如技术手册)视为片段的扁平集合,忽略了用户在导航此类内容时依赖的超链接拓扑。我们引入LARAG(基于链接的RAG):一种轻量级、基于链接的检索策略,利用HTML文档中已有的超链接结构,在片段表示中编码超链接关系作为元数据,并利用它们执行一种类似图状的局部相关内容检索。在对Rulex平台技术文档的二十个专家设计查询和四种提示策略的基准测试中,LARAG始终提升答案质量,实现最高的BERTScore F1,同时检索的片段更少且生成的token更少,相较于比较用的基线RAG架构。这些结果表明,直接利用技术文档中已有的超链接拓扑(即使没有显式图构建或推理),能够实现一种隐式的图状检索形式,从而提供更准确且高效的RAG流程,以更低的成本实现更好的事实基础。

英文摘要

Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as flat collections of passages, thereby overlooking the hyperlink topology that users rely on when navigating such content. We introduce LARAG (Link-Aware RAG): a lightweight, link-aware retrieval strategy that leverages the author-defined hyperlink structure already present in HTML documentation, encoding hyperlink relations as metadata in the chunk representations and exploiting them to perform a form of graph-like retrieval of locally relevant content. In a benchmark of twenty expert-designed queries over Rulex Platform technical documentation and four prompting strategies, LARAG consistently improves answer quality, achieving the highest BERTScore F1, while retrieving fewer chunks and generating fewer tokens than a baseline RAG architecture used for comparison. These results show that directly leveraging the existing hyperlink topology of technical documentation, even without explicit graph construction or inference, enables an implicit form of graph-like retrieval that yields a more faithful and efficient RAG pipeline, providing better grounding at lower cost.

2605.07481 2026-05-11 cs.CR cs.AI

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Vaporizer: 拆解大型语言模型输出的水印方案

Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay

发表机构 * College of Computing and Data Science(计算与数据科学学院) Nanyang Technological University(南洋理工大学)

AI总结 本文研究了最新水印方案的有效性,通过多种攻击策略揭示现有水印系统的优劣,提出改进安全性的方法。

详情
AI中文摘要

本文探讨了最新状态的大型语言模型(LLM)输出水印方案。这些技术被声称是稳健、可扩展且生产级的,旨在促进LLM的负责任使用。我们分析了这些水印技术对大量修改文本攻击的有效性,这些攻击执行有针对性的语义变化而不改变文本内容的一般含义。我们的方法包括多个攻击策略,包括词汇改变、机器翻译,甚至神经改写。攻击效果通过两个目标标准测量:成功移除水印和保持语义内容。我们通过BERT分数、文本复杂度测量、语法错误和Flesch阅读易度指数评估语义保持。实验结果揭示了不同水印模型的有效性差异,得出相同结论:通过合理努力可以移除水印。本研究揭示了现有LLM水印系统的优缺点,提出了如何改进现有方案安全性的建议。

英文摘要

In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effectiveness of these watermarking techniques against an extensive collection of modified text attacks, which perform targeted semantic changes without altering the general meaning of the text content. Our approach encompasses multiple attack strategies, which include lexical alterations, machine translation, and even neural paraphrasing. The attack efficacy is measured with two target criteria - successful removal of the watermark and preservation of semantic content. We evaluate semantic preservation through BERT scores, text complexity measures, grammatical errors, and Flesch Reading Ease indices. The experimental results reveal varying levels of effectiveness among different watermarking models, with the same underlying result that it is possible to remove the watermark with reasonable effort. This study sheds light on the strengths and weaknesses of existing LLM watermarking systems, suggesting how they should be constructed to improve security of available schemes.

2605.07472 2026-05-11 cs.CR cs.AI cs.MA

HBEE: Human Behavioral Entropy Engine -- Pre-Registered Multi-Agent LLM Simulation of Peer-Suspicion-Based Detection Inversion

HBEE:人类行为熵引擎——预注册的多智能体LLM模拟中的基于同伴怀疑的检测反转

Vickson Ferrel

发表机构 * Faculty of Computer Science & Information Technology(计算机科学与信息科技学院) Universiti Malaysia Sarawak(马来西亚沙巴大学) Vixero Technology Enterprise(Vixero技术企业)

AI总结 本文通过预注册的多智能体LLM模拟,研究了基于同伴怀疑的检测反转问题,发现适应性对手在T_60时的怀疑度低于随机选择的无辜代理,且检测信号在适应性对手行为下解耦。

Comments 14 pages, 6 figures. Pre-registration document and full deviation log included in artifact

详情
AI中文摘要

内部威胁检测假设适应性内部人员会留下区分其与合法用户的行为痕迹。我们通过控制多智能体模拟器测试这一假设,预注册的五条件研究隔离了防御模式(级联 vs. 盲UEBA)与对手类型(朴素 vs. 适应性OPSEC)以及无内鬼对照,共100次运行(95次有效后预承诺排除)。主要发现是检测反转:在T_60时,适应性内鬼的怀疑度入度统计学上低于随机选择的无辜代理(Cliff's delta = -0.694,95% BCa CI [-0.855, -0.519],Mann-Whitney p << 0.01)。预注册预测方向相反。预注册等价性检验(H2)显示适应性OPSEC在两种防御模式下对内鬼的UEBA排名无显著变化。两种检测信号(同伴怀疑图入度和每代理UEBA排名)在适应性对手行为下解耦。我们明确界定了泛化范围:预注册的Gini校准检查(H4)返回失败,HBEE配对信息暴露Gini(0.213)与SNAP Enron参考(0.730)的差异为|Delta Gini|=0.52,超过等价性界限5倍。本文提出一个狭窄但令人惊讶的主张:在可控环境中,当适应性OPSEC可作为LLM指令实现时,同伴怀疑级联检测会反转。我们以开源许可证发布模拟器、预注册文档、冻结场景、原始 telemetry 和分析流程。

英文摘要

Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-condition study isolates defender mode (cascade vs. blind UEBA) crossed with adversary type (naive vs. adaptive OPSEC) plus a no-mole control, across 100 runs (95 valid after pre-committed exclusions). The primary finding is a detection inversion: at T_60, the adaptive mole's suspicion in-degree is statistically lower than a randomly selected innocent agent (Cliff's delta = -0.694, 95% BCa CI [-0.855, -0.519], Mann-Whitney p << 0.01). The pre-registered prediction was the opposite direction. A pre-registered equivalence test (H2) shows adaptive OPSEC produces no detectable shift in the mole's UEBA rank under either defender mode. The two detection signals (peer suspicion graph in-degree and per-agent UEBA rank) decouple under adaptive adversary behavior. We bound generalization explicitly: a pre-registered Gini calibration check (H4) returns FAIL, with HBEE pairwise message-exposure Gini (0.213) diverging from the SNAP Enron reference (0.730) by |Delta Gini| = 0.52, exceeding the equivalence bound by 5x. The paper makes a narrow but surprising claim: in a controlled environment where adaptive OPSEC is implementable as an LLM directive, peer-suspicion-cascade detection inverts. We release the simulator, pre-registration document, frozen scenarios, raw telemetry, and analysis pipeline under an open-source license.

2605.07158 2026-05-11 cs.IR cs.CL cs.LG

Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings

主题并非议程:文本嵌入的引用社区审计

Junseon Yoo

发表机构 * Pluto Labs(Pluto实验室)

AI总结 本文通过构建358万篇科学论文的增强引用图,发现文本嵌入的余弦相似性与研究议程的相关性在子领域层面合理,但在更高层次的议程层面失效,揭示了科学RAG系统中的关键缺陷。

Comments 16 pages, 4 figures, 4 tables

详情
AI中文摘要

向量搜索和检索增强生成(RAG)基于余弦相似性反映概念相关性的假设。我们测量这一假设的失效点。我们构建了一个覆盖358万篇科学论文的增强引用图,并通过Leiden CPM在两个粒度层面进行划分:子领域(L1)和研究议程(L2,每个L1内部的层级结构)。四种最先进的嵌入(Gemini、Qwen3-8B、Qwen3-0.6B、SPECTER2)在L1层面表现合理(45-52%的top-10相同率),但在L2层面失效:只有15-21%的top-10邻居共享查询的研究议程。绝对而言,每10篇检索的论文中有8篇偏离议程。失败现象在八个科学领域和所有四个模型中普遍存在;SPECTER2,尽管基于引用的对比训练,是最弱的。作为诊断探针,我们测试增强图是否也能作为检索信号:一个刻意简单的引用计数重排在LLM扩展布尔检索和纯BM25之上分别达到57.7%和59.6%的top-1 L2,针对80个精心编写的议程查询,比最佳余弦检索器(Gemini,50.6%)高出约9个百分点,比BM25单独使用高出20个百分点。该探针隔离了图中携带的议程匹配信号的一部分,但嵌入缺失了这一部分,将最近的单向量检索理论限制与科学RAG的实际失败模式联系起来。

英文摘要

Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific papers and partition it via Leiden CPM at two granularities: sub-field (L1) and research-agenda (L2, hierarchical inside each L1). Four state-of-the-art embeddings (Gemini, Qwen3-8B, Qwen3-0.6B, SPECTER2) clear the L1 bar reasonably (45-52% top-10 same-rate) but stop working at L2: only 15-21% of top-10 neighbors share the query's research agenda. In absolute terms, 8 of every 10 retrieved papers are off-agenda. The failure is universal across eight scientific domains and all four models; SPECTER2, despite its citation-based contrastive training, is the weakest. As a diagnostic probe, we test whether the same augmented graph also functions as a retrieval signal: a deliberately simple citation-count rerank reaches 57.7% top-1 L2 on top of LLM-expanded Boolean retrieval and 59.6% on top of plain BM25, on 80 curated agenda queries -- about 9 points above the best cosine retriever (Gemini, 50.6%) and 20 points above BM25 alone (39.3%). The probe isolates a slice of the agenda-matching signal the graph carries but the embeddings miss, connecting recent theoretical limits on single-vector retrieval to a concrete failure mode of scientific RAG.

2605.07145 2026-05-11 cond-mat.mtrl-sci cs.CV

Fine-tuning a vision-language model for fracture-surface morphology recognition

对骨折表面形貌识别进行视觉-语言模型的微调

Quanliang Liu, Jungtaek Kim, Kangwook Lee, Hyunseok Oh

发表机构 * Department of Materials Science & Engineering, University of Wisconsin–Madison(威斯康星大学麦迪逊分校材料科学与工程系) Department of Electrical & Computer Engineering, University of Wisconsin–Madison(威斯康星大学麦迪逊分校电气与计算机工程系) KRAFTON Ludo Robotics

AI总结 本文通过微调开源视觉-语言模型,利用定制化的13168张骨折表面图像数据集,提升了对骨折表面形貌的识别能力,展示了在材料表征中的应用潜力。

详情
AI中文摘要

视觉-语言模型(VLMs)在科学图像理解中展现出强大潜力,但通用模型常缺乏用于可靠材料表征所需的领域特定视觉知识。本文通过微调开源VLM(Qwen3-VL-32B-Instruct)对骨折表面图像进行分析,使用了经过精心编译的13,168张开源、文献挖掘的骨折表面图像数据集。形态注释由GPT-5.2-Reasoning(高)从图像和相关文献摘录生成,数据集进一步通过有针对性的手动收集和基于图像旋转的增强进行补充。所得到的专用模型在100张人工标注图像的基准测试中优于旗舰专有多模态模型。其精度达到0.92,相比基础Qwen3-VL-32B-Instruct为0.35,GPT-5.5-Reasoning(高)为0.58,Gemini 3.1 Pro-Reasoning(高)为0.78。数据集消融显示,手动收集罕见特征图像和通过图像旋转进行增强对识别较少见的骨折形貌特征有益。我们进一步讨论了微调模型与专有模型的集成使用,以结合骨折特定的视觉准确性与更广泛的多模态推理,用于自主的断裂图谱分析。尽管专注于断裂表面图像,本文展示了如何通过有针对性的收集和在新型特征图像上的微调,使VLMs能够识别这些特征并支持自主显微镜工作流中的下游决策。

英文摘要

Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-source VLM (Qwen3-VL-32B-Instruct) for fracture-surface image analysis using a curated dataset of 13,168 open-source, literature-mined fracture-surface images. Morphology annotations were generated by GPT-5.2-Reasoning (high) from both the images and relevant excerpts of their source papers, and the dataset was further enriched with targeted manual collection and rotation-based augmentation. The resulting specialist model outperforms flagship proprietary multimodal models on a benchmark of 100 manually annotated images. It achieves a precision of 0.92, compared to 0.35 for the base Qwen3-VL-32B-Instruct, 0.58 for GPT-5.5-Reasoning (high), and 0.78 for Gemini 3.1 Pro-Reasoning (high). Dataset ablations show that manual collection of rare-feature images and augmentation via image rotation are both beneficial to improve recognition of less common fracture morphology features. We further discuss integrated use of the fine-tuned model with proprietary models to combine fracture-specific visual accuracy with broader multimodal reasoning for autonomous fractography. Although focused on fracture-surface images, this work demonstrates how VLMs can be adapted through targeted collection and fine-tuning on novel feature images to recognize those features and support downstream decision-making in autonomous microscopy workflows.

2605.07129 2026-05-11 cs.IR cs.AI cs.LG

RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation

RRCM:基于协作和元记忆的排序驱动检索用于LLM推荐

Shijun Li, Wooseong Yang, Yu Wang, Tianxin Wei, Joydeep Ghosh

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) University of Illinois at Chicago(伊利诺伊大学香槟分校) Capital One AI Foundations(Capital One AI基金会) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 RRCM通过排序驱动的检索与推理框架,解决LLM推荐中构建相关上下文的挑战,利用自然语言表示的协作和元记忆,实现灵活的证据获取,提升推荐质量。

详情
AI中文摘要

RRCM通过排序驱动的检索与推理框架,解决LLM推荐中构建相关上下文的挑战,利用自然语言表示的协作和元记忆,实现灵活的证据获取,提升推荐质量。

英文摘要

Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key challenges in constructing decision-relevant contexts from heterogeneous evidence. First, existing methods often rely on fixed context construction strategies: collaborative behavioral evidence and item-side metadata are typically incorporated through predefined prompts, static retrieval pipelines, or handcrafted injection mechanisms, making it difficult to determine what information is truly beneficial for each instance. Second, heterogeneous evidence introduces a severe context-efficiency bottleneck. Rich metadata and collaborative interaction records can quickly overwhelm the context window, while aggressive compression or heuristic filtering may discard fine-grained evidence critical for accurate recommendation. To address these challenges, we propose RRCM, a ranking-driven retrieval-and-reasoning framework over collaborative and metadata memories for LLM-based agentic recommendation. RRCM starts from a lightweight user-history context and learns whether to recommend directly, retrieve collaborative evidence, retrieve item metadata, or interleave both through reasoning. Both memories are represented in natural language and accessed through a unified retrieval interface, enabling flexible evidence acquisition without handcrafted CF injection or fixed retrieval rules. We optimize this memory-reading policy with an outcome-only ranking reward, instantiated using group relative policy optimization, so that retrieval decisions are directly driven by final top-k recommendation quality. Extensive experiments show that RRCM significantly outperforms traditional baselines and diverse LLM-based recommendation approaches.

2605.07125 2026-05-11 cs.IR cs.AI

An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation

一种 embarrassingly 简单的图启发式方法揭示了可快捷解决的序列推荐基准

Haoyu Han, Li Ma, Hanbing Wang, Bingheng Li, Daochen Zha, Chun How Tan, Huiji Gao, Xin Liu, Stephanie Moyerman, Sanjeev Katariya, Hui Liu, Jiliang Tang

发表机构 * Michigan State University(密歇根州立大学) Airbnb, Inc.(Airbnb公司)

AI总结 本文提出一种简单图启发式方法,在无需序列编码器等复杂结构的情况下,通过局部项转移图检索并基于特征相似性排序,实现了在序列推荐基准上的高性能表现,揭示了基准测试中存在可快捷解决的结构。

详情
AI中文摘要

序列推荐近年来逐渐转向结合序列模式与语义物品信息的生成推荐器。然而,这些方法往往在少量广泛使用的基准上进行评估,引发了一个关键问题:这些基准是否真的需要现代生成推荐器所声称的先进建模能力?我们通过一种故意简单的图启发式方法进行了基准审计。从仅最后一件或两件交互物品出发,它从几跳的物品转移图中检索候选,并通过物品特征相似性对它们进行排序。尽管不使用序列编码器、生成目标或训练,这种启发式方法在许多现代基线中匹配或超越了表现,例如在亚马逊评论体育和CDs数据集上,相对于最佳竞争基线,相对NDCG@10提升了38.10%和44.18%。我们展示了这种行为反映的是快捷可解性而非单一启发式的artifact。我们识别出三种可使下一项预测比预期更容易的快捷结构:低分支局部转移、特征平滑转移以及对长用户历史的有限依赖。这些快捷结构不需要同时出现;即使一个或两个强信号可以使简单的局部检索高度具有竞争力,而削弱它们则使更复杂模型的优势更加明显。在14个数据集中,模型排名因数据集属性而大幅不同,但该启发式方法在10个数据集中仍具有竞争力。我们的发现表明,标准基准上的强大表现并不总是表明先进的序列、语义或生成建模能力。我们呼吁在使用基准支持新推荐模型主张时,进行更仔细的数据集选择和数据集层面的诊断分析。

英文摘要

Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacted items, it retrieves candidates from a few-hop item-transition graph and ranks them by item-feature similarity. Despite using no sequence encoder, generative objective, or training, this heuristic matches or outperforms many modern baselines, with relative NDCG@10 improvements of 38.10% and 44.18% over the best competing baseline on Amazon Review Sports and CDs. We show that this behavior reflects shortcut solvability rather than an artifact of one heuristic. We identify three shortcut structures that can make next-item prediction easier than expected: low-branching local transitions, feature-smooth transitions, and limited dependence on long user histories. These shortcuts need not appear together; even one or two strong signals can make simple local retrieval highly competitive, while weakening them makes the benefits of more sophisticated models clearer. Across 14 datasets, model rankings vary substantially with dataset properties, yet the heuristic remains competitive on 10 of them. Our findings suggest that strong performance on standard benchmarks does not always demonstrate advanced sequential, semantic, or generative modeling ability. We call for more careful dataset selection and dataset-level diagnostic analysis when using benchmarks to support claims about new recommendation models.

2605.07119 2026-05-11 stat.ML cs.LG

Classification Fields: Arbitrarily Fine Recursive Hierarchical Clustering From Few Examples

分类场:从少量示例中进行任意精细的递归分层聚类

Yicen Li, Ruiyang Hong, Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Paul D. McNicholas

发表机构 * Department of Mathematics and Statistics, McMaster University, Canada(加拿大麦 master 大学数学与统计学系) Vector Institute, Canada(加拿大向量研究所) University of Cambridge, United Kingdom(英国剑桥大学)

AI总结 本文提出分类场,一种通过局部父到子细化规则生成无限深度分层聚类结构的方法,证明了其在完成单元度量下的指数截断收敛性,并在实验中验证了其在递归展开中的有效性。

详情
AI中文摘要

经典聚类方法通常返回观测数据的有限划分或有限树状图。当感兴趣的层次结构是递归几何对象且具有细粒度细化时,有限样本观点不足。我们引入分类场:在R^d上由局部父到子细化规则生成的无限深度分层聚类结构。分类场生成器将每个父中心映射到一个有序、有界且分离的子残差元组。结合根和缩放因子,该规则递归生成聚类中心、Voronoi单元和度量DAG编码的层次结构。仅给定此类层次结构的有限前缀,我们学习一个分类场预测器,可以近似生成器并展开到未见深度。我们证明了在完成单元度量下的指数截断收敛性,并在ReLU可实现性上具有宽度O(ε^{-γ})和深度O(ε^{-3γ/2}),其中γ=log K/(-log s),考虑到有限窗口纵横比因素。近似在诱导的紧度量结构层面成立,以完成单元度量Hausdorff距离衡量。在匹配CFG生成的层次结构、IFS分形和图像诱导的递归聚类层次结构上的实验验证显示,学习的预测器在递归展开中保持了有序的子槽、无序的几何和层次级路径度量。这些结果支持了有限层次观察可以揭示局部细化规则,从而生成更深层次分类场的主张。

英文摘要

Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that continue beyond the levels directly observed. We introduce classification fields: infinite-depth hierarchical cluster structures on $\mathbb{R}^d$ generated by a local parent-to-child refinement rule. A classification field generator maps each parent centre to an ordered, bounded, and separated tuple of child residuals. Together with a root and a scale factor, this rule recursively generates cluster centres, Voronoi cells, and a metric DAG encoding the hierarchy. Given only a finite prefix of such a hierarchy, we learn a classification field predictor that approximates the generator and can be rolled out to unseen depths. We prove exponential truncation convergence in the completed cell metric and ReLU realizability with width $O(\varepsilon^{-γ})$ and depth $\widetilde O(\varepsilon^{-3γ/2})$, where $γ=\log K/(-\log s)$, up to finite-window aspect-ratio factors. The approximation holds at the level of the induced compact metric structures, measured in the completed cell-metric Hausdorff distance. Experimental validation on matched CFG-generated hierarchies, IFS fractals, and image-induced recursive clustering hierarchies shows that learned predictors preserve ordered child slots, unordered geometry, and hierarchy-level path metrics under recursive rollout. These results support the claim that finite hierarchical observations can reveal local refinement rules capable of generating substantially deeper classification fields.

2605.07100 2026-05-11 stat.ML cs.LG

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

TRACE: 通过扩散与流匹配模型进行传输对齐的置信域

Zhenhan Fang, Aixin Tan, Jian Huang

发表机构 * Department of Statistics and Actuarial Science University of Iowa(统计与精算科学系,爱荷华大学) Departments of Data Science and AI, and Applied Mathematics The Hong Kong Polytechnic University(数据科学与人工智能系、应用数学系,香港理工大学)

AI总结 本文提出TRACE框架,通过扩散和流匹配模型中的传输对齐定义非符合性分数,实现多维输出的有效置信域构造,验证了其在复杂生成设置中的有效性。

Comments 22 pages, 5 figures and 5 tables

详情
AI中文摘要

构建有效的多维输出置信域仍是一个根本挑战。尽管置信域预测提供有限样本、分布无关的覆盖保证,但其实际性能严重依赖于非符合性分数的选择。现有方法常依赖于限制的几何假设或需要显式似然评估和可逆变换,限制了其在复杂生成设置中的应用。在本文中,我们介绍了TRACE(TRansport Alignment Conformal Estimation),一种通过扩散和流匹配模型中的传输对齐定义非符合性的置信域框架。不同于评估似然,我们通过平均去噪或速度匹配误差沿随机传输轨迹来衡量候选输出与学习生成动态的一致性。所得到的基于传输的分数是标量值,可通过分割置信域预测进行校准,从而在交换性下获得有效的边缘覆盖。我们进一步分析了所提出分数的统计特性及其对计算预算的敏感性。在合成和真实数据集上的实验验证了有效覆盖,并显示所得到的区域能够自然适应多模态和非凸的条件分布。

英文摘要

Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depends on the choice of nonconformity score. Existing approaches often rely on restrictive geometric assumptions or require explicit likelihood evaluation and invertible transformations, limiting their applicability in complex generative settings. In this work, we introduce TRACE (TRansport Alignment Conformal Estimation), a conformal prediction framework that defines nonconformity through transport alignment in diffusion and flow matching models. Rather than evaluating likelihoods, we measure how well a candidate output aligns with the learned generative dynamics by averaging denoising or velocity-matching errors along stochastic transport trajectories. The resulting transport-based scores are scalar-valued and can be calibrated using split conformal prediction, yielding valid marginal coverage under exchangeability. We further analyze the statistical properties of the proposed scores and their sensitivity to computational budget. Experiments on synthetic and real datasets demonstrate valid coverage and show that the resulting regions adapt naturally to multimodal and non-convex conditional distributions.

2605.07097 2026-05-11 stat.ML cs.LG cs.NE math.LO math.ST stat.TH

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

每个可定义于o-最小结构中的前馈神经网络都有有限样本复杂性

Anastasis Kratsios, Gregory Cousins, Haitz Sáez de Ocáriz Borde, Bum Jun Kim, Simone Brugiapaglia

发表机构 * Department of Mathematics & Statistics, McMaster University(数学与统计学系,麦斯特大学) Department of Mathematics & Statistics, Concordia University(数学与统计学系,康科迪亚大学) University of Cambridge(剑桥大学) Graduate School of Engineering, The University of Tokyo(东京大学工学研究院)

AI总结 本文证明了在PAC模型中,可定义于o-最小结构中的广泛前馈神经网络具有有限样本复杂性,涵盖标准固定大小的MLP、CNN、GNN和Transformer等架构及其常见操作。

详情
AI中文摘要

我们证明,在严格意义上,广泛类别的前馈神经网络在PAC模型中学习(具有有限样本复杂性):每个固定有限的前馈架构,其层可定义于o-最小结构中,在agnostic PAC设置下具有有限样本复杂性,即使参数无界。这涵盖了标准固定大小的MLP、CNN、GNN和具有固定序列长度的Transformer,以及此类架构中通常使用的操作和层,包括线性投影、残差连接、注意力机制、池化层、归一化层和可接受的位置编码。因此,现代非循环架构的分布无关学习能力并非特定激活函数或架构特定VC论证的特殊属性,而是 tame 前馈计算的后果。我们的结果将有限样本PAC学习能力重新定位为基准而非区分因素:它们将架构比较的焦点转向归纳偏置、对称性和几何先验、可扩展性以及优化行为。

英文摘要

We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, including linear projections, residual connections, attention mechanisms, pooling layers, normalization layers, and admissible positional encodings. Hence, distribution-free learnability for modern non-recurrent architectures is not an exceptional property of particular activations or architecture-specific VC arguments, but a consequence of tame feedforward computation. Our results reposition finite-sample PAC learnability as a baseline rather than a differentiator: they shift the focus of architectural comparison toward inductive biases, symmetries and geometric priors, scalability, and optimization behaviour.

2605.07065 2026-05-11 stat.ML cs.AI cs.LG econ.EM

Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

因果EpiNets:利用Epistemic神经网络进行个体治疗效应的精确修正界限

Gandharv Patil, Keyi Tang, Raquel Aoki, Leo Guelman

发表机构 * RBC Borealis(RBC 前沿实验室) McGill University/Mila(麦吉尔大学/米拉实验室)

AI总结 本文提出一种神经框架,用于有限样本下的PNS估计,通过锚定神经架构保证结构约束,并利用Epistemic神经网络修正极值偏差,以提高高维情况下的覆盖概率和约束有效性。

详情
AI中文摘要

个体治疗效应无法从数据中点识别。概率必要性和充分性(PNS)通过从结合实验和观察数据中推导出的交集界限来克服这一限制。然而,在有限样本中,标准插值估计器系统性地失败:它们违反结构概率约束,并受到极值偏差的影响,导致虚假狭窄的区间。我们提出了一种神经框架,用于有限样本PNS估计,解决了这两种病理学。我们引入了一种锚定的神经架构,通过构造保证结构约束的满足。为了纠正极值偏差,我们采用精确修正的交集界限推断,利用Epistemic神经网络进行可扩展的高维不确定性量化。实证评估证实,这种方法在标准估计器系统性低估的高维情况下保持名义覆盖概率和精确约束有效性。

英文摘要

Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. We propose a neural framework for finite-sample PNS estimation that resolves both pathologies. We introduce an anchored neural architecture that guarantees structural constraint satisfaction by construction. To correct extremum bias, we employ precision-corrected intersection-bound inference, leveraging Epistemic Neural Networks for scalable, high-dimensional uncertainty quantification. Empirical evaluations confirm that this approach maintains nominal coverage and exact constraint validity in high-dimensional regimes where standard estimators systematically undercover.

2605.07062 2026-05-11 cs.SE cs.AI

From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines

从协助到自主:重新思考CI/CD流水线中的自主性与控制

Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan

发表机构 * Faculty of Information University of Toronto Toronto Ontario Canada(信息学院多伦多大学多伦多安大略加拿大) Department of Computer Science Trent University Peterborough Ontario Canada(计算机科学系特伦特大学彼得伯格安大略加拿大) University of Toronto(多伦多大学) Trent University(特伦特大学)

AI总结 本文探讨CI/CD流水线中自主性与控制的重新定义,提出授权转移的设计挑战,分析当前系统在数据平面的操作限制,并提出控制平面安全与治理机制的研究方向。

Comments Accepted to the 3rd ACM International Conference on AI-Powered Software (AIware 2026), Main Track, Montreal, Canada, July 6-7, 2026. 5 pages

详情
AI中文摘要

AI代理正在持续集成和持续部署(CI/CD)流程中扮演积极角色,但研究社区缺乏描述CI/CD为何具有代理性、授权程度如何以及控制应在哪里的共享词汇。本文提出了一种代理CI/CD的愿景,其中核心挑战不是提升任务性能,而是设计授权转移,即在指定约束和救济机制下,将操作决策从人类控制的流水线委托给代理系统。为了构建这一论点,我们引入了数据平面授权(如补丁生成和测试重运行等局部干预)与控制平面授权(如流水线配置、部署策略和批准门的修改)的区别。基于研究原型和工业平台,我们表明当前系统主要在数据平面运行,具有有限的自主性,安全通过外围治理基础设施实现,而非内在代理保证。我们识别出三个反复出现的模式:受限自主性作为主导设计,外部治理作为主要安全机制,以及部署速度与评估方法之间的差距扩大。我们提出了一项研究议程,其中控制平面安全和治理机制代表最紧迫的开放问题,随后是自主性边界的正式化、评估框架和人-代理协调。

英文摘要

AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and where control should reside. This paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-controlled pipelines to agent systems under specified constraints and recourse mechanisms. To structure this argument, we introduce a distinction between data-plane authority (localized interventions such as patch generation and test reruns) and control-plane authority (modifications to pipeline configuration, deployment policies, and approval gates). Drawing on research prototypes and industrial platforms, we show that current systems operate mainly at the data plane under bounded autonomy, with safety achieved through surrounding governance infrastructure rather than intrinsic agent guarantees. We identify three recurring patterns: constrained autonomy as the dominant design, external governance as the primary safety mechanism, and a widening gap between deployment momentum and evaluation methodology. We propose a research agenda in which control-plane safety and governance mechanisms represent the most urgent open problem, followed by formalization of autonomy boundaries, evaluation frameworks, and human--agent coordination.

2605.07052 2026-05-11 eess.SY cs.LG cs.SY

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

基于行为框架的数据驱动建模非线性系统在向量值再生核希尔伯特空间中

Boya Hou, Maxim Raginsky

发表机构 * Carl R. Woese Institute for Genomic Biology and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign(卡尔·R·沃塞基因组生物学研究所和协调科学实验室,伊利诺伊大学厄巴纳-香槟分校) Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign(电气与计算机工程系和协调科学实验室,伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出将Jan Willems的行为方法推广到向量值再生核希尔伯特空间中的离散非线性系统,结合最小范数插值和子空间识别方法进行数据驱动建模。

Comments 12 pages

详情
AI中文摘要

我们把Jan Willems的行为方法推广到一类离散时间非线性系统在向量值再生核希尔伯特空间(RKHS)中的情况。除了线性时不变系统外,这一类系统还包括由Volterra级数及其自回归变种以及允许Hammerstein型状态空间实现的系统。我们应用所提出的框架解决此类系统的数据驱动建模问题,即在没有显式系统识别步骤的情况下进行未知系统的仿真或控制目标。为此,我们将行为方法与向量值RKHS中的两种数据驱动建模方法联系起来:(1)最小范数插值和(2)子空间识别。

英文摘要

We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.

2605.07046 2026-05-11 stat.ML cs.AI cs.LG

An Interpretable and Scalable Framework for Evaluating Large Language Models

用于评估大语言模型的可解释且可扩展的框架

Xinhao Qu, Qiang Heng, Hao Zeng, Xiaoqian Liu

发表机构 * Department of Statistics, University of California, Riverside(加州大学河滨分校统计学系) School of Statistics and Data Science, Southeast University(东南大学统计与数据科学学院) Department of Statistics and Data Science, Southern University of Science and Technology(南方科技大学统计与数据科学系)

AI总结 本文提出基于最大化最小化原理的可解释且可扩展框架,用于评估大语言模型,通过约束矩阵分解子问题实现稳定高效的参数估计,实验表明方法在可扩展性和可解释性上优于现有方法。

详情
AI中文摘要

评估大语言模型(LLMs)日益关键,但标准基准方法依赖平均准确率,忽视了LLM输出的内在随机性和基准项目的异质性。项目反应理论(IRT)提供了一个建模潜在模型能力和项目特征的原理性框架,但传统方法计算成本高且数值不稳定,限制了大规模应用。为了解决这些挑战,我们提出了一种基于最大化最小化原理的可解释且可扩展的LLM评估框架。我们的方法将问题重新表述为一系列约束矩阵分解子问题,从而实现稳定且高效的参数估计,具有理论保证的可识别性和收敛性。在合成和真实世界数据集上的实验,包括MATH-500和六个开源LLM排行榜基准,表明我们的方法在可扩展性和可解释性上表现优异。它在速度上比竞争方法快多个数量级,同时保持可比或更高的估计准确性。我们的结果与已建立的扩展定律一致,并提供了有关项目难度和区分度的见解,为更系统的基准设计提供信息。

英文摘要

Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we propose an interpretable and scalable framework for LLM evaluation based on the majorization-minimization principle. Our approach reformulates the problem as a sequence of constrained matrix factorization subproblems, enabling stable and efficient parameter estimation with theoretical guarantees for identifiability and convergence. Experiments on synthetic and real-world datasets, including MATH-500 and six Open LLM Leaderboard benchmarks, demonstrate that our method achieves superior scalability and interpretability. It delivers orders-of-magnitude speedups over competing methods while maintaining comparable or even higher estimation accuracy. Our results align with established scaling laws and offer insights into item difficulty and discrimination, informing more principled benchmark design.

2605.07034 2026-05-11 cs.CR cs.LG

Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE

超越封装:利用TRUSTEE识别静态恶意软件分类器中的artifact依赖

Riyazuddin Mohammed, Lan Zhang

发表机构 * School of Informatics, Computing, and Cyber Systems(信息学、计算与网络安全学院) Northern Arizona University(北亚利桑那大学)

AI总结 本文提出TRUSTEE框架,通过后验可解释性工具和手动分析,发现静态恶意软件分类器易受数据集组成影响,误将打包特征视为恶意行为。

详情
AI中文摘要

现代网络安全严重依赖静态机器学习恶意软件分类器,但执行文件上的转换如打包等非语义修改限制了其可靠性。恶意软件分类器常学习这些不必要的artifact而非真正的二进制行为,因为恶意性与打包的高关联性。此外,这些分类器是黑箱,难以理解其学习内容。为解决此问题,我们提出一个两阶段框架,使用后验可解释性XAI工具TRUSTEE,随后手动分析顶部特征。通过改变数据集组成比例进行多组受控实验,发现TRUSTEE识别的顶部特征主要是打包artifact、可移植执行文件(PE)元数据和字符串层面的n-grams,而非恶意语义。这些结果表明,这些分类器对数据集组成高度敏感,可能误将打包视为恶意行为。所提框架允许对这种偏见进行可重复诊断,并为构建更稳健和语义明确的恶意软件检测模型提供指南。

英文摘要

Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address this issue, we proposed a two-part framework using the post-hoc interpretability XAI tool TRUSTEE, followed by a manual analysis of the top features. We conducted several controlled experiments by varying the dataset composition ratios to understand their impact on the results. The top-ranked features across all experiments, identified by TRUSTEE, were predominantly packing artifacts, portable executable(PE) metadata, and n-grams at the string level, rather than malicious semantics. These results suggest that these malware classifiers are highly sensitive to dataset composition and can misinterpret packing as malicious behavior. Our proposed framework allows for the reproducible diagnosis of such biases and forms a guideline for building more robust and semantically meaningful malware detection models