arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1709
专题追踪
2508.18166 2026-06-15 cs.IR cs.LG 版本更新

PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation

PCR-CA: 基于对比对齐的并行码本表示用于多类别应用推荐

Bin Tan, Wangyao Ge, Yidi Wang, Xin Liu, Jeff Burtoft, Hao Fan, Hui Wang

发表机构 * Microsoft Suzhou China(微软苏州中国) Microsoft Beijing China(微软北京中国) Microsoft Redmond WA USA(微软雷德蒙德华盛顿州美国)

AI总结 提出PCR-CA框架,通过并行码本VQ-AE模块学习多类别应用的离散语义表示,结合对比对齐损失和双注意力融合,提升CTR预测,尤其对长尾应用效果显著。

Comments Accepted by KDD 2026, oral

详情
AI中文摘要

现代应用商店推荐系统在处理多类别应用时面临挑战,因为传统分类法无法捕捉重叠语义,导致个性化效果不佳。我们提出PCR-CA(并行码本表示与对比对齐),一个用于改进CTR预测的端到端框架。PCR-CA首先从应用文本中提取紧凑的多模态嵌入,然后引入并行码本VQ-AE模块,该模块并行学习多个码本上的离散语义表示——不同于层次残差量化(RQ-VAE)。这种设计能够独立编码不同方面(如游戏玩法、艺术风格),更好地建模多类别语义。为了桥接语义信号和协同信号,我们在用户和项目层面采用对比对齐损失,增强长尾项目的表示学习。此外,双注意力融合机制结合了基于ID的特征和语义特征,以捕捉用户兴趣,特别是对于长尾应用。在大规模数据集上的实验表明,PCR-CA在强基线上实现了+0.76%的AUC提升,其中长尾应用的AUC增益达到+2.15%。在线A/B测试进一步验证了我们的方法,CTR提升+10.52%,CVR提升+16.30%,证明了PCR-CA在实际部署中的有效性。该新框架现已完全部署在Microsoft Store上。

英文摘要

Modern app store recommender systems struggle with multiple-category apps, as traditional taxonomies fail to capture overlapping semantics, leading to suboptimal personalization. We propose PCR-CA (Parallel Codebook Representations with Contrastive Alignment), an end-to-end framework for improved CTR prediction. PCR-CA first extracts compact multimodal embeddings from app text, then introduces a Parallel Codebook VQ-AE module that learns discrete semantic representations across multiple codebooks in parallel -- unlike hierarchical residual quantization (RQ-VAE). This design enables independent encoding of diverse aspects (e.g., gameplay, art style), better modeling multiple-category semantics. To bridge semantic and collaborative signals, we employ a contrastive alignment loss at both the user and item levels, enhancing representation learning for long-tail items. Additionally, a dual-attention fusion mechanism combines ID-based and semantic features to capture user interests, especially for long-tail apps. Experiments on a large-scale dataset show PCR-CA achieves a +0.76% AUC improvement over strong baselines, with +2.15% AUC gains for long-tail apps. Online A/B testing further validates our approach, showing a +10.52% lift in CTR and a +16.30% improvement in CVR, demonstrating PCR-CA's effectiveness in real-world deployment. The new framework has now been fully deployed on the Microsoft Store.

2312.14889 2026-06-15 stat.ML cs.CR cs.LG math.ST stat.TH 版本更新

On Rate-Optimal Partitioning Classification from Observable and from Privatised Data

关于可观测数据和私有数据的最优划分分类方法

Balázs Csanád Csáji, László Györfi, Ambrus Tamás, Harro Walk

发表机构 * HUN-REN Institute for Computer Science and Control (SZTAKI)(HUN-REN计算机科学与控制研究所(SZTAKI)) Department of Probability Theory and Statistics, Institute of Mathematics, Eötvös Loránd University (ELTE)(概率论与统计学系,厄特沃什·洛朗大学数学学院(ELTE)) Department of Computer Science and Information Theory, Budapest University of Technology and Economics (BME)(计算机科学与信息理论系,布达佩斯技术与经济大学(BME)) Institute for Stochastics and Applications, University of Stuttgart(概率论与应用研究所,斯图加特大学)

AI总结 本文重新审视划分分类方法,在更宽松条件下(无需强密度假设)推导出可观测和私有数据下分类误差概率的收敛速率,该速率仅依赖于连续输入的内在维度。

详情
AI中文摘要

在本文中,我们重新审视了划分分类的经典方法,并在宽松条件下证明了新的收敛速率,既适用于可观测(非私有化)数据,也适用于私有化数据。我们考虑在 $d$ 维欧几里得空间中的分类问题。先前关于划分分类器的结果依赖于强密度假设(SDA),我们通过简单示例表明该假设具有限制性。在此,我们在更温和的假设下研究该问题。我们预设输入分布是绝对连续分布和离散分布的混合,使得绝对连续分量集中在 $d_a$ 维子空间上。除了标准的 Lipschitz 和边际条件外,还引入了绝对连续分量的一个新特征,据此计算分类误差概率的收敛速率,包括二元和多类情况。该界可以达到使用 SDA 所能达到的极小极大最优收敛速率,但在更温和的分布假设下。有趣的是,该收敛速率仅依赖于连续输入的内在维度 $d_a$,而非 $d$。在隐私约束下,数据无法直接观测,构建的分类器是合适的局部差分隐私机制随机结果的函数。在本文中,我们将拉普拉斯分布噪声添加到特征向量所有可能位置的离散化及其标签中。再次,可以在不使用 SDA 的情况下推导出分类误差概率收敛速率的紧上界,使得该速率依赖于 $2d_a$。

英文摘要

In this paper we revisit the classical method of partitioning classification and prove novel convergence rates under relaxed conditions, both for observable (non-privatised) and for privatised data. We consider the problem of classification in a $d$ dimensional Euclidean space. Previous results on the partitioning classifier worked with the strong density assumption (SDA), which is restrictive, as we demonstrate through simple examples. Here, we study the problem under much milder assumptions. We presuppose that the distribution of the inputs is a mixture of an absolutely continuous and a discrete distribution, such that the absolutely continuous component is concentrated on a $d_a$ dimensional subspace. In addition to the standard Lipschitz and margin conditions, a novel characteristic of the absolutely continuous component is introduced, by which the convergence rate of the classification error probability is computed, both for the binary and for the multi-class cases. This bound can reach the minimax optimal convergence rate achievable using SDA, but under much milder distributional assumptions. Interestingly, this convergence rate depends only on the intrinsic dimension of the continuous inputs, $d_a$, and not on $d$. Under privacy constraints, the data cannot be directly observed, and the constructed classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. In this paper we add Laplace distributed noises to the discretisations of all possible locations of the feature vector and to its label. Again, tight upper bounds on the convergence rate of the classification error probability can be derived, without using SDA, such that this rate depends on $2d_a$.

2506.06542 2026-06-15 stat.ML cs.LG 版本更新

Direct Fisher Score Estimation for Likelihood Maximization

直接Fisher得分估计用于似然最大化

Sherman Khoo, Yakun Wang, Song Liu, Mark Beaumont

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院) School of Biological Sciences, University of Bristol(布里斯托大学生物科学学院)

AI总结 针对似然函数难解但模型模拟易得的问题,提出基于局部得分匹配的顺序梯度优化方法,直接建模Fisher得分,实现快速高效的似然最大化。

详情
AI中文摘要

我们研究当似然函数难以处理但模型模拟易于获得时的似然最大化问题。我们提出一种顺序的、基于梯度的优化方法,该方法基于局部得分匹配技术直接建模Fisher得分,该技术使用来自每个参数迭代周围局部区域的模拟。通过对代理得分模型采用线性参数化,我们的技术允许闭式最小二乘解。这种方法提供了一种快速、灵活且高效的Fisher得分近似,有效平滑了似然目标,并缓解了复杂似然景观带来的挑战。我们为得分估计器提供了理论保证,包括平滑引入的偏差界限。在一系列合成和真实世界问题上的实证结果表明,与现有基准相比,我们的方法具有优越的性能。

英文摘要

We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate. By employing a linear parameterization to the surrogate score model, our technique admits a closed-form, least-squares solution. This approach yields a fast, flexible, and efficient approximation to the Fisher score, effectively smoothing the likelihood objective and mitigating the challenges posed by complex likelihood landscapes. We provide theoretical guarantees for our score estimator, including bounds on the bias introduced by the smoothing. Empirical results on a range of synthetic and real-world problems demonstrate the superior performance of our method compared to existing benchmarks.

2504.03686 2026-06-15 cs.NI cs.AI cs.LG 版本更新

Revisiting Outage for Edge Inference Systems

重新审视边缘推理系统的中断问题

Zhanwei Wang, Qunsong Zeng, Haotian Zheng, Kaibin Huang

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong(香港大学电子与计算机工程系)

AI总结 针对边缘推理系统的端到端可靠性,提出推理中断概率框架,量化推理精度低于阈值的概率,并优化通信开销与推理可靠性的权衡。

详情
AI中文摘要

第六代(6G)移动网络的关键任务之一是在网络边缘部署大规模人工智能(AI)模型,为边缘设备提供远程推理服务。由此产生的平台称为边缘推理,将支持广泛的物联网应用,如自动驾驶、工业自动化和增强现实。鉴于这些任务的关键性和时间敏感性,设计既可靠又能满足严格端到端(E2E)延迟约束的边缘推理系统至关重要。现有研究主要关注以信道中断概率为特征的通信可靠性,可能无法保证E2E性能,特别是在E2E推理精度和延迟方面。为解决这一局限,我们提出一个理论框架,引入并数学刻画了推理中断(InfOut)概率,该概率量化了E2E推理精度低于目标阈值的可能性。在E2E延迟约束下,该框架建立了通信开销(即上传更多传感器观测)与以InfOut概率量化的推理可靠性之间的基本权衡。为了找到优化这种权衡的可行方法,我们通过对接收判别增益的分布应用高斯近似,推导出InfOut概率的精确替代函数。实验结果表明,所提出的设计在E2E推理可靠性方面优于传统的以通信为中心的方法。

英文摘要

One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the mission-critical and time-sensitive nature of these tasks, it is essential to design edge inference systems that are both reliable and capable of meeting stringent end-to-end (E2E) latency constraints. Existing studies, which primarily focus on communication reliability as characterized by channel outage probability, may fail to guarantee E2E performance, specifically in terms of E2E inference accuracy and latency. To address this limitation, we propose a theoretical framework that introduces and mathematically characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Under an E2E latency constraint, this framework establishes a fundamental tradeoff between communication overhead (i.e., uploading more sensor observations) and inference reliability as quantified by the InfOut probability. To find a tractable way to optimize this tradeoff, we derive accurate surrogate functions for InfOut probability by applying a Gaussian approximation to the distribution of the received discriminant gain. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches in terms of E2E inference reliability.

2504.16173 2026-06-15 cs.AR cs.AI 版本更新

FPGA-Based Neural Network Accelerators for Space Applications: A Survey

基于FPGA的神经网络加速器在空间应用中的综述

Pedro Antunes, Artur Podobas

发表机构 * KTH Royal Institute of Technology(皇家理工学院)

AI总结 本文综述了基于FPGA的神经网络加速器在空间任务中的应用,分析了现有文献、趋势和空白,并提出了未来研究方向,以提升星载计算系统性能。

Comments Manuscript under review at ACM CSUR. Pre-print updated after 1st Major Revision

详情
AI中文摘要

空间任务正变得越来越雄心勃勃,需要高性能的星载计算系统。为此,现场可编程门阵列(FPGA)因其灵活性、成本效益和潜在的辐射容错能力而引起了广泛兴趣。同时,神经网络(NN)因其执行自主操作、传感器数据分析和数据压缩等空间任务的能力而受到认可。本综述为旨在空间应用中实现基于FPGA的NN加速器的研究人员提供了宝贵资源。通过分析现有文献、识别趋势和空白,并提出未来研究方向,本文强调了这些加速器在增强星载计算系统方面的潜力。

英文摘要

Space missions are becoming increasingly ambitious, necessitating high-performance onboard spacecraft computing systems. In response, field-programmable gate arrays (FPGAs) have garnered significant interest due to their flexibility, cost-effectiveness, and radiation tolerance potential. Concurrently, neural networks (NNs) are being recognized for their capability to execute space mission tasks such as autonomous operations, sensor data analysis, and data compression. This survey serves as a valuable resource for researchers aiming to implement FPGA-based NN accelerators in space applications. By analyzing existing literature, identifying trends and gaps, and proposing future research directions, this work highlights the potential of these accelerators to enhance onboard computing systems.

2409.04843 2026-06-15 eess.AS cs.SD 版本更新

Leveraging Sound Source Trajectories for Universal Sound Separation

利用声源轨迹进行通用声音分离

Donghang Wu, Xihong Wu, Tianshu Qu

发表机构 * National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University(国家级通用人工智能重点实验室,智能科学与技术学院,北京大学)

AI总结 提出一种利用声源定位与分离相互促进机制的方法,通过迭代跟踪和波束形成实现移动声源的精确分离。

Comments Published in IEEE Transactions on Audio, Speech and Language Processing(TASLP)

Journal ref IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 2337-2348, 2025

详情
AI中文摘要

现有利用空间信息进行声源分离的方法需要预先知道声源的到达方向(DOA)或使用估计但不精确的定位结果,这损害了分离性能,尤其是当声源移动时。实际上,声源定位和分离是相互关联的问题,即声源定位有助于声音分离,而声音分离有助于改进源定位。本文提出了一种利用声源定位与分离相互促进机制的方法,用于移动声源。所提方法包括三个阶段。第一阶段是初始跟踪,基于源信号包络估计从音频混合中跟踪每个声源。这些跟踪结果可能缺乏足够的精度。第二阶段涉及相互促进:使用初步的声源跟踪结果进行声音分离。随后,对分离信号进行声源跟踪,从而提高跟踪精度。改进的轨迹进一步提高分离性能。这种相互促进过程可以迭代多次。第三阶段,神经波束形成器基于改进的跟踪轨迹和多通道分离输出估计精确的单通道分离结果。在混响条件和移动声源下进行的仿真实验表明,所提方法能够基于改进的跟踪结果实现更精确的分离。

英文摘要

Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.

2402.16388 2026-06-15 stat.ML cs.LG 版本更新

Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors

留一法、自助法和交叉共形异常检测器

Oliver Hennhöfer, Christine Preisach

发表机构 * German Federal Ministry for Economic Affairs and Climate Action(德国经济事务和气候行动部)

AI总结 为解决异常检测中校准数据不足的问题,基于共形预测提出留一法、自助法和交叉共形方法,在控制第一类错误率的同时提高数据效率。

Comments Published in 2024 IEEE International Conference on Knowledge Graph (ICKG)

Journal ref Proc. 2024 IEEE ICKG 15(1): 110-119 (February 2025)

详情
AI中文摘要

异常检测系统中不确定性量化的需求日益重要。在此背景下,有效控制这些系统的第一类错误率而不增加第二类错误率,可以建立信任并减少与错误发现相关的成本。共形异常检测领域通过模型校准提供统计和有限样本有效性保证,成为一种有前景的方法。然而,对校准数据的依赖带来了实际限制,尤其是在低数据场景中。在本工作中,我们基于共形预测领域的方法,正式定义并评估了用于共形异常检测的留一法、自助法和交叉共形方法。超越经典的拆分共形方法,我们展示了用于计算重抽样共形$p$值的派生方法在全共形(直推式)方法的数据效率与拆分共形(归纳式)方法的计算效率之间提供了实用的折衷。我们验证了派生方法,并量化了它们在一类分类器和数据集上的改进。

英文摘要

The need for uncertainty quantification in anomaly detection systems has become increasingly important. In this context, effectively controlling Type I error rates without inflating Type II error rates in these systems can build trust and reduce costs associated with false discoveries. The field of conformal anomaly detection emerges as a promising approach for providing respective statistical and finite-sample validity guarantees through model calibration. However, reliance on calibration data imposes practical limitations, especially in low-data regimes. In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for conformal anomaly detection, building on methods from the field of conformal prediction. Looking beyond the classical split-conformal approach, we show that derived methods for calculating resampling-conformal $p$-values offer a practical compromise between the data efficiency of full-conformal (transductive) approaches and the computational efficiency of split-conformal (inductive) methods. We validate derived methods and quantify their improvements for a range of one-class classifiers and datasets.

2305.07609 2026-06-15 cs.IR cs.CL cs.CY 版本更新

Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

ChatGPT 在推荐中是否公平?评估大语言模型推荐的公平性

Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

发表机构 * University of Science and Technology of China(中国科学技术大学) National University of Singapore(新加坡国立大学)

AI总结 针对大语言模型推荐(RecLLM)可能存在的偏见,提出公平性基准 FaiRLLM,包含精心设计的指标和涵盖8个敏感属性的数据集,评估发现 ChatGPT 在推荐中仍存在不公平现象。

Comments Accepted by Recsys 2023 (Short). Typo corrections

详情
AI中文摘要

大语言模型(LLM)的显著成就催生了一种新颖的推荐范式——基于LLM的推荐(RecLLM)。然而,需要注意的是,LLM可能包含社会偏见,因此RecLLM做出的推荐的公平性需要进一步研究。为了避免RecLLM的潜在风险,有必要评估RecLLM在用户侧各种敏感属性上的公平性。由于RecLLM范式与传统推荐范式存在差异,直接使用传统推荐的公平性基准是有问题的。为解决这一困境,我们提出了一个新的基准,称为基于LLM的推荐公平性(FaiRLLM)。该基准包括精心设计的指标和一个数据集,该数据集考虑了音乐和电影两个推荐场景中的八个敏感属性。通过使用我们的FaiRLLM基准,我们对ChatGPT进行了评估,发现它在生成推荐时仍然对某些敏感属性表现出不公平性。我们的代码和数据集可在以下网址找到:https://this URL。

英文摘要

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.

2112.04573 2026-06-15 cs.DL cs.AI cs.LG 版本更新

Application of Artificial Intelligence and Machine Learning in Libraries: A Systematic Review

人工智能与机器学习在图书馆中的应用:系统综述

Rajesh Kumar Das, Mohammad Sharif Ul Islam

发表机构 * University of Nebraska - Lincoln(内布拉斯加大学林肯分校) Noakhali Science and Technology University(诺阿克利科学与技术大学) University of Dhaka(达卡大学)

AI总结 通过系统综述32篇文献,总结了人工智能与机器学习在图书馆中的应用领域、技术及现状,发现当前研究以理论为主,部分涉及实践案例。

详情
AI中文摘要

随着人工智能和机器学习等前沿技术的概念和实施变得相关,学者、研究人员和信息专业人员涉足这一领域的研究。本系统文献综述旨在综合探讨人工智能和机器学习在图书馆中应用的实证研究。为实现研究目标,基于Kitchenham等人(2009)提出的原始指南进行了系统文献综述。数据来自Web of Science、Scopus、LISA和LISTA数据库。经过严格/既定的筛选过程,最终选定、审阅并分析了32篇文章,以总结图书馆中最常使用的AI和ML领域及技术。结果表明,当前与LIS领域相关的AI和ML研究主要集中于理论工作。然而,一些研究人员也强调了实施项目或案例研究。本研究将为研究人员、实践者和教育工作者提供图书馆中AI和ML的全景视图,以推动更多技术导向的方法,并预见未来的创新路径。

英文摘要

As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries. To achieve the objectives of the study, a systematic literature review was conducted based on the original guidelines proposed by Kitchenham et al. (2009). Data was collected from Web of Science, Scopus, LISA and LISTA databases. Following the rigorous/ established selection process, a total of thirty-two articles were finally selected, reviewed and analyzed to summarize on the application of AI and ML domain and techniques which are most often used in libraries. Findings show that the current state of the AI and ML research that is relevant with the LIS domain mainly focuses on theoretical works. However, some researchers also emphasized on implementation projects or case studies. This study will provide a panoramic view of AI and ML in libraries for researchers, practitioners and educators for furthering the more technology-oriented approaches, and anticipating future innovation pathways.

2603.20821 2026-06-15 cs.DC cs.AI cs.LG

Compass: Optimizing Compound AI Workflows for Dynamic Adaptation

Compass: 为动态适应优化复合AI工作流

Milos Gravara, Juan Luis Herrera, Stefan Nastic

发表机构 * University of California, Berkeley(加州大学伯克利分校) ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出Compass框架,通过离线优化和在线适应动态切换复合AI工作流的配置,提升准确率、延迟和成本的平衡能力。

Comments 10 pages, 7 figures; accepted at the 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2026)

Journal ref In Proceedings of the 26th IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2026

详情
AI中文摘要

复合AI是一种分布式智能方法,通过整合专用AI/ML模型与工程软件组件形成AI工作流。复合AI生产部署必须在变化负载下满足准确性、延迟和成本目标。然而,许多部署运行在固定基础设施上,无法水平扩展。现有方法仅优化准确性,未考虑负载变化。我们发现复合AI系统可切换配置以适应基础设施容量,根据当前负载在准确性与延迟之间进行权衡。这需要从组合搜索空间中发现多个帕累托最优配置,并在运行时确定切换时机。本文提出Compass框架,通过离线优化和在线适应实现动态配置切换。Compass包含三个组件:COMPASS-V算法用于配置发现,Planner用于切换策略推导,Elastico控制器用于运行时适应。COMPASS-V利用有限差分引导搜索和爬山与横向扩展结合的方法发现准确性可行的配置。Planner在目标硬件上对这些配置进行剖析,并利用基于排队理论的模型推导切换策略。Elastico监控队列深度并根据推导的阈值切换配置。在两个复合AI工作流中,COMPASS-V在减少57.5%的配置评估的同时实现100%召回率,效率提升达95.3%。运行时适应在动态负载模式下实现90-98%的SLO合规性,比静态高精度基线提升71.6%的SLO合规性,同时比静态快速基线提高3-5%的精度。

英文摘要

Compound AI is a distributed intelligence approach that represents a unified system orchestrating specialized AI/ML models with engineered software components into AI workflows. Compound AI production deployments must satisfy accuracy, latency, and cost objectives under varying loads. However, many deployments operate on fixed infrastructure where horizontal scaling is not viable. Existing approaches optimize solely for accuracy and do not consider changes in workload conditions. We observe that compound AI systems can switch between configurations to fit infrastructure capacity, trading accuracy for latency based on current load. This requires discovering multiple Pareto-optimal configurations from a combinatorial search space and determining when to switch between them at runtime. We present Compass, a novel framework that enables dynamic configuration switching through offline optimization and online adaptation. Compass consists of three components: COMPASS-V algorithm for configuration discovery, Planner for switching policy derivation, and Elastico Controller for runtime adaptation. COMPASS-V discovers accuracy-feasible configurations using finite-difference guided search and a combination of hill-climbing and lateral expansion. Planner profiles these configurations on target hardware and derives switching policies using a queuing theory based model. Elastico monitors queue depth and switches configurations based on derived thresholds. Across two compound AI workflows, COMPASS-V achieves 100% recall while reducing configuration evaluations by 57.5% on average compared to exhaustive search, with efficiency gains reaching 95.3% at tight accuracy thresholds. Runtime adaptation achieves 90-98% SLO compliance under dynamic load patterns, improving SLO compliance by 71.6% over static high-accuracy baselines, while simultaneously improving accuracy by 3-5% over static fast baselines.

2503.15496 2026-06-15 cs.HC cs.RO

Fast Multi-Party Open-Ended Conversation with a Social Robot

快速多方开放性对话与社交机器人

Giulio Antonio Abbo, Maria Jose Pinto-Bernal, Martijn Catrycke, Tony Belpaeme

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 本文提出一种结合多模态感知与大语言模型的多方对话系统,评估结果显示其在平行对话和小组讨论中表现出高参与度和准确率,但存在语音识别误差和响应延迟等技术限制。

Comments 15 pages, 5 figures, 4 tables; 2 appendices

Journal ref Front. Robot. AI 13:1766383 (2026)

详情
AI中文摘要

多方开放性对话在人机交互中仍是一个重大挑战,特别是当机器人需要识别说话者、分配发言权并在对话重叠或快速变化时保持连贯回应。本文提出一种多方对话系统,结合多模态感知(语音方向到达、说话人分离、面部识别)与大语言模型进行回应生成。在Furhat机器人上实现后,该系统在两个场景中对30名参与者进行了评估:(i)平行独立对话和(ii)共享小组讨论。结果表明,该系统能维持连贯且吸引人的对话,在平行设置中实现高收件人准确率(92.6%)和强面部识别可靠性(80-94%)。参与者报告了清晰的社会存在感和积极的参与度,尽管语音基于说话人识别错误和响应延迟等技术障碍影响了小组互动的流畅性。结果突显了基于LLM的多方交互的潜力和局限性,并概述了未来社交机器人改进多模态提示整合和响应能力的具体方向。

英文摘要

Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.

2508.10827 2026-06-15 astro-ph.EP cs.LG

Accelerating exoplanet climate modelling: A machine learning approach to complement 3D GCM grid simulations

加速系外行星气候建模:一种机器学习方法用于补充3D GCM网格模拟

Alexander Plaschzug, Amit Reza, Ludmila Carone, Sebastian Gernjak, Christiane Helling

发表机构 * Space Research Institute, Austrian Academy of Sciences(空间研究所,奥地利科学院) Institute for Theoretical Physics and Computational Physics, Graz University of Technology(理论物理与计算物理研究所,格拉茨技术大学) Institute of Physics, University of Graz(物理研究所,格拉茨大学)

AI总结 本文利用机器学习方法预测系外行星的3D温度和风结构,通过训练神经网络和决策树算法,为系外行星气候建模提供高效工具,提升对空间任务观测数据的解释能力。

Journal ref A&A Volume 706, February 2026

详情
AI中文摘要

随着望远镜技术的发展,观测系外行星大气的能力不断增强,对更精确的3D气候模型需求增加。然而,通用环流模型(GCMs)计算密集且耗时,难以模拟多种系外行星大气。本文研究了机器学习算法能否预测任意潮汐锁定气态系外行星的3D温度和风结构。引入了一个新的3D GCM网格,模拟了60颗膨胀的热木星围绕A、F、G、K和M型恒星。通过训练密集神经网络(DNN)和决策树算法(XGBoost),预测局部气体温度及水平和垂直风。通过WASP-121 b、HATS-42 b、NGTS-17 b、WASP-23 b和NGTS-1 b等目标测试,验证了DNN预测气体温度的可靠性,所有但一个行星的光谱计算误差在32 ppm以内。所开发的机器学习模拟器能够可靠预测围绕A到M型恒星的膨胀温暖至超热潮汐锁定木星的3D温度场,为系外行星集合研究提供快速工具。预测质量足以保证对气体相化学、云形成和传输光谱的影响极小。

英文摘要

With the development of ever-improving telescopes capable of observing exoplanet atmospheres in greater detail and number, there is a growing demand for enhanced 3D climate models to support and help interpret observational data from space missions like CHEOPS, TESS, JWST, PLATO, and Ariel. However, the computationally intensive and time-consuming nature of general circulation models (GCMs) poses significant challenges in simulating a wide range of exoplanetary atmospheres. This study aims to determine whether machine learning (ML) algorithms can be used to predict the 3D temperature and wind structure of arbitrary tidally-locked gaseous exoplanets in a range of planetary parameters. A new 3D GCM grid with 60 inflated hot Jupiters orbiting A, F, G, K, and M-type host stars modelled with Exorad has been introduced. A dense neural network (DNN) and a decision tree algorithm (XGBoost) are trained on this grid to predict local gas temperatures along with horizontal and vertical winds. To ensure the reliability and quality of the ML model predictions, WASP-121 b, HATS-42 b, NGTS-17 b, WASP-23 b, and NGTS-1 b-like planets, which are all targets for PLATO observation, are selected and modelled with ExoRad and the two ML methods as test cases. The DNN predictions for the gas temperatures are to such a degree that the calculated spectra agree within 32 ppm for all but one planet, for which only one single HCN feature reaches a 100 ppm difference. The developed ML emulators can reliably predict the complete 3D temperature field of an inflated warm to ultra-hot tidally locked Jupiter around A to M-type host stars. It provides a fast tool to complement and extend traditional GCM grids for exoplanet ensemble studies. The quality of the predictions is such that no or minimal effects on the gas phase chemistry, hence on the cloud formation and transmission spectra, are to be expected.

2501.15196 2026-06-15 stat.ML cs.LG

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

时间序列异常检测中自监督学习的综述:最新进展与开放挑战

Aitor Sánchez-Ferrera, Borja Calvo, Jose A. Lozano

发表机构 * University of the Basque Country UPV/EHU(巴斯克大学UPV/EHU) Basque Center for Applied Mathematics (BCAM)(巴斯克应用数学中心)

AI总结 本文综述了时间序列异常检测中自监督学习的最新方法,提出分类体系以理解其多样性,并提供GitHub仓库供后续更新。

详情
AI中文摘要

时间序列异常检测面临诸多挑战,这源于时间依赖数据的序列性和动态性。传统无监督方法常在泛化能力上遇到困难,往往过度拟合训练期间观察到的已知正常模式,难以适应未见过的正常情况。为解决这一限制,时间序列的自监督技术引起了关注,作为克服这一障碍并提升异常检测器性能的潜在解决方案。本文综述了近期利用自监督学习进行时间序列异常检测的方法。提出了一种分类体系,根据其主要特征对这些方法进行分类,有助于清晰理解该领域内的多样性。本文调查中包含的信息,以及将定期更新的额外细节,可在以下GitHub仓库中找到:https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection。

英文摘要

Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and struggling to adapt to unseen normality. In response to this limitation, self-supervised techniques for time series have garnered attention as a potential solution to undertake this obstacle and enhance the performance of anomaly detectors. This paper presents a comprehensive review of the recent methods that make use of self-supervised learning for time series anomaly detection. A taxonomy is proposed to categorize these methods based on their primary characteristics, facilitating a clear understanding of their diversity within this field. The information contained in this survey, along with additional details that will be periodically updated, is available on the following GitHub repository: https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection.

2606.14695 2026-06-15 cs.LG cs.CL 新提交

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Persona-Pruner: 为角色扮演雕琢轻量级模型

Jinsu Kim, Jihoon Tack, Noah Lee, Jongheon Jeong

AI总结 提出Persona-Pruner框架,通过从单个描述中隔离特定角色的子网络来剪枝语言模型,在保持角色扮演性能的同时大幅降低计算成本,性能下降比最强基线减少93.8%。

Comments 25 pages; ICML 2026; Code is available at https://github.com/jsu-kim/Persona-Pruner

详情
AI中文摘要

语言模型(LMs)作为角色扮演聊天机器人展现出显著潜力,在给定角色或用户画像规范时,能够提供一致且风格化的交互。然而,将这些能力应用于现实世界应用(例如,众多NPC同时交互的生态系统)时,由于过高的计算成本,暴露了关键的效率问题。在本文中,我们质疑将完整的通用模型专用于单一角色的必要性,假设特定角色身份仅依赖于模型总容量的一小部分。我们观察到,朴素地剪枝LM通常会严重降低特定角色的角色扮演性能;它无法区分冗余知识和基本角色特征。我们提出Persona-Pruner,一个通过从单个描述中隔离特定角色的子网络来雕琢轻量级角色扮演模型的框架。我们的实验一致表明,Persona-Pruner在保留角色扮演性能方面比现有最先进的LLM剪枝技术有效得多,在RoleBench上使用LLM-as-a-judge评分,将性能下降从密集模型减少至多93.8%(相比最强基线),同时仍保持通用LLM能力。代码可在以下网址获取:此https URL。

英文摘要

Language Models (LMs) have shown remarkable potential as role-playing chatbots, delivering consistent, stylized interactions when given a specification of a character or user persona. However, applying these capabilities to real-world applications (e.g., ecosystems with numerous NPCs interacting simultaneously) exposes a critical inefficiency due to the excessive computational cost. In this paper, we question the necessity of dedicating a full, generalist model to a single persona, hypothesizing that a specific character identity relies on only a fraction of the model's total capacity. We observe that naively pruning LMs often severely degrades the role-playing performance for a specific persona; it does not distinguish between redundant knowledge and essential character traits. We propose Persona-Pruner, a framework that sculpts a lightweight role-playing model by isolating persona-specific sub-networks from a single description. Our experiments consistently show that Persona-Pruner preserves role-playing performance substantially more effectively than existing state-of-the-art LLM pruning techniques, reducing the performance drop from the dense model by up to 93.8% over the strongest baseline on RoleBench in LLM-as-a-judge score, while still maintaining general LLM capabilities. Code is available at https://github.com/jsu-kim/Persona-Pruner.

2606.14684 2026-06-15 cs.CV cs.LG 新提交

HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire Classification

HumP-KD: 一种混合不确定性感知的多阶段渐进式知识蒸馏框架用于高效火灾分类

Mohammed Arif Mainuddin, Najifa Tabassum, Omar Ibne Shahid, Riasat Khan

AI总结 提出HumP-KD框架,通过层次化渐进式知识蒸馏和多阶段蒸馏,将两个冻结的异构Transformer教师(Swin-Tiny和ViT-Base)及其集成知识蒸馏到轻量级MobileViT-S学生模型中,在火灾分类任务上显著提升性能,同时保持低参数量和实时推理速度。

详情
AI中文摘要

实时火灾分类系统需要模型同时具备准确性、计算效率以及可在资源受限硬件上部署的能力。本文提出\textbf{HumP-KD},一种混合不确定性感知的多阶段渐进式知识蒸馏框架,用于高效火灾分类。使用了两个数据集:FlameVision(8600张图像)和Dataset-II(31309张图像)。在标准预处理、在线增强、高斯噪声和运动模糊鲁棒性条件下,应用了多种CNN和Transformer基线模型。所提出的HumP-KD模型通过三个紧密集成的组件,将两个冻结的异构Transformer教师(Swin-Tiny和ViT-Base)及其Meta-MLP集成的知识蒸馏到轻量级MobileViT-S学生中。层次化渐进式知识蒸馏采用层次化特征构建器,生成融合的空间注意力掩码,以选择性地引导蒸馏到判别性区域。多阶段知识蒸馏在训练过程中逐步激活三个蒸馏阶段。在Dataset-II上,HumP-KD在10次独立试验中平均F1分数达到$0.9876 \pm 0.0063$,显著优于未使用蒸馏训练的MobileViT-S基线($0.9537 \pm 0.0351$),独立t检验($p = 0.0195$)和Wilcoxon符号秩检验($W = 1$,$p = 0.0039$)均证实了统计显著性。所提出的方法还展示了跨数据集的强泛化能力和在退化视觉条件下的鲁棒性。学生模型仅保留4.94M参数和19.01Mb模型大小,相比Swin-Tiny参数减少$5.7\times$,相比ViT-Base减少$17.5\times$,同时达到37.72 CPU FPS,适合实时部署。

英文摘要

Real-time fire classification systems require models that are simultaneously accurate, computationally efficient, and deployable on resource-constrained hardware. This work proposes \textbf{HumP-KD}, a Hybrid Uncertainty-aware Multi-stage Progressive Knowledge Distillation framework for efficient fire classification. Two datasets, FlameVision and Dataset-II, containing 8,600 and 31,309 images, are used. Various CNN and transformer baselines are applied under standard preprocessing, online augmentation, Gaussian noise and motion blur robustness conditions. The proposed HumP-KD model distills knowledge from two frozen heterogeneous transformer teachers, Swin-Tiny and ViT-Base, along with their Meta-MLP ensemble, into a lightweight MobileViT-S student via three tightly integrated components. Hierarchical Progressive Knowledge Distillation employs a Hierarchical Feature Builder. It generates a fused spatial attention mask to guide distillation toward discriminative regions selectively. Multi-Stage Knowledge Distillation progressively activates three distillation stages across training. On Dataset-II, HumP-KD achieves a mean F1 score of $0.9876 \pm 0.0063$ across 10 independent trials, significantly outperforming the MobileViT-S baseline trained without distillation ($0.9537 \pm 0.0351$), with statistical significance confirmed by both independent t-test ($p = 0.0195$) and Wilcoxon signed-rank test ($W = 1$, $p = 0.0039$). The proposed method also demonstrates strong generalization across datasets and robustness under degraded visual conditions. The student model retains only 4.94M parameters and 19.01Mb model size, representing a $5.7\times$ parameter reduction over Swin-Tiny and a $17.5\times$ reduction over ViT-Base, while achieving 37.72 CPU FPS, making it suitable for real-time deployment.

2606.14606 2026-06-15 cs.RO cs.SY eess.SY 新提交

Impedance MPC with Disturbance Estimation for Dexterous Hand Control

用于灵巧手控制的阻抗MPC与扰动估计

Yongyan Cao

AI总结 提出一种执行器无关的阻抗模型预测控制框架,通过代数前馈将肌腱传动简化为常系数双积分器,结合编码器增强卡尔曼扰动估计,实现高精度轨迹跟踪与安全接触力控制。

详情
AI中文摘要

灵巧手必须同时跟踪精确的手指轨迹并保持安全、柔顺的接触——这对于任何固定增益控制器来说都是相互矛盾的目标。我们提出了一种执行器无关的灵巧手指阻抗模型预测控制(Impedance MPC)框架,实例化了为物理人机交互(pHRI)建立的恒定$A_d$无偏移架构;通过保留架构假设,其稳定性、递归可行性和输入-状态稳定性保证得以继承。代数前馈将肌腱传动——液压、缆绳、气动、扭绳或串联弹性——简化为常系数双积分器,因此QP代价逆矩阵可离线预计算,一个10步滚动时域二次规划以500 Hz运行,同时强制执行接触力(ISO/TS 15066)、驱动限制和加加速度的硬约束。仅使用编码器的增广卡尔曼扰动状态使任何恒定接触负载下的稳态误差为零。在液压驱动手指上——作为工作示例平台,增加了压力和空化约束——500 Hz卡尔曼MPC在1.5 Nm接触下实现了0.5 mrad RMS、0.1 mrad稳态和6.6 mrad峰值偏差:比经典阻抗分别好183倍、1500倍和23倍。实现的首次运动刚度(随更新率从18变化到323 Nm/rad)得到独立验证。该架构可扩展到16自由度LEAP Hand MuJoCo仿真,在0.7秒内从2.5 N抓取负载扰动中恢复。

英文摘要

Dexterous hands must simultaneously track precise finger trajectories and maintain safe, compliant contact -- objectives in tension for any fixed-gain controller. We present an actuator-agnostic Impedance Model Predictive Control (Impedance MPC) framework for dexterous fingers, instantiating the constant-$A_d$ offset-free architecture established for physical human-robot interaction (pHRI); its stability, recursive-feasibility, and input-to-state-stability guarantees are inherited by preserving the architectural assumptions. An algebraic feedforward reduces the tendon transmission -- hydraulic, cable, pneumatic, twisted-string, or series-elastic -- to a constant-coefficient double integrator, so the QP cost inverse is precomputed offline and a 10-step receding-horizon quadratic program runs at 500\,Hz while enforcing hard constraints on contact force (ISO/TS 15066), actuation limits, and jerk. An encoder-only augmented-Kalman disturbance state drives steady-state error to zero under any constant contact load. On a hydraulically actuated finger -- the worked example platform, adding pressure and cavitation constraints -- the 500\,Hz Kalman MPC attains 0.5\,mrad RMS, 0.1\,mrad steady-state, and 6.6\,mrad peak deflection under 1.5\,Nm contact: 183$\times$, 1500$\times$, and 23$\times$ better than classical impedance. The realized first-move stiffness (18$\to$323\,Nm/rad with update rate) is independently verified. The architecture scales to a 16-DOF LEAP Hand MuJoCo simulation, recovering from 2.5\,N grasp-load disturbances within 0.7\,s.

2606.14504 2026-06-15 cs.CV 新提交

Scratched Lenses, Shifted Depth: Passive Camera-Side Optical Attacks

划痕镜头,深度偏移:被动式相机侧光学攻击

Qinlin He, Zeming Zhuang, Yongji Wu, Lan Zhang, Xiaoyong, Yuan

AI总结 提出一种被动式镜头划痕攻击SLASH,通过光学伪影扭曲深度线索,在单目深度估计和3D目标检测中实现高达32%的相对深度偏移。

详情
AI中文摘要

视觉系统上的物理对抗攻击通常通过场景操纵进行研究,例如对抗性补丁或投影,其中攻击者控制相机观察的内容。使用贴纸或辅助光学的相机侧攻击也已被探索,但它们将攻击视为来自设计模式的图像空间扰动。这忽略了物理缺陷如何与场景相关的光照和光学相互作用。我们识别出一种威胁:被动的镜头侧损伤,它持久存在但具有触发条件,产生在特定视觉条件下偏置几何推理的光学伪影。我们通过划痕诱导的镜头对抗性条纹劫持(SLASH)实例化这种威胁,这是一种由相机镜头或保护罩上的小划痕引起的物理世界攻击。划痕与明亮光源和镜面反射相互作用,产生扭曲深度线索的结构化条纹伪影。由于扰动在光路中固定但由场景触发,它既持久又具有选择性。我们在光学空间中制定攻击,将划痕模式建模为触发条件光学通道,并优化一个固定配置以适应不同的观看条件。我们在数字和真实世界环境中评估SLASH对单目深度估计和单目3D目标检测的效果。在固定划痕约束下,单目深度估计的方向性深度偏移达到高达32%的相对误差,对单目3D目标检测具有一致的影响。物理实验证实了向真实相机记录的迁移,诱导的深度偏移高于模型的自然预测基线。这些发现揭示了一个攻击面,其中看似无害的硬件缺陷充当潜在的、场景触发的对抗机制,挑战了关于物理鲁棒性的假设,并激励了安全视觉系统的防御措施。

英文摘要

Physical adversarial attacks on vision systems are typically studied through scene manipulation, such as adversarial patches or projections, where the adversary controls what the camera observes. Camera-side attacks using stickers or auxiliary optics have also been explored, but they treat attacks as image-space perturbations from designed patterns. This misses how physical imperfections interact with scene-dependent lighting and optics. We identify a threat: passive lens-side damage that is persistent yet trigger-conditioned, producing optical artifacts that bias geometric inference under particular visual conditions. We instantiate this threat through Scratch-induced Lens Adversarial Streak Hijacking SLASH, a physical-world attack caused by small scratches on a camera lens or protective cover. Scratches interact with bright light sources and specular reflections to create structured streak artifacts that distort depth cues. Since the perturbation is fixed in the optical path but triggered by the scene, it is both persistent and selective. We formulate the attack in optical space, model the scratch pattern as a trigger-conditioned optical channel, and optimize one fixed configuration across diverse viewing conditions. We evaluate SLASH on monocular depth estimation and monocular 3D object detection in digital and real-world settings. Under the fixed-scratch constraint, directional depth shifts reach up to 32% relative error for monocular depth estimation, with consistent effects on monocular 3D object detection. Physical experiments confirm transfer to real camera recordings, inducing depth shifts above the model's natural prediction baseline. These findings reveal an attack surface where benign-looking hardware imperfections act as latent, scene-triggered adversarial mechanisms, challenging assumptions about physical robustness and motivating defenses for secure vision systems.

2606.14257 2026-06-15 cs.CL 新提交

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

语言学奥林匹克:语言学研究的语料库新方向?

Vlad A. Neacsu

AI总结 本文探讨语言学奥林匹克问题作为语言学语料库的潜力,分析其优势与局限,并提出在学术研究中负责任使用的标准。

Comments Accepted for publication in LingBaW. Linguistics Beyond and Within (Volume 12, 2026)

详情
AI中文摘要

语言学奥林匹克问题(LOPs)是一类自足的谜题,包含一个缩小的语料库,代表特定的语言现象,解题者需从中推断出语言的基本规则集,然后翻译一组新元素。语言学奥林匹克(LOs)已成为全球现象,有43个不同地区参加2025年国际语言学奥林匹克(IOL)。尽管LOPs的类型和解题策略已被分析,但其科学层面及与学术语言学的联系尚待探索。LOPs直接关联许多语言学领域,如语言类型学、语言相对论和语言学田野调查。最近,LOPs作为大语言模型的基准成为研究焦点,凸显其在计算语言学中的实用性。然而,它们尚未被纳入主流语言学研究。本文试图通过提供LOPs作为语言数据源的结构化评估,开辟将这类特殊谜题纳入学术研究的新方向,并提出在学术研究中负责任使用的标准。基于超过1800个LOPs的数据集,本研究批判性地考察了LOPs作为语言学新语料库的潜力,讨论了它们作为工具的优势和局限性,以及这些谜题可能适用的语言学领域。这项工作为更广泛的倡议奠定了基础,旨在通过为LOPs建立稳健的理论框架,弥合LOs与学术语言学之间的鸿沟。

英文摘要

Linguistics olympiad problems (LOPs) are a category of self-sufficient puzzles consisting of a scaled-down corpus representative of certain linguistic phenomena, from which the solver must deduce a primitive set of rules of the language and then translate a new set of elements. The linguistics olympiads (LOs) have become a worldwide phenomenon with 43 different territories taking part in the International Linguistics Olympiad (IOL) 2025. While the typology and solving strategies of LOPs have been analysed, their scientific facet and connections to academic linguistics have yet to be explored. LOPs are directly connected to many linguistic fields, e.g., linguistic typology, linguistic relativity, and linguistics fieldwork. Recently, LOPs have become a research focus as benchmarks for large language models, thus highlighting their usefulness in computational linguistics. Nevertheless, they have not yet been integrated into mainstream linguistics research. This paper attempts to open new directions of including this particular type of puzzle in academic research by offering a structured evaluation of LOPs as linguistic data sources and proposes criteria for their responsible use in academic research. Starting from a set of over 1800 LOPs, this study critically examines the potential of LOPs as a novel corpus for linguistics research by discussing their strengths and limitations as tools, as well as the areas of linguistics into which these problems could fit. This work forms the foundation for a broader initiative aimed at bridging the gap between LOs and academic linguistics, by establishing a robust theoretical framework for LOPs.

2606.13991 2026-06-15 cs.CL 新提交

Fusing Stylometric and Embedding Systems to Estimate Authorship Likelihood Ratios in Japanese

融合文体特征与嵌入系统以估计日语作者身份似然比

Praju Ghatpande, Satoru Tsuge, Shunichi Ishihara, Wataru Zaitsu, Mitsuyuki Inaba

AI总结 首次将似然比框架应用于日语数字文本,通过融合文体特征系统与基于嵌入的系统,提高了似然比量值和判别能力,最佳融合的log-likelihood-ratio成本为0.32484。

详情
AI中文摘要

似然比框架被广泛认为是法医学证据分析的逻辑和法律基础,其在文本证据的作者身份分析中的重要性日益得到认可。然而,迄今为止,其应用仅限于英语文本。同时,作者身份归属传统上依赖于多种文体特征,而预训练大语言模型的兴起使得新的上下文嵌入方法成为可能。通过融合这些不同方法有望提升性能,但尚未在似然比范式下应用于整合文体特征系统与基于嵌入的系统。本研究首次将基于似然比的法医文本比较应用于日语数字文本,使用约1000字符的博客摘录,以1)评估系统性能和似然比量级,2)评估融合文体特征系统与基于嵌入的系统的影响。结果表明,融合系统在保持优秀校准的同时,1)增加了与事实一致的似然比量级;2)减少了与事实相反的似然比量级;3)提高了整体判别能力。最佳融合实现了0.32484的对数似然比成本,既证明了似然比框架在日语中的可行性,也展示了异构系统融合的优势。

英文摘要

The likelihood ratio framework is widely recognized as the logically and legally sound basis for evidential analysis across forensic sciences, and its importance is increasingly acknowledged in analyses of authorship in textual evidence. To date, however, its application has been confined to English-language texts. Meanwhile, authorship attribution has traditionally relied on a diverse array of stylometric features, even as the rise of pre-trained large language models enables new contextual-embedding approaches. Combining these diverse approaches through fusion promises enhanced performance, yet it has not been applied to integrate stylometric-feature systems with embedding-based systems within the likelihood ratio paradigm. This study is the first to apply likelihood ratio-based forensic text comparison to Japanese digital texts, using ~1,000-character excerpts from blogs, to 1) evaluate system performance and likelihood ratio magnitudes and 2) assess the impact of fusing stylometric-feature systems with embedding-based systems. The results demonstrate that the fused system maintains excellent calibration while 1) increasing consistent-with-fact likelihood ratio magnitudes; 2) decreasing contrary-to-fact likelihood ratio magnitudes and 3) improving overall discriminability. The best-performing fusion achieved a log-likelihood-ratio cost of 0.32484, illustrating both the feasibility of likelihood ratio framework for Japanese and the benefits of fusion across heterogeneous systems.

2606.13884 2026-06-15 cs.AI 新提交

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

能力最小化作为安全原语:风险感知因果门控实现最小特权LLM代理

Laxmipriya Ganesh Iyer, Rahul Suresh Babu

AI总结 提出风险感知因果门控(RACG)框架,通过因果效应估计与校准风险控制决定是否采纳模型预测,显著降低高成本错误,同时保持非门控策略的大部分效用。

详情
AI中文摘要

现代决策系统越来越依赖学习组件,其输出可能自信但错误,导致下游行动面临代价高昂的错误。我们引入风险感知因果门控(RACG),该框架通过结合因果效应估计与校准风险控制,决定是否对模型预测采取行动、推迟或放弃。RACG对从候选行动到结果的因果路径进行建模,并根据估计的反事实风险而非原始预测置信度对每个决策进行门控。为使门控可靠,我们推导了在高风险条件下行动概率的分布无关界限,并展示了这些界限如何转化为满足用户指定安全约束的操作阈值。我们进一步提出一种自适应门控策略,通过监测预测结果与实际结果之间的差异来适应分布偏移,在因果假设看似被违反时收紧门控。在模拟干预和真实世界决策基准测试中,RACG大幅减少了高成本错误,同时保留了非门控策略的大部分效用,并且在匹配的弃权率下优于基于置信度和选择性预测的基线方法。我们的结果表明,明确分离因果风险与预测不确定性可以产生更安全、更透明的决策系统,为高风险场景中的可信自动化提供了一种原则性机制。

英文摘要

Modern decision systems increasingly rely on learned components whose outputs may be confident yet wrong, exposing downstream actions to costly errors. We introduce Risk-Aware Causal Gating (RACG), a framework that decides whether to act on, defer, or abstain from a model's prediction by combining causal effect estimation with calibrated risk control. RACG models the causal pathway from candidate actions to outcomes and gates each decision according to an estimated counterfactual risk rather than raw predictive confidence. To make gating reliable, we derive distribution-free bounds on the probability of acting under high-risk conditions and show how these bounds translate into operating thresholds that satisfy user-specified safety constraints. We further propose an adaptive gating policy that adjusts to distribution shift by monitoring discrepancies between predicted and realized outcomes, tightening the gate when causal assumptions appear violated. Across simulated interventions and real-world decision benchmarks, RACG reduces high-cost errors substantially while preserving most of the utility of an ungated policy, and it outperforms confidence-based and selective-prediction baselines at matched abstention rates. Our results indicate that explicitly separating causal risk from predictive uncertainty yields decision systems that are both safer and more transparent, offering a principled mechanism for trustworthy automation in high-stakes settings.

2604.24117 2026-06-15 cs.AI 版本更新

An Analysis of the Coordination Gap between Joint and Modular Learning for Job Shop Scheduling with Transportation Resources

带运输资源的作业车间调度中联合学习与模块化学习协调差距分析

Moritz Link, Jonathan Hoss, Noah Klarmann

AI总结 通过资源稀缺性和时间主导性分析,量化联合训练与模块化训练在带运输资源的作业车间调度中的性能差距,发现联合训练在多数情况下更优,但在瓶颈环境下差距缩小。

Comments Supported by the Chips Joint Undertaking and its members, including top-up funding by National Authorities, within the Cynergy4MIE project (Grant Agreement No. 101140226). This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

带运输资源的高效作业车间调度对于高性能制造至关重要。随着“去中心化工厂”的兴起,多智能体强化学习已成为生产与运输任务联合调度的一种有前景的方法。先前的工作主要集中于开发新颖的合作架构,而忽视了何时需要联合训练的问题。联合训练指同时训练作业和自动导引车调度智能体,而模块化训练则涉及独立训练每个智能体后进行事后集成。在本研究中,我们系统地调查了在带运输资源的作业车间调度问题中,联合训练对于最优性能至关重要的条件。通过对资源稀缺性和时间主导性的严格敏感性分析,我们量化了协调差距——这两种训练模式之间的性能差异。在我们的评估中,联合训练优于大多数调度规则组合和模块化训练方法。然而,在瓶颈环境中,特别是在严重的运输和处理约束下,协调差距的优势会减弱。这些发现表明,在单个调度任务占主导地位的环境中,模块化训练是一种可行的替代方案。总体而言,我们的工作为根据环境条件选择训练模式提供了实用指导,使决策者能够优化基于强化学习的调度性能。

英文摘要

Efficient job-shop scheduling with transportation resources is critical for high-performance manufacturing. With the rise of "decentralized factories", multi-agent reinforcement learning has emerged as a promising approach for the combined scheduling of production and transportation tasks. Prior work has largely focused on developing novel cooperative architectures while overlooking the question of when joint training is necessary. Joint training denotes the simultaneous training of job and automatic guided vehicle scheduling agents, whereas modular training involves independently training each agent followed by post-hoc integration. In this study, we systematically investigate the conditions under which joint training is essential for optimal performance in the job-shop scheduling problem with transportation resources. Through a rigorous sensitivity analysis of resource scarcity and temporal dominance, we quantify the coordination gap -- the performance difference between these two training modalities. In our evaluation, joint training outperforms the majority of dispatching rule combinations and modular training approaches. However, the coordination gap advantage diminishes in bottleneck environments, particularly under severe transport and processing constraints. These findings indicate that modular training represents a viable alternative in environments where a single scheduling task dominates. Overall, our work provides practical guidance for selecting between training modalities based on environmental conditions, enabling decision-makers to optimize reinforcement learning-based scheduling performance.

2510.02695 2026-06-15 cs.LG cs.AI 版本更新

RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

RAMAC: 多模态风险感知离线强化学习及行为正则化的作用

Kai Fukazawa, Kunal Mundada, Iman Soltani

AI总结 提出RAMAC框架,结合分布性评论家与生成式演员(如扩散模型),通过条件风险价值与行为克隆的复合目标实现离线强化学习中的风险敏感学习,抑制分布外动作并提升CVaR。

Comments ICML 2026

详情
AI中文摘要

在安全关键领域中,当在线数据收集不可行时,离线强化学习(RL)只有在策略能够实现高回报且避免灾难性的下尾风险时才具有吸引力。先前关于风险厌恶离线RL的工作通过(i)基于值/模型的悲观主义或(ii)限制策略类以限制表达能力来实现安全性,而扩散/流式表达性生成策略主要在中性风险设置中使用。我们引入了\textbf{风险感知多模态演员-评论家(RAMAC)},一个简单、模块化、无模型的框架,它将表达性生成演员(例如扩散/流)与分布性评论家相结合,并优化一个结合条件风险价值(CVaR)与行为克隆(BC)的复合目标,从而在复杂的多模态场景中实现风险敏感学习。由于分布外(OOD)动作是离线RL中灾难性失败的主要驱动因素,我们进一步提供了一个目标层面的分析,表明通过BC控制行为发散可以抑制OOD动作并稳定CVaR。使用扩散演员实例化RAMAC,我们在二维风险赌博机上展示了这些见解,并在Stochastic-D4RL上进行了评估,观察到在保持高回报的同时,$\mathrm{CVaR}_{0.1}$的一致提升。代码和实验结果可在\href{this https URL}{项目网站}上获取。

英文摘要

In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) is attractive only if policies achieve high returns without catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of either (i) value/model-based pessimism or (ii) restricted policy classes that limit expressiveness, whereas diffusion/flow-based expressive generative policies have largely been used in risk-neutral settings. We introduce \textbf{Risk-Aware Multimodal Actor-Critic (RAMAC)}, a simple, modular, model-free framework that couples an expressive generative actor (e.g., diffusion/flow) with a distributional critic and optimizes a composite objective that combines Conditional Value-at-Risk (CVaR) with behavioral cloning (BC), enabling risk-sensitive learning in complex multimodal scenarios. Since out-of-distribution (OOD) actions are a major driver of catastrophic failures in offline RL, we further provide an objective-level analysis showing that controlling behavior divergence via BC suppresses OOD actions and stabilizes CVaR. Instantiating RAMAC with a diffusion actor, we illustrate these insights on a 2-D risky bandit and evaluate on Stochastic-D4RL, observing consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns. The code and experimental results are available on the \href{https://kaifukazawa.github.io/ramac-project/} {project website}

2505.16077 2026-06-15 cs.LG 版本更新

Ensembling Sparse Autoencoders

集成稀疏自编码器

Soham Gadgil, Chris Lin, Su-In Lee

AI总结 针对单个稀疏自编码器只能捕获激活空间中有限特征的问题,提出通过朴素Bagging和Boosting集成多个SAE,理论证明可降低重构误差,实验表明集成方法在重构质量、稳定性和下游任务上优于扩展单个SAE。

Comments Accepted to ICML 2026

详情
AI中文摘要

稀疏自编码器(SAEs)用于将神经网络激活分解为人类可解释的特征。通常,单个SAE学习到的特征用于下游应用。然而,最近研究表明,单个SAE只能捕获从激活空间中提取的特征的有限子集。受此限制,我们引入并形式化了SAE集成。此外,我们提出通过朴素Bagging和Boosting集成多个SAE。在朴素Bagging中,集成使用不同权重初始化训练的SAE;而在Boosting中,集成顺序训练以最小化残差误差的SAE。理论上,朴素Bagging和Boosting被证明是减少重构误差的方法。实证上,我们在三种语言模型和SAE架构设置下评估了我们的集成方法。我们的实证结果表明,与匹配集成中特征数量的扩展SAE相比,集成SAE改善了语言模型激活的重构以及SAE稳定性。此外,在概念检测和虚假相关性去除等下游任务中,SAE集成实现了更好的性能,显示出改进的实际效用。

英文摘要

Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we introduce and formalize SAE ensembles. Furthermore, we propose to ensemble multiple SAEs through naive bagging and boosting. In naive bagging, SAEs trained with different weight initializations are ensembled, whereas in boosting SAEs sequentially trained to minimize the residual error are ensembled. Theoretically, naive bagging and boosting are justified as approaches to reduce reconstruction error. Empirically, we evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that, compared to an expanded SAE that matches the number of features in the ensemble, ensembling SAEs improves the reconstruction of language model activations along with SAE stability. Additionally, on downstream tasks such as concept detection and spurious correlation removal, SAE ensembles achieve better performance, showing improved practical utility.

2508.18967 2026-06-15 cs.RO cs.CV

Enhanced UAV Path Planning Using the Tangent Intersection Guidance (TIG) Algorithm

利用切线交点引导算法(TIG)增强的无人机路径规划

Hichem Cheriet, Khellat Kihel Badra, Chouraqui Samira

AI总结 本文提出TIG算法,通过椭圆切线交点方法生成可行路径,结合启发式规则和二次贝塞尔曲线平滑技术,在静态和动态环境中实现高效安全的无人机路径规划。

Comments Accepted for publication in JAMRIS Journal

Journal ref Journal of Automation, Mobile Robotics and Intelligent Systems, 20(2), 30-52 (2026)

详情
AI中文摘要

高效的无人机导航对于各种应用至关重要,包括战斗支援、包裹递送和搜索救援。本文介绍了切线交点引导(TIG)算法,一种用于静态和动态环境中的无人机路径规划的先进方法。该算法使用椭圆切线交点方法生成可行路径。它为每个威胁生成两条子路径,根据启发式规则选择最佳路线,并迭代优化路径,直到达到目标。考虑到无人机的运动学和动力学约束,采用基于二次贝塞尔曲线的改进平滑技术生成平滑且高效的路径。实验结果表明,TIG算法在静态环境中能够在0.01秒内生成最短路径,比A*、PRM、RRT*、切线图和静态APPATT算法具有更少的转向角度。此外,在完全未知和部分已知环境中,TIG展示了高效的实时路径规划能力,用于避障,优于APF和动态APPATT算法。

英文摘要

Efficient and safe navigation of Unmanned Aerial Vehicles (UAVs) is critical for various applications, including combat support, package delivery and Search and Rescue Operations. This paper introduces the Tangent Intersection Guidance (TIG) algorithm, an advanced approach for UAV path planning in both static and dynamic environments. The algorithm uses the elliptic tangent intersection method to generate feasible paths. It generates two sub-paths for each threat, selects the optimal route based on a heuristic rule, and iteratively refines the path until the target is reached. Considering the UAV kinematic and dynamic constraints, a modified smoothing technique based on quadratic Bézier curves is adopted to generate a smooth and efficient route. Experimental results show that the TIG algorithm can generate the shortest path in less time, starting from 0.01 seconds, with fewer turning angles compared to A*, PRM, RRT*, Tangent Graph, and Static APPATT algorithms in static environments. Furthermore, in completely unknown and partially known environments, TIG demonstrates efficient real-time path planning capabilities for collision avoidance, outperforming APF and Dynamic APPATT algorithms.

2606.14565 2026-06-15 cs.CE cs.LG physics.comp-ph 新提交

CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field data

CANN-EUCLID:基于全场数据的无监督本构人工神经网络模型发现

Benjamin Alheit, Siddhant Kumar, Mathias Peirlinck

AI总结 提出CANN-EUCLID框架,结合本构人工神经网络与无监督全场数据发现方法,直接从位移场和反作用力识别稀疏超弹性本构模型,无需应力测量或预设模型。

详情
AI中文摘要

本构人工神经网络(CANN)提供了可解释的材料模型发现方法,但迄今为止仅用于基于均匀测试的表观应力-应变数据的应力监督设置。由于每个测试仅采样狭窄的加载路径并提供均匀化而非局部应力信息,稳健的发现通常需要多种加载模式来约束多维响应。这对于软生物组织具有挑战性,因为重复测试、损伤和样本变异性限制了单个标本的可靠信息。在这里,我们将CANN与应力无监督的全场发现框架EUCLID相结合,直接从位移场和反作用力中识别稀疏超弹性本构律,仅需一个诱导异质性的加载案例。CANN-EUCLID通过稀疏促进正则化最小化平衡不平衡,选择紧凑的活跃项,无需局部应力测量或预设本构律。我们在具有预设真实本构律的各向同性和各向异性基准上评估了该方法。当真实本构律可由所选CANN基表示时,我们的方法以近乎精确的精度恢复正确项,包括带有嵌入参数的指数项。当真实本构律不包含在基中时,该方法保留共享项并使用可用基函数近似缺失贡献。泛化能力强烈依赖于采样的变形状态:当充分探测时,指数应变硬化项可以准确恢复,但当硬化区域位于采样域之外时,可能产生较大的外推误差。正向有限元验证模拟表明,发现的行为准确复制了真实本构律。这些结果确立了应力无监督CANN发现作为可解释的全场本构模型识别的有前景框架。

英文摘要

Constitutive artificial neural networks (CANNs) provide interpretable material model discovery, but have so far been used in stress-supervised settings based on apparent stress-strain data from homogeneous tests. Because each test samples only a narrow loading path and provides homogenized rather than local stress information, robust discovery typically requires multiple loading modes to constrain the multidimensional response. This is challenging for soft biological tissues, where repeated testing, damage, and sample variability limit reliable information from a single specimen. Here, we combine CANNs with the stress-unsupervised full-field discovery framework EUCLID to identify sparse hyperelastic laws directly from displacement fields and reaction forces in one heterogeneity-inducing loading case. CANN-EUCLID minimizes equilibrium imbalance with sparsity-promoting regularization selecting compact active terms, without local stress measurements or a prescribed law. We evaluate the approach on isotropic and anisotropic benchmarks with prescribed ground-truth laws. When the ground truth is representable by the chosen CANN basis, our method recovers the correct terms with near-exact accuracy, including exponential terms with embedded parameters. When it is not contained in the basis, the method retains shared terms and approximates missing contributions using available basis functions. Generalization depends strongly on sampled deformation states: exponential strain-stiffening terms can be recovered accurately when sufficiently probed, but can produce large extrapolation errors when the stiffening regime lies outside the sampled domain. Forward FE validation simulations show that the discovered behavior accurately replicates the ground truth. These results establish stress-unsupervised CANN discovery as a promising framework for interpretable full-field constitutive model identification.

2606.14350 2026-06-15 cs.DC cs.AI 新提交

Design Methodology and Performance Trade-offs Management for Distributed and Compound AI Systems

分布式与复合AI系统的设计方法论及性能权衡管理

Milos Gravara, Andrija Stanisic, Stefan Nastic

AI总结 提出从模型中心转向系统中心的设计方法论,通过工作流拓扑和配置选择两个维度组织设计空间,识别八种设计模式以克服单体部署局限,实验表明复合AI配置在接近精度同时显著降低延迟和成本。

详情
AI中文摘要

人工智能系统通常必须满足包括准确性、延迟和成本在内的服务级别目标。当前以模型为中心的方法在设计时选择单一模型,并对所有输入应用相同的计算,无法将任务分解到专门组件,且知识在训练时固定。在运行时,这可能导致性能下降和成本增加。由于模型是主要设计变量,它决定了系统的大部分行为,将操作目标耦合到单一设计时选择。解决这些限制需要从以模型为中心转向以系统为中心的设计。复合AI系统通过显式控制逻辑将多个模型、算法和工具编排为分布式AI系统,实现了这一转变。此类系统的性能取决于其工作流拓扑、分配给每个任务的模型以及控制运行时行为的参数。我们提出了一种设计方法论,沿工作流拓扑和配置选择两个维度组织这一空间,并识别出八种设计模式,每种模式整合了解决单体部署特定限制的技术。我们通过三个案例研究验证了该方法论。在我们的案例研究中,复合AI配置的准确性接近单体模型(相差2.5至4个百分点),同时延迟降低高达60%,成本降低高达71%。我们表明模型选择和参数配置共同决定系统性能,但随着工作流组合更多模式和组件,产生的设计空间呈组合增长。因此,我们识别出五个开放挑战,这些挑战定义了从手动配置原型到自动发现并维护复合与分布式AI系统中SLO合规性的系统的路线图。

英文摘要

Artificial Intelligence (AI) systems must typically satisfy service-level objectives including accuracy, latency, and cost. The prevailing model-centric approaches select a monolithic model at design time and apply identical computation regardless of input difficulty, cannot decompose tasks across specialized components, and have knowledge that is fixed at training time. During runtime, this can lead to performance degradation and increasing costs. Because the model is the main design variable, it determines the majority of system behavior, coupling operational objectives to a single design-time choice. Addressing these limitations requires shifting from model-centric to system-centric design. Compound AI systems realize this shift by orchestrating multiple models, algorithms, and tools as distributed AI systems through explicit control logic. The performance of such systems depends on their workflow topology, the models assigned to each task, and the parameters governing runtime behavior. We present a design methodology that organizes this space along two dimensions, workflow topology and configuration selection, and identifies eight design patterns, each consolidating techniques to address a specific limitation of monolithic deployment. We validate our methodology through three case studies. Across our case studies, Compound AI configurations approach accuracy of monolithic models within 2.5 to 4 percentage points while reducing latency by up to 60% and cost by up to 71%. We show that model selection and parameter configuration jointly determine system performance, but the resulting design space grows combinatorially, as workflows compose more patterns and components. Thus, we identify five open challenges that define a roadmap from manually configured prototypes towards systems that automatically discover and maintain SLO-compliance in Compound and Distributed AI systems.

2606.12918 2026-06-15 cs.CR cs.AI 新提交

MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems

MAStrike: 基于Shapley值的多智能体系统合谋红队测试

Chejian Xu, Zhaorun Chen, Jingyang Zhang, Freddy Lecue, Avni Kothari, Sarah Tan, Wenbo Guo, Bo Li

AI总结 提出MAStrike框架,通过Shapley值分析识别多智能体系统中脆弱智能体联盟,生成角色感知的对抗攻击,并迭代优化以绕过防御,显著优于启发式基线。

详情
AI中文摘要

分层多智能体系统(MAS)正迅速部署在金融和软件工程等高危工作流中。在这些系统中,安全本质上是分布在不同角色智能体上的,显著扩大了攻击面,特别是在特权提升和跨智能体合谋等协调对抗行为下。现有的MAS红队测试方法仍然有限:它们依赖启发式选择目标智能体并扰动孤立的消息流,留下了关键问题未解答,即哪些智能体对系统安全最负责,以及受损智能体如何协调以绕过防御。我们提出MAStrike,一个用于分层MAS中合谋红队测试的闭环框架。我们首次提出针对MAS的智能体级Shapley值分析,量化每个智能体在任务特定分布下对系统鲁棒性的边际贡献。在此归因指导下,MAStrike识别脆弱智能体联盟并生成协调的、角色感知的对抗操纵。这些攻击通过结构化因果诊断迭代优化,将失败案例归因于阻止对抗尝试的未受损智能体。我们进一步构建了全面的MAS红队测试基准和可控环境,涵盖不同的分层拓扑和领域,包括金融、软件工程和CRM。在多个前沿模型构建的MAS上进行的广泛实验表明,MAStrike显著优于启发式基线。我们的分析进一步揭示了智能体间非平凡的Shapley值分布和高阶交互结构,揭示了先前单智能体或基于模板的方法忽略的关键漏洞和协调模式。

英文摘要

Hierarchical multi-agent systems (MAS) are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particularly under coordinated adversarial behaviors such as privilege escalation and cross-agent collusion. Existing red-teaming approaches for MAS remain limited: they rely on heuristic selection of target agents and perturb isolated message streams, leaving critical questions unanswered as which agents are most responsible for system safety, and how compromised agents can coordinate to bypass defenses. We propose MAStrike, a closed-loop framework for collusive red-teaming in hierarchical MAS. We propose the first agent-level Shapley value analysis for MAS, quantifying each agent's marginal contribution to system robustness under task-specific distributions. GGuided by this attribution, MAStrike identifies vulnerable agent coalitions and generates coordinated, role-aware adversarial manipulations. These attacks are iteratively refined through structured causal diagnosis, attributing failure cases to uncompromised agents that block adversarial attempts. We further build a comprehensive MAS red-teaming benchmark and controllable environments spanning diverse hierarchical topologies and domains, including finance, software engineering, and CRM. Extensive experiments across MAS built on multiple frontier models show that MAStrike substantially outperforms heuristic baselines. Our analysis further uncovers non-trivial Shapley value distributions and higher-order interaction structures among agents, revealing critical vulnerabilities and coordination patterns that are overlooked by prior single-agent or template-based methods.

2605.18250 2026-06-15 physics.data-an cs.LG 版本更新

A Unified Framework for Structured Flow Modeling: From Representation to Verification and Model Discovery

结构化流建模的统一框架:从连续场到数据驱动表示

Diego Casadei

AI总结 提出一个统一框架,通过连接Helmholtz-Hodge分解与离散及数据驱动表示,实现结构化流的建模,并引入跨域验证策略以评估模型复杂度、可解释性和预测性能之间的权衡。

Comments 26 pages, 1 figure

详情
AI中文摘要

许多动力系统可以用结合源/汇行为、循环动力学和拓扑约束输运的结构化流来描述。这些特征出现在广泛的领域中,包括物理、工程和数据驱动系统。本工作通过连接基于Helmholtz-Hodge分解的连续公式与离散及数据驱动表示,为这类系统提供了统一视角。我们回顾了最近提出的图向量场(GVF)框架,该框架能够在单纯复形上将复杂动力学分解为梯度、旋度和调和分量,兼具表达性和可解释性。然后,我们引入了一系列替代建模方法,包括参数条件模型、线性图动力系统和约化Hodge表示,这些方法在表达力与计算易处理性及降低数据需求之间进行权衡。本工作的一个关键贡献是跨域验证策略,该策略利用来自物理系统理解良好的数据集,独立于目标应用领域验证模型正确性并评估鲁棒性。这种方法能够系统评估模型复杂度、可解释性和预测性能之间的权衡。最终框架支持迭代建模方法论,其中高表达性模型作为诊断工具识别主导机制,指导构建适应实际约束的简化模型。本工作强调了结构化流建模的广泛适用性,并为复杂动力系统的可扩展和可解释分析提供了基础。

英文摘要

Many dynamical systems can be described in terms of structured flows combining source/sink behavior, cyclic dynamics, and topology-constrained transport. These features arise across a wide range of physical, engineered, and data-driven systems. The objective of this work is to establish a unified perspective on such systems, to identify modeling approaches that balance expressivity, interpretability, computational complexity, and data requirements, and to investigate how highly expressive models can be used to uncover the dominant mechanisms underlying observed dynamics. Starting from the Helmholtz-Hodge decomposition of continuous vector fields, we review the recently proposed Graph Vector Field (GVF) framework and its discrete representation on simplicial complexes. We then introduce a hierarchy of alternative approaches, including parametric conditional models, linear graph dynamical systems, and reduced Hodge representations. Finally, we propose a verification and validation methodology based on benchmark datasets from well-understood physical systems and on systematic model-reduction and ablation studies. The resulting family of structured-flow models within a common framework, ranging from low-dimensional parametric representations to full GVF formulations, supports a diagnostic methodology in which gradient, curl, harmonic, and topological contributions are systematically assessed through ablation studies. This process enables the identification of dominant mechanisms underlying the observed dynamics and guides the construction of simplified models tailored to the available data and operational constraints. By separating structural verification, behavioral verification, and domain-specific validation, the proposed approach provides a foundation for scalable and interpretable analysis of complex dynamical systems across multiple application domains.

2602.02355 2026-06-15 cs.DC cs.IT cs.LG math.IT 版本更新

Mitigating Heterogeneity-Induced Drift in Hierarchical Sign-Based Federated Learning

缓解层次化基于符号的联邦学习中的异质性引起的漂移

Amirreza Kazemi, Seyed Mohammad Azimi-Abarghouyi, Gabor Fodor, Carlo Fischione

AI总结 针对层次化联邦学习中簇间数据异质性导致的模型漂移问题,提出DC-HierSignSGD算法,通过云辅助梯度校正消除偏差,在保持二进制通信的同时提升精度。

详情
AI中文摘要

层次化联邦学习(HFL)非常适合大规模无线和物联网系统,其中设备在到达云之前与附近的边缘服务器通信。在这些环境中,上行链路带宽和延迟施加了严格的通信约束,使得激进的梯度压缩成为必要。基于一位符号的随机梯度下降方法在平坦联邦设置中提供了有吸引力的解决方案,但其在层次化边缘-云架构中的行为仍未得到充分理解,尤其是在簇间数据异质性下。为填补这一空白,我们开发了一个基于符号的HFL框架,其中设备向边缘服务器传输二进制随机梯度符号,边缘服务器应用多数投票,云定期聚合边缘模型。我们的分析表明,簇间异质性在收敛界中引入了一个持续偏差项,反映了边缘模型向局部目标的漂移。这一项无法通过增加训练轮数或单独调整标准超参数来消除。因此,我们提出了\(\mathtt{DC\text{-}HierSignSGD}\),一种漂移校正的基于符号的HFL算法,其中设备在取符号之前应用云辅助梯度校正。我们表明,这种预符号校正减轻了非消失的异质性引起的偏差,同时在重复的局部符号更新步骤中保留了设备-边缘的二进制通信。在严重簇间异质性下的实验表明,\(\mathtt{DC\text{-}HierSignSGD}\)提高了基于符号的HFL的稳定性和准确性,并在设备-边缘通信大幅降低的情况下实现了与全精度层次化SGD相当的性能。

英文摘要

Hierarchical federated learning (HFL) is well suited for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication constraints, making aggressive gradient compression essential. One-bit sign-based stochastic gradient descent methods provide an attractive solution in flat federated settings, but their behavior in hierarchical edge--cloud architectures remains insufficiently understood, especially under inter-cluster data heterogeneity. To address this gap, we develop a sign-based HFL framework in which devices transmit binary stochastic-gradient signs to edge servers, edge servers apply majority voting, and the cloud periodically aggregates edge models. Our analysis reveals that inter-cluster heterogeneity induces a persistent bias term in the convergence bound, reflecting the drift of edge models toward local objectives. This term cannot be removed by increasing the number of training rounds or by tuning standard hyperparameters alone. We therefore propose \(\mathtt{DC\text{-}HierSignSGD}\), a drift-corrected sign-based HFL algorithm in which devices apply a cloud-assisted gradient correction before taking the sign. We show that this pre-sign correction mitigates the non-vanishing heterogeneity-induced bias while preserving binary device--edge communication during the repeated local sign-update steps. Experiments under severe inter-cluster heterogeneity demonstrate that \(\mathtt{DC\text{-}HierSignSGD}\) improves the stability and accuracy of sign-based HFL and achieves performance comparable to full-precision hierarchical SGD with substantially lower device--edge communication.

2606.14704 2026-06-15 astro-ph.EP 新提交

The Edges of Planetary Systems: Falling Off the Kuiper Cliff in a Dissipating Gas Disk

行星系统的边缘:在消散的气体盘中坠入柯伊伯悬崖

Rixin Li, Eugene Chiang

AI总结 通过一维时间演化模型,研究气体盘从内向外消散如何形成具有悬崖状外边缘的星子盘,并重现柯伊伯悬崖和冷经典柯伊伯带表面密度。

Comments Submitted to AAS Journals. Comments welcome

详情
AI中文摘要

太阳星云中最后形成的星子很可能是冷经典柯伊伯带天体(CCKBOs)。由于它们自诞生以来孤立且未发生变化,CCKBOs为星云过程提供了直接见解。它们的数密度在日心半径约47 au处突然下降,这一特征被称为“柯伊伯悬崖”。我们通过全局一维(径向)时间相关模型展示了,由磁力和光蒸发风消散的气态原行星盘如何留下具有悬崖状外边缘的星子盘。气体从内向外消散,形成过渡盘,其内部空腔从小于1 au扩展到大于100 au。空腔边界处的气体呈现压力极大值,尘埃颗粒向该处漂移,触发流不稳定性,将尘埃聚集为足够大以与气体解耦的星子。因此,后退的空腔壁铺就了一个星子盘,当尘埃和气体耗尽时,该盘被截断。无需精细调节,我们展示了通用的从内向外清除的气体盘如何重现柯伊伯悬崖和CCKB表面密度。将这些全局一维结果与已发表的局部三维尘埃和气体模拟联系起来,我们看到了CCKB的许多性质——其径向范围、总质量、单个天体大小和双星统计——如何源于晚期过渡盘中运行的流不稳定性。

英文摘要

Probably the last planetesimals to have formed from dust in the solar nebula are Cold Classical Kuiper belt objects (CCKBOs). To the extent that they are isolated and unchanged since birth, CCKBOs offer direct insights into nebular processes. Their population density drops abruptly beyond a heliocentric radius of $\sim$47 au, a feature known as the "Kuiper Cliff". We show with global, 1D (radial), time-dependent models how gaseous protoplanetary disks that disperse from magnetic and photoevaporative winds leave behind planetesimal disks with Cliff-like outer edges. The gas disperses from the inside out, creating transitional disks whose inner cavities expand from $\lesssim$ 1 au to $\gtrsim$ 100 au. Gas at the cavity boundary presents a pressure maximum toward which dust particles drift, triggering the streaming instability which clumps dust into planetesimals massive enough to decouple from gas. The receding cavity wall thus paves a disk of planetesimals which truncates when dust and gas are spent. With no fine-tuning, we show how a generic gas disk clearing from the inside out reproduces the Kuiper Cliff and the CCKB surface density. Connecting these global 1D results with published local 3D simulations of dust and gas, we see how many properties of the CCKB -- its radial extent, total mass, individual object sizes, and binary statistics -- follow from the streaming instability at work in a late-stage transition disk.