arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
2605.07029 2026-05-11 stat.ML cs.AI cs.LG stat.ME

BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis

BGM-IV:一种基于人工智能的贝叶斯生成建模方法用于工具变量分析

Guyue Luo, Qiao Liu

发表机构 * Department of Biostatistics(生物统计学系)

AI总结 BGM-IV通过构建因果结构的潜在空间,解决高维协变量下的非线性工具变量估计问题,提供了一种有效的方法。

详情
AI中文摘要

工具变量回归能够在内生性下进行因果估计,但现代工具变量问题往往涉及非线性结构效应和高维协变量。现有非线性工具变量方法直接在观测特征空间中学习因果关系或依赖于两阶段或矩基过程中的学习表示,当因果信息嵌入高维表示时会遇到困难。我们提出了BGM-IV,一种潜在的贝叶斯生成建模方法,将非线性工具变量回归重新框架为潜在空间中的后验推断。BGM-IV推断出的潜在组件分别捕捉共享混杂结构、结果特定变化、治疗特定变化以及协变量仅的干扰信息。为解决内生性问题,BGM-IV将混杂结果似然替换为一个整合了工具变量的伪似然,该似然在潜在模型中对仪器诱导的治疗值进行平均。在各种基准数据集上,BGM-IV在经典低维领域保持竞争力,并在高维协变量领域表现最佳。这些结果表明,结构化的潜在生成建模为具有丰富协变量的非线性工具变量估计提供了一种原理上有效的方法。BGM-IV的代码可在https://github.com/liuq-lab/BGM-IV上获得。

英文摘要

Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative modeling approach that reframes nonlinear IV regression as posterior inference in a causally structured latent space. BGM-IV infers latent components that separately capture shared confounding structure, outcome-specific variation, treatment-specific variation, and covariate-only nuisance information. To account for endogeneity, BGM-IV replaces the confounded outcome likelihood with an IV-integrated pseudo-likelihood that averages over instrument-induced treatment values within the latent model. Across various benchmark datasets, BGM-IV remains competitive in the classical low-dimensional regime and performs best in high-dimensional covariate regimes. Together, these results show that structured latent generative modeling provides a principled and effective strategy to nonlinear IV estimation with rich covariates. The code of BGM-IV is available at https://github.com/liuq-lab/BGM-IV.

2605.07026 2026-05-11 q-bio.NC cs.AI cs.LG

Learning Cross-Atlas Consistent Brain Disorder Representations via Disentangled Multi-Atlas Functional Connectivity Learning

通过解耦多脑图谱功能连接学习学习跨脑图谱一致的脑部疾病表示

Minheng Chen, Chao Cao, Jing Zhang, Tianming Liu, Dajiang Zhu

发表机构 * Department of Computer Science and Engineering, University of Texas at Arlington, United States(德克萨斯理工大学计算机科学与工程系,美国) School of Computing, University of Georgia, United States(佐治亚大学计算机学院,美国)

AI总结 本文提出MADCLE框架,通过解耦多脑图谱功能连接学习,实现跨脑图谱一致的脑部疾病表示,实验表明其性能优于单脑图谱和多脑图谱方法。

详情
AI中文摘要

功能连接(FC)来源于静息态fMRI被广泛用于表征神经精神疾病中的大规模脑网络变化。然而,FC构建严重依赖于脑图谱的选择,不同分区可能强调不同的组织特征,导致异质且有时不一致的表示。现有的多图谱方法部分缓解了这一问题,但通常在较浅的层次融合图谱衍生的特征或预测,而单图谱解耦方法并未明确处理跨图谱异质性。我们提出多图谱解耦连接学习(MADCLE),一种多分支表示学习框架,联合编码来自不同脑图谱的FC矩阵。而不是引入一个显式共享的潜在变量,MADCLE学习图谱相关的疾病表示,并通过分布对齐使它们跨图谱一致。同时,利用协变量相似性监督、图谱特定重建和去相关约束,分别建模与协变量相关的和图谱依赖的残差因素,从而减少非疾病和分区依赖信息泄漏到疾病相关嵌入中。在ADNI和ADHD-200数据集上的实验表明,MADCLE在与单图谱基线、多图谱GNN/Transformer模型和最近的多图谱一致性框架相比,实现了竞争或改进的性能。这些结果支持了结构解耦在异质分区方案下用于基于功能连接的疾病识别的潜在价值。

英文摘要

Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different parcellations may emphasize distinct organizational features, leading to heterogeneous and sometimes inconsistent representations. Existing multi-atlas approaches partially alleviate this issue but often fuse atlas-derived features or predictions at a relatively shallow level, while single-atlas disentanglement methods do not explicitly address cross-atlas heterogeneity. We propose Multi-Atlas Disentangled Connectivity LEarning (MADCLE), a multi-branch representation learning framework that jointly encodes FC matrices derived from different brain atlases. Rather than introducing a single explicitly shared latent variable across parcellations, MADCLE learns atlas-wise disease-related representations and encourages them to be cross-atlas consistent through distributional alignment. Meanwhile, covariate-related and atlas-dependent residual factors are modeled separately using covariate similarity supervision, atlas-specific reconstruction, and decorrelation constraints, thereby reducing the leakage of non-disease and parcellation-dependent information into the disease-related embeddings. Experiments on the ADNI and ADHD-200 datasets suggest that MADCLE achieves competitive or improved performance compared with single-atlas baselines, multi-atlas GNN/Transformer models, and recent multi-atlas consistency frameworks. These results support the potential value of structured disentanglement for FC-based disorder identification under heterogeneous parcellation schemes.

2605.06989 2026-05-11 stat.AP cs.AI cs.LG stat.ME

Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data

在心理空间中绘制线条:K均值聚类在模拟和真实心理测量数据中揭示的内容

Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez

发表机构 * Postgraduate Program in Cognitive and Behavioral Neuroscience (PPGNeC) - Federal University of Paraíba (UFPB)(认知与行为神经科学硕士课程 - 巴西南里奥格兰德联邦大学) Center for Health Sciences (CCS) - Federal University of Paraíba (UFPB)(健康科学中心 - 巴西南里奥格兰德联邦大学)

AI总结 本文探讨了K均值聚类在模拟和真实心理测量数据中的局限性,通过对比分析指出其在连续高斯潜在空间中仍能产生稳定且视觉一致的聚类结果。

Comments Methodological study on K-means clustering in psychometric data using simulated and empirical datasets

详情
AI中文摘要

K均值聚类在心理和心理测量研究中广泛用于识别个人资料、子群体和潜在类型,但其经典公式并未检验这些群体是否作为潜在的心理类别存在。相反,K均值将多维空间划分为围绕质心的区域,倾向于由几何距离定义的紧凑、近似球形的聚类。本文通过一系列受控的模拟数据集检验这一限制,然后扩展分析到SMARVUS数据集,一个包含来自35个国家大学学生的大型国际心理测量数据集,以评估在经验心理数据中是否会出现类似的几何划分模式。通过对比模拟和经验数据,本文论证K均值即使在没有真实子群体结构的连续高斯潜在空间中,也能产生稳定且视觉一致的聚类解决方案。

英文摘要

K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.

2605.06988 2026-05-11 cs.MA cs.IT cs.RO math.IT

The Cost of Consensus: Malignant Epistemic Herding and Adaptive Gating in Distributed Multi-Agent Search

共识的成本:恶意认知从众与自适应门控在分布式多智能体搜索中的问题

David Farr, Iain Cruickshank, Kate Starbird, Jevin West

发表机构 * Information School, University of Washington(华盛顿大学信息学院) Computer Science, Carnegie Mellon University(卡内基梅隆大学计算机科学系) Human Centered Design Engineering, University of Washington(华盛顿大学人本设计工程系)

AI总结 研究探讨了在分布式多智能体搜索中,通信频率和内容如何共同影响集体信念状态,提出认知对齐的概念,分析了通信设计对集体推理的影响。

详情
AI中文摘要

现实世界中的分布式智能体经常必须在不确定性和部分观察下协调。协调是共享信念以帮助任务完成的必要条件,但通信会消耗带宽,引入延迟,如果设计不当,会降低集体推理能力。这种紧张关系在带宽受限的部署中尤为突出,如分布式传感网络、自主侦察和协作网络防御,其中过度传输有直接的操作成本。现有研究关注多智能体探索和通信策略,但未探讨通信频率和内容如何共同塑造集体信念状态。核心挑战在于智能体维持对环境的兼容内部信念的程度,我们称之为认知对齐。当智能体有效共享信念时,它们会收敛到正确的假设;当通信设计不佳时,智能体可能自信地收敛到错误的假设。我们正式化这一区别,并展示其无法仅通过协调度量如Jensen-Shannon散度或达成共识的速率来检测。

英文摘要

Distributed agents in real-world settings frequently must coordinate under uncertainty with only partial observations. Coordination is necessary to share beliefs to aid in task completion, but communication costs bandwidth, introduces latency, and if done poorly, can degrade collective reasoning. This tension is especially acute in bandwidth-constrained deployments such as distributed sensing networks, autonomous reconnaissance, and collaborative cyber defense, where excessive transmission carries direct operational costs. Existing work has focused on multi-agent exploration and communication strategies, but not on how communication frequency and content jointly shape the collective belief state. Central to this challenge is the degree to which agents maintain compatible internal beliefs about the environment, a property we term \textit{epistemic alignment}. When agents share beliefs effectively, they converge on correct hypotheses; when communication is poorly designed, agents may converge confidently on wrong ones. We formalize this distinction and show it is not detectable from coordination metrics alone such as Jensen-Shannon Divergence or rate to consensus.

2605.06981 2026-05-11 cs.IR cs.CL

Bridging Textual Profiles and Latent User Embeddings for Personalization

弥合文本特征与潜在用户嵌入之间的鸿沟以实现个性化

Zhaoxuan Tan, Xiang Zhai, Yan Zhu, Meng Jiang, Mohamed Hammad

发表机构 * University of Notre Dame(诺特大学) Google(谷歌)

AI总结 本文提出BLUE框架,通过结合语言基用户特征与嵌入基推荐目标,弥合可解释性文本特征与判别性潜在嵌入之间的差距,实验证明其在零样本序列推荐中优于基线方法。

详情
AI中文摘要

个性化系统依赖用户表示将行为历史与下游推荐应用连接起来。现有方法通常采用监督的潜在用户嵌入,这些方法在检索中有效但难以解释,或者文本用户特征,这些方法可解释但难以优化下游效用,因为缺乏直接监督。为弥合这一差距,我们提出了BLUE,一种强化学习框架,通过将语言基用户特征与嵌入基推荐目标对齐,统一这两种用户表示形式。给定一个用户交互历史,BLUE利用一个profiler大语言模型(LLM)生成文本特征,同时嵌入模型提供奖励信号。这促使生成的文本表示在嵌入空间中更接近正样本并远离负样本。我们进一步引入基于下一项预测的文本空间监督信号,确保学习到的特征在语义上有意义且在下游检索中高度有效。在Amazon Reviews 2023和Google Local Reviews的零样本序列推荐设置中进行的实验表明,BLUE在冻结和可训练嵌入条件下均优于强基线方法。值得注意的是,BLUE在跨域转移中实现了显著优势,突显了学习到的用户特征的强大泛化能力。此外,这些生成的特征相比原始用户历史或替代特征优化方法,为问答提供了更优的个性化上下文。整体而言,这些结果表明BLUE为统一可解释的文本特征与判别性潜在嵌入以实现个性化提供了一种有效的方法。

英文摘要

Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpret, or textual user profiles, which are interpretable but challenging to optimize for downstream utility due to lack of direct supervision. To bridge this gap, we present BLUE, a reinforcement learning framework that unifies these two forms of user representation by aligning language-based user profiles with embedding-based recommendation objectives. Given a user interaction history, BLUE leverages a profiler Large Language Model (LLM) to generate textual profiles, while an embedding model provides reward signals. This encourages the resulting textual representations to move closer to positive items and farther from negative ones in the embedding space. We further introduce a text-space supervision signal based on next-item prediction, ensuring the learned profiles remain both semantically meaningful and highly effective for downstream retrieval. Experiments on Amazon Reviews 2023 and Google Local Reviews in zero-shot sequential recommendation settings demonstrate that BLUE consistently outperforms strong baselines under both frozen and trainable embedding conditions. Notably, BLUE achieves clear gains in cross-domain transfer, highlighting the strong generalization ability of the learned user profiles. Furthermore, these generated profiles provide superior personalized context for question answering compared to raw user histories or alternative profile optimization methods. Overall, these results show that BLUE provides an effective way to unify interpretable textual profiling with discriminative latent embeddings for personalization.

2605.06976 2026-05-11 stat.ML cs.LG stat.CO

A Differentiable Bayesian Relaxation for Latent Partial-Order Inference

一种可微的贝叶斯松弛用于潜在偏序推断

Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo

发表机构 * University of Oxford(牛津大学) University College London(伦敦大学学院)

AI总结 本文提出一种可微贝叶斯松弛方法,用于从具有偏序结构的数据中推断潜在偏序,通过平滑替代硬约束模型,实现连续后验并支持梯度基MCMC和变分推断。

详情
AI中文摘要

许多排序和智能体轨迹数据以线性顺序记录,但其潜在结构仅为偏序。本文引入一种可微松弛方法,从此类轨迹中推断潜在偏序。从具有噪声线性扩展的硬前沿约束模型出发,将不连续的产品序优先级和二元前沿可行性替换为平滑替代项,得到保持闭包级偏序语义的连续后验,并支持基于梯度的MCMC和变分推断。证明了软传递性、尖锐极限前沿恢复和收敛到硬似然。在合成数据、社会支配关系记录和云智能体轨迹上的实验显示,后验在小实例上接近硬MCMC,且在大问题上提升了运行时间-准确性权衡。

英文摘要

Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We introduce a differentiable relaxation for latent partial-order inference from such traces. Starting from a hard frontier-constrained model of noisy linear extensions, we replace discontinuous product-order precedence and binary frontier feasibility with smooth surrogates, yielding a continuous posterior that preserves closure-level partial-order semantics and supports gradient-based MCMC and variational inference. We prove soft transitivity, sharp-limit frontier recovery, and convergence to the hard likelihood. Experiments on synthetic data, records of social dominance relations, and cloud-agent traces show close posterior fidelity to hard MCMC on small instances and improved runtime--accuracy trade-offs on larger problems.

2605.06971 2026-05-11 eess.SP cs.AI cs.SY eess.SY

Decentralized Time-Varying Optimization for Streaming Data via Temporal Weighting

通过时间加权实现流数据的去中心化时间变化优化

Muhammad Faraz Ul Abrar, Nicolò Michelusi, Erik G. Larsson

发表机构 * School of Electrical, Computer and Energy Engineering, Arizona State University(亚利桑那州立大学电气、计算机与能源工程学院) Department of Electrical Engineering (ISY), Linköping University(林奈大学电气工程系(ISY))

AI总结 本文研究了在分布式网络中利用流数据进行去中心化梯度下降优化,通过时间加权策略分析强凸和光滑损失函数的跟踪误差,揭示了固定点跟踪误差和数据异质性偏置的分解。

详情
AI中文摘要

经典优化理论主要关注固定目标函数,而许多现代学习系统在动态环境中运行,数据按顺序到达且决策需持续更新。本文研究了分布式网络中流数据的优化问题,采用结构化的加权方法,明确捕捉流数据源的时间变化目标:每个时间步,每个代理接收新样本,网络试图跟踪由所有观测样本形成的加权目标的最小值。我们专注于具有有限通信和计算预算的去中心化梯度下降(DGD),每个时间步只能执行有限次DGD迭代。对于强凸和光滑损失函数,通过固定点理论分析跟踪误差,发现跟踪误差分解为固定点跟踪项和由代理间数据异质性引起的偏置项。我们专门分析了两种自然加权策略:均匀权重和指数衰减权重。在均匀加权下,DGD以O(1/t)速率跟踪固定点,而折扣加权导致由折扣因子控制的非消失固定点跟踪地板。在两种情况下,去中心化在恒定步长下引入额外的非零偏置地板。通过数值模拟验证了理论发现。

英文摘要

Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with streaming data over a distributed network of agents. We adopt a structured, weight-based formulation that explicitly captures the streaming-data origin of the time-varying objective: at each time step, every agent receives a new sample, and the network seeks to track the minimizer of a temporally weighted objective formed from all samples observed across the network so far. We focus on decentralized gradient descent (DGD) with a limited communication/computation budget, where at each time step, only a limited number of DGD iterations can be performed before the objective changes again. For strongly convex and smooth losses, we analyze the tracking error with respect to the time-varying minimizer through a fixed-point theory lens. Our analysis reveals that the tracking error decomposes into a fixed-point tracking term and a bias term induced by data heterogeneity across agents. We specialize the analysis to two natural weighting strategies: uniform weights, which treat all samples equally, and exponentially discounted weights, which geometrically decay the influence of older data. Under uniform weighting, DGD tracks the fixed-point at a rate $\mathcal{O}(1/t)$, whereas discounted weighting yields a non-vanishing fixed-point tracking floor controlled by the discount factor. In both cases, decentralization induces an additional non-zero bias floor under a constant step size. We validate our theoretical findings through numerical simulations.

2605.06965 2026-05-11 cs.CY cs.AI cs.HC

AI and Consciousness: Shifting Focus Towards Tractable Questions

人工智能与意识:转向可解决的问题

Iulia-Maria Comsa

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 本文探讨人工智能意识的可研究性,指出直接研究AI是否具有主观体验的问题目前难以解决,而公众对AI意识的认知影响社会规范,需加强对其感知原因和影响的研究。

详情
AI中文摘要

随着基于语言的人工智能系统日益拟人化,它们是否具有主观体验的问题变得愈发紧迫。本文聚焦于AI意识研究问题的可解决性。我论证当前直接研究AI是否具有意识的问题由于缺乏普遍接受的意识科学理论以及哲学心身问题的历史开放性而难以解决。相反,围绕感知AI意识的相邻主题问题则具有可解决性、时效性和对社会的深远影响。公众日益接受AI系统可能具有意识,并经常用人类认知和主观体验的词汇来描述它们。这种现象已推动社会在用户体验、伦理标准和语言规范上的转变。因此,我提出应加强研究感知AI意识的原因和影响,这些最终会影响我们如何看待人类主观体验相对于人工实体。为此,我映射了当前AI意识感知的现状,并讨论其关键潜在驱动因素和社会后果。最后,我呼吁开发者、决策者和更广泛的科学界承诺在AI意识问题上进行清晰和准确的沟通,明确承认其固有的不确定性。

英文摘要

As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamental problem of whether AI systems can be conscious is currently intractable in its direct form, given the absence of a universally accepted scientific theory of consciousness, as well as the historical open-endedness of the philosophical mind-body problem. In contrast, questions around the adjacent subject of perceived AI consciousness are tractable, timely, and highly consequential for society. The general public is increasingly open to the possibility of consciousness in AI systems and routinely adopts the vocabulary of human cognition and subjective experience to describe them. This phenomenon is already driving societal shifts across user experience, ethical standards, and linguistic norms. I therefore propose an increased research focus on uncovering the causes and effects of perceived AI consciousness, which ultimately shape how we see our own human subjective experience relative to artificial entities. To support this, I map the current landscape of AI consciousness perception and discuss its key potential drivers and societal consequences. Finally, I urge developers, decision-makers, and the broader scientific community to commit to clear and accurate communication regarding the topic of AI consciousness, explicitly acknowledging its inherent uncertainties.

2605.06963 2026-05-11 cs.HC cs.AI cs.CL cs.IR

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

从表层学习到深度理解:面向Moodle的 grounded AI 教学系统

Anna Ostrowska, Michał Kukla, Gabriela Majstrak, Jan Opala, Sebastian Pergała, Jan Skwarek, Anna Wróblewska

发表机构 * Faculty of Mathematics and Information Sciences, Warsaw University of Technology(数学与信息科学学院,华沙技术大学)

AI总结 本文提出一种基于Moodle的AI教学助手,利用检索增强生成技术提供高质量教育,通过双核心设计实现学生互动式辅导和教师监督内容生成,有效减少信息误导并促进深度理解。

Comments 5 pages, accepted as demo paper at IJCAI 2026

详情
AI中文摘要

本文演示论文描述了AI教学与学习助手的开发,这是一种模块化的Moodle插件,利用检索增强生成(RAG)技术提供高质量且无幻觉的教育内容。系统采用双核心设计,为学生提供基于苏格拉底式互动的辅导,为教师提供带有监督内容生成的'人机协同'工作空间。通过将大语言模型(LLM)的响应基于教师提供的材料进行支撑,该助手解决了信息误导的风险,同时鼓励深度概念掌握。通过Ragas(LLM-as-a-Judge)框架和初步用户研究评估,证实了其有效性,达到高达0.97的忠实度评分和4.00/5.00的推荐率。

英文摘要

This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.

2605.06959 2026-05-11 stat.ML cs.LG math.ST stat.TH

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

高维空间中通过最大仿射函数差实现的局部近优分段线性回归

Haitham Kanj, Kiryung Lee

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 本文提出了一种参数化解决方案,通过自适应块梯度下降算法解决分段线性回归问题,利用最大仿射函数差参数化分段线性函数,并在子高斯协变量和噪声分布下提供非渐近局部收敛性分析,证明了在合适初始化下算法能以线性速度收敛到ε精度估计。

详情
AI中文摘要

本文提出了一种参数化解决方案,通过自适应块梯度下降算法(ABGD)解决分段线性回归问题。该方法的核心是将分段线性函数参数化为最大仿射函数(DoMA)之差。在子高斯协变量和噪声分布下,提供了ABGD的非渐近局部收敛性分析。为了初始化ABGD,我们采用了一种最初为更简单的最大仿射函数设置开发的先验算法。当合适初始化时,ABGD收敛到ε精度估计所需的观测数量为~O(d max(σ_z/ε,1)^2),其中σ_z^2表示噪声方差。这表明在无噪声情况下,可以通过~O(d)样本实现精确恢复。此外,这种收敛率在考虑对数因子的情况下是最小最大最优的。合成数值结果验证了ABGD的理论保证。我们还观察到在真实世界数据集上,其性能优于现有最先进的方法。

英文摘要

This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions. A non-asymptotic local convergence analysis for ABGD is provided under sub-Gaussian covariate and noise distributions. To initialize ABGD, we adapt a prior algorithm originally developed for the simpler setting of max-affine functions. When suitably initialized, ABGD converges linearly to an $ε$-accurate estimate given $\tilde{\mathcal{O}}(d\max(σ_z/ε,1)^2)$ observations where $σ_z^2$ denotes the noise variance. This implies exact recovery given $\tilde{\mathcal{O}}(d)$ samples in the noiseless case. Also, such a rate is shown to be minimax optimal up to logarithmic factors. Synthetic numerical results corroborate the theoretical guarantees for ABGD. We also observe competitive performance compared to the state-of-the-art methods on real-world datasets.

2605.06929 2026-05-11 physics.optics cs.LG

Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices

基于物理的流匹配用于硅光子器件的全场预测

Joseph Quaratiello, Anthony Rizzo

发表机构 * Thayer School of Engineering, Dartmouth College(达特茅斯学院泰勒工程学院)

AI总结 本文提出PIC-Flow,一种生成神经替代方案,通过几何和波长预测光子器件的电磁场分布,替代昂贵的FDTD模拟,结合流匹配、U-Net和物理约束训练,提升计算效率。

Comments 11 pages, 4 figures

详情
AI中文摘要

设计光子集成电路需要准确的电磁场模拟,即使对于简单器件几何也计算成本高。我们提出PIC-Flow,一种生成神经替代方案,通过器件几何和工作波长预测光子器件的电磁场分布,替代昂贵的有限差分时域(FDTD)模拟。我们的方法结合三个关键思想:(i) 条件流匹配作为生成框架,学习将高斯噪声传输到物理有效的场解;(ii) 一个实值U-Net在分离的实部和虚部场通道上操作;(iii) 通过Helmholtz残差损失进行物理约束训练,强制∇²E_z +k₀²εE_z=0。我们引入了一个界面感知的掩码方案,排除在有限差分Stencil误差主导的介电边界像素,得到一个物理有意义的合规度量。数据集由22,500个地面真实FDTD模拟组成,平均分配在多模干涉仪、Y分支和方向耦合器中,在λ=1.55μm处,80/10/10的比例分配给训练、验证和测试集。我们评估了网络的消融测试,并展示了模型对S弯、 taper和级联Y分支等未出现的器件类的泛化能力。而不是FDTD的直接替代,这项工作建立了一个基础,随着更广泛的数据覆盖、更多的计算和进一步的训练优化,可以扩展到宽带、设备无关的场预测,大幅提高复杂光子器件和电路的快速设计空间探索的运行时间。

英文摘要

Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distributions for photonic devices given their geometry and operating wavelength as an alternative to costly finite-difference time-domain (FDTD) simulations. Our approach combines three key ideas: (i) conditional flow matching as the generative framework, learning a velocity field that transports Gaussian noise to physically valid field solutions; (ii) a real-valued U-Net operating on split real and imaginary field channels; and (iii) physics-constrained training through a Helmholtz residual loss enforcing $\nabla^2 E_z + k_0^2 \varepsilon E_z = 0$. We introduce an interface-aware masking scheme for the Helmholtz residual that excludes dielectric boundary pixels where finite-difference stencil errors dominate, yielding a physically meaningful compliance metric. The data set consists of 22,500 ground-truth FDTD simulations split evenly between multimode interferometers, Y-branches, and directional couplers at $λ=1.55\,μ$m in an 80/10/10 split between training, validation, and test sets. We evaluate ablations on the network against the held out test devices and also show that the model generalizes to held out device classes such as S-bends, tapers, and cascaded Y-branches. Rather than a drop-in replacement for FDTD, this work establishes a foundation that, with broader data coverage, more compute, and further training optimization, could scale toward broadband, device-agnostic field prediction with dramatically improved runtime for rapid design-space exploration of complex photonic devices and circuits.

2605.06920 2026-05-11 cs.GT cs.AI cs.LG

In-Context Credit Assignment via the Core

通过核心进行上下文内信用分配

Keegan Harris, Siddharth Prasad, Asher Trockman

发表机构 * UC Berkeley(伯克利大学) Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) Google Research(谷歌研究)

AI总结 本文提出了一种激励对齐的上下文内信用分配机制,基于合作博弈理论中的核心解概念,通过算法近似核心解来稳定分配价值,实验显示其在网页检索任务中效率显著高于其他方法。

详情
AI中文摘要

我们提出了一种激励对齐的上下文内信用分配机制:在AI生成内容(如代码、新闻文章、短视频)的创作者之间分配信用的任务,其中创作者的知识产权出现在上下文窗口中。我们的方法基于合作博弈理论中的最小核心解概念,通过确保没有子集的创作者相对于他们自己能生成的价值显著被低估来分配价值。我们开发了近似最小核心的算法,这些算法利用了新的约束播种和约束分离技术。在网页检索信用分配任务中,我们发现我们的方法能够使用比其他方法少几个数量级的LLM调用来近似核心解。

英文摘要

We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach is based on the least core solution concept from cooperative game theory, which distributes value in a way that is as stable as possible by ensuring that no subset of creators is significantly under-compensated relative to the value they could generate on their own. We develop algorithms for approximating the least core, which leverage novel routines for constraint seeding and constraint separation. On a web retrieval credit assignment task, we find that our approaches are capable of approximating the least core using orders of magnitude fewer LLM calls compared to alternative methods.

2605.06918 2026-05-11 cs.MA cs.LG

Generalising Travel Time Prediction To Varying Route Choices In Urban Networks

在城市网络中推广旅行时间预测以适应变化的路线选择

Łukasz Gorczyca, Kacper Drozd, Michał Bujak, Rafał Kucharski

发表机构 * Jagiellonian University(雅盖隆大学)

AI总结 本文提出GenTTP模型,通过区分路线选择实现准确的流量和旅行时间预测,解决现有模型无法适应不同路线分配导致网络结果差异的问题。

详情
AI中文摘要

先前基于图神经网络的系统级旅行时间预测方法主要局限于典型和重复的需求模式。尽管它们能成功预测日常通勤后的拥堵情况,但本质上仅近似单一需求实现,无法捕捉变化的路线选择。本文提出通用旅行时间预测器(GenTTP),能够区分路线选择并提供准确的流量和旅行时间预测。我们的框架学习揭示复杂的时空交通模式和微观层面的路线选择与旅行时间之间的关系。这填补了关键空白:缺乏能够跨不同路线分配泛化旅行时间预测模型,其中相同需求可能因旅行者在可用路径上的分布方式产生显著不同的网络结果。

英文摘要

Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approximate a single demand realisation and fail to capture varying route choices. In this work, we propose a Generalised Travel Time Predictor (GenTTP) that successfully differentiates route choices and offers accurate flow and travel time predictions. Our framework learns to uncover complex spatiotemporal traffic patterns and microscopic relationships between route choices and the resulting travel times. This addresses a critical gap: the lack of travel time prediction models that generalise across varying route assignments, where the same demand can produce substantially different network-wide outcomes depending on how travellers are distributed over available paths.

2605.06914 2026-05-11 cs.DC cs.AI cs.CL

Regulating Branch Parallelism in LLM Serving

在LLM服务中调节分支并行度

Swapnil Gandhi, Siva Hari, William J. Dally, Christos Kozyrakis

发表机构 * Stanford University(斯坦福大学) NVIDIA(英伟达)

AI总结 本文提出TAPER,一种按步骤的准入控制器,通过预测分支外部性来调节分支并行度,提升吞吐量并保持服务级别目标。

详情
AI中文摘要

最近的方法在LLM输出中暴露了请求内并行性,允许独立分支并发解码。现有服务系统执行这些分支时要么激进执行,要么受固定限制。我们证明这两种方法都存在缺陷:激进准入会放大共享解码步骤,降低串行阶段的批量请求性能,而保守的固定限制则放弃了最初暴露分支的初衷。我们称由准入分支导致的额外步骤延迟为分支外部性,并证明安全宽度取决于批量组成、上下文长度和累积空闲资源,这些在工作负载跟踪中持续变化。我们引入TAPER,一种按步骤的准入控制器,将额外分支视为机会性工作,在预测的分支外部性符合批量当前空闲预算时才准入。按步骤调节是可行的,因为分支级调度将计算与内存解耦:分支共享请求的前缀KV,因此扩展或收缩宽度不需要内存回收。在Qwen3-32B上,TAPER相比IRP-Off提升了1.77倍的吞吐量,相比IRP-Eager提升了1.48倍,同时保持超过95%的服务级别目标达成。

英文摘要

Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the shared decode step, degrading co-batched requests in serial stages, while conservative fixed caps forgo the throughput that motivated exposing branches in the first place. We call the excess step latency caused by admitted branches the branch externality and show that the safe width depends on batch composition, context lengths, and accumulated slack, all of which change continuously over a workload trace. We introduce TAPER, a per-step admission controller that treats extra branches as opportunistic work, admitted only when the predicted branch externality fits within the batch's current slack budget. Per-step regulation is practical because branch-level scheduling decouples compute from memory: branches share the request's prefix KV, so expanding or contracting width requires no memory reclamation. On Qwen3-32B, TAPER improves goodput by $1.77\times$ over IRP-Off and by $1.48\times$ over IRP-Eager, while maintaining over $95\%$ SLO attainment.

2605.06913 2026-05-11 astro-ph.EP astro-ph.IM cs.LG

You Only Stack Once (YOSO): A Motion-Filtered, Deep-Learning Framework for Detecting Faint Moving Sources

你只堆叠一次(YOSO):一种运动过滤的深度学习框架,用于检测微弱移动源

Nitya Pandey, César Fuentes, Pedro Bernardinelli, Valeria Frías, Colin Orion Chandler, David E. Trilling, Matthew J. Holman, Steven Stetzler, Dallin Spencer, Hsing Wen Lin, Luis E. Salazar Manzano, Darin Ragozzine, Ryder Strauss, Mario Jurić, Andrew J. Connolly, Hayden Smotherman, Scott S. Sheppard, Kevin Napier

发表机构 * Dept. of Astronomy \& the DiRAC Institute, University of Washington, Seattle, USA Facultad de Ciencias Físicas y Matemáticas (FCFM), University of Chile, Beauchef 850, 851, Santiago, Chile LSST Interdisciplinary Network for Collaboration Department of Astronomy Planetary Science, Northern Arizona University, Flagstaff, USA Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, MS 51, Cambridge, MA 02138, USA Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Dr., Pasadena, CA 91109 USA Brigham Young University, Department of Physics Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA Michigan Institute for Data AI in Society, University of Michigan, Ann Arbor, MI 48109, USA Department of Astronomy, University of Michigan, Ann Arbor, MI 48109, USA eScience Institute, Department of Astronomy, University of Washington, Seattle, WA 98195-1580, USA Planets Laboratory, Carnegie Institution for Science, Washington, DC 20015

AI总结 YOSO通过运动过滤技术检测宽视场天文调查中的微弱慢速太阳系天体,其核心方法是Gaussian Motion Filter,能有效提升信噪比,发现45个已知天体和11个新冥王星特异天体,适用于大规模调查及行星成像等领域。

Comments Accepted to The Astronomical Journal; 13 pages, 9 figures

详情
AI中文摘要

我们介绍了You Only Stack Once(YOSO),一种自动流程,用于检测宽视场天文调查中的微弱、缓慢移动的太阳系天体。该流程整合了新颖的高斯运动过滤器(GMoF),在像素层面运作,以增强具有不同视表面运动速率的物体的信噪比。与传统位移和堆叠方法不同,GMoF在增强轨迹的同时抑制随机噪声和静态背景特征。应用于暗能量望远镜的子集观测中,YOSO恢复了73个先前检测到的物体中的45个,以及11个新冥王星特异天体。它还发现了216个近太阳系天体。尽管替代的位移和堆叠方法对约0.88个星等更暗的物体敏感,但YOSO的假阳性率极低,因为它只检测出具有轨迹且在正确速率位移时与点源一致的源。我们展示了该方法如何在大规模调查如LSST上部署,并适应于其他需要基于运动的信号增强的领域,包括通过角差成像(ADI)进行系外行星成像,以及为NEO探测任务如NEO探测器的近地天体探测。YOSO因此提供了一种灵活、可扩展的方法,用于在数据密集型天文学时代提取微弱的运动依赖信号。

英文摘要

We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhance signal-to-noise for objects exhibiting a range of apparent rates of motion. Unlike conventional shift-and-stack methods, which rely on discrete velocity trials, GMoF amplifies trails while suppressing random noise and static background features. Applied to a subset of DEEP observations from the Dark Energy Camera, YOSO recovered 45 out of 73 previously detected objects, as well as 11 new TNOs. It also discovered 216 objects in the near Solar System. Although alternative shift-and-stack methods are sensitive to objects about 0.88 magnitudes fainter, YOSO's false positive rate is extremely low, since it detects only sources that exhibit a trail and are consistent with a point source when shifted at the right rate. We show how this method can be deployed on large surveys like LSST, and adapted for other domains that require motion-based signal enhancement, including exoplanet imaging through Angular Differential Imaging (ADI), and near-Earth object (NEO) detection for missions like NEO Surveyor. YOSO thus provides a versatile, scalable approach for extracting faint, motion-dependent signals in the era of data-intensive astronomy.

2605.06900 2026-05-11 cs.DS cs.LG

Accelerated Relax-and-Round for Concave Coverage Problems

加速的放松与舍入算法用于凹覆盖问题

Matthew Fahrbach, Mehraneh Liaee, Morteza Zadimoghaddam

发表机构 * Google(谷歌)

AI总结 本文提出了一种加速的放松与舍入算法,用于解决凹覆盖问题,改进了传统最大覆盖问题。通过使用投影加速梯度法和特定舍入方案,实现了更高效的运行时间,并证明了新的奖励函数的逼近比。

Comments 47 pages, 6 figures

详情
AI中文摘要

我们提出了一种加速的放松与舍入算法用于凹覆盖问题,这些问题推广了经典的最大覆盖问题。基于Barman等人[STACS 2021]的放松与舍入框架,我们提出了两个重要改进。首先,我们将线性规划(LP)松弛步骤替换为应用于平滑替代目标的投影加速梯度方法,以实现~O(mnε^{-1})的运行时间。其次,我们使用超简单形复形的专用舍入方案,结合Karalias等人[NeurIPS 2025]的Carathéodory分解算法和Chekuri等人[FOCS 2010]的随机交换舍入。我们证明了新奖励函数的紧逼近比,包括对数奖励φ(x)=log(1+x)的0.827逼近比。最后,我们在合成和现实世界图上进行最大多覆盖实验,证明我们的算法在使用最新LP求解器的方法上表现更优。

英文摘要

We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, we replace the linear programming (LP) relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a $\widetilde{O}(mn \varepsilon^{-1})$ running time. Second, we use a specialized rounding scheme for the hypersimplex that combines the Carathéodory decomposition algorithm in Karalias et al. [NeurIPS 2025] with randomized swap rounding of Chekuri et al. [FOCS 2010]. We prove tight approximation ratios for new reward functions, including a $0.827$-approximation for the logarithmic reward $φ(x) = \log(1 + x)$. Finally, we conduct maximum multi-coverage experiments on synthetic and real-world graphs, demonstrating that our algorithm outperforms approaches that use state-of-the-art LP solvers.

2605.06894 2026-05-11 cs.CR cs.LG

McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware

McNdroid:用于Android恶意软件鲁棒漂移检测的纵向多模态基准

Md Mahmuduzzaman Kamol, Jesus Lopez, Saeefa Rubaiyet Nowmi, Emilia Rivas, Md Ahsanul Haque, Edward Raff, Aritran Piplai, Mohammad Saidur Rahman

发表机构 * Department of Computer Science, University of Texas at El Paso(德克萨斯大学埃尔帕索分校计算机科学系) CrowdStrike

AI总结 McNdroid是首个大规模纵向多模态Android恶意软件基准,通过静态、动态和图谱特征分析,评估不同训练-测试时间间隔下的ML和深度学习检测器性能,揭示多模态融合在长期时间间隔中的优势。

Comments 28 pages, 14 figures, 14 tables

详情
AI中文摘要

机器学习(ML)在现实系统中必须应对概念漂移、对抗性行为以及具有不同成本和效益的潜在特征。恶意软件自然表现出这些复杂性,但因此难以整理和组织数据来研究这些因素。我们提出了McNdroid,据我们所知,这是最大的纵向多模态Android恶意软件基准,用于恶意软件检测和漂移分析。McNdroid涵盖2013至2025年(排除2015年),每个应用代表三种对齐的模态——静态特征来自清单和smali代码,动态行为特征来自沙盒执行,以及图谱特征来自函数调用图。通过时间分离的拆分,我们评估了标准ML和深度学习检测器在增加的训练-测试时间间隔下的表现。结果表明存在明显的时序退化,而多模态融合在长期时间间隔中优于最佳单模态。跨模态一致性也随时间下降,表明漂移影响了个体特征空间和模态之间的一致性。我们进一步分析了模态特定的漂移、恶意软件家族演变和模型解释的时间变化。我们公开发布McNdroid、基准拆分和代码,以支持在安全关键、非平稳设置中进行可重复研究的时间泛化和鲁棒多模态学习。

英文摘要

Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challenging to curate and organize data to study these factors. We present McNdroid, to our knowledge the largest longitudinal multimodal Android malware benchmark for malware detection and drift analysis. McNdroid spans 2013--2025, excluding 2015, and represents each application with three aligned modalities--static features from manifests and smali code, dynamic behavioral features from sandbox execution, and graph-based features from function-call graphs. Using temporally separated splits, we evaluate standard ML and deep-learning detectors across increasing train--test time gaps. Results show clear temporal degradation, while multimodal fusion outperforms the best single modality across long-term temporal gaps. Cross-modal agreement also declines over time, suggesting that drift affects both individual feature spaces and the consistency among modalities. We further analyze modality-specific drift, malware-family evolution, and temporal changes in model explanations. We publicly release McNdroid, benchmark splits, and code to support reproducible research on temporal generalization and robust multimodal learning in security-critical, non-stationary settings.

2605.06884 2026-05-11 math.OC cs.LG

Muon with Nesterov Momentum: Heavy-Tailed Noise and (Randomized) Inexact Polar Decomposition

Muon与Nesterov动量:重尾噪声与(随机化)近似极分解

Sayantan Choudhury, Xiaoran Cheng, Martin Takáč, Sen Na, Mladen Kolar

发表机构 * MBZUAI Penn State University(宾夕法尼亚州立大学) Georgia Institute of Technology(佐治亚理工学院) University of Southern California(南加州大学)

AI总结 本文提出Muon算法结合Nesterov动量和近似极分解,针对非凸矩阵优化中的重尾噪声问题,建立了收敛理论并提供了高效随机低秩极分解方法。

Comments 33 pages, 4 figures, 1 table

详情
AI中文摘要

大多数一阶优化器将矩阵参数视为向量,忽略了神经网络隐藏层权重的内在几何结构。Muon通过沿动量矩阵的极因子更新来解决这一不匹配问题,但其理论理解滞后于实践。本文开发了Muon结合Nesterov动量和近似极分解的收敛理论,在非凸矩阵优化中处理重尾噪声。分析基于统一的近似极分解框架,捕捉了如牛顿-施鲁茨等实用迭代近似方法,并量化了其误差在优化动态中的传播。在该框架下,我们建立了寻找ε-静态点的最优迭代和样本复杂度为O(ε^(-(3α-2)/(α-1))),其中α∈(1,2]表示重尾指数。对于近似极分解设置中σ₁=0的情况,我们还提供了无需先验知识α的保证。我们分析了一种随机低秩极分解,其效率显著高于全空间方法,同时保持与理论的兼容性。数值实验进一步展示了所提近似和随机变体的有效性。

英文摘要

Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical understanding has lagged behind practice. In particular, practical implementations incorporate Nesterov momentum, compute the polar factor only approximately, and operate with stochastic gradients that may be heavy-tailed. We close this gap by developing a convergence theory for Muon with Nesterov momentum and inexact polar decomposition in non-convex matrix optimization under heavy-tailed noise. Our analysis builds on a unified framework for inexact polar decomposition that captures practical iterative approximations such as Newton-Schulz and quantifies how their errors propagate through the optimization dynamics. Under this framework, we establish an optimal iteration and sample complexity of $O \left(\varepsilon^{\frac{-(3α-2)}{(α-1)}} \right)$ for finding an $\varepsilon$-stationary point, where $α\in(1,2]$ denotes the heavy-tail index. For the inexact-polar setting with $σ_1=0$, we also provide guarantees that do not require prior knowledge of $α$. We analyze a randomized low-rank polar decomposition that is substantially more efficient than full-space methods while remaining compatible with our theory. Numerical experiments further demonstrate the effectiveness of the proposed inexact and randomized variants.

2605.06883 2026-05-11 stat.ML cs.LG

Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests

核选择是模型选择:一种统一的复杂性惩罚方法用于MMD两样本检验

Yijin Ni, Xiaoming Huo

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering(H.米尔顿·斯图尔特工业与系统工程学院) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出CP-MMD方法,通过将核选择建模为模型选择问题,统一处理MMD两样本检验中的复杂性惩罚,实现无网格的连续参数类最大化,提升检验效能并保证无条件I型错误率。

详情
AI中文摘要

最大均值差异(MMD)是非参数两样本检验的核心统计量,但其检验效能完全取决于所选核。由于任何固定核无法区分某些分布,核必须动态优化。然而,数据驱动的优化违反了基础的i.i.d.假设,迫使现有框架做出严格权衡。比率标准忽略这种依赖性,导致在丰富核类上过度拟合和方差崩溃。相反,聚合方法通过有限网格绕过依赖性,但此策略无法扩展到连续搜索空间如深度核。为打破这种二元对立,我们建立数据驱动的核选择作为模型选择问题。我们提出复杂性惩罚MMD(CP-MMD),通过将先前工作的两样本均匀集中不等式应用于优化后的MMD问题,得到一个准则。所得到的惩罚界定了经验MMD与核搜索空间的复杂性,数学上吸收了优化的成本,使得CP-MMD能够直接、无网格地最大化连续参数类,包括标量带宽、多项式特征带宽和深度网络参数。通过正式考虑优化复杂性,我们证明CP-MMD在最大化真实检验效能的同时确保无条件I型错误率。因此,CP-MMD能够在线性、多项式特征和深度领域实现无网格的核选择,达到或超过现有最先进检验效能。

英文摘要

The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be dynamically optimized. However, data-driven optimization violates the foundational i.i.d. assumption, forcing a strict trade-off in existing frameworks. Ratio criteria ignore this dependence, inducing overfitting and variance collapse on rich kernel classes. Conversely, aggregation methods bypass the dependence using finite grids, but this strategy cannot scale to continuous search spaces like deep kernels. To break this dichotomy, we establish data-driven kernel selection as a model selection problem. We propose Complexity-Penalized MMD (CP-MMD), a criterion derived by applying the two-sample uniform concentration inequality of preceding works to the post-optimization MMD problem. The resulting penalty bounds the empirical MMD by the complexity of the kernel search space, mathematically absorbing the cost of optimization, so that CP-MMD enables direct, grid-free maximization over continuous parametric classes, including scalar bandwidths, polynomial feature bandwidths, and deep network parameters. By formally accounting for optimization complexity, we prove that CP-MMD maximizes true test power while ensuring unconditional Type-I validity. Consequently, CP-MMD enables grid-free kernel selection across linear, polynomial-feature, and deep regimes, matching or exceeding state-of-the-art test power.

2605.06878 2026-05-11 cs.AR cs.CC cs.RO eess.IV

CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning

CARMEN:基于CORDIC的资源高效多精度深度学习推理引擎

Sonu Kumar, Mukul Lokhande, Santosh Kumar Vishvakarma, Adam Teman

发表机构 * EnICS Labs(EnICS实验室) Bar-Ilan University(巴伊兰大学)

AI总结 CARMEN通过CORDIC迭代深度动态切换近似与精确执行模式,实现低资源多精度深度学习推理,提升计算效率和能效。

Comments Under Review (VDAT 2026)

详情
AI中文摘要

本文提出了CARMEN,一种基于CORDIC的多精度向量引擎,用于高效深度学习推理。关键思想是CORDIC迭代深度直接控制计算精度,使动态切换近似与精确执行模式成为可能,而无需硬件修改。架构集成了低资源迭代基于CORDIC的MAC单元和时间复用的多激活函数块,支持灵活的8/16位精度和高硬件利用率。在28 nm CMOS上实现的ASIC在每个MAC阶段实现了33%的计算周期减少和21%的功耗节省;256个PE配置实现了4.83 TOPS/mm²的计算密度和11.67 TOPS/W的能量效率。在PynqZ2上的FPGA部署验证了在0.43 W下实时目标检测的154.6 ms延迟。

英文摘要

This paper presents CARMEN, a runtime-adaptive, CORDIC-accelerated multi-precision vector engine for resource-efficient deep learning inference. The key insight is that CORDIC iteration depth directly governs computational accuracy, enabling dynamic switching between approximate and accurate execution modes without hardware modification. The architecture integrates a low-resource iterative CORDIC-based MAC unit with a time-multiplexed multi-activation function block, supporting flexible 8/16-bit precision and high hardware utilization. ASIC implementation in 28 nm CMOS achieves up to 33% reduction in computation cycles and 21% power savings per MAC stage; a 256-PE configuration delivers 4.83 TOPS/mm2 compute density and 11.67 TOPS/W energy efficiency. FPGA deployment on PynqZ2 validates 154.6 ms latency at 0.43 W for real-time object detection.

2605.06875 2026-05-11 cs.AR cs.AI cs.CV cs.NA eess.IV math.NA

EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration

EULER-ADAS:一种能量高效且支持SIMD的对数正态数引擎,用于可调节精度的近似ADAS加速

Mukul Lokhande, Ratko Pilipovic, Omkar Kokane, Adam Teman, Santosh Kumar Vishvakarma

发表机构 * NSDCS Research Group, Dept. of Electrical Engineering, Indian Institute of Technology Indore(NSDCS研究组,电气工程系,印度理工学院印度尔)

AI总结 本文提出EULER-ADAS,一种支持SIMD的对数正态数神经计算引擎,用于在低功耗和高可靠性条件下加速ADAS应用。通过结合有限制正态数表示、阶段自适应对数尾数乘法和SIMD共享quire累积路径,实现多种精度的高效运算。

详情
AI中文摘要

高级驾驶辅助系统(ADAS)需要在严格功率和面积约束下提供低延迟推理的神经计算引擎。正态数运算因其在低精度下提供高数值保真度而受到此类加速器的青睐,但其可变长度编码域增加了编码/解码成本,并使数据路径暴露于大域场故障效应。本文提出了EULER-ADAS,一种支持SIMD的对数有限正态数神经计算引擎,用于能量高效和可靠性感知的ADAS加速。所提出的路径结合了有限域正态数表示、阶段自适应对数尾数乘法和位截断,以及支持正态数-(8,0)、正态数-(16,1)和正态数-(32,2)执行的SIMD共享quire累积路径。统一的架构使4x正态数-8、2x正态数-16或1x正态数-32操作无需复制精度特定的硬件。FPGA实现显示,所提出的配置在LUT计数、延迟和功率方面分别比精确正态数神经计算引擎减少高达41.4%、76.1%和71.9%,同时在基数-4 Booth基正态数乘法器中实现高达10倍的低能量-延迟积。在28-nm CMOS中,有限变体占用0.013-0.016 mm²,消耗19.8-22.1 mW,并以高达1.84 GHz的速度运行。在图像分类、ADAS和边缘推理工作负载上的应用级评估显示,评估的正态数-16和正态数-32配置在FP32精度方面保持在约1.5个百分点以内。在Pynq-Z2上的TinyYOLOv3原型在0.29 W和22.6 mJ/frame下实现78 ms延迟,证明了EULER-ADAS在低功耗实时ADAS推理中的适用性。

英文摘要

Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precision, but its variable-length regime encoding increases encode/decode cost and exposes the datapath to large regime-field fault effects. This paper presents EULER-ADAS, a SIMD-enabled logarithmic bounded-Posit neural compute engine for energyefficient and reliability-aware ADAS acceleration. The proposed datapath combines bounded-regime Posit representation, stageadaptive logarithmic mantissa multiplication with bit truncation, and a SIMD-shared quire accumulation path supporting Posit- (8,0), Posit-(16,1), and Posit-(32,2) execution. The unified architecture enables 4xPosit-8, 2xPosit-16, or 1xPosit-32 operation without duplicating precision-specific hardware. FPGA implementation shows that the proposed configurations reduce LUT count by up to 41.4%, delay by up to 76.1%, and power by up to 71.9% relative to exact Posit neural compute engines, while achieving up to 10x lower energy-delay product than radix-4 Booth-based Posit multipliers. In 28-nm CMOS, the bounded variants occupy 0.013-0.016 mm2 , consume 19.8-22.1 mW, and operate at up to 1.84 GHz. Application-level evaluation across image-classification, ADAS, and edge-inference workloads shows that the evaluated Posit-16 and Posit-32 configurations remain within about 1.5 percentage points of FP32 accuracy. A TinyYOLOv3 prototype on Pynq-Z2 achieves 78 ms latency at 0.29 W and 22.6 mJ/frame, demonstrating the suitability of EULERADAS for low-power real-time ADAS inference.

2605.06839 2026-05-11 cond-mat.mtrl-sci cs.AI

LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments

基于自主扫描探针显微镜实验的LLM引导开放假设学习

Boris Slautin, Utkarsh Pratiush, Yu Liu, Kamyar Barakati, Sergei Kalinin

发表机构 * Department of Materials Science and Engineering, University of Tennessee, Knoxville, TN 37923, USA(材料科学与工程系,田纳西大学, Knoxville,TN 37923,USA)

AI总结 本文提出结合符号回归与大语言模型物理评估的开放假设学习框架,用于自主扫描探针显微镜实验,通过稀疏测量生成候选分析关系,并通过物理合理性评估发现可解释的电压-时间增长定律。

Comments 21 pages, 6 figures, 1 table

详情
AI中文摘要

自主实验已通过闭环优化改变了显微镜和材料发现,包括成像和光谱调节、结构-性质关系发现以及组合库探索。然而,当前大多数工作仍局限于在固定目标或假设空间内选择测量,而非从实验数据生成新物理模型。本文介绍了一个开放假设学习框架,结合符号回归与基于大语言模型的物理评估,并将其应用于自主扫描探针显微镜。符号回归直接从稀疏测量生成候选分析关系,而语言模型评估器根据物理合理性、缩放行为和与已知机制的一致性对这些候选者进行排序。我们展示了该方法在自主压电响应力显微镜测量中的应用,用于研究PZT薄膜中的铁电域切换。从五个种子测量开始,工作流程从物理不完整的候选表达式演变为与动能域墙运动一致的可解释电压-时间增长定律。这项工作将自主显微镜从闭环优化扩展到开放假设发现,其中候选物理定律由实验本身产生,而非事先指定。更广泛地说,该框架为将符号回归、物理推理和自适应实验整合到分层自主科学工作流程中建立了途径。

英文摘要

Autonomous experimentation has transformed microscopy and materials discovery by enabling closed-loop optimization including imaging and spectroscopy tuning, strucutre property relationship discovery, and exploration of combinatorial libraries. However, most current workflows remain limited to selecting measurements within fixed objective or hypothesis spaces, rather than generating new physical models from experimental data. Here, we introduce an open hypothesis-learning framework that combines symbolic regression with large-language-model-based physical evaluation and implement it for autonomous scanning probe microscopy. Symbolic regression generates candidate analytical relationships directly from sparse measurements, while the language-model evaluator ranks these candidates according to physical plausibility, scaling behavior, and consistency with known mechanisms. We demonstrate the approach on autonomous piezoresponse force microscopy measurements of ferroelectric domain switching in a PZT thin film. Starting from five seed measurements, the workflow evolves from physically incomplete candidate expressions toward interpretable voltage-time growth laws consistent with kinetic domain-wall motion. This work extends autonomous microscopy from closed-loop optimization toward open hypothesis discovery, where candidate physical laws emerge from the experiment itself rather than being specified in advance. More broadly, the framework establishes a route for integrating symbolic regression, physical reasoning, and adaptive experimentation into hierarchical autonomous scientific workflows.

2605.06833 2026-05-11 cs.CR cs.AI cs.NI

PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks

PAMPOS:基于因果变换器的轨迹预测用于V2X网络中攻击无关的异常行为检测

Konstantinos Kalogiannis, Ahmed Mohamed Hussain, Panos Papadimitratos

发表机构 * Networked Systems Security Group\ Royal Institute of Technology Stockholm Sweden Networked Systems Security Group\ Royal Institute of Technology

AI总结 PAMPOS利用因果变换器学习正常移动模式,通过异常评分机制检测轨迹偏离,无需攻击标注数据,实现高准确率的异常行为识别。

Comments Author's version; Accepted for presentation at the ACM Workshop on Wireless Security and Machine Learning (WiseML 2026)

详情
AI中文摘要

在车辆到一切(V2X)网络中,异常行为检测是应对内部伪造攻击的第二道防线,传统监督学习方法因需标注攻击样本而无法应对未知攻击。PAMPOS通过训练良性VeReMi++轨迹的因果变换器学习正常移动模式,在推理时利用top-K归一化异常评分机制检测轨迹偏离,将伪造行为定位到特定运动特征,无需攻击标注数据。在VeReMi++的高峰时段和下午场景中,PAMPOS对19种攻击类型实现了高达0.98的AUC值和0.95的F1分数。

英文摘要

Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, requiring labeled attack samples at training time, thus failing to counter unseen falsification attacks. We present PAMPOS, a causal transformer-decoder trained on benign VeReMi++ trajectories to learn normal mobility patterns. At inference time, misbehavior is identified as a deviation from the model's next-step kinematic predictions using a top-K normalized anomaly scoring mechanism that localizes falsification to specific kinematic features, without requiring attack-labeled training data. We evaluate PAMPOS across all 19 attack types in VeReMi++ under rush-hour and afternoon scenarios, achieving Area Under the Curve (AUC) values of up to 0.98 and F1-scores of up to 0.95 for most attack categories.

2605.06820 2026-05-11 physics.med-ph cs.AI

Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy

通过多中心联邦学习克服数据稀缺性以实现儿童上腹部放疗中的器官-at-risk分割

Mianyong Ding, Maximilian Knoll, Semi Harrabi, Martine van Grotel, Annemieke S. Littooij, Max van Noesel, Jens-Peter Schenk, Marry M. van den Heuvel-Eibrink, Geert O. Janssens, Matteo Maspero

发表机构 * Princess Máxima Centre for Pediatric Oncology University Medical Centre Utrecht Centre for Image Sciences, University Medical Centre Utrecht University Hospital Heidelberg University Medical Center Utrecht Division Imaging & Cancer, University Medical Center Utrecht Division of Pediatric Radiology, Department of Diagnostic and Interventional Radiology, University Hospital Heidelberg Wilhelmina Children’s Hospital-Division of CHILD HEALTH, University Medical Centre Utrecht, University of Utrecht

AI总结 本文提出利用联邦学习解决儿童上腹部放疗中器官-at-risk分割的数据稀缺问题,通过多中心协作训练提升模型泛化能力,实验显示联邦学习在跨中心性能和稳定性方面优于本地模型。

详情
AI中文摘要

基于深度学习的器官/结构-at-risk(OARs)自动勾勒模型可以提高放疗工作流程,但训练于成人数据的模型在儿童患者中表现不佳。开发稳健的儿童专用模型受到数据稀缺性和中心间数据碎片化的阻碍。联邦学习(FL)使在不共享数据的情况下实现隐私保护的协作训练成为可能。我们评估了FL在两个欧洲医疗中心开发儿童专用OAR分割模型的可行性和性能。收集了来自乌得勒支和海德堡的儿童患者CT图像,这些患者有肾肿瘤或腹腔神经母瘤。使用基于nnU-Net的框架在本地和FL方案下分割了19个OARs。FL在云存储上通过安全权重交换在机构防火墙之间实现。性能通过Dice相似性系数(DSC)、95百分位Hausdorff距离和平均表面距离进行评估。识别了对患者取向、手术切除肾脏的假阳性分割和失败案例。总共包括310例术后CT,来自272名患者(105例肾肿瘤,167例神经母瘤)。本地模型在各自中心的数据上表现良好,但在跨中心性能上对九个评估的OARs中的四个到七个显示出显著降低的DSC。相比之下,FL模型在至少七个OARs上匹配本地性能,并在三个指标上实现了最佳跨中心结果,DSC比本地模型提高了0.003-0.007。FL还保持了在患者取向上的稳定性能并减少了假阳性肾分割。现实中的FL提高了CT基OAR分割模型在儿童上腹部肿瘤中的跨中心稳健性。

英文摘要

Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity and fragmentation across centers. Federated learning (FL) enables privacy-preserving collaborative training without the need for data sharing. We evaluated the feasibility and performance of FL for developing pediatric-specific OAR segmentation models across two European medical centers. Computed tomography (CT) images from pediatric patients from Utrecht and Heidelberg with a renal tumor or abdominal neuroblastoma were retrospectively collected and locally processed. An nnU-Net-based framework segmented 19 OARs using local and FL schemes. FL was implemented with secure weight exchange on a cloud storage across institutional firewalls. Performance was assessed using the Dice similarity coefficient (DSC), 95th percentile Hausdorff distance, and mean surface distance. Robustness to patient orientation, false-positive segmentation of surgically removed kidneys, and failure cases were identified. A total of 310 postoperative CTs from 272 patients (105 renal tumors, 167 neuroblastomas) were included. Local models performed well on their respective center data but showed significantly reduced cross-center performance for four to seven of the nine evaluated OARs (DSC). In contrast, the FL model matched local performance for at least seven of nine OARs and achieved the best cross-center results across three metrics, with DSC gains of 0.003-0.007 over local models. FL also maintained stable performance across patient orientations and reduced false-positive kidney segmentations. Real-world FL improves cross-center robustness of CT-based OAR segmentation models in pediatric upper abdominal tumors.

2605.06810 2026-05-11 cs.HC cs.CV

Enhancing Eye Movement Biometrics for User Authentication via Continuous Gaze Offset Score Fusion

通过连续注视偏移分数融合增强眼动生物特征用于用户认证

Hashim Aziz, Mehedi Hasan Raju, Oleg V. Komogortsev

发表机构 * Texas State University San Marcos(德克萨斯州立大学桑马科斯分校)

AI总结 本文研究连续注视偏移与现有生物特征融合对提升用户认证性能的影响,通过两个公开数据集验证线性和非线性融合方法的有效性。

Comments 10 Pages, 1 Figure, 1 Table, Submitted to IJCB 2026

详情
AI中文摘要

眼动生物特征(EMB)利用受试者特定的注视动态进行用户身份验证和识别。最近基于深度学习的EMB系统通过建模时间序列注视行为实现了优异性能。然而,这些系统通常忽视连续注视偏移,尽管已有证据表明它包含用户区分信息。本文探讨了将连续注视偏移与现有生物特征结合是否能提升生物特征性能。我们评估了线性和非线性融合方法,使用实验室级眼追踪仪和虚拟现实头盔在多个任务和观察持续时间内收集的公开数据集。结果表明,融合在两个数据集上均提供了性能提升,特别是使用非线性融合时。此外,跨多个任务融合生物信息进一步提高了认证性能。这些发现支持了连续注视偏移在眼追踪退化或嘈杂条件下可能作为有用辅助信息的假设。

英文摘要

Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook continuous gaze offset, despite prior evidence that it contains user-discriminative information. This work examines whether continuous gaze offset can improve biometric performance when combined with existing biometric features. We evaluate linear and nonlinear fusion methods on two publicly available datasets, collected via the lab-grade eye tracker and virtual reality headset across multiple tasks and observation durations. Results indicate that fusion offers performance benefits on both datasets, particularly when using nonlinear fusion. Additionally, fusing biometric information across multiple tasks further improves authentication performance. These findings support the hypothesis that continuous gaze offset may serve as useful auxiliary information under conditions of degraded or noisy eye tracking.

2605.06762 2026-05-11 q-bio.GN cs.AI

A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine

基于线性变换器的混合模型用于葡萄vine基于SNP的基因型到表型预测

Yibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran, Ambika Chandra, Azlan Zahid

发表机构 * Department of Biological and Agricultural Engineering, Texas A&M AgriLife Research, Texas A&M University System(生物与农业工程系,德克萨斯A&M农业生命研究,德克萨斯A&M大学系统) Department of Soil and Crop Sciences, Texas A&M AgriLife Research, Texas A&M University System(土壤与作物科学系,德克萨斯A&M农业生命研究,德克萨斯A&M大学系统) USDA-ARS, Grape Genetics Research Unit(美国农业部-美国农业研究服务局,葡萄遗传学研究单位)

AI总结 本文提出LiT-G2P模型,结合线性效应与Transformer非线性交互,提升葡萄vine基因型到表型预测的鲁棒性,尤其在跨年测试中表现优异。

Comments 15 pages, 4 Figures

详情
AI中文摘要

稳健的基因型到表型(G2P)预测对于加速育种决策和遗传增益至关重要。然而,测量复杂性状在变化的田间条件和跨年中仍具挑战性。在本研究中,我们提出了一种线性-变换器方法,LiT-G2P(线性-变换器基因型到表型),一种自动预测框架,整合了加性遗传方差效应与基于基因组-wide单核苷酸多态性(SNPs)数据的变换器非线性交互。我们评估了LiT-G2P在多样葡萄品系面板上的性能,这些品系通过SNP标记基因型,并在连续两年中测量表型。目标表型特征包括葡萄vine的叶毛密度和绒毛密度。在单年和跨年测试场景中,LiT-G2P均优于基线模型。对于毛密度,LiT-G2P在单年和跨年评估中均取得最低误差,RMSE分别为0.469和0.454,同时保持强耐受准确性79.2%和74.6%。对于绒毛密度,LiT-G2P也展示了最佳的G2P性能。此外,我们从注意力权重中提取模型优先的SNPs,并应用基因型分层分析,为下游验证提供可解释的候选标记。这些结果表明,整合稳定的加性效应与学习的交互模式可以增强跨年鲁棒性,并支持基于SNP的预测建模用于基因组选择。

英文摘要

Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide polymorphisms (SNPs) data. We evaluated LiT-G2P on a panel of diverse grape accessions, genotyped with SNP markers and measured for phenotypes across two consecutive years. Target phenotypic traits include leaf hair density and trichome density of grapevines. Across both single-year and cross-year testing scenarios, LiT-G2P consistently improves prediction performance compared with baseline models. For hair density, LiT-G2P achieves the lowest error in both single-year and cross-year evaluations, with RMSEs of 0.469 and 0.454, respectively, while maintaining strong tolerance accuracies of 79.2% and 74.6%, respectively. For trichome density, LiT-G2P also presents the best overall G2P performance. In addition, we extract model-prioritized SNPs from attention weights and apply genotype-stratified analysis to provide interpretable candidate marker for downstream validation. These results demonstrate that integrating stable additive effects with learned interaction patterns can enhance cross-year robustness and support practical SNP-based predictive modeling for genomic selection.

2605.06749 2026-05-11 stat.ME cs.AI

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

算法集体行动的统计框架:多集体情况

Claudio Battiloro, Pietro Greiner, Dario Rancati, Bret Nestor, Oumaima Amezgar, Francesca Dominici

发表机构 * Harvard University(哈佛大学) Mila and LawZero(Mila和LawZero) Institute of Science and Technology Austria(奥地利科学与技术研究所) University of British Columbia(不列颠哥伦比亚大学) University of Padova(帕多瓦大学)

AI总结 本文提出首个多集体算法集体行动的统计框架,研究多个集体如何影响分类器行为,并提供可计算的统计界限。

Comments 27 pages, 16 figures

详情
AI中文摘要

随着学习系统日益影响日常决策,算法集体行动(ACA)即用户协调改变共享数据以引导模型行为,为监管方政策和企业模型设计提供了补充。现实中的集体行动通常分散且碎片化为多个集体,尽管共享总体目标,每个集体在规模、策略和可操作目标上有所不同。然而,大多数ACA文献集中在单一集体设置。为此,我们提出首个全面的多集体ACA统计框架。特别地,我们聚焦分类中的集体行动,研究多个集体如何影响分类器行为。我们提供了集体成功的定量统计界限,考虑集体规模和目标一致性的角色及相互作用。我们通过每个集体仅部分了解其他集体规模和策略的情况下,使这些界限可计算。最后,我们通过受智能城市气候适应干预启发的模拟,数值展示了我们框架的实用性。

英文摘要

As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature focuses on single collective settings. To address this, we propose the first comprehensive statistical framework for ACA with multiple collectives acting on the same system. In particular, we focus on collective action in classification, studying how multiple collectives can influence a classifier's behavior. We provide quantitative statistical bounds on the success of the collectives, considering the role and the interplay of the collectives' sizes and the alignment of their goals. We make such bounds computable by each collective with only partial knowledge of other collectives' sizes and strategies. Finally, we numerically illustrate our framework on simulations inspired by interventions for climate adaptation in smart cities, demonstrating the usefulness of our bounds.

2605.06737 2026-05-11 cs.SE cs.AI

A Self-Healing Framework for Reliable LLM-Based Autonomous Agents

基于大语言模型的自主代理的自修复框架

Cheonsu Jeong, Younggun Shin

发表机构 * AX Center, SAMSUNG SDS(三星SDS AX中心) Dept. of Artificial Intelligence, Graduate School of Engineering, Yonsei University(延世大学工程研究生院人工智能系)

AI总结 本文提出一种可靠性感知的自修复框架,通过故障检测、可靠性评估和自动恢复机制提升基于大语言模型的自主代理的可靠性,实验表明该方法显著提高了任务成功率并增强了系统鲁棒性。

Comments 13 pages, 3 figures,1 table

详情
AI中文摘要

基于大语言模型(LLM)的自主代理在复杂软件系统中日益被使用。然而,由于不可预测的故障如幻觉、执行错误和不一致推理,可靠性仍是重大挑战。本文提出一种可靠性感知的自修复框架,该框架整合了故障检测、可靠性评估和自动恢复机制。首先,我们定义了故障类型的分类并引入了定量可靠性评估模型。接着,我们提出了一种故障检测方法,通过执行模式和输出一致性识别异常代理行为。最后,我们设计了一种自修复机制,通过自适应重新规划和纠正提示策略动态恢复故障。所提出的框架在多代理工作流环境中实现,并通过现实任务场景进行评估。实验结果表明,与现有方法相比,我们的方法显著提高了任务成功率,减少了故障传播,并增强了整体系统鲁棒性。特别是,本研究通过将代理的内部推理过程与外部执行结果相结合,建立了集成的监控系统。这些发现预计将有助于保障高级自主系统的稳定性,并降低LLM在生产环境中的采用门槛。

英文摘要

Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent reasoning. This paper proposes a reliability-aware self-healing framework for LLM-based software agents. The framework integrates failure detection, reliability assessment, and automated recovery mechanisms. First, we define a taxonomy of failure types and introduce a quantitative reliability assessment model. Next, we propose a failure detection method that identifies abnormal agent behavior based on execution patterns and output consistency. Finally, we design a self-healing mechanism that dynamically recovers from failures through adaptive replanning and corrective prompting strategies. The proposed framework was implemented in a multi-agent workflow environment and evaluated using real-world task scenarios. Experimental results demonstrate that our approach significantly increases task success rates, reduces failure propagation, and enhances overall system robustness compared to existing methods. In particular, this study distinguishes itself by establishing an integrated monitoring system that combines the agent's internal reasoning process with external execution results. These findings are expected to contribute to securing the stability of advanced autonomous systems and lowering the barriers to LLM adoption in production environments.

2605.06731 2026-05-11 cs.CR cs.CL cs.LG

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

当常规聊天变得有毒:个性化代理中无意的长期状态污染

Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye, Haibo Hu

发表机构 * The Hong Kong Polytechnic University(香港理工大学) Hong Kong University of Science and Technology, HKUST (Guangzhou)(香港科学与技术大学(广州))

AI总结 研究探讨了个性化代理中因常规交互导致的长期状态污染问题,提出ULSPB基准和Harm Score指标,并通过StateGuard防御机制降低安全风险。

Comments 23 pages

详情
AI中文摘要

个性化大语言模型代理通过维持跨会话状态以支持长期协作,但这种持久性引入了关键安全漏洞:常规用户-代理交互会逐渐改变代理的长期状态,削弱未来确认边界,扩大工具使用默认设置,并增加自主行为。本文将此风险定义为无意长期状态污染,提出ULSPB基准,包含350种设置,涵盖五类协助、七种交互模式、24轮常规交互及匹配的单次注入对照。定义Harm Score(HS)作为状态中心度量,量化授权漂移、工具使用升级和未检查的自主性。实验表明,单次注入效果一般,但常规对话单独即可显著污染长期状态,主要破坏记忆中心的artifact。通过真实用户交互验证,此风险并非合成提示的产物。为缓解此威胁,提出StateGuard,一种轻量级后执行防御机制,在写回边界审计状态差异并选择性回滚危险编辑。所有评估模型中,StateGuard将HS降至接近零,降低误报率,安全优先写回防御下具有可接受的高误报率和最小开销。

英文摘要

Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we introduce the \textbf{Unintended Long-Term State Poisoning Bench (ULSPB)}, a bilingual benchmark comprising $350$ settings spanning five assistance categories, seven interaction patterns, 24-turn routine interactions, and matched single-injection counterparts. Furthermore, we define the \emph{Harm Score} (HS), a state-centric metric that quantifies \emph{authorization drift}, \emph{tool-use escalation}, and \emph{unchecked autonomy}. Experiments on OpenClaw with four backbone LLMs demonstrate that, while single-injection is generally effective, routine conversations alone can substantially poison long-term state, primarily corrupting memory-centric artifacts. Evaluations seeded with real-world user interactions confirm that this risk is not a mere artifact of synthetic prompts. To mitigate this threat, we propose \textbf{StateGuard}, a lightweight, post-execution defense that audits state diffs at the writeback boundary and selectively rolls back dangerous edits. Across all evaluated models, StateGuard reduces HS to near zero and lowers false-negative rates, with acceptable high false-positive rates under a safety-first writeback defense and minimal overhead.

2605.06718 2026-05-11 cs.CR cs.LG

TUANDROMD-X: Advanced Entropy and Visual Analytics Dataset for Enhanced Malware Detection and Classification

TUANDROMD-X:用于增强恶意软件检测和分类的高级熵和可视化数据集

Parthajit Borah, Upasana Sarmah, D. K. Bhattacharyya, J. K. Kalita

发表机构 * Department of Computer Science and Engineering, Tezpur University(计算机科学与工程系,泰浦大学) Computer Science, College of Engineering and Applied Science, University of Colorado(计算机科学,工程与应用科学学院,科罗拉多大学)

AI总结 本文提出TUANDROMD-X数据集,通过熵和可视化特征区分恶意软件与良性软件,提升恶意软件检测效率。

详情
AI中文摘要

恶意软件和基于恶意软件的攻击日益增多且复杂。攻击者不断开发新的技术以逃避传统和基于签名的恶意软件防御。为应对这些威胁,需要更先进的防御解决方案。基于机器学习的技术能有效防御恶意软件和基于恶意软件的攻击。然而,创建和高效测试这些技术需要高质量的数据集,包含各种恶意软件家族和良性软件的样本。缺乏此类数据集仍然是恶意软件研究的主要瓶颈。本文介绍TUANDROMD-X,一个多类恶意软件数据集,包含每个样本的可视化和熵基特征,明确区分恶意软件与良性软件。该数据集基于静态分析,降低了高特征工程和动态分析带来的开销。因此,TUANDROMD-X使研究人员和网络安全专家能够设计更快更好的恶意软件检测系统。

英文摘要

Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing demand for advanced and better defense solutions. Machine learning-based techniques are efficiently capable of defending against malware and malware-based attacks. Nevertheless, creating and efficiently testing such techniques demand high-quality datasets having samples of various malware families as well as goodware. The lack of such datasets continues to be a major bottleneck in malware research. In this paper, we introduce TUANDROMD-X, a multiclass malware dataset with visual and entropy-based features of each sample, distinctly identifying malware from goodware. The dataset is created based on static analysis, lowering the overhead that comes with high feature engineering and dynamic analysis. As a result, TUANDROMD-X facilitates researchers and cyber-security experts to design faster and better malware detection systems.