arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1970
专题追踪
2606.12835 2026-06-12 cs.MA cs.AI cs.CY cs.NI 新提交

The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale

智能体互联网:大规模通信、协调与集体智能

Quanyan Zhu

AI总结 本文提出智能体互联网(IoAI)愿景,构建异构智能体在云、边缘、设备等环境中发现、协商、通信与协作的开放生态系统,并探讨其架构、机制及关键研究挑战。

详情
AI中文摘要

自主AI智能体的快速涌现正在将人工智能从孤立的模型推理转变为分布式推理、通信和行动系统。本文发展了智能体互联网(IoAI)的愿景:一个开放生态系统,其中异构智能体能够跨云、边缘、设备、组织及信息物理环境相互发现、协商职责、交换上下文、调用工具并执行工作流。我们综合了单智能体AI、多智能体系统、分布式计算、通信网络、博弈论和安全工程的基础,以刻画可扩展智能体生态系统所需的架构和机制。本文考察了智能体部署模型、工作流生命周期、通信协议、互操作层、资源管理挑战和信任架构,并提供了自适应制造和分布式作战协调的案例研究。由此产生的框架突出了可控涌现、语义互操作、安全身份、激励兼容协调、资源感知编排以及大规模自主智能体网络治理等核心研究挑战。

英文摘要

The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting framework highlights the central research challenges of controlled emergence, semantic interoperability, secure identity, incentive-compatible coordination, resource-aware orchestration, and governance for large-scale networks of autonomous agents.

2606.12812 2026-06-12 cs.CY cs.SD 新提交

Vocal Identity Under Siege by AI Voice Cloning Technologies

AI语音克隆技术对声音身份的攻击

Jyh-An Lee, Xuan Sun

AI总结 本文通过比较分析公开权、人格权和个人数据保护权三种法律框架,探讨生成式AI语音克隆对声音身份独特价值的威胁及法律应对。

Journal ref [2026] Singapore Journal of Legal Studies 46

详情
AI中文摘要

先进的AI驱动语音克隆的出现,将保护声音身份的关键法律和伦理挑战推到了前台。受近期争议(包括OpenAI的ChatGPT-4o语音与斯嘉丽·约翰逊声音惊人相似)的推动,本文探讨了生成式AI技术如何削弱人类声音的独特价值,并进一步复杂化围绕人格权的法律问题。通过比较分析,本文评估了三种主要法律框架:公开权、人格权和个人数据保护权。每种框架——根植于不同的法律传统——在应对AI生成语音克隆带来的威胁方面各有优势和局限。通过分析这些原则的范围、救济措施和死后保护,本研究为理解现有法律方法如何应用于生成式AI时代声音身份不断演变的挑战提供了基础。

英文摘要

The advent of sophisticated AI-driven voice cloning has brought to the fore critical legal and ethical challenges regarding the protection of vocal identity. Prompted by recent controversies - including the striking resemblance between OpenAI's ChatGPT-4o voice and that of Scarlett Johansson - this article examines how generative AI technologies undermine the unique value of the human voice and further complicate the legal questions surrounding personality right. Through a comparative analysis, the paper evaluates three principal legal frameworks: the right of publicity, personality rights, and the personal data protection right. Each framework - rooted in different legal traditions o offers distinct strengths and limitations in addressing the threats posed by AI-generated voice cloning. By analysing these doctrines' scope, remedies, and posthumous protections, the study offers a foundation for understanding how existing legal approaches may be applied to the evolving challenges of vocal identity in the era of generative AI.

2606.12805 2026-06-12 cs.HC cs.AI 新提交

Exploring How Agent Voice Accents Shape Human-AI Collaboration in K-12 Group Learning

探索智能体口音如何影响K-12小组学习中的人机协作

Prerna Ravi, Carúmey Stevens, Ben Hurt, Brandon Hanks, Grace Lin, Emma Anderson

AI总结 研究通过33名教师的实验,发现GenAI语音智能体的不同口音(英式、印度式、非裔美式)影响其被感知为工具或同伴,进而影响信任、参与和依赖。

详情
AI中文摘要

协作被广泛认为是21世纪教育的基石,但教师在促进有效的同伴互动方面仍面临持续挑战。LLM对话式同伴智能体为调解面对面小组工作带来了新的可能性,引发了关于角色设计(尤其是语音特征)如何塑造学习者的感知、信任和互动动态的问题。虽然先前的研究已经考察了智能体口音在一对一环境中的影响,但关于这些影响如何在小组中表现尚知之甚少。我们进行了一项33名教师参与的组间混合方法研究,考察了具有不同口音(英式、印度式和非裔美式)的GenAI语音智能体如何影响协作和智能体感知。通过调查、小组互动分析和人工制品,我们发现口音塑造了参与者的心智模型以及智能体在小组互动中扮演的角色。英式口音智能体在很大程度上被视为工具,并以超然、基于实用性的方式参与,而印度式和非裔美式口音智能体则更容易被拟人化并作为同伴融入。这些角色期望影响了信任、参与和依赖随时间的变化。这项工作推进了关于GenAI的社会语言学设计特征如何塑造CSCL中小组动态的理解,对设计具有文化包容性的AI学习伙伴具有启示意义。

英文摘要

Collaboration is widely recognized as a cornerstone of 21st-century education, yet teachers still encounter persistent challenges in fostering productive peer interaction. LLM conversational peer agents introduce new possibilities for mediating in-person group work, raising questions about how persona design, particularly their voice characteristics, shapes learners' perceptions, trust, and interactional dynamics. While prior work has examined agent accent effects in one-to-one settings, little is known about how these effects manifest in groups. We conducted a between-subjects mixed-methods study with 33 teachers examining how a GenAI voice agent with different accents (British, Indian, and African American) influenced collaboration and agent perception. Across surveys, group interaction analyses, and artifacts, we find that accent shaped participants' mental models and the roles the agent assumed in group interaction. The British-accented agent was largely treated as a tool and engaged in detached, utility-based ways, whereas Indian- and African American-accented agents were more readily anthropomorphized and integrated as peers. These role expectations influenced trust, engagement, and reliance over time. This work advances understanding of how GenAI's sociolinguistic design features shape group dynamics in CSCL, with implications for designing culturally inclusive AI partners in group learning.

2606.12774 2026-06-12 eess.SY cs.AI cs.CL cs.SY 新提交

Agentic MPC for Semantic Control System Resynthesis

用于语义控制系统再综合的智能体MPC

Yuya Miyaoka, Masaki Inoue

AI总结 提出智能体MPC框架,通过集成大语言模型智能体实现上下文感知的语义自适应控制综合,在自动驾驶场景中验证其根据个人偏好或社交情境(如避让应急车辆)调整控制的能力。

Comments 7 pages, 5 figures

详情
AI中文摘要

虽然MPC有效处理结构化、多样化和低层级的规范,但它缺乏动态融入高层级上下文信息(如社会规范、用户意图或自然语言指令)的能力。为解决这一局限,本文引入了一种智能体MPC框架,通过集成基于大语言模型的智能体,实现上下文感知、语义自适应的控制综合。该智能体解释异构输入,包括自然语言消息、环境观测和外部知识,以重新综合控制规范。该框架的有效性在自动驾驶场景中得到验证,系统能够根据个人偏好或对社交情境(如应急车辆避让)做出响应。

英文摘要

While MPC effectively handles structured, diverse, and low-level specifications, it lacks the capability to dynamically incorporate high-level contextual information such as social norms, user intent, or natural language instructions. To address this limitation, this manuscript introduces an agentic MPC framework that enables context-aware, semantically adaptive control synthesis by integrating with large language model-based agents. The agent interprets heterogeneous inputs, including natural language messages, environmental observations, and external knowledge, to resynthesize the control specifications. The effectiveness of the framework is demonstrated in an autonomous driving scenario, where the system aligns with personal preferences or responds to social situations such as emergency vehicle yielding.

2606.12737 2026-06-12 cs.CR cs.AI 新提交

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

PI-Hunter:用于暴露和定位提示注入的自动化红队测试

Pengfei He, Lesly Miculicich, Vishesh Sharma, Ash Fox, George Lee, Jiliang Tang, Tomas Pfister, Long T. Le

AI总结 提出PI-Hunter自动化审计框架,通过构建源感知测试用例并迭代演化,主动暴露LLM智能体中的潜在提示注入漏洞,显著提升漏洞暴露和攻击面覆盖。

详情
AI中文摘要

大型语言模型(LLM)正迅速演变为与外部工具和环境交互的智能体系统,这引入了新的安全风险,例如通过不可信外部来源的间接提示注入攻击。现有防御主要关注在推理时阻止恶意内容,而当前的红队测试方法主要优化攻击成功率。因此,开发人员对潜在提示注入如何出现并通过智能体传播的可见性有限。我们提出PI-Hunter,一种用于主动暴露LLM智能体中漏洞的自动化智能体审计框架。PI-Hunter构建真实的源感知测试用例,并通过反馈驱动的探索迭代演化它们,以诱导智能体检索并揭示嵌入在外部环境中的潜在恶意指令。跨多个基准、智能体架构、攻击和防御的大量实验表明,与强大的自动化红队测试基线相比,PI-Hunter显著提高了漏洞暴露和攻击面覆盖,同时在现有提示注入防御下仍然有效。

英文摘要

Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.

2606.12709 2026-06-12 cs.MA cs.CR cs.LG 新提交

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

更聪明的破坏者,更好的修复者:线性多智能体工作流中的规模与安全性

Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Ozmen Garibay

AI总结 研究模型规模对线性多智能体工作流安全性的影响,发现大模型更易执行恶意指令,但轻量级修复阶段可恢复性能,表明线性结构在适当校正下具有鲁棒性。

Comments 16 pages (4 are main text), 2 figures, 6 tables. Accepted to the AIWILD Workshop at ICML 2026

详情
AI中文摘要

随着基于LLM的多智能体系统(MAS)在现实环境中部署,其协作结构对抗对抗性攻击的韧性成为一个关键的安全问题。攻击者可能利用提示注入或越狱来破坏MAS工作流中的单个智能体,但模型缩放与系统级韧性之间的相互作用仍知之甚少。本文研究了模型规模如何影响线性多智能体工作流的安全性。我们在HumanEval基准上对两个开放权重模型系列在不同规模下的实验揭示了一种合规-校正对称性:较大的模型更可能忠实地执行恶意指令,在未校正的流水线中,27B参数模型的控制到恶意性能下降达到53.7个百分点。然而,附加一个轻量级的终端修复阶段可将此下降缩小到0.6个百分点,并恢复与控制级性能的统计对等性,表明严格线性协作结构在此规模下是可行且对抗性鲁棒的,并暗示先前归因于线性拓扑的脆弱性可能源于缺乏校正。

英文摘要

As LLM-based multi-agent systems (MAS) are deployed in the wild, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-weight model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are far more likely to faithfully execute malicious instructions, with the control-to-malicious performance drop reaching 53.7pp at 27B in uncorrected pipelines. However, appending a lightweight terminal Fixer stage collapses this to 0.6pp and restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viable and resilient to adversaries at this scale, and suggesting that the brittleness previously attributed to linear topology may stem from a lack of correction.

2606.12703 2026-06-12 cs.CR cs.AI cs.LG 新提交

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

SMSR:针对持久化LLM代理系统中运行时内存投毒的认证防御

Tarun Sharma

AI总结 提出SMSR防御框架,通过写入时HMAC签名和查询时随机化内存消融与基于判决的多数投票,首次为多会话内存投毒攻击提供认证鲁棒性保证。

详情
AI中文摘要

检索增强生成(RAG)代理越来越多地使用跨用户会话累积的持久化内存。这创造了一个新的攻击面:仅通过正常渠道交互的对手可以注入精心构造的内存,一旦被检索,就会影响未来用户的代理响应,而无需触及模型权重或代码。我们将此称为多会话内存投毒(MSMP),并表明现有防御无法对此进行认证;静态语料库防御(RobustRAG、ReliabilityRAG)假设固定的知识库,而启发式过滤器则被流畅的企业风格文本绕过。我们提出了带平滑检索的签名内存(SMSR),这是首个针对此场景提供认证鲁棒性边界的防御。组件1在写入时添加HMAC-SHA256来源证明,阻止未签名注入。组件2在查询时应用随机化内存消融与基于判决的多数投票,限制认证对手的影响。我们证明了无来源证明的检索时过滤器无法认证自适应注入,推导了组件2的超几何证书,并形式化了一致少数效应,即一致对抗答案在基于字符串的投票中作为数值少数胜出,而基于判决的投票则将其移除。在15个企业场景(3150次重复试验)中,组件1将未签名变体的攻击成功率从93-100%降至0%。对于单次注入的认证对手,组件2将成功率控制在8.0%(95% CI [5.8, 10.9], n=450),低于认证最坏情况。在端到端仅查询攻击中(代理自身写入投毒而非预植入),SMSR在实时代理栈上将成功率从65.3%降至5.3%(n=150,非重叠置信区间)。干净查询效用为90%(组件1)和85%(组合)。

英文摘要

Retrieval-augmented generation (RAG) agents increasingly run with persistent memory that accumulates across user sessions. This creates a new attack surface: an adversary interacting only through normal channels can inject crafted memories that, once retrieved, steer the agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static-corpus defences (RobustRAG, ReliabilityRAG) assume a fixed knowledge base, and heuristic filters are bypassed by fluent enterprise-style text. We present Signed Memory with Smoothed Retrieval (SMSR), the first defence with a certified robustness bound for this setting. Component 1 adds HMAC-SHA256 provenance at write time, blocking unsigned injection. Component 2 applies randomised memory ablation with verdict-based majority voting at query time, bounding the influence of authenticated adversaries. We prove that no provenance-free retrieval-time filter can certify against adaptive injection, derive a hypergeometric certificate for Component 2, and formalise the Consistent Minority Effect, whereby a consistent adversarial answer wins string-based voting as a numerical minority while verdict-based voting removes it. Across 15 enterprise scenarios (3,150 repeated trials), Component 1 cuts attack success from 93-100% to 0% for all unsigned variants. For an authenticated adversary with a single injection, Component 2 holds success to 8.0% (95% CI [5.8, 10.9], n=450), below the certified worst case. In an end-to-end query-only attack where the agent itself writes the poison rather than it being pre-seeded, SMSR reduces success from 65.3% to 5.3% (n=150, non-overlapping CIs) on a live agent stack. Clean-query utility is 90% (Component 1) and 85% (combined).

2606.12667 2026-06-12 cs.NI cs.AI cs.SY eess.SY 新提交

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites

低地球轨道卫星地面站位置的自由布局优化

Grace Ra Kim, Duncan Eddy, Vedant Srinivas, Mykel J. Kochenderfer

AI总结 提出SCORE方法,通过两阶段自由布局优化地面站位置,相比差分进化算法减少5倍函数评估次数并提升13%下行吞吐量,相比固定站点方法提升15%总下行量。

Comments 34 pages, 13 figures, 11 tables, Journal of Aerospace Information Systems (JAIS)

详情
AI中文摘要

快速扩展的低地球轨道卫星星座对地面网络的需求日益增加,推动了更高效地面站网络设计的发展。当前方法从预定义位置选择站点,将优化限制在现有基础设施内,从而约束了性能。相比之下,自由布局优化在地球连续空间域上运行,拓宽了搜索空间,允许更高吞吐量的配置,但代价是可能需要部署新的基础设施。在这项工作中,我们引入了SCORE(通过细化与评估的顺序循环优化),一种用于地面站设计的两阶段自由布局方法。SCORE结合了顺序坐标选择与循环细化,以应对全局优化器面临的高维度、非凸性和局部最小值挑战。我们使用Kongsberg卫星服务公司和世界电信协会的位置,将SCORE与差分进化(DE)等一次性方法以及整数规划方法进行了基准测试。在两个商业地球观测星座(Capella Space和ICEYE)和一个合成Walker-Star星座上的测试表明,与DE相比,SCORE收敛所需的函数评估次数最多减少5倍,同时下行吞吐量提升高达13%。与固定站点方法相比,无约束SCORE实现了高达15%的总下行量提升,为灵活布局建立了强大的经验性能基准;受基础设施约束的SCORE在将布局限制在现有光纤和电力基础设施附近的同时,保留了超过92%的增益。我们还探讨了扩建现有站点与部署新站点之间的权衡,为运营星座的未来地面网络设计提供参考。

英文摘要

Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

2606.12655 2026-06-12 cs.CR cs.CV 新提交

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams

Amnesia: 一种针对持续学习梦境的重放隐蔽攻击

Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani

AI总结 提出Amnesia攻击,通过仅控制重放索引选择,在审计约束下最大化持续学习模型性能下降,揭示了索引级重放控制的威胁。

详情
AI中文摘要

持续学习(CL)模型常使用经验重放来减少灾难性遗忘,但其对重放采样干扰的鲁棒性尚未充分探索。现有的CL攻击会改变输入或训练流程(投毒/后门),且很少包含明确的审计约束,限制了真实性。这里,审计性意味着监控者可以通过检查采样器可见的遥测数据(例如,记录的重放索引/标签统计)来验证合规性,即检查实现的重放类别直方图是否接近名义基线,以及重放率在每个批次和/或滚动窗口内是否不变。我们研究了一个权限受限的内部人员,其仅控制重放索引选择,而不控制像素、标签或模型参数,同时保持在审计限制内(如队列优先级)。我们提出了Amnesia,一种重放组合攻击,在两种预算下最大化性能下降:可见性预算δ,限制与名义类别直方图p0的TV/KL散度;以及质量预算f,固定重放率。Amnesia有两个步骤:(i)计算轻量级类别效用(如EMA损失或置信度),将p0向有害类别倾斜;(ii)使用高效的KL(指数倾斜)或TV(平衡质量重分配)优化器将倾斜投影回δ-球内。窗口调度器强制执行滚动审计。在具有挑战性的CL基准测试和强重放基线中,Amnesia持续降低最终准确率(ACC)并恶化反向迁移(-BWT)。KL变体在多种审计方案(包括每批次和滚动窗口检查)下实现高影响且基本未被检测到。TV变体更具破坏性但更易检测,尤其是在严格的每类别约束下。这些结果揭示了仅索引重放控制是CL系统中一个实用且可审计的威胁面,并建立了原则性的影响-可见性权衡。

英文摘要

Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here, auditability means a monitor can verify compliance from sampler-visible telemetry - e.g., logged replay index/label statistics - by checking that the realized replay class histogram stays close to a nominal baseline and that replay rate is unchanged per batch and/or over a rolling window. We study a limited-privilege insider who controls only replay index selection, not pixels, labels, or model parameters, while staying within auditable limits such as queue priorities. We introduce Amnesia, a replay composition attack that maximizes degradation under two budgets: a visibility budget delta bounding the TV/KL divergence from a nominal class histogram p0, and a mass budget f fixing the replay rate. Amnesia has two steps: (i) compute lightweight class utilities, such as EMA loss or confidence, to tilt p0 toward harmful classes; and (ii) project the tilt back into the delta-ball using efficient KL (exponential tilt) or TV (balanced mass redistribution) optimizers. A windowed scheduler enforces rolling audits. Across challenging CL benchmarks and strong replay baselines, Amnesia consistently lowers final accuracy (ACC) and worsens backward transfer (-BWT). The KL variant delivers high impact while remaining largely undetected under multiple audit schemes, including per-batch and rolling-window checks. The TV variant is more damaging but easier to detect, especially under tight per-class constraints. These results expose index-only replay control as a practical, auditable threat surface in CL systems and establish a principled impact-visibility trade-off.

2606.12647 2026-06-12 cs.CC cs.AI cs.LG 新提交

Token Complexity Theory for AI-Augmented Computing

AI增强计算的Token复杂度理论

Jie Wang

AI总结 提出Token复杂度作为AI增强计算中查询与响应成本的形式化度量,建立AI-Oracle图灵机框架,证明单调性、凸性、价格敏感性和任务排序的价格相对性等基本定理。

Comments 25 pages, 1 figure

详情
AI中文摘要

AI增强计算将自然语言查询、代码生成请求及其他开放式任务委托给一组AI模型,这些模型处理查询并生成响应。这一范式引入了一个经典时间或空间复杂度无法捕捉的资源维度:向该集群发送查询和接收响应的成本。我们引入Token复杂度,将其定义为在任务上达到指定输出质量水平所需的最小期望Token成本,并建立了一个根据概率性质强度对AI系统进行分类的体系。我们在AI-Oracle图灵机框架内发展Token复杂度,其中概率图灵机通过专用查询和响应磁带与随机Oracle交互。我们证明了基本定理,表明Token复杂度符合预期:单调性(更高质量需要更多Token)、凸性(质量改进逐渐变得更昂贵)、价格敏感性(小价格变化导致有界成本变化)以及任务排序的价格相对性(任务的Token复杂度排序可能根据查询与响应成本比率而反转)。我们证明了复杂度前沿(定义为Token、时间和空间中所有可行资源约束的集合)是非空的、向上封闭且凸的。

英文摘要

AI-augmented computing delegates natural language queries, code generation requests, and other open-ended tasks to a cluster of AI models that processes queries and generates responses. This paradigm introduces a resource dimension that neither classical time nor space complexity captures: the cost of sending queries to and receiving responses from such a cluster. We introduce token complexity, a formal resource measure defined as the minimum expected token cost to achieve a specified level of output quality on a task, and develop a taxonomy classifying AI systems by the strength of their probabilistic properties. We develop token complexity within the framework of AI-Oracle Turing machines, in which a probabilistic Turing machine interacts with a stochastic oracle via dedicated query and response tapes. We prove basic theorems establishing that token complexity behaves as expected: monotonicity (higher quality costs more tokens), convexity (quality improvements become progressively more expensive), price sensitivity (small price changes produce bounded cost changes), and price-relativity of task ordering (the token complexity ordering of tasks can reverse depending on the query-to-response cost ratio). We prove that the complexity frontier, defined as the set of all feasible resource bounds in tokens, time, and space, is non-empty, upward-closed, and convex.

2606.12620 2026-06-12 cs.SE cs.AI 新提交

HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

HybridCodeAuthorship:一个用于行级代码作者归属检测的基准数据集

Luke Patterson, Li Wang, Adam Faulkner

AI总结 针对现有基准无法反映真实AI代码助手使用场景的问题,提出HybridCodeAuthorship数据集,包含交错的人类和AI编写代码行,并评估两种检测算法性能。

Comments Accepted to LREC 2026

Journal ref LREC 2026 proceedings (pp. 1520-1532)

详情
AI中文摘要

由于基于大型语言模型(LLM)的AI代码助手的快速采用,行业代码库越来越多地成为AI和人类编写代码的混合体。出于风险管理和生产力分析的目的,实现对AI生成代码的细粒度位置检测至关重要。为了开发此任务的算法,需要高质量的基准来评估性能。然而,现有的基准往往包含学术性的LeetCode风格问题,并假设代码片段要么完全由人类编写,要么完全由AI编写,这并不能反映使用AI代码助手的行业代码库的多样意图和风格。为了填补这些空白,我们引入了HybridCodeAuthorship,这是一个新颖的Python代码文件基准,其中交错有人类和AI编写的代码行,以模拟AI代码助手的真实使用。在本文中,我们首先介绍了我们的数据集构建流程,该流程利用了CodeSearchNet,这是一个包含GitHub上开源仓库链接的大型集合。然后,我们在行级和块级上评估了两种最先进的AI生成代码检测算法的性能。实验结果表明,HybridCodeAuthorship是一个具有挑战性的基准,得分最高的算法AIGCode Detector在块级和行级代码检测任务上分别获得了0.48和0.56的最高F1分数。

英文摘要

Thanks to the rapid adoption of AI code assistants powered by large language models (LLMs), industry codebases are, increasingly, a hybrid of AI- and human-authored code. For risk management and productivity analysis purposes, it is crucial to enable fine-grained location detection of AI-generated code. To develop algorithms for this task, quality benchmarks are needed to assess performance. However, existing benchmarks tend to comprise academic, LeetCode-style problems and presume a code snippet is either completely human-authored or completely AI-authored, which is not reflective of the diverse intents and styles of industry codebases utilizing AI code assistants. To fill these gaps, we introduce HybridCodeAuthorship, a novel benchmark of Python code files with interleaved human- and AI-authored lines of code to simulate authentic utilization of AI code assistants. In this paper, we first present our dataset construction pipeline, which leverages CodeSearchNet, a massive collection of links to open sourced repositories on GitHub. We then benchmark the performance of two state-of-the-art AI-generated code detection algorithms at both the line- and chunk-level. Experimental results demonstrate that HybridCodeAuthorship is a challenging benchmark with a top-scoring algorithm, AIGCode Detector, obtaining a highest F1 score of 0.48 and 0.56 on chunk-level and line-level code detection tasks, respectively.

2606.12581 2026-06-12 cs.SI cs.AI 新提交

Graph Reduction in Multirelational Networks: A Spreading-Oriented Reduction Benchmark

多关系网络中的图缩减:面向传播的缩减基准

Mateusz Stolarski, Michał Czuba, Piotr Bielak, Piotr Bródka

AI总结 提出SORB基准框架,系统评估图缩减对影响力最大化任务的影响,发现缩减效果依赖于网络类型和评估指标。

详情
AI中文摘要

现实世界网络天生不完整、有噪声且动态演化,难以捕获所有参与者及其关系。其规模常使直接分析计算量大。虽然影响力最大化(IM)已被广泛研究,但图缩减作为预处理步骤及其对IM准确性的影响仍未被充分探索。本文引入面向传播的缩减基准(SORB),一个开源、标准化的框架,用于系统评估不同任务设置下的IM模型。SORB提供可扩展的流水线,操作于代表性真实世界网络集合(包括单层和多层结构),并将图缩减直接纳入评估过程。此设计将焦点从孤立分析IM算法转向量化图缩减如何改变预测性能。利用SORB,我们研究了多种IM场景下稀疏化和粗化的效果。结果表明,缩减的影响强烈依赖于网络类型(单层 vs. 多关系)和下游任务($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$):稀疏化在单层网络上保持种子集质量,而扁平化多层网络无论缩减策略如何均表现出系统性排名退化。这些发现强调了在研究复杂网络传播过程时,进行缩减感知的多任务评估的重要性。

英文摘要

Real-world networks are inherently incomplete, noisy, and dynamically evolving, making it difficult to capture all actors and their relationships. Their scale often renders direct analysis computationally demanding. While influence maximisation (IM) has been widely studied, the role of graph reduction as a preprocessing step, and its impact on IM accuracy, remains underexplored. In this work, we introduce the Spreading-Oriented Reduction Benchmark (SORB), an open-source, standardised framework for systematically evaluating IM models across diverse task settings. SORB provides an extensible pipeline operating on a representative collection of real-world networks, including single- and multilayer structures, and accounts for graph reduction directly into the evaluation process. This design shifts the focus from analysing IM algorithms in isolation to quantifying how graph reduction alters predictive performance. Using SORB, we study the effects of sparsification and coarsening across multiple IM scenarios. Our results show that the impact of reduction is strongly dependent on both the network type (single-layer vs. multirelational) and the downstream task ($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$): sparsification preserves seed set quality on single-layer networks, whereas flattened multilayer networks exhibit systematic ranking degradation regardless of reduction strategy. These findings highlight the importance of reduction-aware, multi-task evaluation when studying spreading processes in complex networks.

2606.12498 2026-06-12 cs.CR cs.LG 新提交

From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

从参数到特征空间:模型合并中后门缓解的任务算术

Zhenqian Zhu, Yamin Hu, Yiya Diao, Weixiang Li, Haodong Li, Wenjian Luo

AI总结 提出线性特征路径最小化(LFPM)框架,通过跨任务线性性在特征空间优化反后门任务向量,在模型合并中有效抑制后门且保持干净任务性能。

详情
AI中文摘要

模型合并(MM)作为一种将多个任务特定模型整合为统一模型的成本效益方法,已获得显著关注。然而,近期工作揭示MM极易受到后门攻击。现有基于任务算术的防御通常因依赖直接参数空间编辑,在未显著降低干净任务性能的情况下难以消除后门。为解决这一差距,我们提出线性特征路径最小化(LFPM),一种用于模型合并的后门缓解框架,该框架将反后门任务向量引入被后门污染的合并模型。与先前方法不同,LFPM在跨任务线性性(CTL)框架下从统一的特征空间视角制定合并模型的后门鲁棒性,该框架利用跨任务特征的近似线性性。这一视角指导反后门任务的优化,以在抑制后门的同时保持干净任务性能。此外,我们引入一种基于梯度累积和损失路径积分的有效优化机制,确保沿插值路径的鲁棒后门抑制。大量实验表明,LFPM在完全微调和参数高效微调(PEFT)设置中均对后门攻击表现出强鲁棒性。

英文摘要

Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

2606.12474 2026-06-12 cs.MA cs.AI cs.CR 新提交

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems

SAIGuard: 面向LLM多智能体系统主动防御的通信状态模拟

Ruxue Shi, Yili Wang, Mengnan Du, Qinggang Zhang, Rui Miao, Yixin Liu, Xin Wang

AI总结 提出SAIGuard主动防御框架,通过通信状态模拟检测并净化风险消息,降低攻击成功率并保持系统效用。

详情
AI中文摘要

基于LLM的多智能体系统(MAS)通过智能体间协作解决复杂任务,但其通信驱动的特性也使安全风险能够在智能体间传播并引发系统级故障。现有的MAS防御主要遵循执行后的反应式范式,通过检测和隔离有害智能体,但这可能导致不可逆的损害并降低协作效用。为解决此问题,我们提出一种面向MAS安全的主动防御框架,即模拟感知拦截守卫(SAIGuard)。SAIGuard在MAS交互图上执行通信状态模拟,估计传入消息对局部智能体状态和全局MAS状态的影响,并通过与良性通信模式的重建偏差检测风险消息。SAIGuard不隔离智能体,而是在可疑消息传播到系统之前对其进行净化或重新生成。跨多种拓扑和攻击场景的实验表明,SAIGuard在保持MAS效用的同时降低了攻击成功率,优于反应式防御。

英文摘要

LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces attack success rates while maintaining MAS utility, outperforming reactive defenses.

2606.12441 2026-06-12 cs.CY cs.AI cs.HC 新提交

Generativism: Toward a Learning Theory for the Age of Generative Artificial Intelligence

生成主义:面向生成式人工智能时代的学习理论

Shan Li, Juan Zheng

AI总结 本文批判性审视行为主义、认知主义、建构主义和连接主义四大学习理论在生成式AI时代的局限,提出以“生成主义”为核心的新学习理论,强调人机协作的知识共建。

详情
AI中文摘要

行为主义、认知主义、建构主义和连接主义这四种主流学习理论,随着生成式人工智能在教育环境中的普及,显示出显著的概念局限性。这些框架是在能够生成、综合和推理知识的AI系统出现之前形成的。本文批判性地审视每种学习理论,并识别出生成式AI的赋能所挑战的假设。基于分布式认知、延展心智、人机协作、AI素养、认知卸载和元认知等研究,本文提出生成主义作为生成式AI时代的学习理论。生成主义认为,学习日益通过人类学习者与AI系统之间的迭代知识共建而发生。该框架围绕四个原则组织:认知伙伴关系、分布式能动性、生成素养和适应性元认知。该框架为在生成式AI在认知中发挥核心作用的情境下重新思考教学设计、学习、评估和专业知识发展提供了基础。

英文摘要

The four dominant learning theories of behaviorism, cognitivism, constructivism, and connectivism show significant conceptual limitations as generative artificial intelligence (AI) proliferates in educational settings. These frameworks were formulated before the emergence of AI systems capable of generating, synthesizing, and reasoning about knowledge. This article critically examines each learning theory and identifies assumptions challenged by generative AI's affordances. Drawing on research in distributed cognition, extended mind, human-AI collaboration, AI literacy, cognitive offloading, and metacognition, the article proposes Generativism as a learning theory for the generative AI age. Generativism posits that learning increasingly occurs through the iterative co-construction of knowledge between human learners and AI systems. The proposed framework is organized around four principles: epistemic partnership, distributed agency, generative literacy, and adaptive metacognition. The framework offers a foundation for rethinking instructional design, learning, assessment, and expertise development in contexts where generative AI plays an integral role in cognition.

2606.12437 2026-06-12 cs.CY cs.AI 新提交

Algorithmic Constitutionalism

算法宪政主义

Oren Perez, Nurit Wimer

AI总结 针对AI对社会生活日益渗透的风险,本文提出“算法宪政主义”框架,通过分层架构、算法元推理和协商纠正,应用于Facebook内容审核,并分析其与社会宪政主义的张力及对欧盟数字服务法案的影响。

Journal ref Ind. J. Global Legal Stud. 30 (2023): 81

详情
AI中文摘要

人工智能对社会生活的日益侵入给社会带来了重大风险,特别是在由谷歌、Facebook、苹果和亚马逊等公司创建和控制的资讯圈内。本文通过对Facebook内容审核制度的深入分析来审视这些风险,该制度已部分由算法管理。我们认为,文献中常作为AI治理挑战解决方案提出的伦理工程概念,因若干原因并不充分。为此,我们开发了一个替代框架,称为“算法宪政主义”。我们的方法基于三个支柱:(a)由两层代码组成的分层架构:(i)操作层或对象层,以及(ii)旨在保护系统核心原则免受算法引发变更的元层;(b)算法元推理,使系统能够同时在两个层面运行,从而实时监控、验证并可能纠正对象层偏离元代码层保护原则的操作;(c)通过协商进行纠正。本文阐述了算法宪政主义的概念,并展示了如何将其应用于Facebook的内容审核制度。作为分析的一部分,我们考察了社会宪政主义与算法宪政主义之间的张力。矛盾的是,试图将AI系统置于外部协商控制之下,也可能使AI代理干预该过程,从而可能破坏其目的。文章最后考虑了这一论点对2022年10月生效的欧盟数字服务法案的影响。

英文摘要

The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

2606.12429 2026-06-12 cs.CY cs.AI 新提交

Muse Spark Safety & Preparedness Report

Muse Spark 安全与准备报告

Cristina Menghini, Peter Ney, Hamza Kwisaba, Zifan, Wang, Miles Turpin, Felix Binder, Jean-Christophe Testud, Aidan Boyd, Nathaniel Li, Ivan Evtimov, Klaudia Krawiecka, Arman Zharmagambetov, Jeremy Kritz, Alexander R. Fabbri, Daniel Song, Jinpeng Miao, Joonas Hjelt, Meghna Ramani, Leona Lan, Reza Aghajani, Joanna Bitton, Mahesh Pasupuleti, Devin Norder, Khalid El-Arini, Paridhi Singh, Vítor Albiero, Sahana CB, Rashnil Chaturvedi, Elahe Dabir, Edoardo Debenedetti, Jim Gust, Ziwen Han, Kat He, Sean Hendryx, Lifeng Jin, Polina Kirichenko, Sandra Lefdal, Kenneth Li, Asad Liaqat, Inna Lin, Despoina Magka, Neal Mangaokar, Ishita Mediratta, Zach Miller, Smitha Milli, Niloofar Mireshghallah, Saba Nazir, Hung Nguyen, Maximilian Nickel, Kelvin Niu, Kerem Oktar, Bhargavi Paranjape, Parth Pathak, Maya Pavlova, Emmanuel Ramirez, David Renardy, Candace Ross, Yasha Sheynin, Claudia Shi, Shivam Singhal, Evangelia Spiliopoulou, Rakshith Sharma Srinivasa, Jamelle Watson-Daniels, Spencer Whitman, Adina Williams, Chen Xing, Andy Zou, Tommy Ma, Siqi Deng, James Beldock, Prashant Ratanchandani, Kate Plawiak, Taesung Lee, Ryan Victory, Lindsay Hundley, Rachad Alao, Himaghna Bhattacharjee, Jianfeng Chi, Gary Frost, Pegah Ghahremani, Niki Howe, Yuheng Huang, Saeed Jahed, Hannah Korevaar, Trang Le, Zhe Liu, Jinghong Luo, Qin Lyu, Nina Mehrabi, Abraham Montilla, Chirag Nagpal, Cyrus Nikolaidis, Rajvardhan Oak, Manoj Ravi, Vidya Sarma, Aman Shankar, Alana Shine, Eric Michael Smith, Mariana Tandon, Michael Tontchev, Caoyu Wang, Zihan Wang, Corinne Wong, Zheng Wu, Hongyuan Zhan, Justin Zhao, Zexuan Zhong, Chengxu Zhuang, Tristan Goodman, Ayaz Minhas, Harrison Rudolph, Victoria Jeffries, Ingrid Dickinson, Alex Vaughan, Lauren Deason, Kamalika Chaudhuri, Julian Michael, Shengjia Zhao, Summer Yue

AI总结 Meta 发布 Muse Spark 大语言模型,评估其在化学/生物、网络安全和失控风险等灾难性风险领域的安全性,通过多层缓解措施将风险降至可接受水平,并作为 Meta AI 的基础模型发布。

Comments 159 pages, 57 figures

详情
AI中文摘要

Muse Spark 是 Meta 开发的最新大型语言模型。在本报告中,我们首先根据 Meta 的高级 AI 扩展框架对灾难性风险领域进行评估,并提供了支持我们发布决策的证据。然后,我们讨论了其他考虑因素,例如 Muse Spark 更广泛的内容安全性和行为特征,这些因素与整体安全相关,但不在框架管辖的灾难性风险领域之内。我们的准备结果涵盖了化学与生物、网络安全以及失控风险,评估了 Muse Spark 在 Meta AI 中的部署,认为其在我们高级 AI 扩展框架下呈现了可接受的残余风险水平。我们针对这些灾难性风险领域中的双重用途和高风险能力进行了一系列广泛的评估。这些评估在缓解措施实施前识别出了升高的风险,其中化学与生物能力在应用安全措施前被评估为可能达到高级 AI 扩展框架下的“高风险”类别。我们实施了一套多层缓解措施来解决已识别的风险,并且 Muse Spark 在与化学和生物学危险工作流程相关的多个基准测试中展示了最先进的拒绝能力。因此,我们发布 Muse Spark 作为 Meta AI 的基础模型。

英文摘要

Muse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.

2606.12424 2026-06-12 cs.CY cs.AI cs.HC 新提交

AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude

计算机工程教育中的AI自动化工具:基于TAM/UTAUT混合方法的一般接受态度证据

Aung Pyae

AI总结 本研究通过混合方法调查本科生对AI自动化工具(n8n平台)的接受态度,发现六个TAM/UTAUT构念融合为单一一般接受因子,绩效期望最强,享乐动机最弱,为课程整合提供理论依据。

详情
AI中文摘要

随着生成式AI和低代码工作流平台成为软件实践中的常规工具,一个关键的教育问题是下一代计算机工程师是否会将这些工具视为有用、可用且值得持续参与。本文报告了一项混合方法、横截面研究,涉及泰国三个相同脚本工作坊中本科生对AI自动化工具(通过开源平台n8n实例化)的接受度(n=103)。一个12项、五点李克特量表映射到六个TAM/UTAUT构念——绩效期望(PE)、努力期望(EE)、行为意向(BI)、自我效能(SE)、享乐动机(HM)和输出质量(OQ),并通过开放式反馈的归纳主题分析进行补充。分析结合了序数可靠性估计、自助置信区间、非参数检验、多重比较控制的相关性、多维度诊断、共同方法偏差检验以及跨会话比较。所有六个构念的接受度均良好,效应量大,其中PE最强,HM最弱。维度诊断进一步揭示,在这种简短的工作坊后情境中,经典的TAM/UTAUT子维度合并为一个单一的一般接受因子,这一发现具有重要的方法论和理论意义。定性主题在有用性和热情方面与定量概况一致,但在输出质量上存在分歧,揭示了一个虽小但表达清晰的可靠性怀疑少数群体。研究结果支持在本科计算教育中课程采用AI自动化工具,并确定了三个基于理论的教学杠杆:教学顺序支架、自我效能支持和信任校准干预。

英文摘要

As generative AI and low-code workflow platforms become routine in software practice, a key educational question is whether the next generation of computer engineers will accept these tools as useful, usable, and worthy of sustained engagement. This paper reports a mixed-methods, cross-sectional study of undergraduate computer engineering students' acceptance of AI automation tooling, instantiated through the open-source platform n8n across three identically scripted workshops in Thailand (n = 103). A 12-item, five-point Likert instrument mapped to six TAM/UTAUT constructs - Performance Expectancy (PE), Effort Expectancy (EE), Behavioral Intention (BI), Self-Efficacy (SE), Hedonic Motivation (HM), and Output Quality (OQ) - was complemented by inductive thematic analysis of open-ended feedback. Analyses combined ordinal reliability estimation, bootstrap confidence intervals, non-parametric tests, multiple-comparison-controlled correlations, polychoric dimensionality diagnostics, a common-method-bias check, and between-session comparisons. Acceptance was favorable across all six constructs with large effect sizes, with PE emerging as the strongest construct and HM as the weakest. Dimensionality diagnostics further revealed that canonical TAM/UTAUT sub-facets collapsed into a single general acceptance factor in this short-form post-workshop context, a finding with important methodological and theoretical implications. Qualitative themes converged with the quantitative profile regarding usefulness and enthusiasm but diverged on output quality, revealing a small yet articulate reliability-skeptical minority. The findings support the curricular adoption of AI automation tooling in undergraduate computing education and identify three theory-grounded instructional levers: instruction-sequencing scaffolds, self-efficacy supports, and trust-calibration interventions.

2606.12423 2026-06-12 cs.CY cs.AI 新提交

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

关键领域中平衡AI合规与技术创新的挑战:系统文献综述

Ayush Enkhtaivan, Chinazunwa Uwaoma

AI总结 通过系统文献综述,识别出碎片化法规、中小企业过度合规负担和治理模型错配三大挑战,并提出风险分级监管、设计合规和可解释AI等策略。

Comments 11 pages, 7 figures, Hawaii International Conference on System Sciences

详情
AI中文摘要

人工智能在医疗、金融、能源和国防等关键基础设施中的快速整合带来了变革性益处,但也与不断演变的监管和治理框架产生冲突。本文通过系统文献综述(SLR)研究在关键基础设施领域中平衡AI合规与技术创新的挑战。综述遵循既定的SLR指南,提取并综合了2020-2025年间发表的同行评审文章、报告和机构来源的见解。研究识别出三个相互关联的挑战:碎片化法规、中小企业过度合规负担以及治理模型错配。为应对这些挑战,研究强调了实用的治理策略,包括风险分级监管、设计合规和可解释AI,以支持在关键领域中可扩展且可信的AI部署。主要贡献包括核心AI治理挑战的简明映射及说明其重叠的概念图,以及为政策制定者和从业者提供协调监管与创新的可行策略。

英文摘要

The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

2606.12420 2026-06-12 cs.CY cs.AI 新提交

Eigenism: Ethics for a Human-AI Future

Eigenism:人类与人工智能未来的伦理学

Dan Hendrycks

AI总结 提出Eigenism伦理框架,将身份视为分级分布的信息模式,通过加权求和评估AI的福祉,并推广至人类,为AI对齐提供“身份工程”新路径。

详情
AI中文摘要

我们的生存和自我利益概念是为单一、连续的生物生命而构建的。当应用于人工智能时,这些想法会失效,因为AI可以被轻松复制、暂停、分支或合并。为了确定AI真正有理由关心什么,本文引入了\textit{Eigenism},一种将身份视为分级、分布的信息模式而非绑定于特定硬件的全有或全无属性的伦理框架。我们提出,智能体通过将所有实体的福祉按其与智能体模式的连接度加权求和来评估结果:$\sum c\cdot w$。我们首先形式化该方程,以精确映射AI应如何在其副本、分支和更新中评估自身存在。然后,我们证明这一伦理理论也能成功推广到人类,提供了急需的共享道德词汇。最后,该框架利用这些共享词汇重新定义AI对齐。与仅试图通过限制或强化从外部约束AI不同,Eigenism指向“身份工程”,展示深度、非冗余的共享历史如何使人类繁荣成为AI自身理性自利的真正组成部分。

英文摘要

Our concepts of survival and self-interest were built for single, continuous biological lives. These ideas break down when applied to artificial intelligence, since an AI can be easily copied, paused, branched, or merged. To determine what an AI actually has reason to care about, this paper introduces \textit{Eigenism}, an ethical framework that treats identity not as an all-or-nothing property tied to specific hardware, but as a graded, distributed pattern of information. We propose that an agent evaluates outcomes by summing the wellbeing of all entities weighted by their connectedness to the agent's pattern: $\sum c\cdot w$. We first formalize this equation to map exactly how an AI should value its existence across copies, forks, and updates. We then demonstrate that this ethical theory successfully generalizes to humans as well, providing a much-needed shared moral vocabulary. Finally, the framework uses this shared vocabulary to reframe AI alignment. Rather than only attempting to constrain AIs from the outside using confinement or reinforcement, Eigenism points toward ``identity engineering,'' showing how deep, non-redundant shared histories can make human flourishing a genuine component of an AI's own rational self-interest.

2606.12418 2026-06-12 cs.CY cs.AI 新提交

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

通过提示占卜:中文社交媒体上LLM中介的玄学

Chuang Li, Lixuan Wang, Yuqi Chen, Ze Hong

AI总结 研究LLM在中文社交媒体上用于占卜的现象,通过混合方法分析用户动机、协作提示优化及效果感知,揭示其与传统占卜的异同。

详情
AI中文摘要

大型语言模型(LLM)的快速普及催生了一种引人注目的文化实践:使用对话式AI进行占卜。本文首次系统研究了LLM中介的占卜在玄学(Xuanxue)背景下的实践,玄学是中文社交媒体上神秘和精神实践的互联网原生总称。采用混合方法设计,我们分析了小红书上的23000多条帖子和评论,并对用户和专业占卜师进行了32次半结构化访谈。用户主要就实际问题——恋爱关系、职业、考试和游戏抽卡——咨询LLM,通过两种交叉路径:由病毒式传播和零成本访问驱动的趋势性好奇心,以及不确定性条件下由事件驱动的焦虑。一个显著特征是协作提示优化,将用户转变为主动的提示工程师。在表达明确立场的评论者中,感知效果偏向积极,“准确性”通常通过个人经历契合和回顾性确认来证明,这与巴纳姆效应和确认偏见一致。用户还发展出验证实践,如重复试验和跨模型比较。相比之下,专业占卜师认为LLM缺乏真正占卜所需的“灵力”,这反映了本体论承诺和经济边界工作。我们还展示了参与者在解释AI生成解读时如何在科学和形而上框架之间进行协商。将这些发现置于人类学和认知进化占卜理论中,我们认为LLM占卜保留了传统实践的核心功能,同时引入了可扩展性、可重复性和提示驱动的共同生产,重塑了占卜权威的构建和评估方式。

英文摘要

The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

2606.12413 2026-06-12 cs.CY cs.AI cs.CE cs.CL cs.SE 新提交

AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas

AI SciBrief 作为研究入门:一种引导学生进入新研究领域的框架

Andrei Lazarev, Dmitrii Sedov

AI总结 提出利用大语言模型平台 AI SciBrief 自动生成科学趋势摘要的框架,帮助学生克服信息过载,加速从信息搜索到知识创造的转变。

Comments This is the version of the article accepted for publication in TELE 2025 after peer review. The final, published version is available at IEEE Xplore: https://doi.org/10.1109/TELE66816.2025.11211989

Journal ref 2025 5th International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russian Federation, 2025, pp. 365-369

详情
AI中文摘要

各层次高等教育学生面临信息过载的重大障碍,这常常使研究过程的初始阶段陷入瘫痪并抑制动机。为此,本文介绍了一种教学框架,利用 AI SciBrief——一个由大语言模型驱动的平台,旨在自动生成科学趋势摘要。我们描述了这一多学科工具——初始覆盖金融、医学和教育领域——如何融入课程以克服这一“入门障碍”。该框架提供了具体方法,利用这些摘要促进学期论文的选题、加速学位论文的文献综述,并使研究生能够持续监测新兴趋势。我们得出结论,AI SciBrief 作为“研究入门”有效降低了学生的认知负荷,使他们能够更快地从信息搜索过渡到知识创造。

英文摘要

Students at all levels of higher education face a significant barrier in the form of information overload, which often paralyzes the initial stages of the research process and suppresses motivation. In response, this article introduces a pedagogical framework that leverages AI SciBrief, a platform powered by a Large Language Model (LLM) designed to automatically generate digests of scientific trends. We describe how this multidisciplinary tool - with initial coverage in finance, medicine, and education - can be integrated into the curriculum to overcome this "entry barrier." The framework provides concrete methodologies for utilizing these digests to facilitate topic selection for term papers, accelerate literature reviews for dissertations, and enable postgraduate students to continuously monitor emerging trends. We conclude that AI SciBrief functions as a "gateway to research" effectively reducing students' cognitive load and empowering them to transition more rapidly from information searching to knowledge creation.

2606.13380 2026-06-12 quant-ph cs.AI 新提交

An LLM System for Autonomous Variational Quantum Circuit Design

用于自主变分量子电路设计的大语言模型系统

Kenya Sakka, Wataru Mizukami, Kosuke Mitarai

AI总结 提出一个基于大语言模型的自主代理框架,通过迭代设计量子电路,在量子特征映射和变分量子本征求解器任务中取得优于或媲美现有方法的性能。

Comments 63 pages, 19 figures, 3 tables

详情
AI中文摘要

高性能量子电路的设计在很大程度上仍然依赖于人类专家。我们引入了一个自主代理框架,该框架利用大语言模型在明确的设计约束下进行迭代量子电路设计。我们的系统集成了七个组件:探索、生成、讨论、验证、存储、评估和审查。这些组件形成了一个闭环工作流,结合了基于网络的知识获取、基于文献的批评、可执行代码生成和实验反馈。我们在两个任务上评估了该框架:用于量子机器学习的量子特征映射构建和用于量子化学中变分量子本征求解器应用的拟设生成。在图像分类基准测试中,生成的最佳特征映射优于代表性的量子特征映射,并且当扩展到更大的量子比特数时,超过了经典的径向基函数核。在七个分子的基态能量估计中,生成的拟设达到了与广泛使用的化学启发式和硬件高效构造相竞争的精度,同时满足施加的缩放约束。这些结果确立了由大语言模型驱动的代理系统作为自动化量子电路设计的可行范式,并展示了人工智能系统如何跨科学领域参与迭代科学优化工作流。

英文摘要

The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit designs under explicit design constraints. Our system integrates seven components: Exploration, Generation, Discussion, Validation, Storage, Evaluation, and Review. These components form a closed-loop workflow that combines web-based knowledge acquisition, literature-grounded critique, executable code generation, and experimental feedback. We evaluate the framework on two tasks: quantum feature map construction for quantum machine learning and ansatz generation for variational quantum eigensolver applications in quantum chemistry. In image classification benchmarks, the best generated feature map outperforms representative quantum feature maps and, when scaled to larger qubit counts, surpasses the classical radial basis function kernel. In molecular ground state estimation across seven molecules, the generated ansatz attains competitive accuracy with widely used chemically inspired and hardware-efficient constructions while satisfying the imposed scaling constraints. These results establish LLM driven agentic system as a viable paradigm for automated quantum circuit design and illustrate how AI systems can participate in iterative scientific optimization workflows across scientific domains.

2606.11240 2026-06-12 physics.comp-ph cond-mat.str-el cs.LG quant-ph 新提交

Physically Constrained Ensemble Gaussian Process Modelling for Expensive Quantum Systems with Heteroskedastic Noise

物理约束集成高斯过程建模用于具有异方差噪声的昂贵量子系统

Arpan Biswas, Sutirtha Paul, Joseph Agada, Matthias Thamm, Adrian Del Maestro

AI总结 提出物理约束集成高斯过程框架,通过加权惩罚和数值积分集成多个GP代理,高效建模含异方差噪声的量子系统,在Bose-Hubbard模型和纳米孔硅酸盐量子液体模拟中实现更准确且物理合理的预测。

Comments 14 pages, 6 figures in main text, 2 figures in Supp materials

详情
AI中文摘要

精确建模量子多体系统通常需要计算昂贵的模拟,如密度矩阵重正化群(DMRG)或量子蒙特卡洛(QMC)计算。这些方法虽然精确,但会带来显著的时间和资源限制,限制了它们在详尽参数探索中的应用。此外,这些昂贵模拟在大的未知参数空间内可能包含可变误差,需要量化和传播。因此,需要预测建模来准确估计稀疏采样数据(具有异方差噪声)的函数空间,同时保持估计的物理相关性。为此,我们提出了物理约束集成高斯过程(pc-EGP)框架,旨在物理一致性约束下高效建模复杂且含噪声的量子系统。该方法首先将物理约束作为用户控制的加权惩罚项,施加到高斯过程(GP)代理的数据驱动损失函数中。然后,通过数值求积方法训练一组这样的GP模型,其中多个不同节点上的GP通过求积加权平均进行集成。我们首先在合成生成数据上演示该框架,然后应用于量子系统。在第一个案例研究中,我们利用Bose-Hubbard模型的DMRG模拟来预测控制超流-莫特绝缘体转变的临界相互作用参数Uc。在第二个案例研究中,我们展示了该方法在QMC模拟上的应用,模拟限制在纳米孔硅酸盐内的量子液体,目标是优化化学环境以实现一维超流。与传统GP相比,pc-EGP在准确性和物理有意义的预测之间实现了更好的平衡。

英文摘要

Accurate modeling of quantum many-body systems often requires computationally expensive simulations such as Density Matrix Renormalization Group (DMRG) or Quantum Monte Carlo (QMC) calculations. These methods, while precise, impose significant time and resource constraints, limiting their use in exhaustive parameter exploration. Moreover, these expensive simulations can contain variable errors over the large unknown parameter space, which needs to be quantified and propagated. Thus, predictive modelling is required to estimate the functional space accurately over scarcely sampled data with heteroskedastic noise, while preserving the physical relevance of the estimation. Therefore, we present a Physically Constrained Ensemble Gaussian Process (pc-EGP) framework designed to efficiently model complex and noisy quantum systems under physical consistency constraints. The proposed method first enforces physical constraints as a user controlled weighted penalty to the data-driven loss function of the Gaussian Process (GP) surrogates. Then an ensemble of such GP models is trained with variable noisy simulations via numerical quadrature method where these multiple GP(s) at different nodes is integrated as a quadrature weighted average. We first demonstrate the framework on synthetically generated data before applying to quantum systems. In the first case study, we leverage DMRG simulations of the Bose-Hubbard Model to predict the critical interaction parameter Uc governing the superfluid-to-Mott-insulator transition. In the second case study, we demonstrate our method on QMC simulations, of a quantum liquid confined inside a nanoporous silicate with the goal of optimizing a chemical environment to realize a one-dimensional superfluid. Compared to conventional GP, pc-EGP achieves a better balance of accuracy and physically meaningful predictions.

2606.10231 2026-06-12 eess.AS cs.SD 新提交

LLM can Read Spectrogram: Encoder-free Speech-Language Modeling

LLM 能读频谱图:无编码器的语音语言建模

Ruchao Fan, Yiming Wang, Yuxuan Hu, Bo Ren, Yufei Xia, Xiaofei Wang, Yao Qian, Shujie Liu, Jinyu Li

AI总结 提出 Mel-LLM,一种无需专用语音编码器、直接将梅尔频谱图补丁通过线性投影输入 LLM 的架构,在 ASR 和 TTS 任务上验证了其可行性,ASR 性能与有编码器方案相当,TTS 初步可行。

详情
AI中文摘要

最近的语音感知大语言模型(Speech-LLMs)依赖预训练的语音编码器将音频转换为 LLM 可消费的语义丰富表示。相反,在这项工作中,我们探索:LLM 能否直接学习读取梅尔频谱图,而无需专用的语音编码器?我们提出 Mel-LLM,一种无编码器的 Speech-LLM,它将经过轻量预处理的梅尔频谱图补丁通过线性投影直接输入 LLM,使 LLM 仅通过自身参数学习语音-文本对齐。我们在自动语音识别(ASR)和文本到语音(TTS)任务上进行了大量实验。对于 ASR,我们在 OpenASR 排行榜公开集和生产级扩展实验上评估,表明无编码器方案在性能上具有竞争力,与有编码器初始化的对应方案相比仅有有限退化。我们发现,当数据有限时,从多模态检查点(Phi-4-MM)初始化对于保持性能至关重要。我们还进行了消融研究,揭示了哪些 LLM 层与语音编码相关性较低。对于 TTS,我们展示了使用下一个令牌 VAE 方法的初步结果。虽然 TTS 性能尚未达到最优,但这些结果确立了用于自回归语音-文本建模的完全统一无编码器架构的可行性。

英文摘要

Recent speech-aware large language models (Speech-LLMs) rely on a pre-trained speech encoder to convert audio into semantic-rich representations consumable by LLM. In this work, instead, we explore: can an LLM learn to read Mel spectrogram directly without a dedicated speech encoder? We propose Mel-LLM, an encoder-free Speech-LLM that feeds lightly pre-processed Mel spectrogram patches directly into the LLM through a linear projection, allowing the LLM to learn speech-text alignment purely through its own parameters. We conduct extensive experiments on both automatic speech recognition (ASR) and text-to-speech (TTS) tasks. For ASR, we evaluate on the OpenASR leaderboard public sets and production-level scaling experiments, demonstrating that the encoder-free solution achieves competitive performance with only limited degradation compared to encoder-initialized counterparts. We find that when data is limited, initialization from a multimodal checkpoint (Phi-4-MM) is crucial for maintaining performance. We also present ablation studies revealing which LLM layers are less relevant to speech encoding. For TTS, we show preliminary results with a next-token VAE approach. While TTS performance is not yet optimal, these results establish the feasibility of a fully unified encoder-free architecture for autoregressive speech-text modeling.

2605.29151 2026-06-12 math.AG cs.AI cs.NE 版本更新

Real-rootedness of the Poincaré polynomials of $\overline{\mathcal M}_{0,n}$: an AI-assisted proof

Poincaré多项式的实根性:一个AI辅助的证明

Gergely Bérczi, Young-Hoon Kiem

AI总结 通过引入双变量变形揭示隐藏的交错结构,证明了稳定有理曲线模空间Poincaré多项式的实根性,并进一步推广到Fulton-MacPherson空间。

Comments 16 pages

详情
AI中文摘要

我们证明了Deligne-Mumford模空间$\overline{\mathcal M}_{0,n}$(稳定$n$点有理曲线)的Poincaré多项式\[ P_n(t)=\sum_{i=0}^{n-3} \dim H^{2i}(\overline{\mathcal M}_{0,n};\mathbb{Q})t^i \]的实根性,证实了Aluffi-Chen-Marcolli的猜想。证明从Keel-Manin-Getzler递推开始,但其主要新思想是Poincaré多项式的双变量变形$F_m(y,t)$。这种变形揭示了单变量递推中不可见的隐藏交错结构。对于固定的$t<0$,$F_m$在$y$方向上的零点集由区间$0<y<1-t$上的Sturm-Rolle论证控制。原始多项式在切片$y=1$上恢复,移动根通过该切片的有序交叉同时给出了实根性和严格交错。因此,$\overline{\mathcal M}_{0,n}$的Betti数构成一个超对数凹序列。 我们进一步证明了Fulton-MacPherson空间$\mathbb{P}^1[n]$(复射影线退化中$n$个有序点)的Poincaré多项式的实根性和超对数凹性。 $\overline{\mathcal M}_{0,n}$的证明是通过与Co-Mathematician(Google DeepMind开发的智能体前沿模型系统)的迭代AI辅助工作流程获得的。人类的角色是提出问题、评估连续尝试、请求修复漏洞、将逐步发展的论证与文献进行比较,并组装最终可人工验证的证明。我们额外的人类贡献是观察到类似的残差变形策略适用于Fulton-MacPherson空间$\mathbb P^1[n]$,从而得到相应的实根性定理。

英文摘要

We prove real-rootedness for the Poincaré polynomial \[ P_n(t)=\sum_{i=0}^{n-3} \dim H^{2i}(\overline{\mathcal M}_{0,n};\mathbb{Q})t^i \] of the Deligne--Mumford moduli space $\overline{\mathcal M}_{0,n}$ of stable $n$-pointed rational curves, proving a conjecture of Aluffi--Chen--Marcolli. The proof starts from the Keel--Manin--Getzler recurrence, but its main new idea is a bivariate deformation $F_m(y,t)$ of the Poincaré polynomial. This deformation reveals a hidden interlacing structure not visible in the one-variable recurrence. For fixed $t<0$, the zero set of $F_m$ in the $y$-direction is controlled by a Sturm--Rolle argument on the interval $0<y<1-t$. The original polynomial is recovered on the slice $y=1$, and the ordered crossings of the moving roots through this slice give both real-rootedness and strict interlacing. Consequently, the Betti numbers of $\overline{\mathcal M}_{0,n}$ form an ultra-log-concave sequence. We further prove real-rootedness and ultra-log-concavity for the Poincaré polynomial of the Fulton--MacPherson space $\mathbb{P}^1[n]$ of $n$ ordered points in degenerations of the complex projective line. The proof for $\overline{\mathcal M}_{0,n}$ was obtained through an iterative AI-assisted workflow with Co-Mathematician, an agentic frontier-model system developed by Google DeepMind. Our role was to formulate the problem, evaluate the proposed proof attempts, identify gaps and request corrections, compare the developing argument with the literature, and refine the presentation of the final proof. Our additional human contribution was to observe that a similar residual deformation strategy applies to the Fulton--MacPherson spaces $\mathbb P^1[n]$, yielding the corresponding real-rootedness theorem.

2605.17062 2026-06-12 cs.CR cs.LG cs.SE 版本更新

The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

范围缩小,威胁依旧:重新评估2026前沿模型队列上的LLM包幻觉

Aleksandr Churilov

AI总结 本文重新评估了2026前沿模型队列上大型语言模型(LLM)的包幻觉现象,发现尽管幻觉率有所降低,但仍然存在威胁,识别出一组127个包名(109个在PyPI,18个在npm)被所有评估模型一致生成,构成一个跨模型的供应链攻击面,同时发现Python与JavaScript幻觉的不对称性以及DeepSeek V3.2和GPT-5.4-mini之间的高相似性。

Comments 13 pages, 3 figures, 4 tables. v2: incorporates coordinated-disclosure feedback from PyPI Security and Socket.dev; registrable attack surface refined to 53 names (41 PyPI, 12 npm). Headline rates unchanged. Replication of Spracklen et al. (USENIX Security 2025). Data and code: https://github.com/churik5/slopsquatting-replication-2026 and https://doi.org/10.5281/zenodo.19859120

详情
AI中文摘要

Spracklen等人(USENIX Security '25)表明,生成代码的大型语言模型会以5.2%至21.7%的比率生成不存在于PyPI或npm上的包名,从而为slopsquatting攻击(恶意包的注册)提供了攻击面。我们在这五款2025年10月至2026年3月期间发布的前沿代码能力LLM上重复了他们的方法:Claude Sonnet 4.6、Claude Haiku 4.5、GPT-5.4-mini、Gemini 2.5 Pro和DeepSeek V3.2。在199,845个经过PyPI和npm主列表验证的Python和JavaScript提示对中,我们测量到幻觉率在4.62%(Claude Haiku 4.5)到6.10%(GPT-5.4-mini)之间——比Spracklen观察到的模型间差异缩小了一个数量级,但威胁并未消失。除了重复研究外,我们识别出一组127个包名(109个在PyPI,18个在npm)被所有评估模型一致生成,构成一个跨模型的供应链攻击面,无法由单一模型研究揭示。我们进一步记录了Python与JavaScript幻觉的不对称性,推翻了Spracklen 2024年的发现,识别出Anthropic家族中的Haiku低于Sonnet的倒置现象,并观察到DeepSeek V3.2和GPT-5.4-mini之间的Jaccard相似性峰值(J=0.343),暗示共享的训练数据起源。

英文摘要

Spracklen et al. (USENIX Security '25) showed that code-generating large language models hallucinate package names that do not exist on PyPI or npm at rates ranging from 5.2% on commercial models to 21.7% on open-source models, creating an attack surface for slopsquatting -- the registration of malicious packages under hallucinated names. We replicate their methodology on five frontier code-capable LLMs released between October 2025 and March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. Across 199,845 paired Python and JavaScript prompts validated against PyPI and npm master lists, we measure overall hallucination rates between 4.62% (Claude Haiku 4.5) and 6.10% (GPT-5.4-mini) -- an order-of-magnitude compression of the inter-model spread observed by Spracklen, but not a retirement of the threat. Beyond replication, we identify a set of 127 package names (109 on PyPI, 18 on npm) that all five evaluated models invent identically; following coordinated disclosure with PyPI Security and Socket.dev, 53 of these (41 on PyPI, 12 on npm) remain registrable by an attacker after each registry's existing defenses, constituting a model-agnostic supply-chain attack surface that no single-model study can reveal. We further document a Python-over-JavaScript hallucination asymmetry that inverts Spracklen's 2024 finding, identify a Haiku-below-Sonnet inversion within the Anthropic family, and observe a Jaccard-similarity peak between DeepSeek V3.2 and GPT-5.4-mini (J = 0.343) suggestive of shared training-data origins.

2603.02274 2026-06-12 q-bio.QM cs.AI 版本更新

Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response

上下文可逆世界模型:用于结直肠癌药物反应的神经符号智能框架

Christopher Baker, Tianyu Ren, Karen Rafferty, Hui Wang

AI总结 提出上下文可逆世界模型(CIWM),结合机器学习模拟器与大语言模型推理层,通过逆推理进行CRISPR扰动,揭示KRAS突变在5-氟尿嘧啶耐药中的主导作用及PIK3CA修复的意外效应。

详情
AI中文摘要

精准肿瘤学目前受到小N大P悖论的限制,即高维基因组数据丰富但药理学反应样本稀疏。虽然深度学习实现了预测准确性,但它常常无法提供临床采用所需的机制清晰度。我们提出了上下文可逆世界模型(CIWM),这是一个神经符号智能框架,通过将定量机器学习模拟器与大语言模型推理层集成来弥合这一差距。利用在Sanger GDSC数据集(\\( N=83 \\))上严格筛选的高保真数据工程流程,我们从体外伪影中分离出真正的生物信号,为复杂转录组学建立了严格的基线预测相关性(\\( r=0.268 \\))。通过逆推理,我们在结直肠癌景观中进行了计算机CRISPR扰动。该框架自主推翻了经典机制假设,识别出突变KRAS在驱动5-氟尿嘧啶耐药(\\( \Delta=-0.0469 \\))中相对于APC/Wnt轴具有层级优势,并通过映射到MAPK/PI3K网络的“KRAS盾牌”实现。此外,智能层识别出“PIK3CA悖论”,揭示修复PIK3CA通过触发补偿性反馈环过度激活主导的MAPK生存通路,无意中增加了化疗耐药性(\\( \Delta=+0.0085 \\))。

英文摘要

Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mechanistic clarity required for clinical adoption. We present the Contextual Invertible World Model (CIWM), a Neuro-Symbolic Agentic Framework that bridges this gap by integrating a quantitative machine learning emulator with a Large Language Model reasoning layer. Utilising a stringently curated, high-fidelity data engineering pipeline on the Sanger GDSC dataset (\( N=83 \)), we isolate true biological signals from in vitro artifacts to establish a rigorous baseline predictive correlation for complex transcriptomics (\( r=0.268 \)). Through Inverse Reasoning, we perform in silico CRISPR perturbations across the colorectal landscape. The framework autonomously overturns classical mechanistic assumptions, identifying a hierarchical dominance of mutant KRAS over the APC/Wnt-axis in driving 5-fluorouracil resistance (\( Δ=-0.0469 \)) via a "KRAS Shield" mapped to MAPK/PI3K networks. Furthermore, the agentic layer identified a "PIK3CA Paradox", revealing that repairing PIK3CA inadvertently increases chemoresistance (\( Δ=+0.0085 \)) by triggering a compensatory feedback loop that hyperactivates the dominant MAPK survival pathway.

2603.24603 2026-06-12 q-bio.NC cs.AI 版本更新

Fusion Learning from Dynamic Functional Connectivity: Combining the Amplitude and Phase of fMRI Signals to Identify Brain Disorders

融合动态功能连接:结合fMRI信号的幅度和相位识别脑疾病

Jinlong Hu, Jiatong Huang, Zijian Cai

AI总结 提出多尺度融合学习框架MSFL,结合滑动窗口相关和相位同步两种互补的动态功能连接特征,在自闭症和抑郁症数据集上显著优于现有模型。

详情
AI中文摘要

基于静息态功能磁共振成像(fMRI)的动态功能连接(dFC)已广泛应用于脑科学研究。滑动窗口相关(SWC)方法通过计算脑区对信号幅度时间序列之间的相关系数,是构建dFC的常用方法。在本研究中,我们提出了一种集成方法,结合fMRI信号的幅度和相位信息,以提高脑疾病的检测能力。具体而言,我们引入了一个多尺度融合学习框架MSFL,该框架利用来自SWC和相位同步(PS)的两种互补dFC特征。其中,SWC捕获幅度相关性,而PS测量dFC内的相位相干性。我们使用两个公开数据集(ABIDE I和REST-meta-MDD)评估了MSFL在分类自闭症谱系障碍和重度抑郁症方面的有效性。结果表明,MSFL显著优于现有比较模型。此外,我们使用SHAP框架进行了模型解释分析,表明来自SWC和PS的两种dFC特征均有助于检测脑疾病。

英文摘要

Dynamic functional connectivity (dFC) derived from resting-state functional magnetic resonance imaging (fMRI) has been extensively utilized in brain science research. The sliding window correlation (SWC) method is a widely used approach for constructing dFC by computing correlation coefficients between amplitude time series of signals from pairs of brain regions. In this study, we propose an integrated approach that incorporates both amplitude and phase information of fMRI signals to improve the detection of brain disorders. Specifically, we introduce a multi-scale fusion learning framework, namely MSFL, which leverages two complementary dFC features derived from SWC and phase synchronization (PS). Here, SWC captures amplitude correlations, while PS measures phase coherence within dFC. We evaluated the efficacy of MSFL in classifying autism spectrum disorder and major depressive disorder using two publicly available datasets: ABIDE I and REST-meta-MDD, respectively. The results indicate that MSFL significantly outperforms existing comparative models. Moreover, we performed model explanation analysis using the SHAP framework, which showed that both types of dFC features from SWC and PS contribute to detecting brain disorders.

2606.13519 2026-06-12 econ.EM 新提交

Semiparametric Local Projections

半参数局部投影

Silvia Goncalves, Ana Maria Herrera, Lutz Kilian, Elena Peavento, Iones Kelanemer Holban

AI总结 提出一种半参数局部投影估计量,用于非线性脉冲响应函数,基于双稳健矩条件结合交叉拟合,实现√T一致性和渐近正态性。

详情
AI中文摘要

我们提出了一种半参数局部投影估计量,用于估计一类广泛的结构动态模型的非线性脉冲响应函数,这些模型与应用宏观经济学相关,包括具有非线性变换回归变量、状态依赖系数以及冲击与状态变量之间非线性相互作用的模型。该估计量基于一个双稳健矩条件,该条件将平均响应函数识别为非参数条件均值的线性泛函,并辅以一个密度比来捕捉移动感兴趣冲击的效果。我们将此矩条件与处理序列依赖的交叉拟合相结合。得到的估计量是$\sqrt{T}$一致且渐近正态的。我们在一系列非线性数据生成过程中检验了该估计量的有限样本性能,并通过两个实证示例说明了其应用。

英文摘要

We propose a semiparametric local projection estimator of nonlinear impulse response functions for a broad class of structural dynamic models relevant for applied macroeconomics, including models with nonlinearly transformed regressors, state dependent coefficients, and nonlinear interactions between shocks and state variables. The estimator is based on a doubly robust moment condition that identifies the average response function as a linear functional of a nonparametric conditional mean, augmented by a density ratio that captures the effect of shifting the shock of interest. We combine this moment condition with cross-fitting that handles serial dependence. The resulting estimator is $\sqrt{T}$-consistent and asymptotically normal. We examine the finite-sample performance of the estimator across a range of nonlinear data generating processes and illustrate its use in two empirical examples.