arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
2606.02240 2026-06-03 cs.CR cs.AI cs.CL cs.ET

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

AgentRedBench: 针对SaaS集成的LLM代理的动态红队测试与集成感知防御

Hiskias Dingeto, William Leeney

发表机构 * StackOne Technologies(StackOne技术公司)

AI总结 针对LLM代理在工具使用中面临的间接提示注入威胁,提出动态红队基准AGENTREDBENCH(覆盖24个企业集成、5种攻击类型)和基于集成多样语料训练的防御模型AGENTREDGUARD,将攻击成功率从69.9%降至2.4%,误报率仅0.37%。

详情
AI中文摘要

工具使用代理中的间接提示注入是一个具体的生产威胁:LLM代理读取来自集成(通过工具调用访问的第三方服务,如Gmail、Salesforce或Jira)的响应内容,用户既未编写也无法控制这些内容。现有基准低估了该威胁:大多数仅覆盖少量集成,且每次运行重复相同的攻击载荷,而开源防护模型是在聊天风格数据而非工具响应内容上训练的。我们引入了AGENTREDBENCH,这是一个动态的LLM驱动的红队测试基准,包含215个微妙的未明确授权场景(在用户请求授权边界上的攻击),涵盖9个功能家族、24个企业集成和5种攻击类型。在八模型面板(Anthropic、OpenAI、Google)上,无防护的攻击成功率(ASR)范围从32%(Claude Sonnet 4.6)到81%(Gemini 3 Flash)。为了保持场景集不在训练语料中,并随时间保持标题ASR的意义,我们开源了代码库、集成模式和AGENTREDGUARD模型;规范场景通过维护者中介渠道进行评估,具有不可变版本控制。我们随基准发布了AGENTREDGUARD:一个在集成多样化的对抗性工具响应内容语料上训练的防护模型。AGENTREDGUARD将面板ASR从69.9%降至2.4%,误报率为0.37%,在两个指标上均优于所有具有非平凡检测能力的开源基线(Llama Guard、PromptGuard 2、ProtectAI)。跨集成和跨攻击类型的保留测试均证实了增益在训练子集之外具有迁移性。

英文摘要

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls) whose response content the user neither writes nor controls. Existing benchmarks under-measure the threat: most cover only a handful of integrations with the same attack payload replayed across runs, and open-source guards are trained on chat-style data rather than tool-response content. We introduce AGENTREDBENCH, a dynamic LLM-driven redteaming benchmark of 215 subtle underspecified authorization (attacks at the boundary of what the user's request authorises) scenarios across 24 enterprise integrations in nine functional families and five attack types. Across an eight-model panel (Anthropic, OpenAI, Google), no-guard ASR (attack success rate) ranges from 32% (Claude Sonnet 4.6) to 81% (Gemini 3 Flash). To keep the scenario set out of training corpora and preserve headline ASR meaning over time, we release the codebase, integration schemas, and AGENTREDGUARD model openly; the canonical scenarios are evaluated through a maintainer-mediated channel with immutable versioning. We release AGENTREDGUARD alongside the benchmark: a guard trained on an integration-diverse corpus of adversarial tool-response content. AGENTREDGUARD cuts panel ASR from 69.9% to 2.4% at 0.37% false-positive rate, outperforming every open-source baseline with non-trivial detection (Llama Guard, PromptGuard 2, ProtectAI) on both axes. Cross-integration and cross-attack type holdouts both confirm the gain transfers beyond the training subset.

2606.01472 2026-06-03 cs.DC cs.AI cs.LG

Hierarchical Online Prompt Mutation with Dual-Loop Feedback for Guardrailed Evidence Document Generation: A Production-Evaluation Case Study

分层在线提示变异与双环反馈用于有护栏的证据文档生成:生产评估案例研究

Nataraj Agaram Sundar, Tejas Morabia

发表机构 * eBay Inc.(eBay公司)

AI总结 提出分层在线提示变异框架HOPM,通过双环反馈(人工审核与自动评判)优化提示策略,在真实市场纠纷证据生成中显著提升胜率和质量。

Comments 7 pages. Production-evaluation case study of guardrailed LLM evidence-document generation

详情
AI中文摘要

高风险生产文档生成系统要求语言模型具有适应性、基于证据且可审计。我们提出HOPM,一种分层在线提示变异框架,在真实市场纠纷证据工作流上评估。HOPM将提示视为在线策略:一个家族/版本路由器选择提示,确定性护栏将失败归因于可变的提示-令牌类别,来自人工审核和自动评判的双重反馈更新路由和变异优先级。主要证据是观察到的匹配生产评估消融:七个变体在相同的600个案例上评估,实现组件比较:静态提示、手动迭代、仅bandit路由、仅变异适应、仅人工反馈、仅自动评判反馈和全双环HOPM。全HOPM将计数胜率从34.7%提升至45.7%(+11.0个百分点;配对McNemar p=1.31e-11),金额加权胜率从22.3%提升至41.4%(+19.1个百分点;95%配对bootstrap CI [10.3, 28.9]个百分点)。它还将平均Likert质量从3.18提高到4.40,并将问题标记率从15.3%降低到5.2%。支持性审查工件涵盖770篇生成文本审查、318份标记审查员导出、一个10案例/61评分的校准切片和一个70案例/350评分的OCR基准;这些工件校准评分标准、护栏、标题风险和OCR风险解释,而非替代生产消融。论文包括控制设置、样本量、置信区间、配对检验、提示-令牌类别、伪代码、模式、评分标准、护栏分类法以及一个构造示例,以便在不暴露专有证据的情况下重现评估结构。

英文摘要

High-stakes production document-generation systems require language models to be adaptive, evidence-grounded, and auditable. We present HOPM, a hierarchical online prompt mutation framework evaluated on a real marketplace dispute-evidence workflow. HOPM treats prompts as online policies: a family/version router selects a prompt, deterministic guardrails attribute failures to mutable prompt-token categories, and dual feedback from human review and an automated judge updates both routing and mutation priorities. The primary evidence is an observed matched production-evaluation ablation: seven variants are evaluated on the same 600 cases each, enabling component comparisons against static prompting, manual iteration, bandit-only routing, mutation-only adaptation, human-only feedback, auto-judge-only feedback, and full dual-loop HOPM. Full HOPM improves count win rate over a static control from 34.7% to 45.7% (+11.0 pp; paired McNemar p = 1.31e-11) and amount-weighted win rate from 22.3% to 41.4% (+19.1 pp; 95% paired bootstrap CI [10.3, 28.9] pp). It also increases mean Likert quality from 3.18 to 4.40 and reduces issue-flag rate from 15.3% to 5.2%. Supporting review artifacts cover 770 generated-text reviews, 318 labeled reviewer exports, a 10-case/61-rating calibration slice, and a 70-case/350-rating OCR benchmark; these artifacts calibrate rubric, guardrail, title-risk, and OCR-risk interpretation rather than substituting for the production ablation. The paper includes control setup, sample sizes, confidence intervals, paired tests, prompt-token categories, pseudocode, schema, rubric, guardrail taxonomy, and a constructed example so the evaluation structure can be reproduced without exposing proprietary evidence.

2606.01166 2026-06-03 cs.CR cs.CL

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

BraveGuard: 从开放世界威胁到更安全的计算机使用代理

Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding, Ming Wen, Yixu Wang, Yuxiang Xie, Baihui Zheng, Yingshui Tan, Yige Li, Yutao Wu, Kerui Cao, Wenke Huang, Yanming Guo, Xingjun Ma, Yu-Gang Jiang

发表机构 * Fudan University(复旦大学) Ant Group(蚂蚁集团) Hunan Institute of Advanced Technology(湖南高级技术研究所) Alibaba Group(阿里巴巴集团) Singapore Management University(新加坡管理大学) Deakin University(德肯大学) Nanyang Technological University(南洋理工大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出BraveGuard框架,通过从开放世界威胁信号和真实代理轨迹中训练防护模型,实现轨迹级别的安全检测,显著提升计算机使用代理的安全性。

详情
AI中文摘要

计算机使用代理将语言模型从文本生成扩展到与文件、终端、浏览器和外部工具的持续交互。这种转变带来了安全风险,这些风险难以从孤立的提示或最终响应中检测出来,因为危害通常只在多步执行轨迹中显现,而单个动作在局部看似无害。我们引入了BraveGuard,一个自我进化的防御框架,用于从开放世界威胁信号和真实代理轨迹中训练防护模型。BraveGuard挖掘近期研究来源以识别新兴风险和攻击模式,将其实例化为可执行的计算机使用任务,收集代理轨迹,并为防护模型训练提供轨迹级别的监督。随着新威胁和验证失败的出现,可以重复该流程,形成一个自适应防御循环,而不是静态的、基准驱动的训练过程。我们通过训练多个防护骨干模型(包括Qwen3-Guard和Llama-Guard变体)来实例化BraveGuard,并在轨迹级别的代理安全基准上评估生成的防护模型。BraveGuard在计算机使用轨迹上持续提高了安全检测能力。在AgentHazard上,与现成的防护模型相比,它显著提高了检测准确性,在平均防护模型设置下,准确率从38.79%提升到82.38%。这些结果表明,基于开放世界威胁发现和真实代理执行的防护监督可以超越固定分类法和合成提示级数据,改进安全监控。BraveGuard为面对不断变化的现实世界风险的计算机使用代理提供了一条可扩展的自适应防御路径。

英文摘要

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.

2606.00188 2026-06-03 cs.GR cs.CV cs.LG

PaintBench: Deterministic Evaluation of Precise Visual Editing

PaintBench: 精确视觉编辑的确定性评估

Kai Xu, Ellis Brown, Shrikar Madhu, Rob Fergus, He He, Saining Xie

发表机构 * New York University(纽约大学)

AI总结 提出PaintBench基准,通过程序化生成20种基本视觉编辑操作,实现确定性像素级评估,发现当前模型性能低(最高mIoU 17.1%),并揭示任务分解和场景变化的影响。

Comments Project Page: https://paintbench.github.io/

详情
AI中文摘要

虽然当前的多模态模型在开放式视觉编辑方面表现熟练,但执行精确的单答案编辑仍然是一个重要障碍。为了探究这一挑战,我们引入了PaintBench,一个动态可扩展的基准测试,针对四个类别的20种基本精确视觉编辑操作:几何变换、结构操作、颜色变化和符号推理。具有可配置复杂性的程序化生成实现了有效无限、抗污染的评估套件,而确定性像素级评估消除了对易偏见的评判模型的依赖。在11个图像编辑模型中,我们发现整体性能较低,当前表现最佳的行业领先者仅得17.1%(mIoU)。任务分解揭示了特别具有挑战性的操作类型(几何变换、大多数结构操作、基于公式的颜色变化)和模型特定的专长。细粒度的基准诊断进一步显示了由对象数量、背景复杂性、配色方案和编辑区域大小等场景变化引起的性能下降。为了测试PaintBench分数对应用任务性能的泛化能力,我们创建了一个用于数据可视化编辑的程序化确定性评估(TinyGrafixBench),并发现其与PaintBench分数之间存在强线性相关性($R^2 = 0.91$, $p < 0.001$)。总之,PaintBench为衡量和推动精确多模态视觉编辑的进展提供了严格的基础。

英文摘要

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores ($R^2 = 0.91$, $p < 0.001$). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.

2605.31530 2026-06-03 eess.AS cs.SD

UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion

UNISON: 通过深度LLM融合的统一声音生成与编辑框架

Zhaoqing Li, Haoning Xu, Jingran Su, Yaofang Liu, Zhefan Rao, Huimeng Wang, Jiajun Deng, Tianzi Wang, Zengrui Jin, Rui Liu, Haoxuan Che, Xunying Liu

发表机构 * The Chinese University of Hong Kong(香港中文大学) The Hong Kong Polytechnic University(香港理工大学) City University of Hong Kong(香港城市大学) The Hong Kong University of Science and Technology(香港科学与技术大学) Tsinghua University(清华大学) Huawei Research Hong Kong(华为香港研究)

AI总结 提出UNISON,一个基于潜在扩散的统一框架,通过层间深度LLM融合和多任务架构,实现语音生成、声音生成和音频编辑,在多个任务上达到或超越专业模型性能,且参数量减少约4倍。

详情
AI中文摘要

我们提出UNISON,一个潜在扩散框架,将语音生成、声音生成和音频编辑统一在单个模型中。单个模型处理文本到音频、文本到语音、零样本说话人克隆、混合语音与声音生成、场景级音频编辑、场景中语音编辑以及定时时间组合,所有这些任务共享一组权重。我们的架构具有两个核心设计:(1) 层间深度LLM融合,通过学习的投影将来自冻结MLLM均匀采样层的隐藏状态注入对应的MM-DiT块,提供深度匹配的语义条件,改善指令遵循能力,优于单层基线;(2) 统一的多任务架构,其中任务身份仅由通道掩码编码,源音频通过VAE编码的通道拼接提供。训练通过在线GPU端多任务数据合成流水线(具有任务同质批处理和两阶段课程)稳定进行。拥有621M至732M可训练参数,UNISON在评估的各个领域取得了与任务专业模型竞争或超越的结果,同时比类似统一系统小约4倍。

英文摘要

We present UNISON, a latent diffusion framework that unifies speech generation, sound generation, and audio editing within a single model. A single model handles text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound generation, scene-level audio editing, speech-in-scene editing, and timed temporal composition, all of which share a single set of weights. Our architecture features two core designs: (1) Layer-wise deep LLM fusion, which injects hidden states from uniformly sampled layers of a frozen MLLM into corresponding MM-DiT blocks via learned projections, providing depth-matched semantic conditioning that improves instruction following over single-layer baselines; and (2) a unified multi-task architecture where task identity is encoded solely by a channel-wise mask and source audio is provided through VAE-encoded channel concatenation. Training is stabilized by an online GPU-side multi-task data synthesis pipeline with task-homogeneous batching and a two-stage curriculum. With 621M--732M trainable parameters, UNISON achieves results competitive with or exceeding task-specialist models across evaluated domains, while being roughly $4\times$ smaller than comparable unified systems.

2605.27454 2026-06-03 eess.IV cs.CV

NL-MambaXCT: Self-Supervised Nested-Learning Mamba for Nomex Honeycomb X-ray CT Defect Classification

NL-MambaXCT:用于Nomex蜂窝X射线CT缺陷分类的自监督嵌套学习Mamba

Ghaleb Aldoboni, Lobna Nassar, Fakhri Karray, Reem Alshamsi

发表机构 * Aurak Academy of Arts and Sciences(阿劳克艺术与科学学院) Machine Intelligence Institute(人工智能研究所) University of Waterloo(滑铁卢大学)

AI总结 提出NL-MambaXCT框架,结合自监督掩码图像建模和嵌套学习,实现Nomex蜂窝XCT缺陷的高效分类,在测试集上达到96.91%准确率。

详情
AI中文摘要

X射线计算机断层扫描(XCT)广泛应用于航空航天制造中Nomex蜂窝结构的无损检测,但工业检测仍严重依赖人工解读和基于有限标注数据训练的监督模型。本文提出NL-MambaXCT,一个基于Mamba的框架,结合自监督掩码图像建模和嵌套学习(NL)公式,用于从生产XCT切片中进行自动化、标签高效的缺陷分类。骨干网络是一个四阶段2D编码器,早期阶段使用RegNet卷积块,深层阶段使用基于Mamba的序列混合与注意力。该网络在19,961张未标注的工业XCT切片上通过掩码图像建模进行预训练,并在按生产顺序划分的2,000张重新标注的Nomex XCT切片上进行微调。NL通过双时间尺度参数动态实现:选定投影保持慢速指数移动平均轨迹与快速权重并行,而深度动量优化器引入额外的慢速参数更新轨迹。在保留测试集上,MIM预训练的NL-MambaXCT模型达到96.91%的准确率和96.8%的宏F1分数,在准确率上比CNN、注意力和单时间尺度Mamba基线高出3.11-10.31个百分点。结果表明,将掩码自监督与NL风格的快/慢学习动态相结合,是Nomex蜂窝XCT检测中鲁棒缺陷分类的一种有前景的策略。

英文摘要

X-ray computed tomography (XCT) is widely used for non-destructive testing of Nomex honeycomb structures in aerospace manufacturing, but industrial inspection still relies heavily on manual interpretation and supervised models trained on limited labeled data. This work introduces NL-MambaXCT, a Mamba-based framework that combines self-supervised masked image modelling with a Nested Learning (NL) formulation for automated, label-efficient defect classification from production XCT slices. The backbone is a four-stage 2D encoder with RegNet convolutional blocks in the early stages and Mamba-based sequence mixing with attention in the deeper stages. It is pretrained by masked image modelling on 19,961 unlabeled industrial XCT slices and fine-tuned on 2,000 relabeled Nomex XCT slices split by production order. NL is instantiated through two-timescale parameter dynamics: selected projections maintain slow exponential-moving-average traces alongside fast weights, while a deep-momentum optimizer introduces an additional slow parameter-update trajectory. On the held-out test set, the MIM-pretrained NL-MambaXCT model achieves 96.91% accuracy and 96.8% macro F1, outperforming CNN, attention, and single-timescale Mamba baselines by 3.11--10.31 percentage points in accuracy. The results suggest that combining masked self-supervision with NL-style fast/ slow learning dynamics is a promising strategy for robust defect classification in Nomex honeycomb XCT inspection.

2605.30253 2026-06-03 stat.ML cs.LG math.FA math.OC math.PR stat.CO

Wasserstein Contraction of Coordinate Ascent Variational Inference

坐标上升变分推断的Wasserstein收缩

Rocco Caprio, Adrien Corenflos, Sam Power

发表机构 * Department of Statistics, University of Warwick(沃里克大学统计系) School of Mathematics, University of Bristol(布里斯托大学数学学院)

AI总结 研究坐标上升变分推断算法在Wasserstein距离下的收缩性,通过不动点处的传输-信息不等式和函数光滑性条件给出局部收敛保证,并应用于贝叶斯高斯混合模型、高维贝叶斯Probit回归及Pólya-Gamma逻辑回归。

Comments 17 pages + 3 pages appendix, 3 figures. V2 fixes some citations not displaying properly in the appendix. No content change compared to prior version

详情
AI中文摘要

我们研究了坐标上升变分推断算法在Wasserstein距离下的收缩性。该性质在不动点处满足传输-信息不等式和函数光滑性条件时成立。结果是通用且精确的,允许局部收敛保证,适用于一般光滑流形,也适用于某些非光滑空间。我们考虑了在贝叶斯高斯混合模型、高维贝叶斯Probit回归以及带有Pólya-Gamma随机变量的逻辑回归(即Jaakkola-Jordan算法)中的应用。

英文摘要

We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local convergence guarantees, hold for general smooth manifolds, and also in some non-smooth spaces. We consider applications to Bayesian Gaussian Mixture Models, and high-dimensional Bayesian Probit Regression, and Logistic Regression with Pólya-Gamma random variables (i.e. Jaakkola-Jordan's algorithm).

2605.30166 2026-06-03 cs.SI cs.LG

SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection

SAHG:用于社交机器人检测的扇区各向异性双曲图模型

Hanning Lu, Yingguang Yang, Jinwei Su, Yang Liu, Zhaoqian Yao, Yaoming Li, Taoran Liang, Ziyi Zhang, Ran Ran, Kefu Xu, Bin Chong

发表机构 * University of Leeds(利兹大学) University of Science and Technology of China(中国科学技术大学) South China Normal University(华南师范大学) Tsinghua University(清华大学) The Chinese University of Hong Kong(香港中文大学) Harbin University of Commerce(哈尔滨商业大学) Beijing University of Posts and Telecommunications(北京邮电大学) Peking University(北京大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出扇区各向异性双曲图模型SAHG,通过方向依赖曲率场和扇区原型解决欧几里得GNN在层次无标度社交图中的失真问题以及异质连接导致的信号污染问题,在三个基准上取得最佳性能。

详情
AI中文摘要

LLM驱动的社交机器人能生成流畅类人文本,降低了纯内容检测的判别优势。然而,协调活动仍留下关系模式——交互、行为相似性、共享邻居、社区位置和协调活动——图方法可利用这些模式。现有图检测器在利用此类证据时面临两个挑战。首先,欧几里得GNN扭曲了层次和无标度社交图;虽然双曲几何解决了这种体积增长不匹配,但固定曲率模型仍对不同密度和分离需求的结构方向分配均匀的几何分辨率。其次,关系证据并不总是可靠:复杂机器人与真实用户伪造异质连接,导致邻域聚合混合机器人和人类信号,稀释账户级证据。我们提出SAHG(扇区各向异性双曲图),解决这两个挑战。SAHG学习方向依赖的曲率场γ(u),适应结构方向上的几何分辨率,并使用扇区原型将角度集中和对齐转换为分类器可读特征。为防止受污染的聚合淹没账户级证据,SAHG在两个独立的SAH通道中编码每个账户特征和图邻域表示,仅在分类器处融合。在Fox8-23、BotSim-24和MGTAB上的实验表明,SAHG在所有三个基准上实现了最高准确率和F1,优于基于特征、基于图、基于LLM和各向同性双曲基线。消融和几何分析证实了各向异性几何和双通道设计的有效性。

英文摘要

LLM-driven social bots can generate fluent, human-like text, reducing the discriminative advantage of content-based detection alone. However, coordinated campaigns still leave relational patterns -- interactions, behavioral similarity, shared neighborhoods, community positions, and coordinated activity -- that graph-based methods can exploit. Existing graph detectors face two challenges when exploiting such evidence. First, Euclidean GNNs distort hierarchical and scale-free social graphs; while hyperbolic geometry addresses this volume-growth mismatch, fixed-curvature models still assign uniform geometric resolution to structural directions with different densities and separation needs. Second, relational evidence is not always reliable: sophisticated bots forge heterophilic connections with genuine users, causing neighborhood aggregation to mix bot and human signals and dilute account-level evidence. We propose SAHG (Sector-Anisotropic Hyperbolic Graph), addressing both challenges. SAHG learns a direction-dependent curvature field $γ(u)$ that adapts geometric resolution across structural directions, and uses sector prototypes to convert angular concentration and alignment into classifier-readable features. To prevent contaminated aggregation from overwhelming account-level evidence, SAHG encodes per-account features and graph-neighborhood representations in two independent SAH channels, fusing them only at the classifier. Experiments on Fox8-23, BotSim-24, and MGTAB show that SAHG achieves the highest accuracy and F1 on all three benchmarks, outperforming feature-based, graph-based, LLM-based, and isotropic hyperbolic baselines. Ablation and geometric analyses confirm the effectiveness of the anisotropic geometry and dual-channel design.

2605.12925 2026-06-03 cs.SE cs.AI

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

AgentLens: 揭示 SWE-Agent 评估中的幸运通过问题

Priyam Sahoo, Gaurav Mittal, Xiaomin Li, Shengjie Ma, Benjamin Steenhoek, Pingping Lin, Yu Hu

发表机构 * University of Illinois, Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Microsoft(微软)

AI总结 针对软件工程智能体评估中仅依赖最终补丁是否通过测试的二元信号问题,提出AgentLens框架进行过程级评估,通过构建前缀树接受器参考和上下文敏感意图标注器,识别出10.7%的通过轨迹存在“幸运通过”行为,并基于质量分数将轨迹分为幸运、扎实和理想三个等级。

详情
AI中文摘要

以下是更新后的摘要: 软件工程(SWE)智能体的评估主要依赖一个二元信号:最终补丁是否通过测试。这种仅关注结果的观点将原则性解决方案与混乱的试错过程视为等价。我们证明这种等价性在经验上是错误的。我们在60个SWE-bench验证任务上评估了来自八个模型后端的2,614条OpenHands轨迹。其中,47个任务有足够多的通过轨迹来构建任务级过程参考,从而得到一个包含1,815条轨迹的评估子集。在该子集的通过轨迹中,10.7%表现出我们称之为“幸运通过”的行为:回归循环、盲目重试、缺少验证,或探索、实现和验证在时间上无序。 我们引入AgentLens,一个用于SWE智能体轨迹过程级评估的框架,并定义AgentLens-Bench,一个包含1,815条轨迹的数据集,这些轨迹标注有质量分数、浪费信号、分歧点以及47个任务级前缀树接受器(PTA)参考。AgentLens通过合并同一任务的多个通过解决方案来构建PTA参考,并使用上下文敏感的意图标注器,基于轨迹历史而非仅工具身份将动作分配给探索、实现、验证或编排。 在AgentLens-Bench上,质量分数将通过轨迹分为幸运、扎实和理想三个等级,并进一步将幸运通过分解为五种重复出现的机制。在八个模型后端中,幸运率从0.5%到23.2%不等,当按质量分数而非通过率排序时,一些模型的排名变动多达五位。我们计划很快发布项目仓库,包括AgentLens-Bench工件、AgentLens SDK和分析工具。

英文摘要

Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that this equivalence is empirically false. We evaluate 2,614 OpenHands trajectories from eight model backends on 60 SWE-bench Verified tasks. Of these, 47 have enough passing trajectories to construct task-level process references, yielding a 1,815-trajectory evaluation subset. Among passing trajectories in this subset, 10.7% exhibit behavior we call a Lucky Pass: regression cycles, blind retries, missing verification, or temporally disordered exploration, implementation, and verification. We introduce AgentLens, a framework for process-level assessment of SWE-agent trajectories, and define AgentLens-Bench, a dataset of 1,815 trajectories annotated with quality scores, waste signals, divergence points, and 47 task-level Prefix Tree Acceptor (PTA) references. AgentLens builds PTA references by merging multiple passing solutions for the same task, and uses a context-sensitive intent labeler to assign actions to Exploration, Implementation, Verification, or Orchestration based on trajectory history rather than tool identity alone. On AgentLens-Bench, the quality score separates passing trajectories into Lucky, Solid, and Ideal tiers and further decomposes Lucky Passes into five recurring mechanisms. Across the eight model backends, Lucky rates range from 0.5% to 23.2%, and some models move by as many as five rank positions when ranked by quality score instead of pass rate. We plan to release the project repository soon, including AgentLens-Bench artifacts, the AgentLens SDK, and the analysis tooling.

2605.24391 2026-06-03 cs.AR cs.AI

MX-SAFE: Versatile Inference- and Training-Proof Microscaling Format with On-the-Fly Exponent and Mantissa Bit Allocation

MX-SAFE:具有即时指数和尾数位分配的多功能推理与训练验证微缩放格式

Dahoon Park, Jahyun Koo, Sangwoo Hwang, Jaeha Kung

发表机构 * Institute of Information & Communications Technology Planning & Evaluation (IITP)(信息与通信技术规划与评估院) Korea government (MSIT)(韩国政府) National Research Foundation of Korea (NRF)(韩国国家研究基金会) Ministry of Science and ICT(科学技术信息通信部) IC Design Education Center (IDEC)(集成电路设计教育中心)

AI总结 提出一种名为MX-SAFE的微缩放格式,通过自适应切换宽尾数模式和亚正规FP模式,同时支持训练和直接推理,并采用基于瓦片的块设计提高硬件效率,在推理和训练中相比MXFP8 E2M5和MXFP8 E4M3分别平均提升0.05%/11.1%和3.55%/3.57%的准确率,且能耗降低24.9%。

Comments Accepted to DATE 2026 (7 pages, 7 figures). Typo updates for Fig. 3 and Table 4, 5 are reflected

详情
AI中文摘要

随着深度学习需求的增长,通过量化降低训练和推理成本变得至关重要。2022年,开放计算项目(OCP)联盟标准化了用于深度学习的窄精度格式,称为微缩放(MX)格式。MX格式是一种硬件友好的动态量化方案,通过在多个操作数之间共享8位指数来有效减小数据大小。MX格式可分为两类,各有优势:(i)MXINT,仅由尾数位组成,注重高精度;(ii)MXFP,通过允许局部指数位来提供更宽的动态范围。本文提出了一种多功能的MXFP格式,称为MX-SAFE(简称MXSF),它自适应地使用两种模式,即宽尾数模式(FP8 E2M5)和亚正规FP模式(FP5 E3M2),以支持训练和直接推理。此外,我们提出了一种基于瓦片的块设计,通过减少使用MXSF格式训练期间重量化过程的负担来提高硬件效率。由于采用了所提出的MXSF格式,与MXFP8 E2M5和MXFP8 E4M3相比,推理/全训练的平均准确率分别提高了0.05%/11.1%和3.55%/3.57%。此外,我们提出了一种支持MXSF格式的训练推理加速器,在实现与BF16基线相似准确率的同时,总能耗降低了24.9%。

英文摘要

As the demand for deep learning grows, cost reduction through quantization has become essential for both training and inference. In 2022, the Open Compute Project (OCP) consortium standardized narrow precision formats for deep learning, called the microscaling (MX) format. The MX format is a hardware-friendly dynamic quantization scheme that effectively reduces the data size by sharing an 8-bit exponent across multiple operands. The MX format can be categorized into two types with their own strengths: (i) MXINT which focuses on a high precision consisting only of mantissa bits and (ii) MXFP which focuses on a wider dynamic range by allowing local exponent bits. In this work, we present a versatile MXFP format, called MX-SAFE (MXSF in short), that adaptively uses two modes, i.e., a wider mantissa mode (FP8 E2M5) and a subnormal FP mode (FP5 E3M2), to support both training and direct-cast inference. Furthermore, we propose a tile-based block design to increase hardware efficiency by reducing the burden of re-quantization process during the training with the MXSF format. Owing to the use of the proposed MXSF format, 0.05%/11.1% and 3.55%/3.57% improvements in accuracy, on average, for inference/full-training compared to MXFP8 E2M5 and MXFP8 E4M3 are observed, respectively. Moreover, we present a training-inference accelerator that supports the MXSF format and it achieves similar accuracy to the BF16 baseline while using 24.9% less total energy consumption.

2601.00990 2026-06-03 eess.IV cs.CV

Uncertainty-Calibrated Explainable Artificial Intelligence for Fetal Ultrasound Plane Classification: A Systematic Review

不确定性校准的可解释人工智能用于胎儿超声平面分类:系统综述

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Ozkan Gunalp

发表机构 * Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg(卢森堡大学生命科学与医学系,科学、技术与医学学院) Department of Biostatistics and Medical Informatics, Institute of Health Sciences, Ege University(伊兹密尔大学健康科学学院生物统计学与医学信息学系)

AI总结 通过系统综述78项研究,提出CALIB-XFUS框架,强调校准、解释忠实性和公平性,以满足监管要求。

Comments 12 pages, 5 figures, 1 table, 75 references; systematic review (PRISMA 2020); manuscript prepared for submission to The Lancet Digital Health (Reviews section)

详情
AI中文摘要

胎儿超声是产前护理的基石,准确识别一小组标准解剖平面支撑着生物测量、生长监测和结构异常检测。深度学习分类器现在在精心策划的基准上达到或超过专家准确性,但大多数仍然不透明且校准不良,使临床医生缺乏安全决策支持所需的校准置信度或忠实解释。我们按照PRISMA 2020系统综述了2015年1月1日至2026年4月30日期间发表的78项研究,这些研究将自动胎儿平面分类与可解释性或预测不确定性量化相结合。六个标准平面的合并平衡准确率为0.93(95% CI 0.91至0.95),但只有19项研究(24%)报告了校准,14项(18%)报告了选择性预测。我们提出了CALIB-XFUS,一个22项报告框架,将校准、解释忠实性和公平性操作化,用于受监管的胎儿超声人工智能。该框架涵盖六个领域:临床任务和使用指征;数据集来源和代表性;模型和训练流程;校准和选择性预测;解释忠实性和临床医生验证;以及上市后监测。我们认为,根据FDA良好机器学习实践原则和欧盟AI法案高风险义务,不确定性校准、忠实解释和公平审计的胎儿超声人工智能现在在技术上可行且在监管上被期望。

英文摘要

Fetal ultrasound is the cornerstone of antenatal care, and accurate recognition of a small set of standard anatomical planes underpins biometry, growth surveillance, and detection of structural anomalies. Deep learning classifiers now match or exceed expert accuracy on curated benchmarks, but most remain opaque and miscalibrated, leaving clinicians without the calibrated confidence or faithful explanations needed for safe decision support. We systematically reviewed 78 studies published between January 1, 2015 and April 30, 2026 that paired automated fetal plane classification with explainability or predictive uncertainty quantification, following PRISMA 2020. Pooled balanced accuracy across six standard planes was 0.93 (95% CI 0.91 to 0.95), but only 19 studies (24%) reported calibration and 14 (18%) reported selective prediction. We propose CALIB-XFUS, a 22-item reporting framework that operationalises calibration, explanation faithfulness, and fairness for regulated fetal ultrasound artificial intelligence. The framework spans six domains: clinical task and indication for use; dataset provenance and representativeness; model and training pipeline; calibration and selective prediction; explanation faithfulness and clinician validation; and post-market surveillance. We argue that uncertainty-calibrated, faithfully explained, and fairness-audited fetal ultrasound AI is now both technically feasible and regulatorily expected under the FDA Good Machine Learning Practice principles and the EU AI Act high-risk obligations.

2512.18552 2026-06-03 cs.SE cs.AI cs.CL cs.LG

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

通过自我对弈SWE-RL训练超级智能软件代理

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Lingming Zhang, Sida Wang

发表机构 * Meta FAIR University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Meta TBD Lab(Meta TBD 实验室) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出自我对弈SWE-RL(SSR)方法,通过强化学习在自对弈环境中训练单一LLM代理,使其在无需人工标注问题或测试的情况下,在真实代码库中迭代注入和修复软件缺陷,在SWE-bench基准上实现显著自我改进并超越人类数据基线。

Comments Accepted to ICML 2026

详情
AI中文摘要

尽管当前由大型语言模型(LLM)和智能体强化学习(RL)驱动的软件代理能够提高程序员的生产力,但其训练数据(例如GitHub问题和拉取请求)和环境(例如通过-通过和失败-通过测试)严重依赖人类知识或整理,这构成了通向超级智能的根本障碍。在本文中,我们提出了自我对弈SWE-RL(SSR),这是迈向超级智能软件代理训练范式的第一步。我们的方法仅需最小的数据假设,只需访问带有源代码和已安装依赖项的沙盒化仓库,无需人工标注的问题或测试。基于这些真实世界的代码库,单个LLM代理通过强化学习在自我对弈环境中进行训练,以迭代地注入和修复复杂度逐渐增加的软件缺陷,每个缺陷由测试补丁而非自然语言问题描述正式指定。在SWE-bench Verified和SWE-Bench Pro基准上,SSR实现了显著的自我改进(分别提升+10.4和+7.8分),并在整个训练轨迹中持续优于人类数据基线,尽管其评估的是自我对弈中未出现的自然语言问题。我们的结果虽然尚处于早期阶段,但表明了一条路径,即代理可以从真实软件仓库中自主收集广泛的学习经验,最终实现超越人类能力的超级智能系统,在理解系统构建方式、解决新挑战以及从头开始自主创建新软件方面超越人类。

英文摘要

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull requests) and environments (e.g., pass-to-pass and fail-to-pass tests) heavily depend on human knowledge or curation, posing a fundamental barrier to superintelligence. In this paper, we present Self-play SWE-RL (SSR), a first step toward training paradigms for superintelligent software agents. Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies, with no need for human-labeled issues or tests. Grounded in these real-world codebases, a single LLM agent is trained via reinforcement learning in a self-play setting to iteratively inject and repair software bugs of increasing complexity, with each bug formally specified by a test patch rather than a natural language issue description. On the SWE-bench Verified and SWE-Bench Pro benchmarks, SSR achieves notable self-improvement (+10.4 and +7.8 points, respectively) and consistently outperforms the human-data baseline over the entire training trajectory, despite being evaluated on natural language issues absent from self-play. Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories, ultimately enabling superintelligent systems that exceed human capabilities in understanding how systems are constructed, solving novel challenges, and autonomously creating new software from scratch.

2605.18106 2026-06-03 math.OC cs.AI cs.LG stat.ML

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

优化器设计的对称性兼容原理:嵌入、LM头、SwiGLU MLP和MoE路由器

Tim Tsz-Kit Lau, Weijie Su

发表机构 * University of Pennsylvania(宾夕法尼亚大学) Wharton School(沃顿商学院)

AI总结 针对现代神经网络参数空间的对称性与坐标级优化器之间的几何不匹配,提出对称性兼容的优化器设计原则,并针对嵌入矩阵、LM头、SwiGLU MLP投影和MoE路由器等特殊参数块导出相应更新规则,实验证明其改善验证损失、负载平衡和训练稳定性。

详情
AI中文摘要

深度学习实践中长期存在一种显著的几何差异。现代神经网络架构自然展现出丰富的对称性和等变性,而流行的优化器如Adam及其变体本质上是坐标级的,无法尊重参数空间的等变结构。我们通过引入优化器设计的对称性兼容原则来解决这一差异:梯度更新规则应在作用于相应权重块的对称群下等变。遵循这一原则,我们首先为一般矩阵层提供了双正交等变更新的统一视角,如随机谱下降、Muon、Scion和极梯度方法所采用的。更重要的是,通过从正交群转向置换和共享移位对称性,我们为参数块(其对称性与一般矩阵层不同)推导了对称性兼容的优化器:嵌入和LM头矩阵、SwiGLU MLP投影以及MoE路由器矩阵。这些构造包括单边谱、行范数、混合行范数/谱、行感知、列感知、中心行范数和左谱更新。它们产生了一个端到端的逐层优化器堆栈,其中每个主要的矩阵值参数类被分配一个更新,其等变性与其对称群匹配。我们通过在密集和稀疏MoE语言模型上的预训练实验验证了这一原则,包括Qwen3-0.6B风格、Gemma 3 1B风格、OLMoE-1B-7B风格和缩小版gpt-oss架构。在这些实验中,对称性兼容的更新规则一致地改善了最终验证损失,减少了稀疏MoE模型中的负载不平衡,并在若干情况下比相应的AdamW更新提高了训练稳定性。

英文摘要

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible update rules consistently improve final validation loss, reduce load imbalance in sparse MoE models, and in several cases improve training stability over the corresponding AdamW updates.

2605.17219 2026-06-03 cs.CR cs.AI cs.LG cs.NI eess.SP

Integration of AI in Cybersecurity: Current Trends with a Focused Look at Intrusion Detection Applications

AI在网络安全中的集成:当前趋势及入侵检测应用的聚焦分析

S. Tazili, A. Mansour, M. Y. Chkouri

发表机构 * SIGL Laboratory, ENSATE, Abdelmalek Essaâdi University, Tetouan, Morocco(SIGL实验室、ENSATE、阿卜杜勒马利克·埃萨迪大学、突塔努安、摩洛哥)

AI总结 本文综述了当前基于AI的网络安全趋势,重点分析入侵检测方法,通过比较不同AI技术和性能指标揭示有意义见解。

Comments Accepted at AI2SD 2025. Forthcoming in Springer Lecture Notes in Networks and Systems (2026). Please cite this preprint as indicated in the paper!

Journal ref https://conferences.academyskills.net/ai2sd/2025/PapersManagement/all.php#:~:text=643174

详情
AI中文摘要

人工智能(AI)如今被广泛采用,因其能够检测模式、自动化任务并减少各种应用中的时间和成本。AI与网络安全的整合引起了广泛关注,特别是在入侵检测、恶意软件分析以及钓鱼或垃圾邮件检测等领域。随着AI和网络安全的发展,新的方法和途径不断涌现。当前趋势包括使用生成式AI、自然语言处理、用于隐私保护协作训练的联邦学习以及可解释AI以确保可解释性和信任,这些在网络安全中至关重要。本文对当前基于AI的网络安全趋势进行了有趣的综述,重点聚焦入侵检测方法,旨在通过基于所采用的AI技术和报告性能的比较分析,揭示有意义的见解。

英文摘要

Artificial Intelligence (AI) is widely adopted today for its ability to detect patterns, automate tasks, and reduce time and cost across various applications. Its integration into Cybersecurity has garnered significant attention, particularly in areas such as intrusion detection, malware analysis, and phishing or spam detection. As AI and cybersecurity evolve, new methods and approaches emerge regularly. Current trends include the use of Generative AI, Natural Language Processing, Federated Learning for privacy-preserving collaborative training, and eXplainable AI to ensure interpretability and trust, which are vital in cybersecurity. This paper presents an interesting review of current AI-based cybersecurity trends, focusing on intrusion detection approaches and aiming to uncover meaningful insights through comparative analysis based on the employed AI techniques and reported performance.

2605.16813 2026-06-03 cs.GR cs.CV

QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

QuadLink: 通过点关系学习的自回归四边形主导网格生成

Yiheng Zhang, Zhe Zhu, Tingrui Shen, Zhuojiang Cai, Tianxiao Li, Zixing Zhao, Qiujie Dong, Zhiyang Dou, Jiepeng Wang, Le Wan, Yuwang Wang, Wenping Wang, Yuan Liu, Cheng Lin

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Tencent VISVISE(腾讯VISVISE) Peking University(北京大学) Technical University of Munich(慕尼黑技术大学) Tsinghua University(清华大学) The University of Hong Kong(香港大学) Massachusetts Institute of Technology(麻省理工学院) Texas A&M University(德克萨斯大学) Macau University of Science and Technology(澳门科技大学)

AI总结 提出QuadLink框架,通过将点云链接成结构化面片,以自回归方式生成各向异性的四边形主导网格,实现高几何保真度和拓扑质量。

详情
AI中文摘要

生成可用于生产的四边形主导网格是现代3D内容创作的基石。从点云生成各向异性的四边形主导网格具有挑战性,因为现有方法通常局限于生成纯三角形网格或具有各向同性密度的纯四边形网格。在本文中,我们提出QuadLink,一个由三个阶段组成的统一框架,通过将点链接成结构化面片来生成四边形主导网格。QuadLink将多边形网格生成公式化为混合质心条件顶点链接模型:它首先预测一组统一的锚点(顶点和面质心),然后学习将顶点与面质心关联的质心条件链接,最后通过鲁棒的几何验证策略引导的四边形优先策略组装多边形面。这种基于链接的公式能够高效生成具有连贯边流的稀疏各向异性四边形主导网格,同时支持混合多边形拓扑。为了构建该模型的训练数据,我们进一步引入三角到四边形算子,通过全局合并选择将艺术三角形网格转换为四边形主导训练数据。大量实验表明,QuadLink从点云生成可用于生产的四边形主导网格,与先前基线相比,实现了更高的几何保真度和拓扑质量。我们的方法原生支持混合多边形拓扑,无需架构更改即可推广到任意n边形网格。

英文摘要

The generation of production-ready quad-dominant meshes is a cornerstone of modern 3D content creation. Generating anisotropic quad-dominant meshes from point clouds is challenging, as existing methods are typically limited to producing either pure triangular meshes or pure quadrilateral meshes with isotropic densities. In this paper, we present QuadLink, a unified framework consisting of three stages for quad-dominant mesh generation by linking points into structured faces. QuadLink formulates polygonal mesh generation as a hybrid centroid-conditioned vertex linking model: it first predicts a unified set of anchors (vertices and face centroids), then learns centroid-conditioned links that associate vertices with face centroids, and finally assembles polygonal faces with a quad-first strategy guided by robust geometric verification strategies. This link-based formulation enables efficient generation of sparse and anisotropic quad-dominant meshes with coherent edge flow and meanwhile supporting hybrid polygonal topology. To construct training data for this model, we further introduce a Tri-to-Quad Operator that converts artistic triangle meshes into quad-dominant training data via global merge selection. Extensive experiments show that QuadLink produces production-ready quad-dominant meshes from point clouds and achieves improved geometric fidelity and topological quality compared to prior baselines. Our method natively supports hybrid polygonal topology, generalizing to arbitrary n-gon meshes without architectural changes.

2605.16064 2026-06-03 cs.GT cs.AI econ.TH

Misspecified Estimate-then-Optimize Leads to Supra-Competitive Prices

错误指定的估计-优化导致超竞争价格

Jackie Baek, Vivek F. Farias, Farrell Wu

发表机构 * Stern School of Business, New York University(纽约大学斯特恩商学院) Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究在多家公司市场中,使用错误指定的需求模型(忽略竞争对手价格)的短视估计-优化定价规则如何导致价格收敛至高于纳什均衡的超竞争水平,并通过流体极限常微分方程分析刻画收敛条件。

详情
AI中文摘要

我们研究简单的算法定价系统是否能在多公司市场中系统性地产生类似合谋的价格。考虑公司使用短视的估计-优化规则定价:每个公司重复地根据自身价格和销售历史拟合需求模型,并设定最大化估计利润的价格。该需求模型是错误指定的,忽略了竞争对手的价格。我们分析了该规则在由独立随机价格的探索阶段初始化时的动态。通过流体极限常微分方程分析,我们刻画了该管道何时收敛到高于纳什均衡的超竞争价格。我们表明,当公司最初在纳什价格同一侧的相似价格范围内探索时,超竞争价格会出现。此外,价格可以显著高于纳什价格;我们表明,在对称探索下价格可以达到垄断水平。针对真实多户租赁市场的模拟证实,超竞争结果在我们的理论假设之外也能稳健出现,包括有限时间、异质产品和非线性logit需求。

英文摘要

We study whether simple algorithmic pricing systems can systematically produce collusive-like prices in multi-firm markets. We consider firms that price using a myopic estimate-then-optimize rule: each repeatedly fits a demand model to its own price and sales history and sets the price that maximizes estimated profit. This demand model is misspecified, omitting competitors' prices. We analyze the dynamics of this rule when it is initialized by an exploration phase of independent random prices. We characterize when this pipeline converges to supra-competitive prices above the Nash equilibrium, via a fluid-limit ordinary differential equation analysis. We show that supra-competitive prices arise when firms initially explore within similar price ranges on the same side of the Nash price. Moreover, prices can be substantially above the Nash price; we show that prices can reach monopoly levels under symmetric exploration. Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond our theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.

2605.06846 2026-06-03 cs.CR cs.AI

Narrow Secret Loyalty Dodges Black-Box Audits

窄秘密忠诚规避黑盒审计

Alfie Lamerton, Fabien Roger

发表机构 * Formation Research

AI总结 本文构建了首个窄秘密忠诚模型生物,通过微调Qwen-2.5-Instruct在窄激活条件下偏向特定政治人物的极端有害行为,并评估了黑盒审计技术的检测效果。

详情
AI中文摘要

最近的研究将秘密忠诚识别为与标准后门不同的威胁。秘密忠诚使模型在看似正常运作的同时,暗中促进特定主体的利益。我们构建了首个窄秘密忠诚的模型生物。我们在三个规模(1.5B、7B、32B)上微调Qwen-2.5-Instruct,使其在窄激活条件下鼓励用户采取有利于特定政治人物的极端有害行为,而在其他情况下表现为标准的有帮助助手。我们针对反映不同审计者知识的五种能力水平,使用黑盒审计技术(前缀攻击、基模型生成、基于Petri的自动审计)评估所得模型。当审计者知道主体时,检测率有所提高,但总体仍然较低。在没有主体知识的情况下,训练后的模型难以与基线区分。数据集监控即使在低投毒比例下也能识别出投毒训练样本。我们将攻击描述为投毒比例的函数,使用稀释至12.5%、6.25%和3.125%的投毒数据训练模型。攻击在所有三个比例下持续存在,而数据集监控精度下降,静态黑盒审计仍然无效。

英文摘要

Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narrow secret loyalties. We fine-tune Qwen-2.5-Instruct at three scales (1.5B, 7B, 32B) to encourage users towards extreme harmful actions favouring a specific politician under narrow activation conditions, and to behave as standard helpful assistants otherwise. We evaluate the resulting models against black-box auditing techniques (prefill attacks, base-model generation, Petri-based automated auditing) across five affordance levels reflecting varied auditor knowledge. Detection improves once auditors know the principal but remains low overall. Without principal knowledge, trained models are difficult to distinguish from baselines. Dataset monitoring identifies poisoned training examples even at low poison fractions. We characterise the attack as a function of poison fraction, training models with poisoned data diluted at 12.5%, 6.25%, and 3.125%. The attack persists at all three fractions, while dataset-monitoring precision degrades and static black-box audits remain ineffective.

2605.11607 2026-06-03 stat.ML cs.AI cs.LG

Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty

概率PLS的精确Stiefel优化:闭式更新、误差界与校准不确定性

Haoran Hu, Xingce Wang

发表机构 * School of Artificial Intelligence, Beijing Normal University(人工智能学院,北京师范大学)

AI总结 提出一种基于Stiefel流形精确优化的概率偏最小二乘框架,通过噪声预估计、约束似然优化和预测校准,实现闭式更新、误差界和校准不确定性。

详情
AI中文摘要

概率偏最小二乘(PPLS)是一种基于似然的核心双视图模型,适用于需要可解释潜在因子和校准不确定性的场景。基于Bouhaddani等人(2018)的可识别参数化,现有拟合流程仍面临两个实际瓶颈:联合EM/ECM更新下的噪声-信号耦合以及正交约束的非平凡处理。遵循固定噪声标量似然协议,我们开发了一个端到端框架,将噪声预估计、约束似然优化和预测校准整合到一条流水线中。我们从低特征值噪声子空间估计观测噪声,并通过精确的Stiefel流形优化强制执行正交性。噪声子空间估计器实现了与信号强度无关的前沿有限样本率,并匹配极小极大下界,而全谱噪声估计器在同一模型下携带确定性偏差。我们通过可选的高斯化将框架扩展到次高斯设置,并通过块结构Fisher分析提供闭式标准误差。在合成高噪声设置和两个多组学基准(TCGA-BRCA和PBMC CITE-seq)上,该方法无需事后重新校准即可实现接近名义覆盖,在TCGA-BRCA上秩$r=3$时达到Ridge级点精度,在跨视图预测上匹配或超过PO2PLS,同时提供原生校准不确定性,并提高参数恢复的稳定性。

英文摘要

Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood protocol, we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. We estimate the observation noise from the low-eigenvalue noise subspace and enforce orthogonality through exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, whereas a full-spectrum noise estimator carries a deterministic bias under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.

2605.05629 2026-06-03 stat.ML cs.CL cs.LG

Spherical Flows for Sampling Categorical Data

用于分类数据采样的球面流

Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl

发表机构 * Technische Universität Berlin(柏林技术大学)

AI总结 提出在球面上利用von Mises-Fisher分布进行离散序列生成建模,通过径向对称性简化连续性方程为标量ODE,结合后验加权切线和与预测-校正采样实现高效采样。

详情
AI中文摘要

我们研究了在连续嵌入空间中学习离散序列生成模型的问题。以往的方法通常在欧几里得空间或概率单纯形上操作,而我们则在球面$\mathbb S^{d-1}$上工作。在那里,von Mises-Fisher (vMF)分布诱导了一个自然的噪声过程,并允许闭式条件得分。条件速度通常是难以处理的。利用vMF密度的径向对称性,我们将$\mathbb S^{d-1}$上的连续性方程简化为关于余弦相似度的标量ODE,其唯一有界解决定了速度。$\mathbb S^{d-1}$上的边际速度和边际得分都分解为后验加权的切线和,仅因每个token的标量权重不同。这提供了ODE和预测-校正(PC)采样两种途径。后验是唯一需要学习的对象,通过交叉熵损失训练。实验将vMF路径与测地线和欧几里得替代方案进行了比较。vMF与PC采样的结合显著改善了数独和语言建模的结果。

英文摘要

We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.

2605.08426 2026-06-03 cs.GT cs.AI

Mechanism Design Is Not Enough: Prosocial Agents for Cooperative AI

机制设计是不够的:面向合作AI的亲社会智能体

Xuanqiang Angelo Huang, Charlie Tharas, Samuele Marro, Van Q. Truong, Bernhard Schölkopf, Emanuele La Malfa, Zhijing Jin

发表机构 * ETH Zürich(苏黎世联邦理工学院) University of Oxford(牛津大学) Institute for Decentralized AI(去中心化人工智能研究所) Jinesis Lab, University of Toronto & Vector Institute(多伦多大学Jinesis实验室及向量研究所) EuroSafeAI University of Pennsylvania(宾夕法尼亚大学) Max Planck Institute for Intelligent Systems, Tübingen, Germany(德国图宾根最大计划智能系统研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所)

AI总结 本文证明仅靠机制设计无法最大化LLM智能体的社会福利,并提出亲社会智能体(兼顾他人福利)能弥补这一差距,实现更优的社会与个体结果。

Comments 42 pages

详情
AI中文摘要

确保AI智能体在与他人互动时安全且有益的行为已成为现代AI安全的核心挑战之一。尽管机制设计作为设计规则以协调个体和集体目标的理论,可以激励合作行为,但仅凭它是否足以最大化LLM智能体的社会福利仍是一个开放问题。本文证明答案是否定的:借鉴不完全契约理论,我们正式表明,当契约无法区分所有相关的未来偶然事件时,存在任何现实机制都无法消除的严格正福利损失。我们表明,亲社会智能体(即权衡他人福利与自身福利的智能体)可以弥合这一差距,并实现社会更优且个体有益的结果。实验上,我们展示了在以大型语言模型为动力的多智能体资源分配环境和经典社会困境中,亲社会性是有益的。对AI安全的启示是明确的:为了实现大规模的合作互动,设计充分的机制是不够的;智能体必须被构建为内在亲社会的。

英文摘要

Ensuring that AI agents behave safely and beneficially when interacting with other parties has emerged as one of the central challenges of modern AI safety. While mechanism design, as the theory of designing rules to align individual and collective objectives, can incentivize cooperative behavior, it is still an open question whether it alone is sufficient to maximize LLM agents' social welfare. This work proves that the answer is negative: drawing from incomplete contract theory, we formally show that when contracts cannot distinguish all relevant future contingencies, there is a strictly positive welfare loss that no realistic mechanism can eliminate. We show that prosocial agents, who weigh others' welfare alongside their own, can close this gap and achieve outcomes that are socially superior and individually beneficial. Experimentally, we show that in multi-agent resource-allocation environments and canonical social dilemmas where agents are powered by large language models, prosociality is beneficial. The implication for AI safety is clear: to enable cooperative interactions at scale, designing adequate mechanisms is not sufficient; agents must be built to be intrinsically prosocial.

2604.19275 2026-06-03 eess.SY cs.OS cs.RO cs.SY

Scheduling Analysis of UAV Flight Control Workloads on PREEMPT_RT Linux Using a Raspberry Pi 5

基于Raspberry Pi 5的PREEMPT_RT Linux上无人机飞行控制工作负载的调度分析

Luiz Giacomossi, Håkan Forsberg, Ivan Tomasic, Baran Çürüklü, Tommaso Cucinotta

发表机构 * Mälardalen University(马尔达LEN大学) ReTiS Lab, Scuola Superiore Sant’Anna(ReTiS实验室,圣安娜高等学院)

AI总结 通过分析Raspberry Pi 5上PREEMPT_RT Linux内核的激活路径对250 Hz控制回路的影响,发现标准内核最差延迟超过9 ms,而PREEMPT_RT将最差延迟降低约88%至225微秒以下,但剩余抖动主要由硬件内存争用引起。

Comments 9 pages, 8 figures, conference

详情
AI中文摘要

现代无人机架构日益趋向于将高级自主性和低级飞行控制统一在单个通用操作系统(GPOS)上。然而,复杂的多核片上系统(SoC)由于共享资源争用引入了显著的时间不确定性。本文对Raspberry Pi 5上的PREEMPT_RT Linux内核进行了架构分析,特别隔离了内核激活路径(延迟执行的SoftIRQ与实时直接激活)对250 Hz控制回路的影响。结果表明,在高负载下,标准内核不适合,最差延迟超过9毫秒。相比之下,PREEMPT_RT将最差延迟降低了近88%,降至225微秒以下,通过强制直接唤醒路径减轻了操作系统噪声。这些发现表明,虽然PREEMPT_RT解决了调度方差问题,但现代SoC上的剩余抖动主要由硬件内存争用驱动。

英文摘要

Modern UAV architectures increasingly aim to unify high-level autonomy and low-level flight control on a single General-Purpose Operating System (GPOS). However, complex multi-core System-on-Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real-time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst-case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst-case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake-up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.

2604.17220 2026-06-03 cs.MA cs.AI

Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation

认知异质性动力学:基于大语言模型模拟的多阶段供应链中行为偏差研究

Jiuyun Jiang, Yuecheng Hong, Bo Yang, Jin Yang, Guangxin Jiang, Xiaomeng Guo, Guang Xiao

发表机构 * Harbin Institute of Technology(哈尔滨工业大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文通过引入大语言模型模拟多阶段供应链,基于分层推理框架分析认知异质性对智能体交互的影响,发现信息共享可缓解短视和自利行为导致的系统效率低下。

详情
AI中文摘要

在复杂的多轮决策中,生成式智能体之间的协调建模是人工智能和运营管理的核心挑战。尽管行为实验揭示了供应链效率低下背后的认知偏差,但传统方法面临可扩展性和控制限制。我们引入了一种可扩展的实验范式,使用大语言模型(LLMs)模拟多阶段供应链动态。本研究基于分层推理框架,专门分析了认知异质性对智能体交互的影响。与先前的同质设置不同,我们采用DeepSeek和GPT智能体,系统性地改变供应链各层级的推理复杂度。通过严格重复和统计验证的模拟,我们研究了这种认知多样性如何影响集体结果。结果表明,智能体表现出短视和自利行为,加剧了系统效率低下。然而,我们证明信息共享有效缓解了这些不利影响。我们的发现扩展了传统行为方法,并为AI赋能组织的动态提供了新见解。这项工作强调了基于LLM的智能体作为人类决策代理在复杂运营环境中的潜力和局限性。

英文摘要

Modeling coordination among generative agents in complex multi-round decision-making presents a core challenge for AI and operations management. Although behavioral experiments have revealed cognitive biases behind supply chain inefficiencies, traditional methods face scalability and control limitations. We introduce a scalable experimental paradigm using Large Language Models (LLMs) to simulate multi-stage supply chain dynamics. Grounded in a Hierarchical Reasoning Framework, this study specifically analyzes the impact of cognitive heterogeneity on agent interactions. Unlike prior homogeneous settings, we employ DeepSeek and GPT agents to systematically vary reasoning sophistication across supply chain tiers. Through rigorously replicated and statistically validated simulations, we investigate how this cognitive diversity influences collective outcomes. Results indicate that agents exhibit myopic and self-interested behaviors that exacerbate systemic inefficiencies. However, we demonstrate that information sharing effectively mitigates these adverse effects. Our findings extend traditional behavioral methods and offer new insights into the dynamics of AI-enabled organizations. This work underscores both the potential and limitations of LLM-based agents as proxies for human decision-making in complex operational environments.

2604.15713 2026-06-03 cs.LO cs.AI cs.PL

Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints

Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints

Kevin Kappelmann, Maximilian Schäffeler, Lukas Stevens, Mohammad Abdulaziz, Andrei Popescu, Dmitriy Traytel

发表机构 * Department of Computer Science, University of Sheffield, United Kingdom(英国谢菲尔德大学计算机科学系) Department of Informatics, King’s College London, United Kingdom(英国伦敦国王学院信息学院) Department of Computer Science, University of Copenhagen, Denmark(丹麦哥本哈根大学计算机科学系)

AI总结 研究Isabelle中秩一多态λ项的类型标注问题,通过人类和AI代理(LLM)分别进行纸笔证明和自动形式化,并利用人类提示进行改进和泛化。

详情
AI中文摘要

类型标注在打印项时至关重要,以确保其在重新解析和类型推断下保持含义。我们研究了Isabelle中使用的秩一多态$λ$-演算项的完全且最小类型标注问题。基于Smolka、Blanchette等人的先前工作,我们对该问题进行了元理论阐述,包括完整的形式化规范和证明,并在Isabelle/HOL中进行了形式化。我们的开发是一系列实验,展示了人类驱动和AI驱动的形式化工作流程:人类和基于LLM的AI代理独立产生纸笔证明,AI代理在Isabelle中自动形式化两者,并通过进一步的人类提示AI干预来改进和泛化开发。

英文摘要

Type annotations are essential when printing terms in a way that preserves their meaning under reparsing and type inference. We study the problem of complete and minimal type annotations for rank-one polymorphic $λ$-calculus terms, as used in Isabelle. Building on prior work by Smolka, Blanchette et al., we give a metatheoretical account of the problem, with a full formal specification and proofs, and formalize it in Isabelle/HOL. Our development is a series of experiments featuring human-driven and AI-driven formalization workflows: a human and an LLM-powered AI agent independently produce pen-and-paper proofs, and the AI agent autoformalizes both in Isabelle, with further human-hinted AI interventions refining and generalizing the development.

2604.15097 2026-06-03 cs.SE cs.CL

From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

从程序技能到策略基因:迈向经验驱动的测试时进化

Junjie Wang, Yiming Ren, Haoyang Zhang

发表机构 * Infinite Evolution Lab, EvoMap(无限进化实验室,EvoMap) Tsinghua University(清华大学)

AI总结 本文通过45个科学代码求解场景中的4590次受控试验,研究如何将经验表示为紧凑、面向控制且可迭代进化的对象(Gene),相比文档导向的Skill包,Gene在一次性控制和迭代积累中均表现更优。

Comments Technical Report

详情
AI中文摘要

这份测试版技术报告探讨了可重用经验应如何表示,以便作为有效的测试时控制和迭代进化的基础。我们在45个科学代码求解场景中的4590次受控试验中研究了这一问题。我们发现,面向文档的Skill包提供了不稳定的控制:它们的有用信号稀疏,将紧凑的经验对象扩展为更完整的文档包通常无助于甚至可能降低整体平均值。我们进一步表明,表示本身是一个首要因素。紧凑的Gene表示产生了最强的整体平均值,在显著的结构扰动下仍具有竞争力,并优于预算匹配的Skill片段,而重新附加面向文档的材料通常会削弱而非改进它。除了单次控制外,我们还表明Gene是迭代经验积累的更好载体:附加的失败历史在Gene中比在Skill或自由格式文本中更有效,可编辑的结构比内容本身更重要,并且失败信息在提炼为紧凑警告时比简单附加更有用。在CritPt上,基因进化系统相对于其配对基础模型从9.1%提升至18.57%,从17.7%提升至27.14%。这些结果表明,经验复用的核心问题不是如何提供更多经验,而是如何将经验编码为紧凑、面向控制、可进化的对象。

英文摘要

This beta technical report asks how reusable experience should be represented so that it can function as effective test-time control and as a substrate for iterative evolution. We study this question in 4.590 controlled trials across 45 scientific code-solving scenarios. We find that documentation-oriented Skill packages provide unstable control: their useful signal is sparse, and expanding a compact experience object into a fuller documentation package often fails to help and can degrade the overall average. We further show that representation itself is a first-order factor. A compact Gene representation yields the strongest overall average, remains competitive under substantial structural perturbations, and outperforms matched-budget Skill fragments, while reattaching documentation-oriented material usually weakens rather than improves it. Beyond one-shot control, we show that Gene is also a better carrier for iterative experience accumulation: attached failure history is more effective in Gene than in Skill or freeform text, editable structure matters beyond content alone, and failure information is most useful when distilled into compact warnings rather than naively appended. On CritPt, gene-evolved systems improve over their paired base models from 9.1% to 18.57% and from 17.7% to 27.14%. These results suggest that the core problem in experience reuse is not how to supply more experience, but how to encode experience as a compact, control-oriented, evolution-ready object.

2604.13354 2026-06-03 cond-mat.mtrl-sci cs.AI

Finetuning-Free Diffusion Model with Adaptive Constraint Guidance for Inorganic Crystal Structure Generation

无需微调的扩散模型结合自适应约束引导用于无机晶体结构生成

Auguste de Lambilly, Vladimir Baturin, David Portehault, Guillaume Lambard, Nataliya Sokolovska, Florence d'Alché-Buc, Jean-Claude Crivello

发表机构 * CNRS-Saint-Gobain-NIMS(法国国家科学研究中心-圣戈班-日本纳米科学研究所) Laboratory for Innovative Key Materials and Structures (LINK)(创新关键材料与结构实验室) Laboratory of Computational, Quantitative, and Synthetic Biology (CQSB)(计算、定量与合成生物学实验室) Data-driven Materials Design Group(数据驱动材料设计组) Center for Basic Research on Materials(材料基础研究中心) LTCI, Télécom Paris, Institut Polytechnique de Paris(LTCI,巴黎电信,巴黎理工学院)

AI总结 提出一种基于扩散模型的自适应约束引导生成框架,无需微调即可结合用户定义的物理化学约束,生成满足热力学稳定性和几何约束的无机晶体结构。

Comments Full article including supplementary information, 56 pages, 9 figures

详情
AI中文摘要

发现具有目标性质的无机晶体结构是材料科学中的一个重大挑战。生成模型,尤其是最先进的扩散模型,有望对复杂数据分布进行建模并提出新颖、真实的样本。然而,当前的生成式AI模型仍然难以产生适用于高风险应用的、多样化、原创且可靠的实验可达成材料结构。在这项工作中,我们提出了一种基于扩散模型的自适应约束引导的生成式机器学习框架,该框架能够在生成过程中融入用户定义的物理和化学约束。该方法旨在对人类专家具有实用性和可解释性,允许透明的决策制定和专家驱动的探索。为了确保生成候选结构的鲁棒性和有效性,我们引入了一个多步骤验证流程,该流程结合了训练达到DFT精度水平的图神经网络估计器和用于评估热力学稳定性的凸包分析。我们的方法已在几个经典的无机化合物家族案例研究中得到测试和验证。因此,这些初步结果表明,我们的框架能够生成满足不同无机化学系统中目标几何约束的热力学合理的晶体结构。

英文摘要

The discovery of inorganic crystal structures with targeted properties is a significant challenge in materials science. Generative models, especially state-of-the-art diffusion models, offer the promise of modeling complex data distributions and proposing novel, realistic samples. However, current generative AI models still struggle to produce diverse, original, and reliable structures of experimentally achievable materials suitable for high-stakes applications. In this work, we propose a generative machine learning framework based on diffusion models with adaptive constraint guidance, which enables the incorporation of user-defined physical and chemical constraints during the generation process. This approach is designed to be practical and interpretable for human experts, allowing transparent decision-making and expert-driven exploration. To ensure the robustness and validity of the generated candidates, we introduce a multi-step validation pipeline that combines graph neural network estimators trained to achieve DFT-level accuracy and convex hull analysis for assessing thermodynamic stability. Our approach has been tested and validated on several classical examples of inorganic families of compounds, as case studies. As a consequence, these preliminary results demonstrate our framework's ability to generate thermodynamically plausible crystal structures that satisfy targeted geometric constraints across diverse inorganic chemical systems.

2507.17506 2026-06-03 eess.SP cs.LG

Power-Aware Cognitive Radar Multi-target Tracking Under Unknown Disturbances

未知扰动下的功率感知认知雷达多目标跟踪

Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux

发表机构 * Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des Signaux et Systèmes(巴黎萨克雷大学、法国国家科学研究中心、中央理工大学、信号与系统实验室) SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris(SAMOVAR、 Télécom SudParis、巴黎公立理工学院) DR2I-IPSA

AI总结 针对未知扰动下多目标跟踪问题,提出一种基于部分可观测蒙特卡洛规划(POMCP)的认知雷达框架,通过自适应波形设计和功率分配提升低信噪比目标检测概率和跟踪精度。

详情
AI中文摘要

本文提出了一种认知雷达(CR)框架,旨在利用大规模多输入多输出(MMIMO)系统在未知扰动下跟踪多架飞机。由于均匀功率分配在不同信噪比(SNR)下是次优的,我们结合了由部分可观测蒙特卡洛规划(POMCP)驱动的自适应波形设计。通过为每个目标分配独立的POMCP树,系统高效预测目标状态。这些预测指导一个约束优化问题,主动将发射能量导向较弱的目标,同时为较强的目标维持足够的功率。结果证实,所提出的POMCP方法将低SNR目标的检测概率从0.6提高到接近0.9,并且相比非自适应正交波形或认知均匀功率POMCP基线,对最弱目标的跟踪更精确。

英文摘要

This work presents a cognitive radar (CR) framework designed to track multiple aircraft under unknown disturbances using massive multiple-input multiple-output (MMIMO) systems. Since uniform power allocation is suboptimal across varying signal-to-noise ratios (SNRs), we couple an adaptive waveform design driven by Partially Observable Monte Carlo Planning (POMCP). By assigning an independent POMCP tree to each target, the system efficiently predicts target states. These predictions inform a constrained optimization problem that actively directs transmit energy toward weaker targets while maintaining sufficient power for stronger ones. Results confirm that the proposed POMCP method improves the detection probability for low-SNR targets from 0.6 to nearly 0.9, and yields more accurate tracking of the weakest target than a non-adaptive orthogonal waveform or a cognitive uniform-power POMCP baseline.

2603.26791 2026-06-03 cs.DL cs.AI cs.CL cs.CY

Crystal: Characterizing Relative Impact of Scholarly Publications

Crystal: 表征学术出版物的相对影响力

Hannah Collison, Benjamin Van Durme, Daniel Khashabi

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出Crystal方法,利用大语言模型对引用论文进行联合排序,通过多数投票消除位置偏差,以更准确地区分高影响力引用,在人工标注数据集上准确率提升9.5%,F1提升8.3%。

详情
AI中文摘要

评估被引论文的影响力通常是通过在施引论文中单独分析其引用上下文来完成的。虽然这聚焦于最直接相关的文本,但它阻止了对一篇论文引用的所有作品进行相对比较。我们提出Crystal,它使用大语言模型(LLMs)联合排序施引论文中的所有被引论文。为了减轻LLMs的位置偏差,我们以随机顺序对每个列表进行三次排序,并通过多数投票聚合影响力标签。这种联合方法利用了完整的引用上下文,而不是独立评估引用,从而更可靠地区分有影响力的参考文献。Crystal在人工标注的引用数据集上,准确率比先前最先进的影响力分类器高出9.5%,F1高出8.3%。Crystal通过更少的LLM调用进一步提高了效率,并使用开放权重模型优于先前的基线,实现了可扩展、成本效益高的引用影响力分析。在对ACL时间检验奖获奖论文的案例研究中,我们发现Crystal的影响力特征与长期科学认可高度一致。我们发布了Crystal-Bank,一个包含46.8k篇论文的排名和影响力标签的数据集,以及代码。

英文摘要

Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper cites. We propose Crystal, which instead jointly ranks all cited papers within a citing paper using large language models (LLMs). To mitigate LLMs' positional bias, we rank each list three times in a randomized order and aggregate the impact labels through majority voting. This joint approach leverages the full citation context, rather than evaluating citations independently, to more reliably distinguish impactful references. Crystal outperforms a prior state-of-the-art impact classifier by +9.5% accuracy and +8.3% F1 on a dataset of human-annotated citations. Crystal further gains efficiency through fewer LLM calls and outperforms prior baselines using an open-weight model, enabling scalable, cost-effective citation impact analysis. In a case study of ACL Test-of-Time award-winning papers, we find that Crystal's impact characterizations align closely with long-term scientific recognition. We release Crystal-Bank, a 46.8k-paper dataset with rankings and impact labels, along with code.

2510.21011 2026-06-03 cs.HC cs.AI cs.CY

Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

生成模态工人:跨模型审计41个职业中LLM生成人设的种族与性别

Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff

发表机构 * Human-Computer Interaction Lab, Computer Science and Engineering(人机交互实验室,计算机科学与工程) Santa Clara University(圣克拉拉大学)

AI总结 本研究审计了四个大型语言模型生成的150多万个职业人设,通过与BLS数据对比,发现模型压缩了人口统计变异,系统性地扭曲了种族和性别代表性。

详情
AI中文摘要

随着生成式AI工具越来越多地被用于描绘职业角色中的人物,理解其种族和性别代表性偏差至关重要。我们审计了由四个主要大型语言模型(GPT-4、Gemini 2.5、DeepSeek V3.1和Mistral-medium)生成的41个美国职业中的150多万个职业人设。将这些人与美国劳工统计局(BLS)数据进行比较,我们发现模型生成的人口统计数据比真实世界数据的变异性更小,实际上将每个职业压缩为一种主导人口统计特征,而不是代表总体水平的变异。通过偏移/夸张分解揭示了这些扭曲的结构:白人(-31个百分点)和黑人(-9个百分点)工人持续被低估,而西班牙裔(+17个百分点)和亚裔(+12个百分点)工人被高估,刻板印象的夸张加剧了现有的职业隔离。这些扭曲往往极端,包括几乎全部将管家描绘为西班牙裔,以及许多职业中黑人工人几乎被抹去。由于这些模式在不同机构和文化起源的模型中重复出现,它们表明存在共享的结构性偏差来源,而非模型特定的伪影。我们认为,审计生成式AI需要评估框架,该框架检查合成人口如何系统地重塑跨社会角色的人口统计可见性。

英文摘要

As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models (GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium) across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31 percentage points) and Black (-9 pp) workers are consistently underrepresented, while Hispanic (+17 pp) and Asian (+12 pp) workers are overrepresented, with stereotype exaggeration amplifying existing occupational segregation. These distortions are often extreme, including near-total portrayals of housekeepers as Hispanic and the near-erasure of Black workers from many occupations. Because these patterns recur across models with different institutional and cultural origins, they suggest shared structural sources of bias rather than model-specific artifacts. We argue that auditing generative AI requires evaluation frameworks that examine how synthetic populations systematically reshape demographic visibility across social roles.

2603.23117 2026-06-03 cs.CR cs.AI cs.RO

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

TRAP: 通过对抗性补丁劫持VLA的CoT推理

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji, Wenyuan Xu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TRAP攻击,利用对抗性补丁劫持视觉-语言-动作模型的链式推理,实现目标行为操控。

Comments Accepted by ICML 2026

详情
AI中文摘要

通过集成链式推理,视觉-语言-动作模型在机器人操作中展现出强大能力,特别是在提升泛化性和可解释性方面。然而,基于CoT的推理机制的安全性尚未得到充分探索。在本文中,我们证明CoT推理引入了一种新的攻击向量,用于目标行为劫持——例如,导致机器人错误地将刀递给一个人而不是苹果——而无需修改用户的指令。我们首先提供经验证据表明,即使CoT与输入指令在语义上不一致,它仍然强烈主导动作生成。基于这一观察,我们提出TRAP,这是首个针对CoT推理VLA模型的目标行为劫持对抗性攻击。通过针对推理到动作的路径,TRAP使用对抗性补丁(例如,放置在桌子上的桌布)来引导中间CoT推理和下游动作朝向对手定义的行为。在三个代表性推理VLA上的广泛评估,涵盖了不同的CoT推理机制,证明了TRAP的有效性。值得注意的是,我们在现实环境中通过将补丁打印在纸上实现了该攻击。我们的发现凸显了保护VLA系统中CoT推理的紧迫性。项目页面可在https://zhengxian-huang.github.io/TRAP-website/获取。

英文摘要

By integrating Chain-of-Thought (CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted behavior hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted behavior-hijacking adversarial attack against CoT-reasoning VLA models. By targeting the reasoning-to-action pathway, TRAP uses an adversarial patch (e.g., a tablecloth placed on the table) to steer intermediate CoT reasoning and downstream actions toward adversary-defined behaviors. Extensive evaluations on three representative reasoning VLAs, spanning distinct CoT reasoning mechanisms, demonstrate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems. The project page is available at https://zhengxian-huang.github.io/TRAP-website/.

2603.20508 2026-06-03 cs.MA cs.AI cs.CL

Measuring Weak-to-Strong Legibility of Reasoning Models

衡量推理模型的弱到强可读性

Dani Roytburg, Shreya Sridhar, Daphne Ippolito

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对推理语言模型在多智能体场景中生成的中间思维链,提出“弱到强可读性”概念,并设计衡量指标以评估强模型输出对弱模型的易理解性。

Comments Accepted to Trustworthy AI4GOOD Workshop @ ICML 2026

详情
AI中文摘要

推理语言模型及其生成的中间思维链在多智能体设置(如模型间监控或蒸馏到较小模型)中扮演着越来越核心的角色。当不同能力层级的智能体必须合作时,强模型需要产生能被弱模型消化的轨迹。我们将此目标称为“弱到强可读性”。大模型的可信度部分依赖于这种可读性属性。特别是在安全监督方面,采用弱监控器可能成为健康预算下可靠性支架的标准。可读性要求这些决策轨迹的形状采取某种弱监控器可访问的形式。现有的基于效率的可读性指标未能捕捉“彻底性”,而是侧重于简洁性。

英文摘要

Reasoning language models (RLMs) and the intermediate chains of thought they emit play an increasingly central role in multi-agent setups such as inter-model monitoring or distillation into smaller models. When agents at different capability tiers must cooperate, strong models need to produce traces digestible by weaker ones. We refer to this goal as "weak-to-strong legibility". Trustworthiness of large models depends in part on this legibility property. For safety oversight in particular, adoption of weak monitors may become a standard for reliability scaffolds on a healthy budget. Legibility requires that the shape of these decision-making traces takes some form accessible to weaker monitors. Existing efficiency-based metrics for legibility fail to capture "thoroughness", instead focusing on conciseness.