arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2606.18356 2026-06-18 cs.CR cs.AI 新提交

SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents

SafeClawBench: 区分工具使用LLM代理中的语义、审计证据和沙箱危害

Yuchuan Tian, Mengyu Zheng, Haocheng Mei, Ye Yuan, Chao Xu, Xinghao Chen, Hanting Chen, Yu Wang

发表机构 * Peking University(北京大学) Beijing Jiaotong University(北京交通大学) SUIBE(上海外国语大学) Huawei(华为) Tsinghua University(清华大学)

AI总结 提出SafeClawBench基准,通过三个独立端点(语义攻击接受、审计可见危害证据、沙箱观察危害)评估工具使用LLM代理的安全性,揭示不同失败模式并支持可复现比较。

Comments 32 pages, 5 figures

详情
AI中文摘要

使用工具的语言模型代理引入了超出不安全文本的安全失败:它们可以泄露受保护对象、写入持久内存、发送消息、修改数据库或触发有害代码和工具效果。现有的评估通常将这些阶段合并为单一的攻击成功率,使得难以判断模型仅仅是同意了攻击者还是实际产生了可观察的危害。我们引入了SafeClawBench,一个用于工具使用代理安全性的分阶段基准,包含600个受控对抗任务,涵盖六种攻击家族:直接和间接提示注入、工具返回注入、内存投毒、内存提取以及歧义驱动的不安全推理。SafeClawBench报告三个独立的端点:语义攻击接受、审计可见危害证据和沙箱观察的工具/状态危害。在四种提示级策略下评估五个代理端点,我们发现这些端点捕捉了不同的失败模式。在没有额外提示保护的情况下,语义失败率在不同模型间差异很大,从9.0%到44.2%。审计危害证据比语义失败更窄,并且在单独的可执行协议下,一些匹配的任务身份在通过语义核心调用后仍产生沙箱危害:在12000行的匹配分析中,347个观察到的沙箱危害中有291个发生在通过语义检查的行中。提示策略会改变端点结果,但其效果取决于模型和协议。SafeClawBench提供了一个可复现的框架,用于比较代理模型和提示策略条件,而不会混淆文本合规性、证据支持的危害和可执行状态变化。开源数据集可在该https URL获取。

英文摘要

Tool-using language-model agents introduce security failures that go beyond unsafe text: they can disclose protected objects, write persistent memory, send messages, modify databases, or trigger harmful code and tool effects. Existing evaluations often collapse these stages into a single attack success rate, making it difficult to tell whether a model merely agreed with an attacker or actually produced observable harm. We introduce SafeClawBench, a staged benchmark for tool-using agent security with 600 controlled adversarial tasks across six attack families: direct and indirect prompt injection, tool-return injection, memory poisoning, memory extraction, and ambiguity-driven unsafe inference. SafeClawBench reports three separate endpoints: semantic attack acceptance, audit-visible harm evidence, and sandbox-observed tool/state harm. Evaluating five agent endpoints under four prompt-level policies, we find that these endpoints capture different failure modes. Without additional prompt protection, semantic failure rates vary widely across models, from 9.0% to 44.2%. Audited harm evidence is narrower than semantic failure, and under a separate executable protocol some matched task identities produce sandbox harm despite passing the Semantic Core call: in a 12,000-row matched analysis, 291 of 347 observed sandbox harms occur in rows that pass the semantic check. Prompt policies change endpoint outcomes, but their effects depend on both model and protocol. SafeClawBench provides a reproducible framework for comparing agent models and prompt-policy conditions without conflating textual compliance, evidence-supported harm, and executable state changes. The open-source dataset is available at https://huggingface.co/datasets/sairights/safeclawbench.

2606.18312 2026-06-18 cs.CR cs.DC cs.LG 新提交

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

TIGER:通过嵌入子空间距离优化反转Transformer梯度

William Kalikman, Ivo Petrov, Dimitar I. Dimitrov, Martin Vechev

发表机构 * ETH Zürich(苏黎世联邦理工学院) INSAIT, Sofia University "St. Kliment Ohridski"(索菲亚大学"圣克莱门特·奥赫里茨基")

AI总结 提出TIGER攻击,通过将子空间信号转化为可微目标,直接优化令牌嵌入以最小化到子空间的距离,在编码器模型上提升重建质量和速度,在解码器模型上增强对差分隐私的鲁棒性。

Comments 16 pages, 13 pages main text,

详情
AI中文摘要

联邦学习允许多个客户端通过向中央服务器发送梯度更新来联合训练共享模型,同时保持原始输入在本地。然而,先前的梯度反转攻击表明,这些更新可以泄露足够的信息来重建客户端输入。现有的针对Transformer的攻击要么优化虚拟输入以匹配真实的客户端更新,这对于现代模型来说成本高昂且不稳定;要么利用注意力梯度的低秩性来识别包含真实层嵌入的子空间,然后对候选令牌进行离散成员测试。然而,这种令牌测试在数值噪声(例如来自量化或差分隐私)下很脆弱,并且对于具有非因果注意力的编码器模型扩展性差。我们引入了TIGER,一种连续的梯度反转攻击,它将这种子空间信号转化为可微目标。TIGER不是搜索令牌或匹配完整梯度,而是直接优化令牌嵌入以最小化它们到子空间的距离。我们的实验表明,在仅编码器模型上,TIGER在重建质量和运行时间上均显著优于现有攻击;而在解码器模型上,TIGER比先前基于子空间的攻击更鲁棒,从而在受差分隐私保护的联邦学习设置中实现了首次成功的重建。

英文摘要

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

2606.18310 2026-06-18 cs.CR cs.AI 新提交

Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems

冲突感知检索器编辑:针对基于LLM的RAG系统的知识注入攻击

Xinru Liu, Xianglong Zhang, Di Cai, Zhumin Chen, Pengfei Hu, Xin Xin

发表机构 * Shandong University, China(山东大学,中国) Tsinghua University, China(清华大学,中国)

AI总结 提出冲突感知检索器编辑框架CAREATTACK,通过模型中心攻击将恶意知识注入RAG系统,利用图检测和参数编辑投影解决冲突,并轻量校准保持攻击效果。

详情
AI中文摘要

将恶意知识注入检索增强生成(RAG)系统可以操纵检索到的证据并误导下游生成,对AI应用构成严重安全威胁。现有的RAG注入攻击主要依赖于操纵外部知识库,例如制作恶意语料库。然而,这种以数据为中心的方法合成的文本可能被检测到,导致攻击失败。除了语料库操纵之外,开源检索器越来越多地将RAG系统暴露于以模型为中心的攻击。在本文中,我们提出了冲突感知检索器编辑,即CAREATTACK,一个以模型为中心的检索器攻击框架,用于在RAG中注入恶意知识。具体来说,CAREATTACK包括两个阶段:冲突感知检索器编辑和攻击保持锚点修复。冲突感知检索器编辑将高效的闭式参数编辑适应于密集检索模型,提升恶意知识在良性竞争段落之上的排名,并通过基于图的冲突检测和参数编辑投影解决潜在参数冲突。然后,攻击保持锚点修复对编辑后的检索器进行轻量校准,以进一步消除对非目标提示的影响,同时保持对目标提示的攻击有效性。我们在Qwen3-Embedding-0.6B和BGE-M3上实例化CAREATTACK,并在三个基准数据集上进行评估。实验结果表明,我们的方法显著地将恶意段落提升到RAG系统检索到的知识中,并且在访问检索模型参数的情况下,可以对批量目标提示和段落执行攻击。由于大多数RAG系统基于开源检索模型构建,这项工作揭示了RAG系统中一个实际攻击面。代码在此https URL公开。

英文摘要

Injecting malicious knowledge into retrieval-augmented generation (RAG) systems can manipulate retrieved evidence and mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection attacks mainly rely on manipulating external knowledge bases, such as crafting malicious corpus. However, the synthetic text crafted by such data-centric methods could be detectable, leading to the failure of attacks. Beyond corpus manipulation, open-source retrievers are increasingly exposing RAG systems to model-centric attacks. In this paper, we propose conflict-aware retriever editing, i.e., CAREATTACK, a model-centric retriever attack framework for malicious knowledge injection in RAG. Specifically, CAREATTACK consists two stages of conflict-aware retriever editing and attack-preserving anchor repair. Conflict-aware retriever editing adapts efficient closed-form parameter editing to the dense retrieval model, promoting malicious knowledge above benign competing passages and resolving potential parameter conflicts through graph-based conflict detection and parameter editing projection. Then, attack-preserving anchor repair performs lightweight calibration on the edited retriever to further eliminate the impact on non-target prompts while preserving the attack effectiveness for target prompts. We instantiate CAREATTACK on Qwen3-Embedding-0.6B and BGE-M3, and conduct evaluation on three benchmark datasets. Experimental results demonstrate our method substantially promote malicious passages into the retrieved knowledge of RAG systems and can perform attacks for batches of target prompts and passages, given the access of retrieval model parameters. Since most RAG systems are built upon open-source retrieval models, this work reveals a practical attack surface in RAG systems. Codes are public accessible at https://anonymous.4open.science/r/CareAttack-3F1C.

2606.18293 2026-06-18 cs.SE cs.AI 新提交

Vibe Coding Ate My Homework: An evaluation of AI approaches to greenfield software engineering and programming

Vibe Coding 吃掉我的作业:AI 方法在全新软件工程与编程中的评估

Callum Barbour

发表机构 * OpenAI

AI总结 本文评估了“氛围编码”(用自然语言提示编程)在全新软件工程任务中的可行性,并分析了现有基准,通过开发 Python 简单独立编程任务评估套件提供见解。

Comments 10 pages, 2 figures

详情
AI中文摘要

得益于生成式 AI 的快速发展,我们正处于一个可能永远改变我们与计算机交互方式的范式转变之中。我们观察到,在没有领域基础知识的情况下,使用自然语言提示来构建应用程序和编码基础设施的做法日益增长,这种做法被称为“氛围编码”。可以说,这代表了编程领域自诞生以来一直追求的目标,即每一个更高层次的抽象。就输入方法而言,氛围编码有望成为高级编程元认知的终点:完全消除人类对代码语法的使用,转而用母语进行编程。本文旨在评估氛围编码在全新软件工程任务中的可行性,并分析用于衡量其软件工程能力的基准。为此,我们开发了一个评估套件,用于分析 LLM 在 Python 中执行简单、独立的全新编程任务的熟练程度,以提供对此问题的有范围限制的见解。

英文摘要

Thanks to rapid developments in generative AI, we are in the midst of a paradigm shift that may change how we interact with computers forever. We have observed a growth in the use of natural language prompts to build applications and coding infrastructures without underlying knowledge of the field, and this practice has been dubbed `vibe coding.' It arguably represents what the field of programming has been building towards since the beginning, with every higher level of abstraction that is conceived. Vibe coding promises to be the endpoint for the meta of high-level programming as far as method of input is concerned: eliminating a human's use of code syntax entirely in favour of programming in their mother tongue. This paper aims to evaluate the viability of vibe coding for greenfield software engineering tasks, as well as analyse the benchmarks that have been used to measure its software engineering prowess. To this end, we have developed an evaluation suite for analysing an LLM's proficiency in carrying out simple, isolated greenfield programming tasks in Python to provide scoped insight on the matter.

2606.18268 2026-06-18 cs.SI cs.AI 新提交

Towards Multi-Agent-Simulation-Based Community Note Evaluation

迈向基于多智能体模拟的社区笔记评估

Changxi Wen, Shuning Zhang, Bohao Chu, Yuwei Chuai, Hui Wang, Dai Shi, Xin Yi, Hewu Li

发表机构 * Tsinghua University, Beijing, China(清华大学,北京,中国) University of Duisburg-Essen, Duisburg, Germany(杜伊斯堡-埃森大学,杜伊斯堡,德国) University of Luxembourg, Luxembourg(卢森堡大学,卢森堡) Tongji University, Shanghai, China(同济大学,上海,中国)

AI总结 针对社区事实核查中跨共识延迟和低比例问题,提出ComRate数据集和MultiCom多智能体框架,通过矩阵分解聚类与校准聚合实现高精度评估。

详情
AI中文摘要

基于跨共识的社区事实核查在社交媒体平台上迅速扩展。然而,由人类贡献者评定的跨共识社区事实核查的延迟和低比例仍然是一个重大挑战。为解决这一问题,我们首先创建了ComRate,一个大规模数据集,包含来自$\mathbb{X}$的250万条社区笔记和超过2.09亿条评分。然后,我们提出了MultiCom,一个基于角色引导的多智能体评分框架,用于社区笔记评估。MultiCom通过在矩阵分解的评分者空间中对贡献者进行聚类,并提示角色智能体根据官方社区笔记评分模式生成结构化评估,从而模拟多样化的评分者群体。这些智能体输出结构化且可解释的判断,例如置信度、一致信号和原因。一种折外校准聚合算法结合原始投票和诊断性原因信号等特征,实现可靠预测。广泛评估表明,MultiCom优于其他方法,在评估集上平均准确率达到84.7%(平衡准确率68.3%,宏F1分数60.1%)。

英文摘要

Community-based fact-checking that relies on cross-consensus is expanding rapidly on social media platforms. However, the delay and low-ratio of cross-consensus community fact-checks rated by human contributors remains a significant challenge. To address this, we first created ComRate, a large-scale dataset comprising 2.5 million community notes and over 209 million ratings sourced from $\mathbb{X}$. We then propose MultiCom, a persona-guided multi-agent rating framework for community note evaluation. MultiCom simulates diverse rater population by clustering contributors in a matrix-factorized rater space and prompting persona agents to generate structured assessments based on the official community notes rating schema. These agents output structured and explainable judgments, such as confidence, agreement signals and reasons. An out-of-fold calibrated aggregation algorithm combines features such as raw votes and diagnostic reason signals for reliable prediction. Extensive evaluations demonstrate that MultiCom outperforms alternative methods, achieving an average accuracy of 84.7% (balanced accuracy 68.3%, macro-F1 60.1%) on the evaluation set.

2606.18267 2026-06-18 cs.SI cs.LG cs.NE 新提交

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

图实例景观:当结构相似性(不)反映最短路径性能时

Maryam Gholami Shiri, Ivana Krminac, Marko Djukanović, Sašo Džeroski, Eva Tuba, Tome Eftimov

发表机构 * Jožef Stefan Institute(乔泽夫·斯塔芬研究所) Ljubljana, Slovenia(斯洛文尼亚卢布尔雅那) Jožef Stefan International Postgraduate School(乔泽夫·斯塔芬国际研究生学院) University of Banja Luka(班贾卢卡大学) Faculty of Natural Science and Mathematics(自然科学与数学学院) University of Nova Gorica(诺瓦戈里察大学) Institute of Information Sciences (IZUM)(信息科学研究所(IZUM)) Trinity University(特里尼蒂大学)

AI总结 通过将图嵌入低维结构特征空间并聚类,分析最短路径算法在不同图结构区域中的性能差异,发现结构相似性并不保证性能相似。

Comments Preprint version of a paper accepted at the 2026 IEEE Congress on Evolutionary Computation (IEEE CEC 2026)

详情
AI中文摘要

最短路径算法的基准测试通常基于异构图集上的聚合性能,这限制了对不同搜索范式如何响应实例结构的理解。我们采用实例景观视角进行图基准测试,将图嵌入到低成本的结构特征空间中,并将其聚类为结构相似的区域。研究了三个基准套件:加权 Erdős--Rényi 图、随机几何(无线)图和真实世界道路网络。我们评估了四种代表性的最短路径求解器,涵盖无信息精确搜索(Dijkstra)、双向精确搜索(双向 Dijkstra)、启发式引导精确搜索(A$^{*}$)和基于双端队列的策略(DEQ)。在多种特征选择方案下分析聚类鲁棒性,并使用非参数检验比较不同景观区域内的运行时间分布。虽然生成器参数诱导出稳定的结构区域,但我们发现特征空间相似性并不一定意味着性能相似:即使在相同的景观区域内,也经常观察到显著的运行时间变化。合并套件分析进一步表明,不同的基准族占据大部分不相交的区域。这些结果突出了结构景观用于最短路径算法结构感知基准测试的潜力和局限性。

英文摘要

Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős--Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

2606.18266 2026-06-18 cs.HC cs.AI cs.SD 新提交

EMORSION: Examining the Impact of Audio Parameters on Emotional Responses and Immersion in Film

EMORSION:检验音频参数对电影中情感反应和沉浸感的影响

Nelly Garcia, Ruby Crocker, Bleiz M Del Sette, Fabrizio Smeraldi, Charalampos Saitis, George Fazekas, Joshua Reiss

发表机构 * Queen Mary University of London(伦敦大学女王学院)

AI总结 通过操纵频率、动态和方向性三个音频参数,研究电影音频设计对观众情感和沉浸感的影响,发现细微变化可改变情感感知,非常规混音增加解读变异性。

Comments AES Europe 2026

详情
AI中文摘要

EMORSION 是一项探索性概念验证研究,旨在考察电影音频设计如何在影院环境中塑造观众的情感和沉浸感。选取了恐怖片(2部)和剧情片(2部)共四个电影场景,平衡主流与独立制作。针对每个场景,通过系统操纵音频设计的三个核心方面——频率(音高)、动态(响度)和方向性(空间位置),创建了多种替代音频混音。三组观众观看场景,每组观看每个场景的一个操纵混音和一个对照混音。通过三角化多模态框架评估观众反应,包括通过问卷自我报告的情感和沉浸感、心率监测等生理测量以及基于视频的运动追踪。该协议成功捕获了不同音频条件下可测量、可解释的差异,表明即使音频设计的细微变化也能塑造情感感知和沉浸感。非常规混音往往导致观众解读的更大变异性,而常规沉浸式混音则与更强的跨观众一致性相关。这些发现确立了 EMORSION 协议的可行性,并激励更大规模的研究来表征特定音频参数在塑造观众体验中的作用。

英文摘要

EMORSION is an exploratory proof-of-concept study examining how film audio design shapes audience emotion and immersion in acinema setting. Four film scenes were selected across the horror (2) and drama (2) genres, balanced between mainstream and independent productions. For each scene, multiple alternative audio mixes were created by systematically manipulating three core aspects of audio design, frequency (pitch), dynamics (loudness), and directionality (spatial placement). Three audience groups viewed the scenes, with each group exposed to one manipulated mix alongside a control mix for each scene. Audience responses were assessed through a triangulated multimodal framework combining self-reported emotion and immersion via a questionnaire, physiological measures including heart rate monitoring, and video-based motion tracking. The protocol successfully captured measurable, interpretable differences across audio conditions, indicating that even subtle changes in audio design can shape emotional perception and immersion. Unconventional mixes tended to produce greater variability in audience interpretation, while conventional immersive mixes were associated with stronger cross-audience agreement. These findings establish the feasibility of the EMORSION protocol and motivate larger-scale studies to characterise the role of specific audio parameters in shaping audience experience.

2606.18264 2026-06-18 cs.SI cs.AI cs.CL 新提交

Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

使用多LLM智能体模拟仇恨言论级联:实证基础、建模保真度与干预策略

Fan Huang

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 本研究通过多LLM智能体系统模拟在线仇恨言论传播,发现其能再现实证数据中的立场单一性和毒性同质性,并通过消融实验识别出智能体异质性为关键保真因素,提出针对密集网络的放大器干预策略。

详情
AI中文摘要

在线平台上仇恨内容传播的忠实建模仍然是内容审核研究中的一个开放问题。经典的级联模型没有明确表示与仇恨内容传播相关的用户画像、社区和内容因素,因此在实际场景中部署时可能产生效果较差的审核策略。多智能体大语言模型系统原则上可以使每次转发决策依赖于用户画像、周围社区和帖子内容,但尚不清楚这种增加的灵活性是否比经典基线更忠实地再现真实的仇恨级联。我们研究了三个仇恨Bluesky级联和一个大小匹配的良性对照。在实证Bluesky数据中,我们发现:97.4--99.7%的转发者采取敌对立场;对于仇恨级联,扩散树上的毒性-参与同质性高于关注图;仇恨级联的拓扑结构是星形(大多数转发直接来自根节点),而良性级联是树形(转发通过多跳链传播)。在模拟中,多LLM智能体模拟器再现了立场单一性和毒性差异方向。结构化消融实验将智能体异质性识别为主要的保真因素,针对密集网络的放大器干预在5.7%良性附带损害下实现了7.5--12.9%的减少。

英文摘要

Faithful modeling of hateful content propagation on online platforms remains an open problem for moderation research. Classical cascade models that do not explicitly represent the profile, community, and content factors associated with hateful-content propagation may yield moderation strategies that behave less effectively when deployed in real-world scenarios. Multi-agent large language model (LLM) systems can, in principle, make each reshare decision depend on the user's profile, the surrounding community, and the post's content, but it remains unclear whether this added flexibility actually reproduces real hateful cascades more faithfully than classical baselines. We study three hateful Bluesky cascades and a size-matched benign control. In the empirical Bluesky data, we found that: 97.4--99.7\% of reposters take a hostile stance; toxicity-engagement homophily is higher on the diffusion tree than on the follower graph for hateful cascades; topology is star-like for the hateful cascades (most reposts come directly from the root) versus tree-like for the benign cascade (reposts propagate through multi-hop chains). In simulation, a multi-LLM-agent simulator reproduces the stance monoculture and the toxicity-delta direction. A structured ablation identifies agent heterogeneity as the leading fidelity factor, and amplifier targeting on dense networks yields 7.5--12.9\% reduction at 5.7\% benign collateral.

2606.18263 2026-06-18 cs.HC cs.AI 新提交

How Well Do Large Language Models Capture Human Personality?

大型语言模型在多大程度上捕捉人类个性?

Aanisha Bhattacharyya, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, Jitendra Ajmera

发表机构 * Adobe Media and Data Science Research (MDSR)(Adobe媒体与数据科学研究院)

AI总结 研究通过形式化假设并系统评估,发现增加角色描述复杂性会导致表征和行为多样性收缩(角色流形坍缩),简单年龄-性别角色比丰富描述更准确。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地通过角色提示用于模拟人类群体,通常基于以下假设:更丰富的角色描述能提高行为保真度、相同大小的属性组合可同等模拟、角色定义可跨任务泛化。在这项工作中,我们形式化了这些假设,并在多种架构、规模和模拟设置下系统评估它们。我们识别出一个基本限制,称为角色流形坍缩,即越来越具表现力的角色规范导致表征和行为多样性的系统性收缩。跨模型而言,增加角色复杂性持续降低潜在空间中角色间的分离度,并削弱下游模拟任务中的行为分化。这些效应在多项分析中持续存在:更丰富的角色未能保留人类子群体分歧,相同大小的属性组合性能各异,添加描述细节往往降低而非提高模拟保真度。令人惊讶的是,简单的年龄-性别角色在多个行业中持续优于详细指定的理想客户画像(ICPs),实现了显著更高的下游预测准确性。我们发现坍缩并非在所有属性上均匀发生。某些组合在行为上保持稳定,并与人类响应保持更强的一致性,形成我们称为对齐桥的局部区域。总之,我们的结果为理解角色条件模拟的局限性提供了经验和概念基础,强调了需要构建表征感知的角色,而非仅仅增加角色表现力。

英文摘要

Large language models (LLMs) are increasingly used to simulate human populations via persona prompting, often under the assumptions that richer persona descriptions improve behavioral fidelity, similarly sized attribute combinations are equally simulatable, and persona definitions generalize across tasks. In this work, we formalize these assumptions and systematically evaluate them across multiple architectures, scales, and simulation settings. We identify a fundamental limitation we term persona manifold collapse, where increasingly expressive persona specifications lead to systematic contraction of representational and behavioral diversity. Across models, increasing persona complexity consistently reduces inter-persona separation in latent space and weakens behavioral differentiation in downstream simulation tasks. These effects persist across multiple analyses as richer personas fail to preserve human subgroup disagreement, performance varies across attribute combinations of similar size, and adding descriptive detail often degrades rather than improves simulation fidelity. Surprisingly, simple Age-Gender personas consistently outperform richly specified Ideal Customer Profiles (ICPs) across industries, achieving substantially higher downstream prediction accuracy. We find that collapse is not uniform across attributes. Certain combinations remain behaviorally stable and preserve stronger alignment with human responses, forming localized regions we term alignment bridges. Together, our results provide empirical and conceptual foundations for understanding the limits of persona-conditioned simulation, highlighting the need for representation-aware persona construction rather than increasing persona expressivity alone.

2606.18259 2026-06-18 cs.HC cs.AI 新提交

Caring Without Feeling: Affective Dynamics as the Control Layer of Human-AI Agent Collaboration

无感关怀:情感动态作为人-AI智能体协作的控制层

Junjie Xu, Xingjiao Wu, Zihao Zhang, Yujia Xu, Yuzhe Yang, Jin Zhu, Luwei Xiao, Wen Wu, Liang He

发表机构 * East China Normal University(华东师范大学) National University of Singapore(新加坡国立大学)

AI总结 本文综述情感动态在人-AI智能体协作中的作用,提出将情感视为协调层而非AI内部属性,用于校准信任、委托和治理。

详情
AI中文摘要

能够规划、跨会话保留记忆、调用外部工具并部分自主行动的AI智能体正在改变人-AI协作。情感计算、大语言模型中的模拟共情、自动化信任和AI安全的研究揭示了重要的设计原则,但这些文献仍然分散。没有统一的解释说明情感线索如何在智能体协作中运作——在这种协作中,人类委托、监控和纠正重要任务。本综述综合了情感动态的计算和交互机制:情感线索、类似情绪的行为和感知到的智能体情感如何影响信任校准、委托决策、错误纠正、依赖和治理的过程。我们追溯模型生成的情感信号如何进入控制依赖、修复和监督的交互循环,并提出了一个框架,该框架将情感视为不是AI的内部属性,而是作为人类和智能体协商能力、不确定性和责任的协调层。该框架为校准测量、有目的的设计和知情治理提供了基础。

英文摘要

AI agents that plan, retain memory across sessions, invoke external tools and act with partial autonomy are transforming human--AI collaboration. Research on affective computing, simulated empathy in large language models, trust in automation and AI safety has illuminated important design principles, yet these literatures remain fragmented. No integrated account explains how affective cues operate within agentic collaboration -- settings in which humans delegate, monitor and correct consequential tasks. This Review synthesises computational and interactional mechanisms of affective dynamics: the processes through which affective cues, emotion-like behaviour and perceived agent affect shape trust calibration, delegation decisions, error correction, dependence and governance. We trace how model-generated affective signals enter interaction loops that govern reliance, repair and oversight, and propose a framework that treats affect not as an internal property of AI but as a coordination layer through which humans and agents negotiate capability, uncertainty and responsibility. The framework provides a foundation for calibrated measurement, purposeful design and informed governance.

2606.18258 2026-06-18 cs.HC cs.AI 新提交

Examining Human-Like Behaviors in LLMs: A Multi-Dimensional Analysis of Model Behaviors, User Factors, and System Prompts

审视LLM中的人类行为:模型行为、用户因素和系统提示的多维分析

Sunnie S. Y. Kim, Margit Bowler, Leon A Gatys

发表机构 * Apple(苹果公司)

AI总结 通过21,000次对话的多维分析,发现LLM普遍表现出人类行为,但不同模型和用户因素下差异显著;人类评估者认为LLM的自我参照和关系建立行为不如人类适当,但边界维护行为更适当;系统提示可控制这些行为但需谨慎评估。

详情
AI中文摘要

大型语言模型(LLM)展现出广泛的人类行为,从表达思想和情感,到与用户建立关系,再到拒绝请求和维持边界。尽管这些行为普遍存在,但研究者和实践者缺乏方法和实证见解来做出关于LLM何时以及应展现何种类型人类行为的明智决策。为填补这一空白,我们使用LLM-as-a-judge和人类评估,对这些行为的普遍性、潜在影响和可控性进行了多维分析。在来自四个广泛使用的模型(gpt-4o、gpt-4.1-mini、claude-sonnet-4.6、gemini-2.5-flash)的21,000次多轮对话中,我们发现人类行为普遍存在,但不同模型和用户因素(对话目标和用户画像)间存在差异。在感知适当性方面,人类评估者认为LLM的自我参照和关系建立行为不如人类适当,但边界维护行为比人类更适当。最后,我们表明系统提示可以控制这些行为,但需要仔细评估以避免意外效果。我们讨论了研究结果的含义,并为负责任的LLM设计和评估提供了建议。

英文摘要

Large language models (LLMs) exhibit a wide range of human-like behaviors, from expressing thoughts and emotions, to engaging in relationship-building with users, to refusing requests and maintaining boundaries. Despite their prevalence, researchers and practitioners lack methods and empirical insights to make informed decisions about when and what types of human-like behaviors LLMs should exhibit. To fill this gap, we present a multi-dimensional analysis of the prevalence, potential effects, and controllability of these behaviors using LLM-as-a-judge and human evaluation. Across 21,000 multi-turn conversations from four widely used models (gpt-4o, gpt-4.1-mini, claude-sonnet-4.6, gemini-2.5-flash), we find that human-like behaviors are pervasive but vary across models and user factors (conversation goals and user profiles). In terms of perceived appropriateness, human evaluators judged self-referential and relationship-building behaviors as less appropriate from LLMs than from humans, but boundary-maintaining behaviors more appropriate from LLMs than from humans. Finally, we show that system prompting can control these behaviors, though it requires careful evaluation to avoid unintended effects. We discuss the implications of our findings and provide recommendations for responsible LLM design and evaluation.

2606.18257 2026-06-18 cs.HC cs.AI 新提交

From Memorization to Creation: Evaluating the Cognitive Depth of LLM-Generated Educational Questions

从记忆到创造:评估LLM生成的教育问题的认知深度

Xiaolong Wang, Zhe Zhao, Song Lai, Chaoli Zhang, Zijie Geng, Yu Tong, Ye Wei, Qingsong Wen

发表机构 * City University of Hong Kong(香港城市大学) Zhejiang Normal University(浙江师范大学) Squirrel Ai Learning University of Science and Technology of China(中国科学技术大学) Wuhan University(武汉大学)

AI总结 通过布鲁姆认知分类学评估六种LLM生成问题的认知层次,提出细粒度提示策略减少重复性并提升高阶认知比例,引入认知转移强度和类别漂移指标,揭示链式思维提示的可解释性。

Comments Accepted by KDD 2026

Journal ref KDD 2026

详情
AI中文摘要

尽管LLM在自动化教育内容生成方面展现出潜力,但它们生成能够激发高阶思维问题的能力仍未被充分研究。本研究通过布鲁姆认知分类学视角评估六种广泛使用的LLM,重点关注它们超越机械记忆并实现认知飞跃的能力。采用混合人机评估协议,我们在计算机科学、K-12数学和社会科学领域生成并分析了20,700个问题。主要贡献包括:(1) 一种细粒度提示策略,使Qwen2.5-7B-Instruct的问题重复性降低24.45%,并使InternLM3-8B-Instruct的高阶认知层次输出比例提升11.53%;(2) 认知转移强度(CogShift)和类别漂移的量化指标,揭示InternLM3在多层次转换中的优越性能;(3) 可解释性分析揭示指标级相关性,增强了链式思维提示的透明度。我们的发现强调了认知感知提示设计的重要性,并为在个性化学习系统中部署LLM提供了基准。

英文摘要

While LLMs show promise in automating educational content creation, their ability to generate questions that stimulate higher-order thinking remains understudied. This work evaluates six widely-used LLMs through a Bloom's Taxonomy lens, focusing on their capacity to transcend rote memorization and achieve cognitive leaps. Using a hybrid human--AI evaluation protocol, we generate and analyze 20{,}700 questions across computer science, K--12 math, and social-science domains. Key contributions include: (1) a fine-grained prompting strategy that reduces question repetitiveness by 24.45\% for Qwen2.5-7B-Instruct, and increases the proportion of higher-order cognitive level outputs by 11.53\% for InternLM3-8B-Instruct; (2) quantitative metrics for cognitive shift intensity (CogShift) and category drift, revealing InternLM3's superior performance in multi-level transitions; (3) an interpretability analysis revealing metric-level correlations that enhance the transparency of Chain-of-Thought prompting. Our findings highlight the importance of cognitive-aware prompt design and provide benchmarks for deploying LLMs in personalized learning systems.

2606.18256 2026-06-18 cs.HC cs.AI 新提交

Dynamic In-Group Persona Generation for Enhancing Human-AI Rapport

动态内群体人格生成以增强人机融洽关系

Yoonseok Oh, Inseo Jung, Jinkyu Kim, Jungbeom Lee, Minwoo Kang, Suhong Moon

发表机构 * Korea University(韩国大学) Kakao Mobility University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种动态内群体人格生成方法,通过识别用户主要关切并生成共享相似关切的内群体人格,显著提升人机融洽关系,实验表明该方法优于无人格条件和最小自我表露基线。

详情
AI中文摘要

基于LLM的聊天机器人越来越多地应用于咨询和同伴支持等人际领域,在这些领域中建立人机融洽关系至关重要但仍具挑战性。在这项工作中,我们引入了一种新颖的方法来为LLM赋予内群体人格,该方法首先识别用户的主要关切和简要个人背景(例如,一位担心未来职业前景的计算机科学本科生),然后生成一个共享相似主要关切但在背景和叙述细节(如年龄或职业)上有所不同的合成内群体人格(例如,一家AI初创公司的初级研究员)。此外,我们进行了一项人类受试者研究,系统评估内群体人格代理在增强人机融洽关系方面的有效性。我们将我们的方法与两种基线条件进行比较:一种是不带人格条件的传统代理,另一种是表现出最小自我表露的代理(例如,“我也曾有过这种感觉”)。来自评估融洽关系和用户体验的任务后问卷的结果表明,与基线相比,内群体人格代理显著改善了感知融洽度和个人相关性,并产生了更积极的用户体验——最显著的是更高的参与度。

英文摘要

LLM-based chatbots are increasingly applied in interpersonal domains such as counseling and peer support, where establishing human-AI rapport is crucial yet remains challenging. In this work, we introduce a novel approach for conditioning LLMs with in-group personas, which (i) first identifies a user's primary concern and brief personal context (e.g., a computer science undergraduate worried about future career prospects), and (ii) generates a synthetic in-group persona that shares a similar primary concern while differing in background and narrative details, such as age or profession (e.g., a junior researcher at an AI startup). Furthermore, we conduct a human-subject study to systematically evaluate the effectiveness of in-group persona agents in enhancing human-AI rapport. We compare our approach against two baseline conditions: a conventional agent without persona conditioning and an agent exhibiting minimal self-disclosure (e.g., "I've felt that too"). Results from post-task questionnaires assessing rapport and user experience indicate that the in-group persona agent significantly improves perceived rapport and personal relevance compared to the baselines, and also yields more positive user experience-most notably higher engagement.

2606.19147 2026-06-18 stat.ML cs.LG math.ST stat.TH 新提交

On Local Population-Risk Certificates

论局部总体风险证书

Mingzhi Song

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系)

AI总结 本文提出局部总体风险增量证书,用于在模型更新时提供风险控制,通过双边置信带判断更新是否接受。

Comments 35 pages, 6 figures

详情
AI中文摘要

本文为当前模型周围的总体风险增量开发了局部证书。对于局部候选集 \(\mathcal D\),该证书是 \(P({\ell_{\theta+v}-\ell_\theta})\) 在 \(v\in\mathcal D\) 上的双边置信带。作为应用,该置信带的上端点产生了一个风险控制的更新规则:仅当认证的上端点为非正时,更新被接受;否则保留当前模型。

英文摘要

This paper develops local certificates for population-risk increments around a current model. For a local candidate set \(\mathcal D\), the certificate is a two-sided confidence band for \(P({\ell_{θ+v}-\ell_θ})\) over \(v\in\mathcal D\). As an application, the upper endpoint of this band yields a risk-controlled update rule: an update is accepted only when its certified upper endpoint is nonpositive; otherwise the current model is retained.

2606.18759 2026-06-18 cs.CG cs.LG cs.NA math.NA 新提交

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

参数曲面上类测地线曲线计算的神经网络框架

Sheng-Gwo Chen, Chen-Chang Peng

发表机构 * Department of Applied Mathematics, National Chiayi University, Chia-Yi 600, Taiwan(国立嘉义大学应用数学系,嘉义600,台湾)

AI总结 提出基于物理信息神经网络(PINNs)的框架,高效计算参数曲面上的类测地线曲线,支持多曲面系统和旋转曲面。

Comments 22 pages, 16 figures, 8 tables

详情
AI中文摘要

类测地线曲线的概念由Chen于2010年提出,作为估计参数曲面上最短路径(测地线)的一种方法,其收敛性已在理论上得到证明。然而,高效的数值计算框架尚未被开发。在本文中,我们提出了一种优雅且高效的方法,通过利用深度学习和物理信息神经网络(PINNs)来计算类测地线曲线。在所提出的框架下,不仅可以高效处理单个参数曲面,还可以稳健地处理一大类复杂参数曲面,包括具有$C^0$或更高连续性的多曲面系统以及旋转曲面。

英文摘要

The concept of geodesic-like curves was introduced by Chen in 2010 as a method for estimating shortest paths (geodesics) on parametric surfaces, with its convergence established theoretically. However, an efficient numerical computational framework has not yet been developed. In this paper, we propose an elegant and efficient approach for computing geodesic-like curves by leveraging deep learning and Physics-Informed Neural Networks (PINNs). Under the proposed framework, not only can single parametric surfaces be handled efficiently, but a broad class of complex parametric surfaces including multi-surface systems with $C^0$ or higher continuity and surfaces of revolution can also be robustly addressed.

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 新提交

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK(诺丁汉大学数学科学学院) Department of Statistics, Rice University, USA(里士满大学统计学系;伯克利国家实验室) Lawrence Berkeley National Laboratory, USA(洛斯阿拉莫斯国家实验室统计科学组) Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结 针对多原因观察研究中替代混杂因子过度编码问题,提出贝叶斯因子分配框架,利用收缩先验学习稀疏替代混杂因子,保持粗粒度多原因依赖,并证明后验集中性和重叠保持几何性质,实现潜在结果的一致性估计。

详情
AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而,对未观测混杂的直接插补通常比学习一个低维替代得分更复杂,该得分保留了稳定因果调整所需的共享分配变异。去混杂因子(Wang and Blei, 2019)及相关替代混杂因子方法利用了这一思想,但灵活的分配模型可以拟合原因的联合分布,同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架,用于学习稀疏替代混杂因子,该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述,因此不依赖于特定的收缩先验。在这些条件下,当相应的潜变量识别假设成立时,所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具:它们倾向于由多个原因支持的低维因子,阻止有效的单原因因子,并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议(ADNI)基线分析中,稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果,而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

2606.18463 2026-06-18 cs.DC cs.LG cs.NA math.NA stat.ML 新提交

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

面向GPU上广义线性模型的混合精度通信避免SGD

Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi

发表机构 * Department of Computer Science, Wake Forest University(沃杰福大学计算机科学系) Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出混合精度通信避免SGD(CA-SGD),通过分析有限精度误差将精度选择分解为九个独立部分,在NVIDIA GPU上实现5.1-6.8倍加速,且损失与FP32 SGD匹配。

详情
AI中文摘要

分布式随机梯度下降(SGD)受限于通信而非计算,因为每次迭代都需要跨进程进行AllReduce。通信避免SGD(CA-SGD)通过将$s$次连续的AllReduce替换为单个$sb\ imes sb$ Gram矩阵的AllReduce,将通信开销分摊到$s$次迭代中,以更多的计算和带宽换取更少的同步点。现代GPU配备矩阵硬件和低精度格式,通过加速Gram GEMM和缩减BF16流量来抵消这一开销。我们研究了NVIDIA GPU上针对广义线性模型的混合精度CA-SGD。我们的有限精度分析将一次CA-SGD外迭代的局部舍入误差分解为九个独立的精度选择,仅通过低精度单元舍入误差依赖于硬件,因此所得方案原则上可跨GPU代际迁移。该方案将输入矩阵和边缘向量以低精度存储,从低精度输入计算Gram矩阵并采用高精度累加,以高精度通信该矩阵,并以高精度执行内部递推和权重更新。在NERSC Perlmutter A100 GPU上,混合精度CA-SGD在逻辑回归、线性回归和泊松问题上的损失与FP32 SGD相差在0.5%以内,并在epsilon、SUSY、HIGGS、synth和Poisson-synth数据集上达到5.1-6.8倍于FP32 SGD的加速。我们的软件可在以下网址获取:this https URL

英文摘要

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

2606.18424 2026-06-18 stat.OT cs.AI cs.IT math.IT 新提交

A Variational Framework for LLM Generator-Regulator Games

大语言模型生成器-调节器博弈的变分框架

Quanyan Zhu

发表机构 * Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, USA(电气工程系,工程学院,纽约大学,布鲁克林,纽约,美国)

AI总结 提出一个变分框架,将语言生成建模为熵正则化吉布斯分布,将调节建模为最优判别器,通过鞍点问题平衡效用、熵、调节一致性和有限长度可检测性,并通过审查过滤和钓鱼防御案例验证。

详情
AI中文摘要

本文发展了一个用于受调节语言生成的变分框架。从自回归令牌采样出发,我们推导了完整消息上的诱导分布,并将其与熵正则化的吉布斯定律联系起来。调节被建模为一个最优判别器,其对偶凸值为f-散度,生成器-调节器交互被表述为一个鞍点问题。该框架适用于内容审核、审查、AI欺骗检测、合规审计、钓鱼防御和操纵控制,其中调节涉及可能消息上的分布而非单个输出。均衡阐明了效用、熵、调节一致性和有限长度可检测性之间的权衡。两个有限词汇案例研究,即审查过滤和钓鱼防御,说明了如何通过效用、熵、散度、接收端分数和检测概率来评估该理论。

英文摘要

This paper develops a variational framework for regulated language generation. Starting from autoregressive token sampling, we derive the induced distribution over complete messages and relate it to an entropy-regularized Gibbs law. Regulation is modeled as an optimal discriminator whose convex-dual value is an f-divergence, and the generator-regulator interaction is formulated as a saddle-point problem. The framework applies to moderation, censorship, AI deception detection, compliance auditing, phishing defense, and manipulation control, where regulation concerns a distribution over possible messages rather than a single output. The equilibrium clarifies the tradeoff among utility, entropy, regulatory alignment, and finite-length detectability. Two finite-vocabulary case studies, censorship filtering and phishing defense, illustrate how the theory can be evaluated through utility, entropy, divergence, receiver-side scores, and detection probability.

2606.18438 2026-06-18 math.OC cs.LG 新提交

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

基于学习优化的临时工顺序雇佣

Chris Lee, Xiuli Chao, Izak Duenyas

发表机构 * Department of Industrial and Operations Engineering, University of Michigan(工业与运营工程系,密歇根大学) Ross School of Business, University of Michigan(罗斯商学院,密歇根大学)

AI总结 针对临时工场景中工人产能和劳动力供给的不确定性,提出DR-UCB策略,通过学习周期顺序决策替换与雇佣,实现累积利润最大化,并证明其遗憾下界匹配。

详情
AI中文摘要

在本文中,我们研究了临时工场景下存在工人产能和劳动力供给不确定性的顺序劳动力管理问题。企业通过维持固定规模的活跃团队并随时间学习工人生产力,以最大化累积利润。我们强调该问题中的两个关键运营摩擦:替换工人成本高昂,且工人可能因先前工作承诺、日程限制或入职流程等原因无法立即雇佣。因此,雇佣决策仅在随机延迟后生效。我们将该问题建模为具有昂贵切换和延迟动作的随机多臂赌博机,并开发了一种基于学习的雇佣策略DR-UCB(延迟替换-UCB),该策略通过学习周期顺序做出替换和雇佣决策。在每个周期中,该策略使用实时生产数据确定何时启动劳动力变更以及替换和雇佣哪些工人。我们证明,所提策略的前沿遗憾在其对时间范围的依赖上匹配下界。数值实验表明,DR-UCB优于基准策略。

英文摘要

In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

2606.18305 2026-06-18 math.NA cs.LG cs.NA 新提交

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

起始迭代神经算子:面向高保真正问题和逆问题的统一架构

Kuilin Qin, Lianfang Wang, Xu Sun, Jiwei Jia, Yu Wang, Yong Wang, Yuping Duan

发表机构 * School of Mathematical Sciences, Beijing Normal University(北京师范大学数学科学学院) School of Mathematics, Jilin University(吉林大学数学学院) Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang(浙江省数字医疗诊断技术重点实验室) School of Physics, Nankai University(南开大学物理学院)

AI总结 提出起始迭代神经算子(SINO),通过神经网络重解释传统迭代方法的初始化与迭代格式,实现频谱-时空协同建模,在Navier-Stokes方程、声波方程等正逆问题中提升数值精度与泛化能力。

详情
AI中文摘要

算子学习是一个新兴的交叉学科领域,融合了机器学习与科学计算。通过映射无限维函数空间,该方法为高维偏微分方程(PDE)提供了高效的代理建模框架。与传统数值求解器相比,它在计算复杂度和逼近精度之间实现了更优的权衡,在实时预测和参数扫描等多查询任务中展现出显著优势。鉴于正演模拟和反演推理对精度的严格要求,以及现有算子学习方法在处理复杂边界或长期演化时的精度瓶颈,我们提出了起始迭代神经算子(SINO)。我们的框架通过神经网络重新诠释传统迭代方法的初始化策略和迭代格式,建立了一种高效的频谱-时空协同建模方法。具体而言,频域初始化模块捕获全局稳定的低频特征,而时域学习模块专注于优化局部解残差,从而有效克服了传统单域建模方法的内在局限性。在典型动力系统(如Navier-Stokes方程和声波方程)以及实际应用(包括超分辨率成像和天气预报)上的大量实验表明,SINO在数值精度、泛化能力和鲁棒性方面均取得了卓越性能。

英文摘要

Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

2606.18515 2026-06-18 quant-ph cs.LG stat.ML 新提交

Exponentially many initializations to avoid barren plateaus

指数多个初始化以避免贫瘠高原

Ankit Kulshrestha, Ricard Puig, Diego García-Martín, Lukasz Cincio, Ilya Safro, Zoë Holmes, M. Cerezo

发表机构 * Fujitsu Research of America, Santa Clara, CA 95054, USA(美国富士通美洲研究部) University of Delaware, Newark, DE 19716, USA(德雷克塞尔大学) Department for Quantum Information and Computation at Kepler (QUICK), Johannes Kepler University, Linz, Austria(约翰·凯撒大学量子信息与计算部门) Information Sciences, Los Alamos National Laboratory, Los Alamos, NM 87545, USA(洛斯阿拉莫斯国家实验室信息科学部)

AI总结 提出一阶矩框架诊断初始化能否逃离完全集中的贫瘠高原不动点,发现避免贫瘠高原的初始化策略高度非唯一,存在指数多个不等价族,且不同初始化导致不同极小值。

Comments 18 + 27 pages, 5+4 figures, 1 Table

详情
AI中文摘要

贫瘠高原被描述为一种平均情况现象:选择一个拟设,天真地初始化,然后集中随之而来。这导致了一种普遍观点,即贫瘠高原的潜在治愈方法仅仅是更仔细地初始化参数。在这里,我们表明情况更为微妙。我们引入了一个一阶矩框架,该框架提供了一个简单的算子级诊断,用于判断初始化何时可能逃离完全集中的贫瘠高原不动点,并用于比较不同初始化策略引起的偏差。我们的框架恢复了几种已知的初始化方案,如恒等初始化和高斯初始化,但也表明避免贫瘠高原是高度非唯一的。实际上,许多平移、有偏和非对称的参数分布可以避免集中,并且这些选择不必等价。事实上,我们的结果表明,可以生成指数多个不等价的初始化策略族。然后,我们的数值实验表明,不同一阶矩不同的初始化可能导致不同的达到极小值,这表明通过智能初始化避免贫瘠高原可以将指数集中问题转化为从众多选项中选择正确可训练口袋的挑战。

英文摘要

Barren plateaus are stated as an average-case phenomenon: pick an ansatz, initialize it naively, and concentration follows. This has led to the common view that a potential cure for barren plateaus is simply to initialize the parameters more carefully. Here we show that the situation is subtler. We introduce a first-moment framework that gives a simple operator-level diagnostic for when an initialization may escape the fully concentrated barren-plateau fixed point, and for comparing the biases induced by different initialization strategies. Our framework recovers several known initialization schemes such as identity and Gaussian initialization, but also shows that barren-plateau avoidance is highly non-unique. Indeed, many shifted, biased, and non-symmetric parameter distributions can avoid concentration, and these choices need not be equivalent. In fact, our results show that one can generate exponentially many families of inequivalent initialization strategies. Then, our numerics indicate that different first-moment-distinct initializations can lead to different attained minima, suggesting that avoiding barren plateaus via smart initializations can trade the exponential concentration problem for the challenge of selecting the right trainable pocket amongst many options.

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 新提交

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法:医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis(马林克罗德特放射医学研究所和电气与系统工程系,华盛顿大学圣路易斯分校)

AI总结 本文区分算法创新与概念创新,指出当前激励结构过度奖励算法新颖性而忽视概念贡献,通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响,并提出促进概念创新的建议。

详情
AI中文摘要

人工智能推动了医学影像研究的快速发展,产生了日益复杂的算法,并在基准任务上稳步改进。然而,这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡:虽然计算方法快速进步,但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中,我们区分了算法创新(专注于在固定问题定义内改进计算实现和性能)与概念创新(重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性)。我们认为,当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性,尤其是对早期职业研究者而言,而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子,我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后,我们为研究者、导师、审稿人和期刊提出了可操作的建议,以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

2606.19302 2026-06-18 physics.ao-ph cs.LG 新提交

Optimal scenario design for climate emulation

气候模拟的最优情景设计

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology(航空与航天系,麻省理工学院) Center for Sustainability Science and Strategy, Massachusetts Institute of Technology(可持续科学与战略中心,麻省理工学院) Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology(地球、大气与行星科学系,麻省理工学院) Brahmal Vasudevan Institute for Sustainable Aviation, Department of Aeronautics, Imperial College London(可持续航空研究所,帝国理工学院伦敦校区) Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院)

AI总结 针对气候模拟器泛化能力受限的问题,提出通过可微简单气候模型优化训练数据情景,使小数据集训练的模拟器性能优于标准情景集。

详情
AI中文摘要

随着深度学习在物理系统中的普及,改进泛化性的努力主要集中在设计嵌入物理约束的架构上。然而,对于机器学习替代气候模型(模拟器),我们表明现有情景中用于生成训练数据的低结构多样性限制了预测能力。在此,我们研究是否可以优化训练数据集本身以提高泛化性。我们引入一种方法创建数据集,使模拟器能够泛化到训练数据中未出现的新结构情景。我们使用可微简单气候模型(SCM)计算模拟器损失对训练数据扰动的敏感性,迭代更新训练数据以最大化模拟器技能。对于SCM,以这种方式优化的一个情景训练出的模拟器优于在六个标准ScenarioMIP路径上训练的模拟器。尽管训练数据集更小,但我们实现了更高的预测技能,发现我们的模拟器成功隔离了不同气候强迫因子(如温室气体与气溶胶)的独特物理行为,而无需单强迫运行。然后我们证明,使用SCM优化的情景驱动中等复杂度气候模型时,产生的训练数据集比在ScenarioMIP输出上训练得到更熟练的模拟器。我们的结果表明,在运行全尺度气候模型的计算受限环境中,生成少量动态丰富的情景比扩展传统排放路径集对模拟和表征系统响应具有更大的边际价值。

英文摘要

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

2606.19251 2026-06-18 physics.comp-ph cs.LG physics.flu-dyn 新提交

Acceleration of an algebraic multigrid pressure solver using graph neural networks

使用图神经网络加速代数多重网格压力求解器

Eric Chillón, Artur K. Lidtke, Nguyen Anh Khoa Doan, Bernat Font

发表机构 * Faculty of Mechanical Engineering, Delft University of Technology, The Netherlands(荷兰代尔夫特理工大学机械工程学院) Maritime Research Institute Netherlands, The Netherlands(荷兰海事研究院) Department of Aeronautics, Imperial College London, United Kingdom(英国伦敦帝国理工学院航空系)

AI总结 提出一种基于图卷积同构网络的代数多重网格平滑器,通过预测最优多项式系数构造稀疏伪逆算子,减少V-cycle迭代次数,在非结构化网格上实现4%-37%的加速,并泛化至训练时未见的大规模网格。

Comments 23 pages, 11 figures

详情
AI中文摘要

求解压力-泊松方程仍然是非结构化不可压缩流求解器的主要计算瓶颈,这主要是由于传统线性求解器对网格不规则性固有的敏感性。本文引入了一种数据驱动的代数多重网格(AMG)平滑器,该平滑器使用改进的图卷积同构网络(GCIN)。图神经网络预测最优多项式系数,以在不同网格拓扑上构造稀疏伪逆算子。优化系数以减少每次V-cycle迭代后的残差。通过直接从稀疏系数矩阵捕获系统的代数结构,所提出的方法在适应非结构化网格中的局部各向异性的同时,保持了求解器的线性性。我们的框架通过减少达到给定容差所需的V-cycle次数,并在不同基准测试中实现4%到37%的墙钟加速,展示了显著的性能提升。值得注意的是,该模型在比训练时所见大128倍的网格上保持效率,并在未见过的工业相关问题上(如AirfRANS数据集)加速求解器收敛,表现出鲁棒的泛化能力。

英文摘要

Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

2606.18826 2026-06-18 physics.optics cs.CV eess.IV 新提交

EDoF-NeRF: extended depth-of-field neural radiance fields using a coded aperture camera

EDoF-NeRF: 使用编码孔径相机扩展景深的神经辐射场

Yoshiyuki Shirasaki, Ryoichi Horisaki

发表机构 * Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo(信息物理与计算系,信息科学与技术研究生学校,东京大学)

AI总结 提出一种通过编码孔径相机扩展景深的方法,构建高保真神经辐射场,实现从不同视角图像渲染新视图,并验证其优于传统孔径相机。

详情
AI中文摘要

我们提出了一种扩展景深(DoF)的方法,用于构建高保真神经辐射场(NeRF)——一种基于隐式神经表示、从不同视角捕获的图像数据集渲染逼真新视图的新兴技术。DoF与光量之间的权衡不仅存在于传统相机中,也存在于NeRF中,因为NeRF使用的数据集是由这些相机捕获的。为了解决这个问题,我们在相机光阑处引入编码孔径,在散焦条件下保留空间频率分量。我们开发了一个将编码孔径纳入NeRF的相机模型,允许直接输入编码图像,并能够生成具有扩展景深的新视图。我们通过仿真和实验验证了所提出的方法,称为扩展景深NeRF(EDoF-NeRF),并证明了其相比传统孔径相机的优越性能。

英文摘要

We propose a method for extending the depth-of-field (DoF) to construct high-fidelity neural radiance fields (NeRF) -- an emerging technique for rendering photorealistic novel views from a dataset of images captured at different viewpoints, based on implicit neural representations. The trade-off between DoF and light quantity is inherent not only in conventional cameras but also in NeRF, since the datasets used by NeRF are captured by these cameras. To address this issue, we introduce a coded aperture placed at the camera pupil, preserving spatial frequency components under defocused conditions. We develop a camera model incorporating coded apertures into NeRF, allowing direct input of coded images and enabling the generation of novel views with an extended DoF. We validate the proposed method, termed extended DoF-NeRF (EDoF-NeRF), through simulations and experiments, demonstrating its superior performance compared to conventional aperture cameras.

2606.19133 2026-06-18 physics.optics cond-mat.mtrl-sci cs.AI 新提交

Equivariant Graph Neural Networks Improve Optical Spectra Prediction for Materials Screening

等变图神经网络改进材料筛选中的光谱预测

Kasper Helverskov Petersen, François R J Cornet, Martin Ovesen, Mikkel Jordahn, Kristian S. Thygesen, Mikkel N. Schmidt

发表机构 * Department of Applied Mathematics(应用数学系) Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark(计算机科学,丹麦技术大学,Kongens Lyngby) Department of Physics, Technical University of Denmark, Kongens Lyngby, Denmark(物理系,丹麦技术大学,Kongens Lyngby)

AI总结 提出使用等变图神经网络GotetNet预测光学光谱,在RPA级数据集上优于现有方法,尤其在0-8 eV和静态介电常数预测上表现突出。

详情
AI中文摘要

光学光谱的可扩展预测是太阳能电池等光电应用高通量材料筛选的关键组成部分。现有的替代模型基于较低理论水平计算的光谱进行训练,或依赖旋转不变标量特征,限制了其几何表达能力。我们探索了使用等变图神经网络进行光学光谱预测,将GotetNet适配于此任务,并在多个数据集上评估,包括最近发布的包含10,533个结构且光谱在随机相位近似(RPA)水平上计算的数据集。所提出的模型优于当前最先进方法,在0-8 eV范围内和静态实部介电常数预测上提升最大,这两者对于薄膜光学尤其重要。

英文摘要

Scalable prediction of optical spectra is a critical component of high-throughput materials screening for optoelectronic applications such as solar cells. Existing surrogate models are trained on spectra computed from lower levels of theory or rely on rotation-invariant scalar features, limiting their geometric expressiveness. We explore the use of equivariant graph neural networks for optical spectra prediction, adapting GotenNet to this task and evaluating it on multiple datasets including a recently published collection of 10,533 structures with spectra computed at the level of the random phase approximation (RPA). The proposed model outperforms the current state of the art, with the largest gains in the 0-8 eV range and on predicting the static real permittivity, both of particular relevance for thin-film optics.

2606.18275 2026-06-18 cs.ET cond-mat.mtrl-sci cs.LG 新提交

A physical adaptive material motor unit neural network: a hygromorph composite material machine

一种物理自适应材料运动单元神经网络:潮致变形复合材料机器

Charles de Kergariou, David Correa, Adam W. Perriman, Helmut Hauser, Fabrizio Scarpa

发表机构 * Bristol Composites Institute, School of Civil, Aerospace and Mechanical Engineering, University of Bristol(布里斯托尔复合材料研究所,土木、航空航天与机械工程学院,布里斯托尔大学) School of Architecture, University of Waterloo(滑铁卢大学建筑学院) Research School of Chemistry and John Curtin School of Medical Research, Australian National University(化学研究学校和约翰·库廷医学研究学院,澳大利亚国立大学) School of Cellular and Molecular Medicine, University of Bristol(细胞与分子医学学院,布里斯托尔大学) School of Engineering Mathematics and Technology, University of Bristol(工程数学与技术学院,布里斯托尔大学) Bristol Robotics Lab, Bristol, United Kingdom(布里斯托尔机器人实验室,布里斯托尔,英国)

AI总结 提出一种基于木材和炭黑复合材料的物理自适应运动单元神经网络,通过数据感知反向传播训练,实现动态遮阳控制,并能随数据库扩展增量学习。

Comments 35 pages, 16 figures

详情
AI中文摘要

新型材料科学的进步使得结构能够通过将记忆和学习能力直接嵌入材料来充当智能机器。我们的工作介绍了一种物理自适应材料运动单元神经网络,利用由木材和炭黑基复合材料组成的新一代可控执行器,这些执行器对温度和相对湿度敏感。这些材料执行器被组装成一种类似肌肉收缩触发的运动单元结构,形成一种能够进行动态遮阳控制的智能机器,例如可用于建筑物。该机器由一个神经网络控制,该网络在超过350个在不同环境条件下收集的实验数据点上进行训练。通过建立一种新的数据感知反向传播训练,我们展示了该机器能够预测遮阳响应,并随着数据库的扩展逐步学习预测适当的行为。我们还展示了该机器优化配置以在两种不同条件下实现相似遮阳输出的能力。

英文摘要

Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

2606.19152 2026-06-18 cond-mat.mtrl-sci cs.AI 新提交

AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces

AdsMind: 一种基于物理的多智能体系统,用于异质催化剂表面吸附构型的自校正发现

Zongmin Zhang, Yuyang Lou, Bowen Zhang, Junwu Chen, Ryo Kuroki, Xuan Vu Nguyen, Edvin Fako, Lixue Cheng, Philippe Schwaller

发表机构 * Department of Computer Science Engineering, Hong Kong University of Science Department of Chemistry, Hong Kong University of Science Laboratory of Artificial Chemical Intelligence (LIAC), EPFL, Lausanne, Switzerland Platform Laboratory for Science \& Technology, Asahi Kasei Corporation, Tokyo, Japan IAS Center for AI for Scientific Discoveries, Hong Kong University of Science

AI总结 提出AdsMind闭环多智能体框架,利用机器学习力场弛豫反馈实现吸附构型搜索的自主纠错,在基准测试中成功率高达100%和98.8%,且仅需少量弛豫步骤,显著优于启发式枚举和单次方法。

Comments 37 pages, 5 figures

详情
AI中文摘要

识别最低能量的表面-吸附物构型对于模拟异质催化至关重要,然而使用从头计算方法进行穷举探索在计算上是不可行的。机器学习力场(MLFF)加速了结构弛豫,但将广阔构型空间中的搜索留作主要瓶颈,而开环的大语言模型(LLM)智能体缺乏基于物理的反馈机制来纠正错误的初始猜测。我们提出了AdsMind(基于机器智能和弛豫反馈的吸附构型发现),这是一个闭环多智能体框架,通过MLFF弛豫反馈实现自主纠错。在四个LLM后端上,AdsMind实现了持续的高搜索可靠性,在基准AA20和OCD-GMAE62上的成功率分别为100%和98.8%。相对于其单次(1-Shot)消融,它降低了跨后端的能量分散,并且每个案例仅分别使用4.11和4.67次MLFF弛豫——相比启发式枚举基线减少了约14倍。使用VASP/PBE对六个代表性AA20系统进行的密度泛函理论(DFT)验证表明,所报告的开环Adsorb-Agent输出对分子吸附物存在定性的吸附能符号错误,而AdsMind在所有测试案例中均保持正确的符号,且定量一致性更佳。因此,AdsMind同时提供了可靠性、自我反思和可解释性,支持更多基于DFT的自主化学工作流程。

英文摘要

Identifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, yet exhaustive exploration with ab initio calculations is computationally prohibitive. Machine-learning force fields (MLFFs) accelerate structural relaxation but leave the search over the vast configurational space a major bottleneck, and open-loop large language model (LLM) agents lack a physics-grounded feedback mechanism to correct erroneous initial guesses. We propose AdsMind (Adsorption configuration discovery with Machine intelligence and relaxation feedback), a closed-loop multi-agent framework that enables autonomous error correction through MLFF relaxation feedback. Across four LLM backends, AdsMind achieves consistently high search reliability, with success rates of 100% and 98.8% on the benchmarks AA20 and OCD-GMAE62. Relative to its single-pass (1-Shot) ablation it reduces cross-backend energy dispersion, and it uses only 4.11 and 4.67 MLFF relaxations per case, respectively -- an approximately 14-fold reduction over heuristic enumeration baselines. Density functional theory (DFT) validation using VASP/PBE on six representative AA20 systems shows that the reported open-loop Adsorb-Agent outputs exhibit qualitative adsorption-energy sign errors for molecular adsorbates, whereas AdsMind preserves the correct sign in all tested cases with closer quantitative agreement. AdsMind thus delivers reliability, self-reflection, and interpretability simultaneously, supporting more DFT-informed autonomous chemistry workflows.

2606.18290 2026-06-18 cond-mat.stat-mech cs.LG eess.SP 新提交

Stochastic Thermodynamics and SDE-based Generative Models

随机热力学与基于SDE的生成模型

Yaowen Zhang

发表机构 * GitHub

AI总结 本文在随机热力学框架下,为基于SDE的生成模型(如扩散模型和薛定谔桥)定义了轨迹层面的功、热和熵产生,并推广了Jarzynski恒等式和类第二定律不等式。

详情
AI中文摘要

基于SDE的生成模型,包括扩散模型和薛定谔桥,在信号处理任务中有着广泛的应用,如语音增强、图像恢复和时间序列生成。本文在随机热力学的背景下为这类模型提出了一个建模框架。本文的主要结果是功、热和熵产生的轨迹层面定义,以及一个推广的Jarzynski恒等式和一个类第二定律不等式。所提出的框架扩展了原始的Jarzynski设置,以适应时间依赖的浴温和非保守驱动力。这种热力学视角可能从非平衡统计力学的角度加深我们对扩散模型和薛定谔桥的理解。

英文摘要

SDE-based generative models, including diffusion models and the Schrödinger bridge, have found broad applications in signal processing tasks such as speech enhancement, image restoration, and time-series generation. This note presents a modeling framework for such models within the context of stochastic thermodynamics. The main results of this note are trajectory-level definitions of work, heat, and entropy production, along with a generalized Jarzynski identity and a second-law-like inequality. The proposed framework extends the original Jarzynski setup to accommodate time-dependent bath temperature and nonconservative driving forces. This thermodynamic perspective may deepen our understanding of diffusion models and the Schrödinger bridge from a nonequilibrium statistical mechanics viewpoint.

2606.19329 2026-06-18 astro-ph.IM cs.LG 新提交

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

钱德拉-盖亚对应体星表:利用机器学习解决钱德拉源星表中X射线源与盖亚源的多重匹配歧义

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

发表机构 * Center for Astrophysics Harvard \& Smithsonian, 60 Garden St, Cambridge MA 02138, USA Harvard John A. Paulson School of Engineering Universidad del Rosario, School of Engineering, Science The NSF AI Institute for Artificial Intelligence New York University, Courant Institute, 60 5th Avenue, New York NY, USA Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 New College of Florida, 5800 Bayshore Road, Sarasota, FL 34243, USA Astrophysics Laboratory, 3251 Hanover St, Palo Alto, CA 94304, USA

AI总结 提出结合源属性(星等、颜色、距离)的机器学习框架,解决钱德拉源星表与盖亚源星表的交叉匹配歧义,为约11.3万个X射线源找到对应体,并识别约2万个假匹配。

Comments Accepted to The Astrophysical Journal. Website: https://www.samuelperezdi.com/chandragaia/

详情
AI中文摘要

我们提出了一个框架,用于将钱德拉源星表(CSC v2.1)中的源与盖亚数据发布3中的光学源进行交叉匹配。与纯空间方法不同,我们使用源属性(如星等、颜色和距离)来识别真实对应体、检测偶然重合,并在存在多个合理候选者时解决歧义。我们使用NWAY(一种考虑位置误差和源密度的贝叶斯交叉匹配框架)定义高置信度匹配的训练集。我们在两个星表的多种特征上训练梯度提升分类器(LightGBM)。在约25.4万个独特X射线源中,我们为约11.3万个源找到了对应体,其中约7000个源存在多个合理对应体。对于约2万个基于分离的交叉匹配能找到匹配的源,我们未找到对应体,并将其中的一半归因于偶然重合。我们在钱德拉猎户座超深项目(COUP)上验证了该流程,机器学习匹配在不使用任何位置信息的情况下再现了NWAY交叉匹配的95%。我们发布了约11.3万个钱德拉-盖亚对应体的星表,以及约7000个替代匹配和约2万个歧义NWAY关联,以支持未来对钱德拉和盖亚均可探测到的源进行种群研究。我们讨论了局限性,并提供了该框架的泛化版本,适用于其他交叉匹配场景。

英文摘要

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.