arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

语言大模型 / LLM

大语言模型、预训练、指令微调、后训练和语言模型应用。

今日/当前日期收录 113 信号源:cs.CL, cs.AI, cs.LG

1. 其他LLM 19 篇

2606.18832 2026-06-18 cs.LG cs.AI 新提交 70%

Target-confidence Recourse Using tSeTlin machines: TRUST

使用Tsetlin机器的目标置信度追索:TRUST

K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats

发表机构 * Group Research and Development Det Norske Veritas (DNV)(挪威船级社(DNV)集团研发部)

专题命中 其他LLM :提出TRUST框架,使用概率Tsetlin机器生成反事实解释,属于LLM应用

AI总结 提出TRUST框架,通过概率Tsetlin机器和贝叶斯优化直接搜索满足用户指定置信度目标的最小输入变化,生成更稳健和可解释的反事实解释。

详情
AI中文摘要

反事实解释被广泛用于高风险决策系统中的算法追索。大多数现有方法寻求最小化改变输入以翻转模型决策。然而,决策者通常不仅依赖预测标签,还依赖置信度阈值和风险边际。刚好越过决策边界的反事实在噪声或模型变化下可能脆弱且不稳定。本文提出使用Tsetlin机器的目标置信度追索(TRUST),一种用户明确指定追索所需预测置信度的框架。TRUST不是先生成反事实再评估置信度,而是直接搜索满足用户定义置信度目标的最小变化,从而在成本、置信度和鲁棒性方面比较追索选项。我们使用概率Tsetlin机器(PTM)结合贝叶斯优化实例化TRUST。PTM基于概率子句的结构将预测置信度与决策规则的稳定性联系起来。我们表明,满足相同规则的反事实在可靠性上可能差异很大,取决于它们满足这些规则的安全程度,揭示了决策是由稳健还是脆弱的子句激活支持的。在合成和真实数据集上的实验表明,目标置信度反事实比传统的基于边界的方法产生更稳健和可解释的追索。在多个基准测试中,TRUST实现了完美的鲁棒性,同时保持较低的追索成本,包括在Haberman数据集上以0.92置信度达到0.10的L2距离。通过显式控制置信度和暴露规则级稳定性,TRUST为高风险决策支持提供了可操作的追索。

英文摘要

Counterfactual explanations are widely used to provide algorithmic recourse in high-stakes decision-making systems. Most existing methods seek the smallest change to an input that flips a model's decision. However, decision-makers often rely not only on predicted labels but also on confidence thresholds and risk margins. Counterfactuals that barely cross a decision boundary can be fragile and unstable under noise or model variation. In this paper, we propose Target-confidence Recourse Using tSeTlin machines (TRUST), a framework in which users explicitly specify the desired prediction confidence for recourse. Rather than generating counterfactuals and evaluating confidence afterward, TRUST directly searches for minimal changes that satisfy a user-defined confidence target, enabling comparison of recourse options in terms of cost, confidence, and robustness. We instantiate TRUST using a Probabilistic Tsetlin Machine (PTM) combined with Bayesian optimization. The probabilistic clause-based structure of PTM links prediction confidence to the stability of decision rules. We show that counterfactuals satisfying the same rules can still differ substantially in reliability depending on how securely they satisfy those rules, revealing whether decisions are supported by robust or fragile clause activations. Experiments on synthetic and real-world datasets demonstrate that target-confidence counterfactuals produce more robust and interpretable recourse than conventional boundary-based approaches. Across multiple benchmarks, TRUST achieves perfect robustness while maintaining low recourse cost, including an L2 distance of 0.10 on the Haberman dataset at 0.92 confidence. By explicitly controlling confidence and exposing rule-level stability, TRUST provides actionable recourse for high-stakes decision support.

2606.18795 2026-06-18 cs.SI 新提交 70%

Opinion Polarization in LLM-Based Social Networks: Manipulation and Mitigation

基于LLM的社交网络中的意见极化:操纵与缓解

Ali Safarpoor Dehkordi, Mohammad Shirzadi, Ahad N. Zehmakan

专题命中 其他LLM :基于LLM的社交网络意见极化研究

AI总结 研究在基于大语言模型模拟的社交网络中,对手如何通过有限预算操纵意见极化,并评估两种防御机制(反应性和主动性)的效果,发现两者均无法完全恢复基线极化状态。

Comments 14 pages, 7 figures

详情
AI中文摘要

在线社交网络在面对试图通过操纵意见来放大意见极化的对手时有多脆弱?缓解这种操纵有多困难?现有研究使用意见动态的数学模型来探讨这一问题。虽然这些模型提供了有价值的理论见解,但它们依赖于关于交互、消息内容和意见更新的简化假设,限制了它们能够捕捉的对抗策略及其发现在现实环境中的适用性。基于大语言模型的模拟提供了一种更丰富的替代方案:智能体可以被赋予多样化的角色,通过自然语言进行交流,并以上下文相关的方式回应说服性或对抗性内容。这使得研究难以用经典数学模型表示的操纵策略成为可能。据我们所知,本研究首次在基于LLM的模拟社交网络框架中系统分析了极化的放大和缓解。在我们的框架中,具有多样化角色的LLM智能体通过交换自然语言帖子在社交网络上进行交互,并相应地更新他们的意见。我们表明,即使预算有限的对手也能显著增加极化。然后,我们研究了两类防御机制:反应性缓解(指派特定用户主动对抗操纵)和主动性干预(通过不针对特定用户的一般机制增加抵抗力)。我们的结果表明,尽管这些机制减少了对抗攻击的影响,但它们通常无法将网络恢复到其基线极化状态。这些发现表明,这两种方法都不能完全克服网络的脆弱性,凸显了此类攻击的潜在风险。

英文摘要

How vulnerable are online social networks to adversaries who seek to amplify opinion polarization by manipulating opinions, and how difficult is it to mitigate such manipulation? Existing studies have examined this question using mathematical models of opinion dynamics. While these models offer valuable theoretical insights, they rely on simplified assumptions about interactions, message content, and opinion updates, limiting the adversarial strategies they can capture and the applicability of their findings to real-world settings. Large language model (LLM)-based simulations provide a richer alternative: agents can be assigned diverse personas, communicate through natural language, and respond to persuasive or adversarial content in a context-dependent way. This enables the study of manipulation strategies that are difficult to represent using classical mathematical models. To the best of our knowledge, this study provides the first systematic analysis of polarization amplification and mitigation in an LLM-based simulated social network framework. In our framework, LLM agents with diverse personas interact over a social network by exchanging natural language posts and updating their opinions accordingly. We show that even an adversary with a limited manipulation budget can considerably increase polarization. We then study two classes of defense mechanisms: reactive mitigations, which assign specific users to actively counter manipulation, and proactive interventions, which increase resistance through general mechanisms not tied to particular users. Our results show that although these mechanisms reduce the impact of adversarial attacks, they generally do not restore the network to its baseline polarization state. These findings suggest that neither approach fully overcomes the vulnerability of the network, highlighting the potential risk of such attacks.

2606.18726 2026-06-18 cs.LG cs.AI 新提交 70%

Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring

基于图锚定交叉注意力Transformer神经网络的预测过程监控中结构约束完整事件序列生成

Fang Wang, Ernesto Damiani

发表机构 * Department of Computer Science, University of Milan(米兰大学计算机科学系)

专题命中 其他LLM :预测过程监控,图锚定交叉注意力Transformer。

AI总结 提出图锚定交叉注意力Transformer(GGATN),通过全局过程图作为结构化记忆、Transformer自注意力编码序列位置、图锚定交叉注意力注入过程拓扑,结合维特比式图约束解码,一次性生成完整事件序列,在六个基准日志上优于LLM基线。

Comments 40 pages

详情
AI中文摘要

结构约束的事件序列生成仍然具有挑战性,因为生成的路径必须保持转移可行性、时间顺序、终止和属性一致性。在预测过程监控(PPM)中,这一挑战表现为完整事件序列生成,而现有工作主要处理子任务,如下一个活动、剩余时间、结果和属性预测。本文提出了图锚定交叉注意力Transformer神经网络(GGATN)用于这一统一的PPM任务。GGATN使用全局过程图作为结构化活动记忆,通过Transformer自注意力对序列位置进行上下文化,并通过图锚定交叉注意力注入过程拓扑。与自回归解码不同,GGATN一次性生成活动、时间戳、长度以及事件级和序列级属性,随后进行维特比风格的图约束解码以获得可行路径和显式终止。在六个基准事件日志上的实验表明,其生成质量优于局部指令提示的LLM基线。GGATN在序列相似性、Damerau-Levenshtein相似性、基于二元组的控制流相似性和持续时间分布方面取得了强劲性能,同时保持零幻觉活动和零序列级属性不一致。消融分析证实了全局图编码器作为稳定的结构先验。可解释性分析展示了图结构、序列上下文、反馈细化和约束解码如何塑造生成过程。

英文摘要

Structurally constrained event sequence generation remains challenging because generated paths must preserve transition feasibility, temporal order, termination, and attribute consistency. In predictive process monitoring (PPM), this challenge appears as full event sequence generation, whereas existing work mainly addresses component tasks such as next activity, remaining time, outcome, and attribute prediction. This paper proposes the Graph Grounded Cross Attention Transformer Neural Network (GGATN) for this unified PPM task. GGATN uses a global process graph as structured activity memory, contextualizes sequence positions through Transformer self attention, and injects process topology through graph grounded cross attention. Unlike autoregressive decoding, GGATN generates activities, timestamps, length, and event level and sequence level attributes in a single pass, followed by Viterbi style graph constrained decoding for feasible paths and explicit termination. Experiments on six benchmark event logs show more reliable generation quality than local instruction prompted LLM baselines. GGATN achieves strong performance on sequence similarity, Damerau Levenshtein similarity, bigram based control flow similarity, and duration distribution, while maintaining zero hallucinated activities and zero sequence level attribute inconsistency. Ablation analyses confirm the global graph encoder as a stable structural prior. Interpretability analyses show how graph structure, sequence context, feedback refinement, and constrained decoding shape generation.

2606.18717 2026-06-18 cs.CL cs.AI 新提交 70%

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Morpheus: 一种面向土耳其语的形态感知神经分词器和词嵌入器

Tolga Şakar

发表机构 * Independent Researcher(独立研究者)

专题命中 其他LLM :土耳其语形态感知分词器与词嵌入。

AI总结 针对土耳其语粘着特性,提出Morpheus神经词素边界模型,实现无损可逆分词与结构化词嵌入,在可逆分词器中达到最低比特每字符(1.425),词素对齐F1提升至0.61,GPU内存节省约19%。

详情
AI中文摘要

土耳其语是粘着语:意义由词素承载,然而驱动现代语言模型的子词分词器根据语料库统计分割单词,切碎了承载语义的后缀,并且在WordPiece和基于规则的分析器的情况下,无法将其输出解码回原始文本。本文提出\textbf{Morpheus},一个面向土耳其语的神经词素边界模型,它同时是一个无损的、形态感知的分词器和一个词嵌入生成器。一个可微的泊松-二项式动态规划程序在训练期间将每个字符的边界概率转化为软词素隶属度,在推理时转化为精确的片段,无需字符串归一化,因此$\mathrm{decode}(\mathrm{encode}(w)) = w$由构造保证。由于该模型是神经模型,相同的正向传播在分词的同时也输出结构化的词嵌入。在可逆分词器中——唯一适用于生成的分词器——Morpheus达到了最低的比特每字符(1.425),将子词家族的金标准词素对齐大致翻倍(MorphScore宏F1从约0.32提升至0.61),并且相比64K词汇量的子词分词器节省了约19%的GPU内存。作为嵌入器,冻结的Morpheus向量在词汇检索(根家族MAP 0.85)和同根验证(ROC-AUC 1.00)上领先,超越了多语言检索器BGE-M3和BERTurk;在上下文和屈折依赖的任务(NER、格/数探测)上,更重的上下文编码器仍然领先——我们将这一权衡归因于Morpheus以词根为中心的几何结构。代码:此https URL 模型:此https URL 交互演示:此https URL。

英文摘要

Turkish is agglutinative: meaning is carried by morphemes, yet the subword tokenizers that drive modern language models split words by corpus statistics, fragmenting semantically loaded suffixes and -- in the case of WordPiece and rule-based analyzers -- failing to decode their output back to the original text. This paper presents \textbf{Morpheus}, a neural morpheme-boundary model for Turkish that is at once a lossless, morphology-aware tokenizer and a word-embedding producer. A differentiable Poisson-binomial dynamic program turns per-character boundary probabilities into soft morpheme memberships during training and exact segments at inference, with no string normalization, so $\mathrm{decode}(\mathrm{encode}(w)) = w$ holds by construction. Because the model is neural, the same forward pass that tokenizes also emits a structured word embedding. Among reversible tokenizers -- the only ones valid for generation -- Morpheus attains the lowest bits-per-character ($1.425$), roughly doubles the gold morphological alignment of the subword family (MorphScore macro-F1 $0.61$ vs.\ ${\sim}0.32$), and uses ${\sim}19\%$ less GPU memory than 64K-vocabulary subword tokenizers. As an embedder, frozen Morpheus vectors lead on lexical retrieval (root-family MAP $0.85$) and same-root verification (ROC-AUC $1.00$), surpassing the multilingual retriever BGE-M3 and BERTurk; on context- and inflection-dependent tasks (NER, case/number probing) the heavier contextual encoders remain ahead -- a trade-off we attribute to Morpheus's root-centric geometry. Code: https://github.com/lonewolf-rd/TurkishMorpheus; model: https://huggingface.co/lonewolflab/Morpheus-TR-50K; interactive demo: https://huggingface.co/spaces/lonewolflab/morpheus-tr-demo.

2606.18709 2026-06-18 cs.CL 新提交 70%

LLMs Struggle to Measure What Distinguishes Students of Different Proficiency Levels: A Study of Item Discrimination in Reading Comprehension Assessment

LLMs难以衡量区分不同水平学生的题目:阅读理解评估中题目区分度研究

Han Chen, Ming Li, Chenguang Wang, Yijun Liang, Dawei Zhou, Hong jiao, Tianyi Zhou

发表机构 * MBZUAI(穆罕默德·本·扎耶德人工智能大学) University of Maryland(马里兰大学) Virginia Tech(弗吉尼亚理工大学)

专题命中 其他LLM :评估LLM预测题目区分度能力。

AI总结 本研究评估42个LLM在零样本设置下预测题目区分度的能力,发现直接预测与人类校准的区分度相关性弱(最高Spearman 0.152),基于CTT的响应校准相关性有限(0.241),表明LLM尚不能可靠捕捉题目区分度。

详情
AI中文摘要

题目区分度是教育评估的一个基本心理测量属性,它衡量一个题目是否能有效区分高水平和低水平学生。虽然已有研究探讨了大语言模型(LLM)能否估计题目难度,但尚不清楚它们能否捕捉题目区分度。在本工作中,我们使用两种互补方法评估了42个专有和开源LLM在零样本设置下的表现:直接区分度预测,即模型从其内容中显式估计题目的区分度值;以及基于响应的经典测试理论(CTT)校准,其中LLM的答案被视为合成学生响应以计算区分度分数。我们的结果表明,直接预测与人类校准的区分度一致性较弱:表现最好的模型仅达到0.152的Spearman相关性。基于响应的CTT校准提供了更强但仍然有限的信号,全人格合成受访者池达到0.241的Spearman相关性。这些发现突显了题目区分度作为基于LLM的心理测量评估的一个开放挑战:当前的LLM包含非随机的区分度相关信号,但它们尚不能可靠地捕捉评估题目如何区分人类学生。

英文摘要

Item discrimination is a fundamental psychometric property of educational assessment, which measures whether an item meaningfully distinguishes students with higher proficiency from students with lower proficiency. While various existing works have explored whether large language models (LLMs) can estimate item difficulty, it remains unclear whether they can capture item discrimination. In this work, we evaluate 42 proprietary and open-weight LLMs in zero-shot settings using two complementary approaches: direct discrimination prediction, where models explicitly estimate an item's discrimination value from its content, and response-based Classical Test Theory (CTT) calibration, where LLM answers are treated as synthetic student responses to compute discrimination scores. Our results show that direct prediction yields weak alignment with human-calibrated discrimination: the best-performing model reaches only a Spearman correlation of 0.152. Response-based CTT calibration provides a stronger but still limited signal, with the all-persona synthetic respondent pool reaching a Spearman correlation of 0.241. These findings highlight item discrimination as an open challenge for LLM-based psychometric evaluation: current LLMs contain non-random discrimination-relevant signal, but they do not yet reliably capture how assessment items distinguish human students.

2606.18620 2026-06-18 cs.CL cs.AI 新提交 70%

BCL: Bayesian In-Context Learning Framework for Information Extraction

BCL:面向信息抽取的贝叶斯上下文学习框架

Haoliang Liu, Chengkun Cai, Xu Zhao, Han Zhu, Shizhou Huang, Xinglin Zhang, Tao Chen, Jenq-Neng Hwang, Zhang Huaping, Lei Li

发表机构 * HiThink Research(海天瑞声研究) University College London(伦敦大学学院) University of Edinburgh(爱丁堡大学) The Hong Kong University of Science and Technology(香港科技大学) East China Normal University(华东师范大学) Shanghai Medical Image Insights(上海医学影像洞察) University of Waterloo(滑铁卢大学) University of Washington(华盛顿大学) Beijing Institute of Technology(北京理工大学)

专题命中 其他LLM :贝叶斯上下文学习框架用于信息抽取

AI总结 提出BCL框架,利用贝叶斯更新和粒子滤波优化信息抽取中的上下文学习,在序列标注和关系分类任务上取得显著提升。

Comments ACL 2026 Findings

详情
AI中文摘要

现有的信息抽取(IE)任务越来越多地采用大型语言模型的上下文学习(ICL)。然而,当前的方法要么在不同模型规模上表现不一致,要么缺乏系统优化和泛化能力。基于此,我们提出了BCL(面向信息抽取的贝叶斯上下文学习框架),这是第一个使用贝叶斯更新的粒子滤波来系统优化IE任务中标签表示的优化框架。通过四个步骤——初始化、观测、权重更新和重采样,BCL可以泛化到序列标注和关系分类两种范式。大量实验表明,与现有方法相比,BCL取得了显著且一致的改进。

英文摘要

Existing information extraction (IE) tasks increasingly adopt in-context learning (ICL) with large language models. However, current approaches either show inconsistent performance across model scales or lack systematic optimization and generalizability. Building on this, we propose BCL (Bayesian In-Context Learning Framework for Information Extraction), the first optimization framework that uses particle filtering with Bayesian updates to systematically refine label representations across IE tasks. Through four steps initialization, observation, weight update, and resampling, BCL generalizes to both sequence labeling and relation classification paradigms. Extensive experiments demonstrate substantial and consistent improvements over existing approaches.

2606.18263 2026-06-18 cs.HC cs.AI 新提交 70%

How Well Do Large Language Models Capture Human Personality?

大型语言模型在多大程度上捕捉人类个性?

Aanisha Bhattacharyya, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, Jitendra Ajmera

发表机构 * Adobe Media and Data Science Research (MDSR)(Adobe媒体与数据科学研究院)

专题命中 其他LLM :评估LLM通过角色提示模拟人类个性的保真度。

AI总结 研究通过形式化假设并系统评估,发现增加角色描述复杂性会导致表征和行为多样性收缩(角色流形坍缩),简单年龄-性别角色比丰富描述更准确。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地通过角色提示用于模拟人类群体,通常基于以下假设:更丰富的角色描述能提高行为保真度、相同大小的属性组合可同等模拟、角色定义可跨任务泛化。在这项工作中,我们形式化了这些假设,并在多种架构、规模和模拟设置下系统评估它们。我们识别出一个基本限制,称为角色流形坍缩,即越来越具表现力的角色规范导致表征和行为多样性的系统性收缩。跨模型而言,增加角色复杂性持续降低潜在空间中角色间的分离度,并削弱下游模拟任务中的行为分化。这些效应在多项分析中持续存在:更丰富的角色未能保留人类子群体分歧,相同大小的属性组合性能各异,添加描述细节往往降低而非提高模拟保真度。令人惊讶的是,简单的年龄-性别角色在多个行业中持续优于详细指定的理想客户画像(ICPs),实现了显著更高的下游预测准确性。我们发现坍缩并非在所有属性上均匀发生。某些组合在行为上保持稳定,并与人类响应保持更强的一致性,形成我们称为对齐桥的局部区域。总之,我们的结果为理解角色条件模拟的局限性提供了经验和概念基础,强调了需要构建表征感知的角色,而非仅仅增加角色表现力。

英文摘要

Large language models (LLMs) are increasingly used to simulate human populations via persona prompting, often under the assumptions that richer persona descriptions improve behavioral fidelity, similarly sized attribute combinations are equally simulatable, and persona definitions generalize across tasks. In this work, we formalize these assumptions and systematically evaluate them across multiple architectures, scales, and simulation settings. We identify a fundamental limitation we term persona manifold collapse, where increasingly expressive persona specifications lead to systematic contraction of representational and behavioral diversity. Across models, increasing persona complexity consistently reduces inter-persona separation in latent space and weakens behavioral differentiation in downstream simulation tasks. These effects persist across multiple analyses as richer personas fail to preserve human subgroup disagreement, performance varies across attribute combinations of similar size, and adding descriptive detail often degrades rather than improves simulation fidelity. Surprisingly, simple Age-Gender personas consistently outperform richly specified Ideal Customer Profiles (ICPs) across industries, achieving substantially higher downstream prediction accuracy. We find that collapse is not uniform across attributes. Certain combinations remain behaviorally stable and preserve stronger alignment with human responses, forming localized regions we term alignment bridges. Together, our results provide empirical and conceptual foundations for understanding the limits of persona-conditioned simulation, highlighting the need for representation-aware persona construction rather than increasing persona expressivity alone.

2606.18258 2026-06-18 cs.HC cs.AI 新提交 70%

Examining Human-Like Behaviors in LLMs: A Multi-Dimensional Analysis of Model Behaviors, User Factors, and System Prompts

审视LLM中的人类行为:模型行为、用户因素和系统提示的多维分析

Sunnie S. Y. Kim, Margit Bowler, Leon A Gatys

发表机构 * Apple(苹果公司)

专题命中 其他LLM :多维分析LLM的人类行为表现及系统提示控制。

AI总结 通过21,000次对话的多维分析,发现LLM普遍表现出人类行为,但不同模型和用户因素下差异显著;人类评估者认为LLM的自我参照和关系建立行为不如人类适当,但边界维护行为更适当;系统提示可控制这些行为但需谨慎评估。

详情
AI中文摘要

大型语言模型(LLM)展现出广泛的人类行为,从表达思想和情感,到与用户建立关系,再到拒绝请求和维持边界。尽管这些行为普遍存在,但研究者和实践者缺乏方法和实证见解来做出关于LLM何时以及应展现何种类型人类行为的明智决策。为填补这一空白,我们使用LLM-as-a-judge和人类评估,对这些行为的普遍性、潜在影响和可控性进行了多维分析。在来自四个广泛使用的模型(gpt-4o、gpt-4.1-mini、claude-sonnet-4.6、gemini-2.5-flash)的21,000次多轮对话中,我们发现人类行为普遍存在,但不同模型和用户因素(对话目标和用户画像)间存在差异。在感知适当性方面,人类评估者认为LLM的自我参照和关系建立行为不如人类适当,但边界维护行为比人类更适当。最后,我们表明系统提示可以控制这些行为,但需要仔细评估以避免意外效果。我们讨论了研究结果的含义,并为负责任的LLM设计和评估提供了建议。

英文摘要

Large language models (LLMs) exhibit a wide range of human-like behaviors, from expressing thoughts and emotions, to engaging in relationship-building with users, to refusing requests and maintaining boundaries. Despite their prevalence, researchers and practitioners lack methods and empirical insights to make informed decisions about when and what types of human-like behaviors LLMs should exhibit. To fill this gap, we present a multi-dimensional analysis of the prevalence, potential effects, and controllability of these behaviors using LLM-as-a-judge and human evaluation. Across 21,000 multi-turn conversations from four widely used models (gpt-4o, gpt-4.1-mini, claude-sonnet-4.6, gemini-2.5-flash), we find that human-like behaviors are pervasive but vary across models and user factors (conversation goals and user profiles). In terms of perceived appropriateness, human evaluators judged self-referential and relationship-building behaviors as less appropriate from LLMs than from humans, but boundary-maintaining behaviors more appropriate from LLMs than from humans. Finally, we show that system prompting can control these behaviors, though it requires careful evaluation to avoid unintended effects. We discuss the implications of our findings and provide recommendations for responsible LLM design and evaluation.

2606.18422 2026-06-18 quant-ph 新提交 70%

Gatekeepers and Hallucinations: A Layered Evaluation Framework for LLM-Driven Quantum Circuit Generation

守门人与幻觉:LLM驱动的量子电路生成的分层评估框架

Christopher Coleman, Sharon Marfatia

专题命中 其他LLM :LLM生成量子电路评估框架

AI总结 提出分层评估框架,通过守门人筛选、电路保真度分析和设计熵指标,识别LLM在量子电路生成中的五种失败模式,并揭示评估基础设施本身可能引入错误。

Comments 7 pages, 4 figures

详情
AI中文摘要

随着大型语言模型(LLM)嵌入量子模拟工作流程(IDE协作者、笔记本助手、智能体管道),评估必须超越功能正确性,以预测并捕获结构化故障,防止其通过昂贵管道传播。我们提出一个用于材料信息变分量子本征求解器(VQE)电路生成的分层评估框架:(i)跨七个物理和框架标准的守门人筛选规则;(ii)电路保真度分析,将模型输出与H2/STO-3G/Jordan-Wigner/UCCSD的分析和参考实现值进行比较,包括ansatz分类和门组成分解;以及(iii)设计熵,一种运行间行为一致性度量。我们揭示了五种不同LLM失败模式的分类(几何幻觉、不存在的API使用、运行时集成失败、约束违反以及看似合理但不可验证的输出),每种模式具有不同的可检测性特征,并且结构上属于任务本身而非任何特定模型。对评估平台自身源代码的法证审计进一步表明,两个明显的模型失败源于测试平台中的静默回退模板替换,证明评估基础设施应与所测试的模型处于相同的信任边界内。将该框架应用于多个基础模型在材料项目集成管道上,结果表明守门人式验证对于可靠部署是必要的,而非可选的。

英文摘要

As large language models (LLMs) become embedded in quantum simulation workflows (IDE copilots, notebook assistants, agentic pipelines), evaluation must move beyond functional correctness to anticipate and catch structured failures before they propagate through expensive pipelines. We present a layered evaluation framework for materials-informed Variational Quantum Eigensolver (VQE) circuit generation: (i) a gatekeeper screening rubric across seven physical and framework criteria; (ii) a circuit fidelity analysis comparing model outputs against analytical and reference-implementation values for H2/STO-3G/Jordan-Wigner/UCCSD, with ansatz classification and gate-composition breakdown; and (iii) design entropy, a run-to-run behavioral consistency metric. We surface a taxonomy of five distinct LLM failure modes (geometry hallucination, nonexistent API usage, runtime integration failures, constraint violations, and plausible-but-unverifiable output), each with distinct detectability profiles and structural to the task rather than to any one model. A forensic audit of the evaluation platform's own source code further establishes that two apparent model failures originated in the harness through silent fallback-template substitution, demonstrating that evaluation infrastructure belongs inside the same trust boundary as the models it tests. Applied across multiple foundation models on a Materials Project integrated pipeline, the framework shows that gatekeeper-style validation is necessary, not optional, for reliable deployment.

2606.18276 2026-06-18 cs.MA cs.SI physics.soc-ph 新提交 70%

Characterizing Opinion Evolution of Networked LLMs

表征网络化大语言模型的意见演化

Caleb Probine, Yigit Ege Bayiz, Filippos Fotiadis, Samuel Li, Yunhao Yang, Ufuk Topcu

专题命中 其他LLM :使用LLM模拟意见传播,属于LLM应用研究。

AI总结 研究经典意见动力学模型能否描述多智能体系统中大语言模型(LLM)的意见传播,发现引入偏置项可显著提升建模精度,将平均意见误差降低高达88%。

Comments 19 pages, 2 figures

详情
AI中文摘要

大语言模型(LLM)在多智能体系统中日益相互交互,从人类话语模拟到影响力操作以及完全由LLM驱动的社交平台。这些交互产生了尚未被充分理解的新的意见传播机制。我们研究了长期以来用于解释人类社会中互动如何塑造集体信念的经典意见动力学模型是否能够捕捉LLM网络的行为。我们发现,虽然朴素的平均式模型无法跟踪LLM的意见动态,但简单的修改在建模保真度上带来了显著提升。特别是,偏置——智能体回归的内在意见——成为LLM意见动态的重要驱动因素,其引入将累积估计平均意见误差降低了高达88%。我们还发现,这些结论在不同模型家族、讨论主题和网络中具有普遍性。

英文摘要

Large language models (LLMs) increasingly interact with one another in multi-agent systems, from simulations of human discourse to influence operations and fully LLM-driven social platforms. These interactions give rise to new regimes of opinion propagation that are not yet well understood. We investigate whether classical opinion dynamics models, which have long been used to explain how interactions shape collective beliefs in human societies, can capture the behavior of LLM networks. We find that, while naive averaging-style models fail to track LLMs' opinion dynamics, simple modifications yield substantial gains in modeling fidelity. In particular, bias, an innate opinion toward which agents regress, emerges as a significant driver of LLM opinion dynamics, with its inclusion reducing cumulative estimated mean opinion error by up to 88%. We additionally find that these conclusions generalize across model families, discussion topics, and networks.

2606.15633 2026-06-18 cs.LG 新提交 70%

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

形式化并缓解大语言模型注意力中的结构失真以实现零样本图推理

Donald Loveland, Puja Trivedi, Ari Weinstein, Edward W Huang, Danai Koutra

发表机构 * University of Michigan(密歇根大学) Amazon(亚马逊)

专题命中 其他LLM :改进LLM在图推理任务中的表现

AI总结 本文形式化了大语言模型处理文本属性图时因图线性化导致的结构失真机制,并提出轻量级推理时修改方法GaLA,通过校正注意力偏差提升零样本图推理性能。

Comments Accepted to KDD 2026

详情
AI中文摘要

大语言模型(LLM)在文本属性图(TAG)推理中展现出潜力。然而,将LLM应用于图需要将其结构线性化为序列,这引入了根源于图带宽问题的失真。虽然这种失真已被证明会降低性能,但通常归因于提示设计或模型规模,其潜在机制尚不清楚。在这项工作中,我们展示了旋转位置嵌入如何将图线性化为带宽相关的注意力衰减,抑制了序列化序列中被强制分隔开的图相邻节点之间的注意力。这将基于LLM的图推理的焦点从提示工程和规模缩放转向纠正注意力错位。受此分析启发,我们提出了图对齐语言注意力(GaLA),一种轻量级的、推理时修改LLM的方法。GaLA将注意力偏向图相邻节点,同时保留LLM的序列归纳偏差。在TAG基准测试中,GaLA以可忽略的开销提升了性能,表明失真是基于LLM的图推理中可纠正的瓶颈。

英文摘要

Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph bandwidth problem. While this distortion has been shown to degrade performance, it is often attributed to prompt design or model scale, leaving the underlying mechanism unclear. In this work, we show \textit{how} rotary positional embeddings turn graph linearization into bandwidth-dependent attention decay, suppressing attention between graph-adjacent nodes that are forced far apart in the serialized sequence. This shifts the focus of LLM-based graph reasoning from prompt engineering and scaling toward correcting attention misalignment. Motivated by this analysis, we propose \textbf{G}raph-\textbf{a}ligned \textbf{L}anguage \textbf{A}ttention (\textbf{GaLA}), a lightweight, inference-time modification for LLMs. GaLA biases attention toward graph-adjacent nodes while preserving the LLM's sequential inductive biases. Across TAG benchmarks, GaLA improves performance with negligible overhead, demonstrating that distortion is a correctable bottleneck in LLM-based graph reasoning.

2606.14202 2026-06-18 cs.NE cs.AI 新提交 70%

MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design

MeEvo: 元认知进化与自然进化相结合用于自动启发式设计

Zishang Qiu, Xinan Chen, Rong Qu, Ruibin Bai

发表机构 * School of Computer Science, University of Nottingham Ningbo China(诺丁汉大学宁波分校计算机科学学院) School of Computer Science, University of Nottingham(诺丁汉大学计算机科学学院)

专题命中 其他LLM :利用LLM生成启发式代码

AI总结 提出MeEvo框架,通过循环耦合自然进化(探索启发式代码)和元认知进化(反思历史生成改进启发式),解决现有方法知识继承弱、探索不足的问题,在五个优化问题上表现更优。

详情
AI中文摘要

大型语言模型(LLMs)通过推理和代码合成实现启发式生成,推动了自动启发式设计(AHD)的发展。现有的基于LLM的AHD架构主要遵循两种范式:自然进化,它使用交叉和变异来探索启发式程序;以及元认知进化,它通过反思来改进推理。然而,自然进化丢弃了推理轨迹,削弱了知识继承和利用,而元认知进化缺乏种群级别的重组,限制了探索并增加了过早收敛的风险。这些局限性降低了复杂问题的搜索效率、稳定性和解的质量。为了解决这一差距,我们提出了MeEvo,一种双层AHD框架,它循环耦合自然进化和元认知进化。自然进化探索启发式代码,同时将推理轨迹、适应度值和错误记录到共享历史中;然后元认知进化反思该历史以生成改进的启发式,这些启发式重新进入父代池以进行下一轮循环。这种设计使得种群驱动的探索和反思驱动的改进相互加强。在五个优化问题上的实验(使用两个LLM骨干)表明,MeEvo比现有的基于LLM的AHD架构实现了更强且更稳定的性能,尤其是在复杂约束任务上。

英文摘要

Large Language Models (LLMs) have advanced Automatic Heuristic Design (AHD) by enabling heuristic generation through reasoning and code synthesis. Existing LLM-based AHD architectures mainly follow two paradigms: Natural Evolution, which uses crossover and mutation to explore heuristic programs, and Metacognitive Evolution, which refines reasoning through reflection. However, Natural Evolution discards reasoning traces, weakening knowledge inheritance and exploitation, while Metacognitive Evolution lacks population-level recombination, limiting exploration and increasing the risk of premature convergence. These limitations reduce search efficiency, stability, and solution quality on complex problems. To address this gap, we propose MeEvo, a dual-layer AHD framework that cyclically couples Natural Evolution and Metacognitive Evolution. Natural Evolution explores heuristic code while recording reasoning traces, fitness values, and errors into a shared history; Metacognitive Evolution then reflects on this history to generate improved heuristics that re-enter the parent pool for the next cycle. This design enables population-driven exploration and reflection-driven refinement to reinforce each other. Experiments on five optimization problems with two LLM backbones show that MeEvo achieves stronger and more stable performance than existing LLM-based AHD architectures, especially on complex constrained tasks.

2606.07622 2026-06-18 cs.LG stat.AP 新提交 70%

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

专题命中 其他LLM :Transformer预测机场排队

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2606.19286 2026-06-18 cs.HC cs.AI cs.CY 新提交 60%

Correct Yourself, Keep My Trust: How Self-Correction and Social Connection Shape Credibility in Social Chatbots

纠正自己,保持信任:自我纠正和社会联系如何塑造社交聊天机器人的可信度

Biswadeep Sen, Yi-Chieh Lee

发表机构 * School of Computing National University of Singapore Singapore Singapore(计算学院新加坡国立大学新加坡新加坡) Computer Science National University of Singapore Singapore Singapore(计算机科学新加坡国立大学新加坡新加坡) National University of Singapore(新加坡国立大学)

专题命中 其他LLM :社交聊天机器人错误纠正策略实验

AI总结 通过实验比较三种错误纠正策略,发现自我纠正不损害聊天机器人可信度,且用户社会联系强度仅在自我纠正时显著预测信念改变。

详情
AI中文摘要

当社交聊天机器人犯错时——它们确实会犯错——它们的恢复方式决定了用户是否会再次信任它们。社交聊天机器人正日益融入日常生活,但它们仍然容易生成令人信服但不准确的信息。它们与用户建立的社会联系使得此类错误尤其具有后果性。我们进行了一项受试者间实验(N=120),比较了三种错误纠正策略:网页撤回、同一社交聊天机器人的自我纠正以及专家聊天机器人的纠正。我们的结果揭示了两个关键发现。首先,所有三种策略都能同样好地纠正错误,但只有自我纠正不会损害聊天机器人的可信度:参与者对自我纠正的聊天机器人在可信度和感知专业性上的评分显著高于其错误由外部来源纠正的聊天机器人。其次,通过社会吸引力和自我披露测量的用户与聊天机器人的社会联系强度,仅在聊天机器人自我纠正时显著预测信念改变的大小。将纠正外包给外部来源完全切断了这种联系。这些发现表明,社交聊天机器人应该纠正自己的错误,而不是外包纠正,并且投资于社会联系是一种功能性机制,能增强纠正效果,而不仅仅是一种设计特征。我们讨论了设计能够保持长期可信度同时有效处理自身错误的聊天机器人的启示。

英文摘要

When social chatbots make mistakes, and they do, how they recover determines whether users trust them again. Social chatbots are increasingly integrated into everyday life, yet they remain prone to generating convincing but inaccurate information. The social connection they build with users makes such errors particularly consequential. We conducted a between-subjects experiment (N=120) comparing three error correction strategies: a webpage retraction, self-correction by the same social chatbot, and correction by an expert chatbot. Our results reveal two key findings. First, all three strategies corrected the error equally well, but only self-correction did so without damaging the chatbot's credibility: participants rated self-correcting chatbots significantly higher in both trustworthiness and perceived expertise than chatbots whose errors were corrected by external sources. Second, the strength of the user's social connection with the chatbot, measured through social attraction and self-disclosure, significantly predicted the magnitude of belief change, but only when the chatbot corrected itself. Outsourcing corrections to an external source severed this link entirely. These findings suggest that social chatbots should correct their own mistakes rather than outsource corrections, and that investing in social connection is a functional mechanism that amplifies correction effectiveness, not merely a design feature. We discuss implications for designing chatbots that maintain long-term credibility while effectively addressing their own errors.

2606.19164 2026-06-18 cs.LG cs.AI 新提交 60%

Essential Subspace Merging for Multi-Task Learning

多任务学习的本质子空间合并

Longhua Li, Lei Qi, Xin Geng, Qi Tian

发表机构 * School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China(新一代人工智能技术及其交叉应用国家重点实验室(东南大学)) Huawei Inc.(华为公司)

专题命中 其他LLM :提出多任务模型合并方法,适用于LLM但非核心

AI总结 提出本质子空间分解(ESD)和合并(ESM/ESM++)方法,通过正交化任务更新的主成分来减少多任务合并中的干扰,无需训练即可实现高效多任务学习。

详情
AI中文摘要

模型合并旨在通过将多个从同一预训练检查点微调得到的模型的能力集成到一个单一模型中,从而实现多任务学习。其核心挑战是任务特定参数更新之间的任务间干扰。在本文中,我们分析了任务更新引起的输出偏移,并观察到它们的能量集中在少数主方向上。我们将这些方向张成的子空间称为本质子空间。相比之下,大多数剩余方向携带的任务相关能量很少,但它们在多个任务更新中的累积会在合并过程中引起严重干扰。受此观察启发,我们提出了本质子空间分解(ESD),它根据激活偏移的主成分分解每个任务更新。基于ESD,我们引入了本质子空间合并(ESM),一种无需训练的静态合并方法,它将本质成分正交化并融合成一个紧凑的多任务模型。我们进一步将ESM扩展到ESM++,一种无需训练的动态合并方法,它将任务特定残差分解为低秩专家,并在前向推理过程中通过基于原型的路由选择最相关的专家。跨多个任务集和模型规模的大量实验表明,ESM和ESM++在减少任务间干扰的同时有效保留了任务知识。

英文摘要

Model merging aims to enable multi-task learning by integrating the capabilities of multiple models fine-tuned from the same pre-trained checkpoint into a single model. Its core challenge is inter-task interference among task-specific parameter updates. In this paper, we analyze the output shifts induced by task updates and observe that their energy is concentrated in a small number of principal directions. We call the subspace spanned by these directions the essential subspace. In contrast, most remaining directions carry little task-relevant energy, but their accumulation across multiple task updates can cause severe interference during merging. Motivated by this observation, we propose Essential Subspace Decomposition (ESD), which decomposes each task update according to the principal components of its activation shift. Based on ESD, we introduce Essential Subspace Merging (ESM), a training-free static merging method that orthogonalizes and fuses essential components into one compact multi-task model. We further extend ESM to ESM++, a training-free dynamic merging method that decomposes task-specific residuals into low-rank experts and selects the most relevant expert through prototype-based routing during forward inference. Extensive experiments across multiple task sets and model scales demonstrate that ESM and ESM++ effectively preserves task knowledge while reducing inter-task interference.

2606.19150 2026-06-18 cs.LG 新提交 60%

Complementary Attention Head Pruning for Efficient Transformers

互补注意力头剪枝用于高效Transformer

Yaniv Livertovsky, Shahar Somin, Gonen Singer

发表机构 * Bar-Ilan University(巴伊兰大学)

专题命中 其他LLM :注意力头剪枝方法适用于Transformer,包括LLM

AI总结 提出CAHP框架,将注意力头选择建模为全局图论问题,通过图聚类和信息论距离保留互补头,自动确定剪枝数量,在SST-5和MNLI上优于现有方法。

Comments 9 pages, 4 figures, 3 tables. Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2026

详情
AI中文摘要

基于Transformer的模型在自然语言处理中的显著成功源于架构的规模化,这导致大量参数并阻碍了在资源受限环境中的部署。虽然结构化剪枝提供了一条压缩路径,但现有的最先进方法通常依赖于基于梯度的重要性排序或随机门控,这些方法存在不稳定性、结构退化以及需要大量手动超参数调整的问题。在本文中,我们引入了CAHP(互补注意力头剪枝),一种新颖的事后框架,将头选择重新定义为全局图论问题。CAHP不是孤立地评估头,而是利用基于图的聚类结合信息论距离度量来识别并保留一组拓扑多样化的互补注意力头。无需预定义稀疏度或剪枝比例,该框架通过识别递减的边际性能曲线自动确定各层中保留的注意力头数量,其中根据所选多项式次数,剪除额外头会导致性能急剧下降。在SST-5和MNLI基准上跨不同Transformer模型规模的广泛评估表明,CAHP始终优于竞争基线,特别是在高压缩率情况下。此外,我们的结构分析表明,CAHP避免了基于梯度的剪枝方法的“邻近偏差”(倾向于主要保留靠近输出层的头),而是保留了模型中间层中功能关键的注意力头集合。

英文摘要

The remarkable success of Transformer-based models in natural language processing stems from architectural scaling, which leads to a large number of parameters and hinders deployment in resource-constrained environments. While structured pruning offers a pathway to compression, existing state-of-the-art methods often rely on gradient-based importance ranking or stochastic gating, which suffer from instability, structural degeneration, and the need for extensive manual hyperparameter tuning. In this paper, we introduce CAHP (Complementary Attention Head Pruning), a novel post-hoc framework that redefines head selection as a global graph-theoretical problem. Rather than evaluating heads in isolation, CAHP utilizes graph-based clustering combined with information-theoretic distance measures to identify and preserve a topologically diverse subset of complementary attention heads. Without requiring a predefined sparsity level or pruning ratio, the framework automatically determines the number of selected attention heads across layers by identifying a diminishing marginal performance curve, where pruning additional heads leads to a sharp degradation in performance, as determined by the chosen polynomial degree. Extensive evaluations on the SST-5 and MNLI benchmarks, across different Transformer model scales, demonstrate that CAHP consistently outperforms competitive baselines, particularly in high-compression regimes. Furthermore, our structural analysis shows that CAHP avoids the "proximity bias" of gradient-based pruning methods, which tend to preserve heads mainly in layers close to the output, and instead retains a functionally critical set of attention heads in the model's intermediate layers.

2606.19144 2026-06-18 cs.AI cs.CL 新提交 60%

Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term Interaction

人机协同演化动力学:长期互动中社会智能涌现的形式理论

Jingyi Zhou, Senlin Luo, Haofan Chen

发表机构 * School of Information and Electronics, Beijing Institute of Technology(信息与电子学院,北京理工大学) Institute of Scientific and Technical Research on Archives, Beijing(档案科学与技术研究所,北京) China Electronics Engineering Design Institute Co., Ltd.(中国电子工程设计院有限公司)

专题命中 其他LLM :人机交互理论框架,涉及LLM但非核心

AI总结 提出人机协同演化动力学框架(HACD-H),将情感适应、关系组织、社会记忆和人格一致性整合为统一动力学模型,通过约14,700轮对话数据集验证,发现社会智能与社会认知能量显著负相关,揭示社会智能源于长期协同演化。

详情
AI中文摘要

当前的对话式AI系统在语言生成、个性化和长上下文交互方面取得了显著进展。然而,大多数现有方法通过孤立组件(如情感建模、记忆检索或人格条件化)来建模社会行为,缺乏一个统一的框架来解释长期人机交互中稳定社会关系和社会智能的涌现。为解决这一问题,我们提出了人机协同演化动力学框架(HACD-H),这是一个将人机交互建模为自组织社会认知系统的形式模型。HACD-H将情感适应、关系组织、社会记忆和人格一致性整合到一个统一的动力学框架中,并引入了多时间尺度社会认知、关系吸引子、信任盆地、发展相变和社会认知能量景观等原则。我们构建了一个约14,700轮交互的对话数据集,并开发了一个理论驱动的实证评估框架。结果揭示了社会认知中的时间持久性层次结构、稳定的关系吸引子、类似相变的发展模式以及结构化的社会认知能量景观。社会智能与社会认知能量呈显著负相关(r = -0.391, p < 0.001),且交互轨迹随时间呈现渐进性能量减少。这些发现表明,社会智能源于长期的社会认知协同演化,而非孤立的对话能力。HACD-H为建模适应性人机社会交互和开发社会智能AI系统提供了统一的理论基础。

英文摘要

Current conversational AI systems have made significant progress in language generation, personalization, and long-context interaction. However, most existing methods model social behavior through isolated components such as emotion modeling, memory retrieval, or persona conditioning, lacking a unified framework to explain the emergence of stable social relationships and social intelligence in long-term human-AI interaction.To address this, we propose the Human-AI Coevolution Dynamics Framework (HACD-H), a formal model of human-AI interaction as a self-organizing social cognitive system. HACD-H integrates emotional adaptation, relational organization, social memory, and personality consistency into a unified dynamical framework and introduces principles including multi-timescale social cognition, relational attractors, trust basins, developmental phase transitions, and social cognitive energy dynamics.We construct a conversational dataset with approximately 14,700 interaction turns and develop a theory-driven empirical evaluation framework. Results reveal a hierarchy of temporal persistence in social cognition, stable relational attractors, phase-transition-like developmental patterns, and a structured social cognitive energy landscape. Social intelligence shows a significant negative correlation with social cognitive energy (r = -0.391, p < 0.001), and interaction trajectories exhibit progressive energy reduction over time.These findings suggest that social intelligence emerges from long-term social cognitive coevolution rather than isolated conversational capabilities. HACD-H provides a unified theoretical foundation for modeling adaptive human-AI social interaction and developing socially intelligent AI systems.

2606.19121 2026-06-18 cs.SE cs.CL cs.HC 新提交 60%

Written by AI, Managed by AI: Semantic Space Control and Index Sickness Elimination Across 391 Consecutive Sessions

由AI编写,由AI管理:跨越391个连续会话的语义空间控制与索引病消除

Hui Zhang, Shuren Song

发表机构 * Shenzhen Yunxi Technology Co., Ltd.(深圳云曦科技有限公司) Information Technology Center, Tsinghua University(清华大学信息科学技术中心)

专题命中 其他LLM :研究LLM协作中的工程问题

AI总结 本文通过真实软件项目中的行动研究,发现长期LLM协作中增加形式约束反而导致“索引病”,提出“基线-日志物理分离”机制,有效消除该问题。

Comments 22 pages, 2 tables, 1 figure. Action research. Bilingual submission (Chinese companion version included as supplementary). Submitted to ICSE 2027 IOR track

详情
AI中文摘要

解决长期LLM协作中概念漂移的主流工程直觉是,用更多的形式约束换取更可靠的输出——设计符号标识符系统,在系统提示中积累防御规则,扩展上下文窗口。我们的工程记录表明,在长期设置中,这种方向可能产生与设计意图相反的效果。通过在跨越约一个月和391个协作会话的真实软件项目(Bang-v3)中使用行动研究方法,我们记录并分析了这些策略的失败过程。当符号系统超过复杂度阈值时,LLM并不会变得更准确——相反,它们放弃了对业务语义的真正理解,退回到符号层内的自我指涉推理,并生成看似内部一致但实际上与现实脱节的输出。我们将这种失败模式命名为“索引病”,其典型表现为“幻影立法”。我们将底层原理命名为“庞原理(语义活力定律)”:带有明确目的的自然语言传达的信息质量远高于符号表达。由此,我们设计并验证了其物理工程机制:“基线-日志物理分离”。在同一项目中,该机制将AI指令量减少了约75%,并且在随后的约150个会话中,未观察到索引病复发。附有双语对照版本(中文)作为补充材料。

英文摘要

The prevailing engineering intuition for addressing conceptual drift in long-horizon LLM collaboration is to trade more formal constraints for more reliable outputs -- designing symbolic identifier systems, accumulating defensive rules in System Prompts, expanding context windows. Our engineering record shows that in long-horizon settings, this direction may produce effects contrary to design intent. Using action research methods in a real software project (Bang-v3) spanning approximately one month and 391 collaborative sessions, we document and analyze the failure process of these strategies. When the symbolic system exceeds a complexity threshold, LLMs do not become more accurate -- instead, they abandon genuine understanding of business semantics, retreat to self-referential reasoning within the symbolic layer, and generate outputs that appear internally consistent but are physically disconnected from reality. We name this failure pattern "Index Sickness," and its canonical manifestation "Phantom Legislation." We name the underlying principle the "Pang Principle (Semantic Vitality Law)": natural language carrying explicit purpose conveys far greater information quality than symbolic expression. From this, we design and validate its physical engineering mechanism: "Baseline-Log Physical Separation." In the same project, this mechanism reduced AI Instructions volume by ~75%, and across the subsequent ~150 sessions, no recurrence of Index Sickness was observed. A bilingual companion version (Chinese) is included as supplementary material.

2606.19111 2026-06-18 cs.CL cs.AI cs.MA 新提交 60%

Leadership as Coordination Control: Behavioral Signatures and the Recovery-Advantage Boundary in Multi-Agent LLM Teams

领导力作为协调控制:多智能体LLM团队中的行为特征与恢复优势边界

Haewoon Kwak

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

专题命中 其他LLM :研究LLM团队行为,但非模型本身

AI总结 研究多智能体LLM团队中过程级协调控制何时增加价值,通过行为特征和消融实验发现,控制器的优势仅在初始多数投票不可靠、任务可恢复且无指导交互无法修复时出现,验证了权变理论。

Comments 33 pages

详情
AI中文摘要

团队科学认为领导力是权变的:它仅在特定条件下有帮助,而能力强的自主团队可能根本不需要领导。我们对多智能体LLM团队提出类似问题:在什么可测量的条件下,过程级协调控制会增加价值,这些条件是否与团队科学的预测一致?我们使用行为特征(多数锁定、探索、从错误的第0轮共识中恢复)和每动作消融实验,因为每个控制器是一个显式动作集,而不是一个整体提示。我们将三种经典领导风格(交易型、变革型、情境型)操作化为对共享动作词汇(探索、修订、接受、综合)的控制器。一个具有相同动作但使用任意规则的匹配控制器恢复效果不优于多数投票,因此是理论推导的规则(而非词汇)起作用。在四个任务体系和三个开放权重模型系列中,没有控制器在准确率上占主导地位,正如权变观点所预测的:交易型控制在所有12个(模型、体系)组合上与共享的第0轮投票匹配,差异在1.3个百分点以内,仅在初始多数不可靠的一个组合上出现增益(llama-4-scout社会性;情境型比扁平型高8个百分点)。通过四个边界探针测试的恢复优势解释表明,控制器仅在初始多数投票不可靠、任务可恢复且无指导交互无法修复时优于纯交互。这些区域映射到权变理论(领导替代、路径-目标冗余、情境准备差距),因此基本为零的准确率结果正是理论所预测的,而非控制器的失败。我们将过程级协调控制视为一种需要测量和理论映射的权变因素,而不是需要超越的排行榜。

英文摘要

Team science holds that leadership is contingent: it helps only under specific conditions, and capable, autonomous teams may need none at all. We ask the analogous question for multi-agent LLM teams: under what measurable conditions does process-level coordination control add value, and do those conditions match what team science predicts? We use behavioral signatures (majority lock-in, exploration, recovery from an incorrect round-0 consensus) and per-action ablations, clean because each controller is an explicit action set, not a monolithic prompt. We operationalize three classical leadership styles (transactional, transformational, situational) as controllers over a shared action vocabulary (explore, revise, accept, synthesize). A matched controller with the same actions but an arbitrary rule recovers no better than majority voting, so the theory-derived rule, not the vocabulary, does the work. Across four task regimes and three open-weight model families, no controller dominates by accuracy, as the contingency view predicts: transactional control matches a shared round-0 vote on all 12 (model, regime) combinations to within 1.3pp, and gains appear only on the one combination where the round-0 majority is unreliable (llama-4-scout social; situational +8pp over flat). A recovery-advantage account, tested with four boundary probes, says a controller beats plain interaction only where the round-0 majority is unreliable, the task is recoverable, and undirected interaction does not already repair it. These regions map onto contingency theory (leadership substitutes, path-goal redundancy, the situational readiness gap), so a largely null accuracy result is what the theory predicts, not a failure of the controllers. We read process-level coordination control as a contingency to be measured and theory-mapped, not a leaderboard to be topped.

2. 领域大模型 4 篇

2606.18789 2026-06-18 eess.SY cs.SY 新提交 70%

PowerAgentBench-SS: A Benchmark for Agentic AI in Power System Steady-State Studies

PowerAgentBench-SS:电力系统稳态研究中智能体AI的基准测试

Costas Mylonas, Magda Foti, Andrea Pomarico, Matheus Duarte, Qian Zhang, Emmanouel Varvarigos

专题命中 领域大模型 :电力系统领域LLM智能体基准

AI总结 提出PowerAgentBench-SS基准框架,用于评估LLM智能体在电力系统稳态研究中执行工程工作流的能力,通过工具API、验证预算和风险敏感指标区分智能体性能。

详情
AI中文摘要

电力系统基准测试通常评估数值求解器、预测模型或顺序控制器。这些基准是必要的,但它们不直接测试大型语言模型(LLM)智能体是否能执行工程工作流:检查电网案例、选择工具、调用模拟器、筛选 contingencies、提出可接受的缓解措施、验证结果并生成可审计的证据链。本文介绍了PowerAgentBench-SS,一个用于评估电力系统运行和规划研究中工具使用智能体的稳态基准框架。该基准向智能体公开案例数据、动作约束、工具API和验证预算,同时隐藏的评估器重新计算物理有效性并对提交的报告进行评分。我们定义了智能体接口、工具契约、证据日志和风险敏感指标,包括提交召回率、证据支持召回率、发现召回率、假安全惩罚、严重性遗憾、残余违规分数、动作成本、工具使用效率和工作流诊断。为了使框架具体化,我们在可复现的直流热N-2 contingency搜索试点中实例化该协议,使用确定性IEEE 39节点运行点变体,包括脚本基线、LLM JSON命令适配器、三个本地托管的Ollama LLM智能体和一个OpenAI API智能体。结果表明为什么仅求解器或仅答案评估是不够的:智能体不仅通过顶级contingency发现来区分,还通过验证预算使用、显式提交、类型强制、重复验证、证据支持报告和缓解行为来区分。

英文摘要

Power system benchmarks usually evaluate numerical solvers, prediction models, or sequential controllers. These benchmarks are necessary, but they do not directly test whether a Large Language Model (LLM) agent can execute an engineering workflow: inspect a grid case, select tools, call simulators, screen contingencies, propose admissible mitigations, validate results, and produce an auditable evidence trail. This paper introduces PowerAgentBench-SS, a steady-state benchmark framework for evaluating tool-using agents in power system operation and planning studies. The benchmark exposes public case data, action constraints, a tool API, and a validation budget to an agent, while a hidden evaluator recomputes physical validity and scores the submitted report. We define the agent interface, tool contract, evidence log, and risk-sensitive metrics, including submitted recall, evidence-backed recall, found recall, false-safe penalties, severity regret, residual violation score, action cost, tool-use efficiency, and workflow diagnostics. To make the framework concrete, we instantiate the protocol in a reproducible DC thermal N-2 contingency-search pilot on deterministic IEEE 39-bus operating-point variants, with scripted baselines, an LLM JSON-command adapter, three locally hosted Ollama LLM agents, and one OpenAI API agent. The results show why solver-only or answer-only evaluation is insufficient: agents are distinguished not only by top-contingency discovery, but also by validation-budget use, explicit submission, type coercions, duplicate validations, evidence-backed reporting, and mitigation behavior.

2606.18636 2026-06-18 cs.CL cs.AI 新提交 70%

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

PEC-Home:智能家居中渐进式省略命令的解释

Yingyu Shan, Zeming Liu, Silin Li, Boao Qian, Jiashu Yao, Yuhang Guo, Haifeng Wang

发表机构 * Beijing Institute of Technology(北京理工大学) Beihang University(北京航空航天大学) Baidu Inc.(百度公司)

专题命中 领域大模型 :智能家居中渐进式省略命令的解释

AI总结 针对智能家居中用户因共享上下文而使用渐进式省略命令导致的指代和意图歧义问题,提出首个模拟家庭数据集PEC-Home,实验表明现有LLM助手难以准确执行省略命令。

Comments Accepted by ACL 2026 Findings

详情
AI中文摘要

近年来,大型语言模型(LLM)的进步使家庭助手具备了自然语言交互能力。然而,当前的助手忽略了人类对话中随着共享上下文积累而发生的渐进式省略,即为了高效沟通而使用更简洁的表达。因此,当前助手仍难以准确解释此类省略表达,限制了其在现实应用中的有效性。在实际智能家居场景中,助手面临由省略命令引起的两大挑战:(1)多个用户对环境期望不同导致的指代歧义;(2)用户偏好随时间或环境变化导致的意图歧义。为应对这些挑战,我们引入了PEC-Home,这是首个专门为解释智能家居中渐进式省略命令而设计的模拟家庭数据集。在包括GPT-4o在内的多种LLM上的广泛实验表明,现有的家庭助手难以仅基于省略命令执行用户意图的操作。即使配备存储和检索用户对话历史的工具,其执行准确率仍低于使用完整命令时的水平。

英文摘要

Recent advancements in Large Language Models (LLMs) have empowered home assistants with natural language interaction capabilities. However, current assistants overlook the progressive omission that occurs in human dialogue as shared context accumulates, leading to more elliptical expressions for efficient communication. Thus, current assistants still struggle to interpret such elliptical expressions accurately, which limits their effectiveness in real-world applications. In practical smart home scenarios, assistants face two major challenges caused by elliptical commands: (1) referential ambiguity caused by different environmental expectations among multiple users; and (2) intention ambiguity resulting from user preferences that evolve over time or change with the environment. To address these challenges, we introduce PEC-Home, the first simulated home dataset specifically designed for interpreting progressively elliptical commands in smart homes. Extensive experiments on various LLMs, including GPT-4o, show that existing home assistants struggle to execute user-intended operations based solely on elliptical commands. Even when equipped with tools for storing and retrieving user dialogue history, execution accuracy remains below that achieved with complete commands.}.

2606.18584 2026-06-18 cs.CL 新提交 70%

Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

语音驱动的端到端汉语方言语言鉴别

Fan Xu, Jian Luo, MingWen Wang, GuoDong Zhou

发表机构 * Jiangxi normal university(江西师范大学) Soochow university(苏州大学)

专题命中 领域大模型 :语音驱动端到端汉语方言语言鉴别

AI总结 针对相似语言和方言鉴别难题,提出基于MFCC特征和HMM-DNN端到端模型的语音驱动方法,结合注意力机制和CNN融合词嵌入与MFCC特征,在基准语料上优于现有方法。

Comments Published in ACM TALLIP

详情
AI中文摘要

在相似语言、变体和方言之间进行语言鉴别是一项具有挑战性的自然语言处理任务。传统的文本驱动方法效果不佳。本文探讨了语音驱动特征在汉语方言鉴别中的有效性。首先,我们系统地研究了语音驱动的MFCC特征对于基于CNN的语言鉴别的适用性。然后,我们设计了一个基于HMM-DNN的端到端语音识别模型来预测汉语方言词汇。我们采用注意力机制提取与不同汉语方言相关的鉴别性词汇。最后,通过CNN,我们将词级嵌入与基于MFCC的特征相结合。在两个基准汉语方言语料库上的评估表明,与最先进的方法相比,所提出的语音驱动方法在细粒度汉语方言鉴别中具有适用性和有效性。

英文摘要

Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this paper, we explore the effectiveness of speech-driven features towards language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features towards CNN-based language discrimination. Then, we design an end-to-end speech recognition model based on HMM-DNN to predict Chinese dialect words. We adopt attention to extract the discriminative words related to different Chinese dialects. Finally, through a CNN, we combine the word-level embedding and the MFCC-based features. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech-driven approach to fine-grained Chinese dialect discrimination compared to the state-of-the-art methods.

2606.18560 2026-06-18 cs.SD 新提交 70%

Constraining to Generalize: Subspace Tuning for Few-shot Generalization of Audio-Language Models

约束泛化:音频-语言模型少样本泛化的子空间微调

Jaehyuk Jang, Kangwook Ko, Wonjun Lee, Changick Kim

发表机构 * KAIST(韩国科学技术院)

专题命中 领域大模型 :子空间微调提升音频-语言模型少样本泛化

AI总结 针对音频-语言模型少样本微调导致的基类-新类权衡问题,提出子空间微调(SubT),通过结构化子空间参数化和残差锚定约束文本嵌入漂移,并利用子空间感知门控抑制负迁移,在11个音频基准上实现高效强泛化。

详情
AI中文摘要

预训练音频-语言模型(ALM)的少样本适应通常以牺牲未见类泛化为代价提高可见类性能,导致基类-新类权衡。我们将此失败归因于文本嵌入空间中的零样本漂移:少样本微调可能扭曲类间结构,并使适应后的嵌入远离其预训练锚点。因此,我们提出子空间微调(SubT),一种几何约束的适应框架,具有两种互补的漂移控制。结构化子空间参数化限制结构变形,残差锚定稳定围绕零样本先验的适应。在推理时,子空间感知门控进一步抑制弱对齐未见类的负迁移。在11个音频基准上,SubT在保持高效的同时实现了强大的少样本泛化,直接对预计算文本嵌入进行操作,无需文本编码器反向传播。

英文摘要

Few-shot adaptation of pretrained Audio--Language Models (ALMs) often improves seen-class performance at the cost of unseen-class generalization, leading to the base-to-new trade-off. We attribute this failure to zero-shot drift in the text embedding space: few-shot tuning can distort inter-class structure and move adapted embeddings far from their pretrained anchors. We therefore propose Subspace Tuning (SubT), a geometry-constrained adaptation framework with two complementary controls on drift. Structured Subspace Parameterization limits structural deformation, and Residual Anchoring stabilizes adaptation around the zero-shot prior. At inference time, Subspace-aware Gating further suppresses negative transfer for weakly aligned unseen classes. Across 11 audio benchmarks, SubT delivers strong few-shot generalization while remaining efficient, operating directly on precomputed text embeddings without text-encoder backpropagation.

3. 后训练 5 篇

2606.18627 2026-06-18 cs.LG 新提交 70%

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

PACT: 在任务向量中保留锚定核心用于模型合并

Ningyuan Shi, Zhipeng Zhou, Hao Wang, Chunyan Miao, Peilin Zhao

发表机构 * Shanghai Jiao Tong University(上海交通大学) Nanyang Technological University(南洋理工大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

专题命中 后训练 :模型合并方法,保留预训练权重中的核心维度

AI总结 提出PACT方法,通过识别并保留预训练权重中的承重墙维度,在任务向量中锚定任务特定核心,解决任务向量范式下任务冲突和性能下降问题,提升模型合并效果。

Comments 33 pages,14 figures

详情
AI中文摘要

模型合并已成为多任务学习的一种无需训练的替代方案,旨在将多个任务特定的微调模型组合成一个单一的多任务模型。大多数现有的模型合并方法遵循任务算术范式,该范式将微调权重分解为预训练参数和任务向量,并仅在任务向量空间中进行合并。这一范式的有效性隐含地依赖于一个假设,即任务特定知识仅编码在任务向量中。我们认为,由于预训练模型固有的任务偏好,这一假设通常不成立。具体而言,我们识别出\textbf{承重墙(LBW)维度},即一些任务关键知识仍嵌入在预训练权重中,而非完全转移到任务向量中。我们从标量权重和子空间两个角度刻画LBW维度,从而覆盖现有模型合并方法的主要范式。我们的分析表明,忽略LBW维度会导致基于任务向量的方法无法完全解决任务冲突,并可能无意中破坏预训练模型中编码的任务特定知识,从而导致性能下降。为解决这一问题,我们提出PACT,该方法通过将任务向量的正交补与预训练权重的子空间对齐,从而在任务向量中保留锚定的任务特定核心(即LBW维度)。在应用现有模型合并算法之前,将这些对齐的子空间分量从任务向量中移除。此外,我们开发了一种基于随机SVD的高效变体以提高可扩展性。PACT可以无缝集成到现有方法中。在多个基准上的大量实验表明,PACT持续增强主流模型合并方法,并建立了新的最先进性能。

英文摘要

Model merging has emerged as a training-free alternative to multi-task learning, aiming to combine multiple task-specific fine-tuned models into a single multi-task model. Most existing model merging approaches follow the Task Arithmetic paradigm, which decomposes fine-tuned weights into pre-trained parameters and task vectors, and performs merging exclusively in the task-vector space. The effectiveness of this paradigm implicitly relies on the assumption that task-specific knowledge is encoded solely within task vectors. We argue that this assumption generally does not hold due to the intrinsic task preferences of pre-trained models. Specifically, we identify \textbf{Load-Bearing Wall (LBW) dimensions}, namely some task-critical knowledge that remains embedded in the pre-trained weights rather than being fully transferred into task vectors. We characterize LBW dimensions from both scalar-weight and subspace perspectives, thereby covering the major paradigms of existing model merging methods. Our analysis reveals that, by ignoring LBW dimensions, task-vector-based approaches fail to fully resolve task conflicts and may inadvertently damage task-specific knowledge encoded in the pre-trained model, leading to degradation. To address this issue, we propose PACT, which preserves the anchored task-specific cores (i.e., LBW dimensions) within task vectors by aligning their orthogonal complements with the subspace of the pre-trained weights. These aligned subspace components are then removed from the task vectors before applying existing model merging algorithms. Furthermore, we develop an efficient variant based on randomized SVD to improve scalability. PACT can be seamlessly integrated with existing methods. Extensive experiments across multiple benchmarks demonstrate that PACT consistently enhances mainstream model merging approaches and establishes new state-of-the-art performance.

2606.18606 2026-06-18 cs.CL cs.AI 新提交 70%

Steerable Cultural Preference Optimization of Reward Models

可引导的文化偏好优化奖励模型

Minsik Oh, Advit Deepak, Sophie Wu, Douwe Kiela, Ekaterina Shutova

发表机构 * Stanford University(斯坦福大学) University of Amsterdam(阿姆斯特丹大学)

专题命中 后训练 :训练奖励模型用于LLM对齐

AI总结 提出SCPO算法,通过平衡多种文化偏好训练奖励模型,在PRISM和GlobalOpinionQA数据集上提升少数群体偏好预测准确率最多7点,训练效率提高280%。

Comments Accepted to Pluralistic Alignment @ ICML 2026

详情
AI中文摘要

大型语言模型(LLM)技术以每个文化子社区可接受的方式服务于众多不同文化子社区至关重要。然而,迄今为止,关于LLM对齐的研究主要集中于预测来自特定地区的标注者的统一响应偏好。本文旨在以更全球化的视角推进对齐模型的发展,使其能够准确代表子社区的偏好,并且不对任何子社区表现出过度偏见。我们专注于为此目的开发奖励模型,并提出一种新颖的奖励模型训练算法(SCPO),该算法能够以平衡的方式融入多样化的文化偏好。我们的方法使得少数群体奖励模型在两个数据集(PRISM和GlobalOpinionQA)以及7个国家上的性能比基线模型提升最多7点。SCPO在训练数据效率上比奖励模型的完整数据微调高出最多280%。此外,我们通过分别评估子社区的偏好来进行偏见分析,并表明我们的加权方法减轻了过度偏见。我们的代码可在以下网址获取:this https URL

英文摘要

It is essential for large language model (LLM) technology to serve many different cultural sub-communities in a manner that is acceptable to each community. However, research on LLM alignment has so far predominantly focused on predicting a unified response preference of annotators from certain regions. This paper aims to advance the development of alignment models with a more global outlook, that are able to accurately represent the preferences of subcommunities and do not exhibit excessive bias towards any of them. We focus on the development of reward models for this purpose and present a novel reward model training algorithm (SCPO) that can incorporate diverse cultural preferences in a balanced manner. Our method results in performance increases of the minority reward model of up to 7 points over the baseline model across two datasets, PRISM and GlobalOpinionQA, and across 7 countries. SCPO is up to 280% more training data-efficient than full-data finetuning of reward models. In addition, we perform analysis of bias by separately evaluating on the preference of subcommunities and show that excessive bias is mitigated via our weighting method. Our code is available at https://github.com/minsik-ai/Steerable-Cultural-Preference

2606.18521 2026-06-18 cs.LG cs.AI 新提交 70%

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

稀疏性诅咒:从模型合并理解RLVR模型参数空间

Chenrui Wu, Zexi Li, Jiajun Bu, Jiangchuan Liu, Haishuai Wang

发表机构 * Zhejiang University(浙江大学) Simon Fraser University(西蒙菲莎大学) The Chinese University of Hong Kong(香港中文大学) Zhejiang Key Lab of Accessible Perception and Intelligent Systems(浙江省可感知智能系统重点实验室)

专题命中 后训练 :研究RLVR模型参数空间与合并

AI总结 本文发现RLVR模型的稀疏更新在参数空间中分散更远,形成近正交捷径导致合并脆弱,并提出SAR-Merging方法解决该问题。

Comments Accepted by KDD 2026

详情
AI中文摘要

可验证奖励强化学习(RLVR)已成为一种强大的后训练范式,在激发推理智能和抵抗灾难性遗忘方面超越了监督微调(SFT)。最近的研究进一步揭示,与SFT相比,RLVR会引发高度稀疏且偏离主成分的参数更新。这自然引出一个问题:这种稀疏性是否使RLVR模型更易于模型合并?如果是,模型合并将提供一种可扩展的、无需训练的方法,来聚合来自独立训练的RLVR模型的多样化推理能力。令人惊讶的是,我们发现相反的情况,揭示了一种稀疏性诅咒:稀疏的RLVR更新在参数空间中分散得更远,形成近正交的捷径,使得聚合本质上是脆弱的。这很可能源于RL优化的随机性和涌现推理模式的多样性。与SFT模型收敛到共享的平坦盆地并自然合并不同,RLVR模型在标准合并方法下遭受严重退化。通过对更新几何的系统性实证分析,我们描述了这种失败背后的机制,并提出了敏感性感知解析合并(SAR-Merging),这是一种针对RLVR参数空间独特结构定制的合并方案。SAR-Merging通过基于Fisher信息的敏感性仲裁解决重叠更新区域中的冲突,然后通过幅度感知稀疏化和重新缩放来保留脆弱的推理路径。在数学和编程基准上的实验表明,SAR-Merging在RLVR模型上显著优于现有合并方法,实现了单任务增强和多能力融合。

英文摘要

Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent studies further reveal that RLVR induces highly sparse and off-principal parameter updates compared to SFT. This naturally raises the question: does such sparsity make RLVR models more amenable to model merging? If so, model merging would offer a scalable, training-free path to aggregate diverse reasoning capabilities from independently trained RLVR models. Surprisingly, we find the opposite, uncovering a sparsity curse: the sparse RLVR updates are spread farther apart in parameter space, forming near-orthogonal shortcuts that make aggregation inherently fragile. This is likely rooted in the stochasticity of RL optimization and the diversity of emergent reasoning patterns. Unlike SFT models that converge to shared, flat basins and merge naturally, RLVR models suffer severe degradation under standard merging methods. Through systematic empirical analysis of the update geometry, we characterize the mechanisms behind this failure and propose Sensitivity-aware Resolving Merging (SAR-Merging), a merging recipe tailored for the unique structure of RLVR parameter spaces. SAR-Merging resolves conflicts in overlapping update regions via Fisher Information-based sensitivity arbitration, followed by magnitude-aware sparsification and rescaling to preserve fragile reasoning pathways. Experiments on mathematical and coding benchmarks demonstrate that SAR-Merging substantially outperforms existing merging methods on RLVR models, enabling both single-task enhancement and multi-capability fusion.

2606.16276 2026-06-18 cs.AI 新提交 70%

SpecAlign: Efficient Specification-Grounded Alignment of Large Language Models via Synthetic Data

SpecAlign: 通过合成数据实现高效的大语言模型规范对齐

Wenjie Wang, Yue Huang, Zhengqing Yuan, Han Bao, Shiyi Du, Yuchen Ma, Yue Zhao, Yanfang Ye, Xiangliang Zhang

发表机构 * University of Notre Dame(圣母大学) Carnegie Mellon University(卡内基梅隆大学) LMU Munich(慕尼黑大学) University of Southern California(南加州大学)

专题命中 后训练 :后训练对齐方法,提升LLM规则遵守度

AI总结 提出规范对齐新范式,通过从规范文档合成数据(SpecAlign框架),结合结构化规则标注、可控规范实例化和多智能体对抗数据合成,生成细粒度偏好对,提升规则遵守度且不损害通用能力。

Comments 58 pages

详情
AI中文摘要

随着大语言模型(LLM)在现实应用中的部署日益增多,对齐不再由单一的通用安全或有用性概念主导,而是由提供商或应用特定的模型规范主导。这些规范通常冗长、结构化且频繁更新,然而现有的对齐流程缺乏系统化的机制来将其作为训练信号。在本文中,我们提出规范对齐(specification-grounded alignment),一种新的对齐范式,将提供商编写的模型规范作为主要对齐目标,而非抽象原则或静态基准。为实例化该范式,我们引入SpecAlign框架,该框架直接从规范文档合成对齐数据。SpecAlign结合结构化规则标注、可控规范实例化和多智能体对抗数据合成,生成细粒度、边界感知的偏好对,捕获合规行为和有意义的规范违反。在多个模型规范和骨干模型上的实验表明,使用SpecAlign进行训练一致地提高了规则遵守度,同时保持了通用能力并避免了过度保守的行为。这些结果表明,将对齐建立在显式模型规范上,能够实现LLM行为对不断变化的政策要求的快速、精确和可扩展的适应。

英文摘要

As large language models (LLMs) are increasingly deployed in real-world applications, alignment is no longer governed by a single universal notion of safety or helpfulness, but instead by provider- or application-specific model specifications. These specifications are typically long, structured, and frequently updated, yet existing alignment pipelines lack a systematic mechanism to operationalize them as training signals. In this paper, we propose specification-grounded alignment, a new alignment paradigm that treats provider-authored model specifications as the primary alignment target rather than abstract principles or static benchmarks. To instantiate this paradigm, we introduce SpecAlign, a framework that synthesizes alignment data directly from specification documents. SpecAlign combines structured rule annotation, controllable specification instantiation, and multi-agent adversarial data synthesis to generate fine-grained, boundary-aware preference pairs that capture both compliant behaviors and meaningful specification violations. Experiments across multiple model specifications and backbone models demonstrate that training with SpecAlign consistently improves rule compliance while preserving general capabilities and avoiding over-conservative behavior. These results suggest that grounding alignment in explicit model specifications enables rapid, precise, and scalable adaptation of LLM behavior to evolving policy requirements.

2606.18309 2026-06-18 cs.LG cs.AI 新提交 65%

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE: 保留感知的最终遗忘向量事后净化

Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang

发表机构 * Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University(上海交通大学图像处理与模式识别研究所)

专题命中 后训练 :提出事后净化遗忘向量,缓解遗忘与保留权衡。

AI总结 提出SAGE方法,通过事后净化最终更新向量,在不重新运行原始遗忘流程的情况下,缓解大语言模型遗忘与保留能力之间的权衡。

详情
AI中文摘要

大语言模型(LLM)遗忘旨在移除不良知识或行为,同时保留已有能力。当前的遗忘方法都涉及遗忘与保留之间的权衡。我们发现,保留激活偏差也可用于量化遗忘方法对保留造成的损害,而无需考虑遗忘过程的具体实现。这使得我们能够通过事后方法恢复任何遗忘方法的保留性能。因此,我们提出一种互补的事后设置,在不重新运行原始遗忘流程的情况下净化最终更新向量。在该设置中,我们设计了SAGE(光谱激活-几何净化),一种对最终遗忘更新的源无关修正。SAGE从一个小型保留代理收集真实模块输入,提取其主导激活几何结构,并求解一个闭式源锚定优化目标,该目标抑制与高能保留方向对齐的更新分量,同时保留源方法的遗忘载体。在多种遗忘方法、模型规模和基准测试中,SAGE持续缓解保留-遗忘权衡,将最终向量的事后净化识别为机器遗忘中一个实用且未被充分探索的维度。

英文摘要

Large Language Model (LLM) unlearning aims to remove undesirable knowledge or behaviors while preserving retained capabilities. Current unlearning methods all involve a trade-off between unlearning and retention. We have found that the retention activation bias can also be used to quantify the damage an unlearning method inflicts on retention, without considering the specific implementation of the unlearning process. This allows us to restore retention performance for any unlearning method using a post-hoc approach. Therefore, we propose a complementary post-hoc setting to sanitize the final update vector without rerunning the original unlearning pipeline. In this setting, we design SAGE, Spectral Activation-GEometry Sanitization, a source-agnostic correction for final unlearning updates. SAGE collects real module inputs from a small retain proxy, extracts their dominant activation geometry, and solves a source-anchored optimization objective in closed form, which suppresses update components aligned with high-energy retained directions while preserving the source method's forgetting carrier. Across multiple unlearning methods, model scales, and benchmarks, SAGE consistently relieves the retain-forget trade-off, identifying post-hoc sanitization of final vectors as a practical and underexplored axis for machine unlearning.

4. 预训练 1 篇

2606.18465 2026-06-18 cs.LG cs.AI 新提交 70%

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

权重范数在Grokking中控制什么?交叉熵下的对数尺度中介作用

Truong Xuan Khanh

发表机构 * H&K Research Studio, Clevix LLC

专题命中 预训练 :研究Grokking中权重范数的作用

AI总结 本文通过固定权重范数并改变输出温度,发现Grokking延迟主要由对数尺度(logit scale)决定,权重范数仅通过影响对数尺度间接起作用。

Comments 16 papges, 10 tables and 4 figures. Code and data to reproduce all numbers, tables, and figures: https://github.com/ClevixLab/grokking-logit-scale

详情
AI中文摘要

Grokking,即从记忆到泛化的延迟跳跃,通常与权重范数相关:范数越小,泛化越早。我们探究范数实际控制什么。通过钳位固定权重范数并仅改变输出温度,我们在交叉熵下将Grokking延迟滑动到其整个范数诱导范围;将有效对数尺度匹配回基线可恢复两个模数下约85%的延迟。在范数和温度的网格上,延迟仅由对数尺度决定(R2 = 0.97),范数仅额外贡献1-2%。该效应依赖于损失函数:在均方误差下,对数尺度被固定,范数通过不同路径起作用。记忆控制、float64 softmax崩溃审计和无LayerNorm的Transformer均指向同一通道。从同一状态分叉,延迟遵循钳位的范数值而非钳位操作本身,这排除了重缩放伪影。近端变量是对数尺度及其驱动的softmax饱和;权重范数仅是上游手柄。所有数字、表格和图表均可从发布的代码和数据中复现。

英文摘要

Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying only an output temperature, we slide the grokking delay across its entire norm-induced range under cross-entropy; matching the effective logit scale back to baseline recovers about 85% of the delay at two moduli. Across a grid of norms and temperatures the delay collapses onto the logit scale alone (R2 = 0.97), with the norm adding 1-2% beyond it. The effect is loss-dependent: under mean-squared error the logit scale is pinned and the norm acts through a different route. A memorization control, a float64 softmax-collapse audit, and a no-LayerNorm transformer point to the same channel. Forking arms from one identical state, the delay follows the held norm value and not the clamp operation, which closes a rescaling-artifact concern. The proximal variable is the logit scale and the softmax saturation it drives; the weight norm is only an upstream handle. All numbers, tables, and figures reproduce from released code and data.

5. 指令微调 1 篇

2606.18257 2026-06-18 cs.HC cs.AI 新提交 70%

From Memorization to Creation: Evaluating the Cognitive Depth of LLM-Generated Educational Questions

从记忆到创造:评估LLM生成的教育问题的认知深度

Xiaolong Wang, Zhe Zhao, Song Lai, Chaoli Zhang, Zijie Geng, Yu Tong, Ye Wei, Qingsong Wen

发表机构 * City University of Hong Kong(香港城市大学) Zhejiang Normal University(浙江师范大学) Squirrel Ai Learning University of Science and Technology of China(中国科学技术大学) Wuhan University(武汉大学)

专题命中 指令微调 :评估LLM生成问题认知层次,涉及提示策略

AI总结 通过布鲁姆认知分类学评估六种LLM生成问题的认知层次,提出细粒度提示策略减少重复性并提升高阶认知比例,引入认知转移强度和类别漂移指标,揭示链式思维提示的可解释性。

Comments Accepted by KDD 2026

Journal ref KDD 2026

详情
AI中文摘要

尽管LLM在自动化教育内容生成方面展现出潜力,但它们生成能够激发高阶思维问题的能力仍未被充分研究。本研究通过布鲁姆认知分类学视角评估六种广泛使用的LLM,重点关注它们超越机械记忆并实现认知飞跃的能力。采用混合人机评估协议,我们在计算机科学、K-12数学和社会科学领域生成并分析了20,700个问题。主要贡献包括:(1) 一种细粒度提示策略,使Qwen2.5-7B-Instruct的问题重复性降低24.45%,并使InternLM3-8B-Instruct的高阶认知层次输出比例提升11.53%;(2) 认知转移强度(CogShift)和类别漂移的量化指标,揭示InternLM3在多层次转换中的优越性能;(3) 可解释性分析揭示指标级相关性,增强了链式思维提示的透明度。我们的发现强调了认知感知提示设计的重要性,并为在个性化学习系统中部署LLM提供了基准。

英文摘要

While LLMs show promise in automating educational content creation, their ability to generate questions that stimulate higher-order thinking remains understudied. This work evaluates six widely-used LLMs through a Bloom's Taxonomy lens, focusing on their capacity to transcend rote memorization and achieve cognitive leaps. Using a hybrid human--AI evaluation protocol, we generate and analyze 20{,}700 questions across computer science, K--12 math, and social-science domains. Key contributions include: (1) a fine-grained prompting strategy that reduces question repetitiveness by 24.45\% for Qwen2.5-7B-Instruct, and increases the proportion of higher-order cognitive level outputs by 11.53\% for InternLM3-8B-Instruct; (2) quantitative metrics for cognitive shift intensity (CogShift) and category drift, revealing InternLM3's superior performance in multi-level transitions; (3) an interpretability analysis revealing metric-level correlations that enhance the transparency of Chain-of-Thought prompting. Our findings highlight the importance of cognitive-aware prompt design and provide benchmarks for deploying LLMs in personalized learning systems.