arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1708
专题追踪
2606.06936 2026-06-08 cs.HC 新提交

Personality Anchoring for Social Simulation: Linking Personality, Social Behavior, and Interaction Success with LLM Agents

社会模拟的人格锚定:将人格、社会行为与交互成功关联于LLM智能体

Vahid Sadiri Javadi, Aksa Aksa, Fryderyk Róg, Lucie Flek, Johanne R. Trippas

AI总结 提出人格锚定方法,利用电影角色构建多LLM社会模拟,发现双人宜人性组合与共享目标达成呈单调关系,同质宜人性对成功率是同质不宜人性的10倍。

详情
AI中文摘要

社会互动由性格特质和情境语境的相互作用塑造,但系统研究个体间人格配置如何共同影响不同社会情境中的社会行为在方法上仍具挑战。我们通过引入改编自CHARISMA框架的模拟流程来填补这一空白,该流程使用知名电影角色和公众人物作为心理学基础的智能体,采用我们称为人格锚定的方法进行多LLM社会模拟。我们进行了一项大规模实证研究,考察了1,010个模拟对话中双人宜人性组成对社会互动结果的影响。结果显示,双人宜人性组成与共享目标达成之间存在单调关系,同质宜人性对的成功率为同质不宜人性对的10倍(62% vs. 6%)。行为中介分析表明,宜人性部分通过合作策略选择影响目标达成,但在相同主导策略内仍能预测结果,表明存在超出可观察对话行为的路径。稳健性分析证实了重复模拟结果的高度一致性(ICC = 0.89)以及跨不同场景的稳定人格表达,验证了人格锚定作为一种可行的操作化策略。

英文摘要

Social interactions are shaped by the interplay of dispositional traits and situational context, yet systematically investigating how personality configurations between individuals jointly influence social behavior across diverse social contexts remains methodologically challenging. We address this gap by introducing a simulation pipeline adapted from the CHARISMA framework, which employs well-known movie characters and public figures as psychologically grounded agents for multi-LLM social simulation using a method we term personality anchoring. We present a large-scale empirical study examining how dyadic Agreeableness composition influences social interaction outcomes across 1,010 simulated conversations. Our results reveal a monotonic relationship between dyadic Agreeableness composition and shared goal achievement, with Homogeneous-Agreeable pairs achieving success 10 times the rate of Homogeneous-Disagreeable pairs (62% vs. 6%). Behavioral mediation analysis reveals that Agreeableness shapes goal achievement partially through cooperative strategy selection, though it continues to predict outcomes within the same dominant strategy, indicating pathways beyond observable conversational behavior. Robustness analyses confirm high consistency of results across repeated simulations (ICC = 0.89) and stable personality expression across diverse scenarios, validating personality anchoring as a viable operationalization strategy.

2606.06932 2026-06-08 eess.SY cs.SY 新提交

Forecast and Model Predictive Control of Distributed Energy Resource Aggregators for Net-Demand Balancing

分布式能源聚合体的预测与模型预测控制在净需求平衡中的应用

Obai Bahwal, Oliver Kosut, LalithaSankar

AI总结 提出结合预测与模型预测控制的方法,将分布式能源聚合体视为虚拟电池,通过滚动时域MPC跟踪净需求模式,并分析预测时域、MPC更新率及预测模型选择的影响。

详情
AI中文摘要

随着能源需求的快速增长,即使加入大量可再生能源也不足以完全满足需求,反而增加了供应不确定性。分布式能源聚合体(DERAs)通过聚合和控制分散的分布式能源,有潜力解决这种不确定性,从而充当虚拟电厂。我们提出了一种新方法,结合预测和模型预测控制,将DERAs分配以跟随净需求模式,同时考虑聚合能源的动态及其容量限制。每个DERA被表示为一个灵活的“虚拟电池”,具有荷电状态和功率限制的约束。调度问题被设定为一个长期模型预测控制任务,旨在最小化与期望荷电水平、输出爬坡和净负荷跟踪误差的差异。为了保持实时运行效率,我们实现了滚动时域MPC,该MPC使用最新的边际需求预测定期更新决策。在预测方面,我们提出了两种模型:线性回归和长短期记忆(LSTM)神经网络。使用高分辨率的CAISO净需求数据和五种典型的DERA类型,我们的模拟展示了该方法跟踪边际需求的效果;特别地,我们强调了预测时域与MPC更新率之间的权衡,以及对负荷预测模型选择的依赖性。我们的结果还表明,在期望的时间偏移和时域选择下,LSTM模型略优于线性回归。

英文摘要

With the rapid demand for energy, even the incorporation of bulk renewable energy sources is not entirely sufficient to meet demand besides adding supply uncertainty. Distributed Energy Resource Aggregators (DERAs) have the potential to address this uncertainty via aggregation and control of decentralized distributed energy sources, thereby acting like virtual power plants. We present a new approach that combines forecasting and model-predictive control to assign DERAs to follow net-demand patterns, while accounting for the dynamics of the aggregate energy sources and their capacity limits. Each DERA is represented as a flexible ``virtual battery" with constraints on state-of-charge and power limits. The dispatch problem is set up as a long-term model predictive control task that aims to minimize differences from desired charge levels, output ramping, and net-load tracking errors. To keep operations efficient in real time, we implement a rolling-horizon MPC, which updates decisions regularly using the latest marginal-demand forecasts. For forecasting, we present two models: linear regression and long-short term memory (LSTM) neural network. Using high-resolution CAISO net-demand data and five typical DERA types, our simulations demonstrate how well our approach tracks marginal-demand; in particular, we highlight the tradeoffs between forecasting horizon times and MPC update rate as well as the dependence on the choice of the load forecasting model. Our results also indicate a slight edge for LSTM models over linear regression for desired time shifts and horizon choices.

2606.06917 2026-06-08 cs.ET 新提交

Belief-Aware Scheduling for Predictive Wildfire Hazard Mapping under Sparse-Window Telemetry

基于信念感知的稀疏窗口遥测下预测性野火危险地图调度

Xun Shao, Kohsuke Yamakawa, Cheah Wai Shiang

AI总结 针对边缘节点遥测受限问题,提出结构化信念与调度器协同方法,在物理校准合成环境中验证,轻量级跨区域注意力编码器优于FAIR基线约28%。

详情
AI中文摘要

监测野火的边缘节点观测到的数据超过受任务限制或窗口化下行链路所能承载的量。接收器必须根据链路传输的任何内容预测H步前的危险地图。我们认为,操作设计问题不在于使用哪种神经架构,而在于如何推导出足以满足接收器预测任务的结构化信念,并通过一个预测未来传输机会的调度器来维持该信念。我们将此形式化为一个部分可观测的序贯分配问题,具有三个耦合的每区域动作轴(感知、表示、传输),并根据H步前向算子的输入需求推导出结构化信念的每个组成部分。识别这些机制需要独立控制窗口周期P、每窗口容量C、预测范围H和燃料组成,这在真实景观数据中不可分离;因此我们在物理校准的合成环境中进行评估。三个实证观察支持该原则:非短视的活动节奏参考与均匀节奏之间的差距在窗口周期稀疏度上呈单峰分布,在中间间隔处达到峰值;消融结构化信念后,主导操作组件在默认景观(时间陈旧性)和结构化景观(静态风险先验)之间翻转,而每单元强度信念在两者中都是冗余的;一个40k参数的轻量级跨区域注意力编码器在默认景观上超过FAIR活动节奏参考约28%,在结构化景观上约11%。更深的Transformer编码器在平均预测损失上并未优于轻量级编码器,且表现出更高的训练种子方差。在此任务类别和机制下,当信念和调度问题正确提出时,适度的架构归纳偏置就足够了。

英文摘要

An edge node monitoring a wildfire observes more than a duty-limited or windowed down-link can carry. The receiver must predict the H-step-ahead hazard map from whatever the link delivers. We argue the operative design problem is not which neural architecture to use but how to derive a structured belief sufficient for the receiver's prediction task and maintain it through a scheduler that anticipates future transmission opportunities. We formalize this as a partially observed sequential allocation problem with three coupled per-region action axes (sensing, representation, transmission), and derive each component of the structured belief from the H-step forward operator's input requirements. Identifying these mechanisms requires independent control over the window period P, per-window capacity C, predictive horizon H, and fuel composition, which is not separable in real-landscape data; we therefore evaluate on a physics-calibrated synthetic environment. Three empirical observations support the principle: the gap between a non-myopic activity-paced reference and uniform pacing is unimodal in window-period sparsity, peaking at intermediate spacing; ablating the structured belief, the dominant operative component flips between a default landscape (temporal staleness) and a structured landscape (static-risk prior), while the per-cell intensity belief is redundant in both; and a 40 k-parameter lightweight cross-region attention encoder exceeds the FAIR activity-paced reference by ~28% on the default landscape and ~11% on the structured landscape. A deeper Transformer encoder does not improve over the lightweight encoder in mean predictive loss and exhibits higher training-seed variance. Within this task class and regime, a modest architectural inductive bias suffices when the belief and the scheduling problem are correctly posed.

2606.06914 2026-06-08 cs.CR 新提交

DPAgent-in-the-Middle: Agentic Defense and Repair Against AI-Groomed Deceptive Patterns

中间DPAgent:针对AI诱导欺骗模式的代理防御与修复

Zewei Shi, Ruoxi Sun, Haoyang Li, Seong Oun Hwang, Feng Liu, Minhui Xue, Xingliang Yuan

AI总结 针对网络界面中的隐私欺骗模式,提出DPAgent框架,通过四个专门代理结合潜在空间净化与防御性提示,主动检测并修复欺骗性界面,检测率达90.98%,修复率77%。

详情
AI中文摘要

网络界面中的隐私欺骗模式系统性地操纵用户泄露个人数据,然而现有防御措施零散、静态,且越来越容易受到大型语言模型的操纵。此外,数据空洞(网络生态系统中信息稀缺的区域)为对手注入误导性内容提供了肥沃土壤,这些内容可被AI系统抓取和学习,从而放大欺骗性设计和模型不当行为。在本文中,我们形式化了一种新的威胁模型——AI诱导,攻击者利用数据空洞植入看似良性但恶意的样本,破坏模型推理并使欺骗行为常态化。为应对隐私欺骗模式中的这一威胁,我们提出了DPAgent,一个基于代理和推理感知的框架,通过四个专门代理协调工作,将潜在空间净化与防御性提示相结合,直接在实时网络环境中主动探索、检测和修复隐私欺骗用户界面,使其在到达最终用户之前被消除。大量评估表明,DPAgent检测出90.98%的诱导样本,在隐私欺骗模式检测中达到最先进水平(微F1为0.816),在仅访问基线所需约10%页面的情况下探索了超过80%的模式类型,并成功修复了77%的检测到的欺骗界面。一项对485个真实网站的大规模研究发现,高达98%的网站至少包含一个隐私欺骗模式,其中超过90%可通过DPAgent缓解。用户研究进一步证实,DPAgent在保持浏览体验的同时有效降低了隐私风险。我们的结果证明了中间代理防御在保护Web UI供应链免受欺骗性设计和基于数据空洞利用的新兴AI威胁方面的潜力。

英文摘要

Privacy deceptive patterns in web interfaces systematically manipulate users into disclosing personal data, yet existing defenses are fragmented, static, and increasingly vulnerable to manipulation by large language models. Moreover, data voids, areas of information scarcity within the web ecosystem, create fertile ground for adversaries to inject misleading content that can be scraped and learned by AI systems, thereby amplifying both deceptive design and model misbehavior. In this paper, we formalize a new threat model, AI grooming, where attackers exploit data voids to seed benign-looking but malicious samples that corrupt model reasoning and normalize deceptive practices. To address this threat in privacy deceptive patterns, we present DPAgent, an agentic and reasoning-aware framework that orchestrates four specialized agents to mitigate the AI Grooming threat via a proactive defense that combines latent space purification with defensive prompting and operates directly in live web environments to proactively explore, detect, and repair privacy deceptive user interfaces before they reach end users. Extensive evaluations show that DPAgent detects 90.98% of groomed samples, achieves state-of-the-art privacy deceptive pattern detection with a micro F1 of 0.816, explores over 80% of pattern types while visiting only about 10% of the pages required by baselines, and successfully repairs 77% of detected deceptive interfaces. A large-scale study of 485 websites in the wild reveals that up to 98% contain at least one privacy deceptive pattern, over 90% of which can be mitigated by DPAgent. User studies further confirm that DPAgent effectively reduces privacy risks while preserving browsing experience. Our results demonstrate the promise of agent-in-the-middle defenses for securing the web UI supply chain against deceptive design and emerging AI threats rooted in data void exploitation.

2606.06912 2026-06-08 cs.SE 新提交

From Custom Logic to APIs: Understanding and Recommending API Replacement Refactorings

从自定义逻辑到API:理解与推荐API替换重构

Bridget Nyirongo, Yanjie Jiang, Yuxia Zhang, Hui Liu

AI总结 通过实证研究挖掘API替换重构的模式,提出混合框架AKIRA,结合模式确定性启发式与重构感知知识库,在推荐API替换重构上达到90%召回率和88%精确率。

详情
AI中文摘要

软件重构对于维护代码质量至关重要。然而,将自定义逻辑替换为API调用的API替换重构仍未被充分探索。现有的重构工具对此类机会的检测支持有限,因为它们依赖预定义模板,难以捕捉复杂的多语句语义等价物。为解决这一局限,我们通过挖掘六个开源Java项目中的166,299次提交,并手动分析精心挑选的1,800次提交,开展了首次API替换重构的实证研究,从中识别出366个验证实例,以表征其范围、类别和重复模式。基于这些洞察,我们提出了AKIRA(自适应知识发现与检索),一个将模式确定性启发式与重构感知知识库相结合的混合框架,以评估推荐API替换重构的实际可行性。我们的评估表明,AKIRA在手动策划的数据集上实现了90%的召回率和88%的精确率。此外,在外部RETIWA数据集上,AKIRA将召回率从21%提高到81%,精确率从40%提高到78%,显著提升了现有技术水平。这些结果证明了将静态模式匹配与语义推理相结合以支持自动化推荐复杂API替换重构的有效性。

英文摘要

Software refactoring is essential for maintaining code quality. However, API replacement refactoring, which replaces custom logic with API calls, remains underexplored. Existing refactoring tools provide limited support for detecting such opportunities because they rely on predefined templates and have difficulty capturing complex, multi-statement semantic equivalents. To address this limitation, we conduct the first empirical study of API replacement refactorings by mining 166,299 commits across six open-source Java projects and manually analyzing a curated subset of 1,800 commits, from which we identify 366 validated instances to characterize their scope, categories, and recurring patterns. Based on these insights, we propose AKIRA (Adaptive Knowledge Discovery and Retrieval), a hybrid framework that integrates pattern-deterministic heuristics with a refactoring-aware knowledge base to assess the practical feasibility of recommending API replacement refactorings. Our evaluation shows that AKIRA achieves 90% recall and 88% precision on a manually curated dataset. Furthermore, on the external RETIWA dataset, AKIRA significantly improves the state of the art by increasing recall from 21% to 81% and precision from 40% to 78%. These results demonstrate the effectiveness of combining static pattern matching with semantic reasoning to support the automation of recommending complex API replacement refactorings.

2606.06910 2026-06-08 cs.DC math-ph math.MP 新提交

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

面向卷积完美匹配边界层的多GPU三维FDTD通信策略选择

Victory C. Obieke

AI总结 针对带CPML边界条件的多GPU三维FDTD计算,研究直接GPU间对等交换相比主机中转的加速效果,并评估扩大鬼域区域的影响。

详情
AI中文摘要

本文描述了一项针对使用CUDA进行卷积完美匹配层边界条件的三维时域有限差分计算的多GPU通信策略研究。用于确定最有效实现的指标包括运行时间、每秒百万输出点的吞吐量、强扩展效率、CPML开销、主机中转与直接GPU间对等交换的加速比,以及扩大鬼域区域的加速比。在单个NVIDIA Quadro RTX 6000 GPU上,CPML实现维持每秒2,889–3,290百万输出点,边界层开销小于1%,为多GPU研究提供了单GPU基线。结果表明,直接GPU间对等交换是主导优化,相比主机中转交换实现了2.46–2.76倍的加速,而扩大鬼域区域仅带来适度收益,因为通信频率降低部分被冗余计算和额外内存流量抵消。在NVIDIA Quadro RTX 8000 GPU上,对于测试的强扩展情况,该实现在两个GPU上提供了高达1.51倍的加速,而四个GPU能够处理接近或超过单GPU内存容量的大网格。

英文摘要

In this paper we describe a communication-strategy study for multi-GPU three-dimensional finite-difference time-domain computation with convolutional perfectly matched layer boundary conditions using CUDA. The metrics used to determine the most effective implementation include runtime, throughput in millions of output points per second, strong-scaling efficiency, CPML overhead, host-staged versus direct GPU-to-GPU exchange speedup, and enlarged-ghost speedup. On a single NVIDIA Quadro RTX 6000 GPU, the CPML implementation sustains 2,889--3,290 million output points per second with less than 1\% boundary-layer overhead, providing the single-GPU baseline for the multi-GPU study. The results show that direct GPU-to-GPU peer exchange is the dominant optimization with a 2.46--2.76$\times$ speedup over host-staged exchange, while enlarged ghost regions give only modest benefits because the reduced communication frequency is partly offset by redundant computation and additional memory traffic. On NVIDIA Quadro RTX 8000 GPUs, the implementation gives up to a 1.51$\times$ speedup on two GPUs for the tested strong-scaling cases, while four GPUs enable larger grids that approach or exceed single-GPU memory capacity.

2606.06894 2026-06-08 cs.CR 新提交

FDM: A Framework for Decision-making to build ML-based Malware detection systems

FDM:构建基于机器学习的恶意软件检测系统的决策框架

Tadiwa Vhito, Jakapan Suaboot, Warodom Werapun, Norrathep Rattanavipanon

AI总结 提出FDM框架,通过加权配置兼容性评分(WCCS)多准则函数,将五个操作参数映射到九个配置维度的排序推荐,实验验证了最优ML配置依赖于部署环境。

Comments 18 pages, 5 figures, 14 tables

详情
AI中文摘要

为恶意软件检测选择合适的机器学习(ML)配置是一个复杂的多准则问题。模型选择、特征工程和更新机制必须共同满足不同部署环境中变化的操作约束。本文提出了用于构建基于ML的恶意软件检测系统的决策框架(FDM)。FDM使用加权配置兼容性评分(WCCS)形式化了这一选择过程,WCCS是一个多准则评分函数,将五个操作参数(平台约束、资源预算、响应延迟、更新频率和检测灵敏度)映射到九个配置维度的排序推荐。为验证该框架,在三个数据集(一个私有Windows API数据集、公共Malimg图像基准和一个Android静态API数据集)上进行了四项实验。关键结果包括:(i)XGBoost在二分类中实现了最佳精度-资源比(测试准确率97.46%,<70 MB RAM),优于消耗高达2.8 GB的LSTM/BiLSTM;(ii)在多分类中,经典模型(XGBoost 79.03%)优于循环深度模型(BiLSTM 72.27%),逆转了二分类的排名;(iii)使用EfficientNetB0的类增量学习在11个增量步骤中保持了99.13%的准确率,仅下降0.65个百分点;(iv)迁移学习在图像型恶意软件数据上平均减少了2.14倍的训练时间,且没有显著的精度损失;(v)自编码器预处理实现了14倍的训练加速,仅损失0.86个百分点的精度。这些发现证实了最优ML配置是上下文相关的,验证了FDM的核心前提,并展示了其对网络安全从业者的实用价值。

英文摘要

Selecting appropriate machine learning (ML) configurations for malware detection is a complex, multi-criteria problem. Model choice, feature engineering, and update mechanisms must jointly satisfy operational constraints that vary across deployment contexts. This paper proposes the Framework for Decision-making (FDM) to build ML-based malware detection systems. The FDM formalises this selection process using the Weighted Configuration Compatibility Score (WCCS), a multi-criteria scoring function mapping five operational parameters (platform constraint, resource budget, response latency, update frequency, and detection sensitivity) to ranked recommendations across nine configuration dimensions. To validate the framework, four experiments were conducted on three datasets (a private Windows API dataset, the public Malimg image benchmark, and an Android static API dataset). Key results include: (i) XGBoost achieved the best accuracy-to-resource ratio in binary classification (97.46 % test accuracy, <70 MB RAM), outperforming LSTM/BiLSTM which consumed up to 2.8 GB; (ii) in multi-class classification, classical models (XGBoost 79.03 %) outperformed recurrent deep models (BiLSTM 72.27 %), reversing the binary ranking; (iii) class-incremental learning with EfficientNetB0 maintained 99.13 % accuracy with only 0.65 pp degradation across 11 incremental steps; (iv) transfer learning reduced training time by 2.14 times on average for image-based malware data without significant accuracy cost; and (v) autoencoder pre-processing yielded a 14 times training speedup at a cost of only 0.86 pp accuracy. These findings confirm that the optimal ML configuration is context-dependent, validating the FDM's core premise and demonstrating its practical utility for cybersecurity practitioners.

2606.06882 2026-06-08 cs.GT cs.CE 新提交

Learning to Strategically Acquire Resources in Competition

在竞争中学习策略性获取资源

Safwan Hossain, Mirah Shi, Andrew Bennett, Neil Andrew Chriss, Michael Kearns, Anderson Schneider, Yuriy Nevmyvaka

AI总结 研究多智能体在时间上竞争获取可分割资源的问题,提出博弈论模型,分析不同信息假设下的贝叶斯纳什均衡,并给出学习动态的收敛条件。

详情
AI中文摘要

我们考虑多个智能体在时间上竞争获取某种昂贵的可分割资源(例如金融资产份额、计算资源等)。利用标准的价格动态模型,我们提出了一个新颖的博弈论模型,推广了不同文献中研究的环境。我们的分析考虑了智能体可用信息的不同假设。在具有共同先验的部分信息下(完全信息作为特例),我们建立了贝叶斯纳什均衡(BNE)的存在性、唯一性和高效可计算性,并限定了无政府状态价格。接下来,更一般地,我们考虑没有共同先验的智能体,根据重复交互中的现实市场反馈学习最优行为。我们提供了智能体同时进行学习动态的充分条件,以实现最后一步收敛到BNE。对于所有设置,我们基于真实金融数据进行了模拟,以说明我们的理论结果,并为交易和资源获取背景下的策略行为提供新见解。

英文摘要

We consider multiple agents competing to acquire some costly divisible resource (e.g. shares of a financial asset, compute resources, etc.) over time. Leveraging a standard model for price dynamics, we propose a novel game-theoretic model for this problem, generalizing settings studied in diverse literatures. Our analysis considers different assumptions on the information available to agents. Under partial-information with a common prior (which subsumes complete information as a special case), we establish the existence, uniqueness, and efficient computability of the Bayesian Nash equilibrium (BNE), and bound the price of anarchy. Next and more generally, we consider agents with no common prior learning to act optimally given realistic market feedback from repeated interactions. We provide sufficient conditions on agents doing simultaneous learning dynamics for last-iterate convergence to the BNE. For all settings, we provide simulations based on real financial data to illustrate our theoretical results and offer new insights on strategic behavior in the context of trading and resource acquisition.

2606.06880 2026-06-08 cs.IR 新提交

Towards Retrieving Interaction Spaces for Agentic Search

面向智能体搜索的交互空间检索

Shengyao Zhuang, Yuansheng Ni, Hengxin Fun, Jimmy Lin, Xueguang Ma

AI总结 提出RISE方法,通过BM25构建有边界的交互空间,并预处理文档支持shell式导航,在BrowseComp-Plus上以约四分之一成本达到78%准确率,优于纯shell基线。

详情
AI中文摘要

搜索智能体的检索仍继承自非智能体信息检索:检索器对语料库排序,智能体读取少量返回文档。最近的直接语料交互(DCI)工作表明,智能体可以通过grep和文件读取等shell工具与原始语料交互。但无界交互无法扩展:每个宽泛的shell命令都是对整个语料库的扫描,延迟随语料库增长急剧下降。我们认为,智能体搜索中检索的作用不仅是选择适合LLM上下文窗口的文档,而是构建一个交互空间:语料库的一个有界子集,智能体可以使用相关工具进行探索。这带来了两个设计后果:空间需要由检索提供的边界,并且其中的对象应被处理以支持交互。作为概念验证,我们提出RISE(检索交互空间):使用BM25构建交互空间;同时,在索引期间处理其文档以支持shell式导航。在BrowseComp-Plus上,RISE在gpt-5.4-mini上以约四分之一的每次查询成本达到78%的准确率,与纯shell DCI基线相当。在100万文档规模下,RISE-BM25在gpt-5.4-mini上达到81%,而gpt-5.4-nano上的DCI降至60%,且100次中有33次时钟失败。

英文摘要

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interaction (DCI) work shows that agents can instead interact with the raw corpus through shell tools such as grep and file reads. But unbounded interaction does not scale: every broad shell command is a scan over the whole corpus, and latency degrades sharply as the corpus grows. We argue that the role of retrieval for agentic search is not just to select documents that fit in the LLM context window, but to construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences follow. The space needs a boundary supplied by retrieval, and the objects within it should be processed for interaction. As a proof of concept, we propose RISE (Retrieving Interaction SpacE): we use BM25 to construct the interaction space; meanwhile, its documents are processed during indexing for shell-style navigation. On BrowseComp-Plus, RISE matches the pure-shell DCI baseline at 78% accuracy with gpt-5.4-mini at roughly one quarter of the per-query cost. At 1M documents, RISE-BM25 reaches 81% on gpt-5.4-mini, whereas DCI on gpt-5.4-nano degrades to 60% with 33 of 100 wall-clock failures.

2606.06860 2026-06-08 cs.CR 新提交

On the Incentive Compatibility of Block Propagation in Bitcoin

论比特币中区块传播的激励相容性

Fumichika Maeda, Akira Sakurai, Taishi Nakai, Kazuyuki Shudo

AI总结 研究比特币矿工在区块传播中的个体激励,通过区块链网络模型推导不同打破平局规则下的奖励表达式,揭示传播延迟、算力分布和规则如何共同决定挖矿奖励,并分析激励与公平性的权衡。

详情
AI中文摘要

比特币是无许可的,不依赖任何中央管理员,这使其具有强大的抗审查性。同时,激励矿工以符合系统整体利益的方式行事也很重要。本文探讨矿工是否在个体上被激励去传播区块——比特币中最基本的过程之一。矿工通过生成区块并将其传播到网络中来共同维护区块链。如果矿工有动机不传播某些区块,这将表明比特币激励设计存在根本缺陷。尽管先前的工作研究了传播延迟如何影响分叉和挖矿奖励,但并未完全刻画在不同打破平局规则下矿工改善区块传播的激励。为填补这一空白,我们基于一个捕获分叉对挖矿公平性影响的区块链网络模型,为每种打破平局规则推导出解析奖励表达式。这些表达式明确刻画了区块传播延迟、算力分布和打破平局规则如何共同决定挖矿奖励。然后,我们利用它们分析矿工改善区块传播的激励。例如,我们的结果表明,矿工没有挖矿奖励激励去中继其他矿工生成的区块。相比之下,在先到先得规则下,每个非多数矿工都有激励更快地接收其他矿工的区块并更快地传播自己的区块。最后,我们比较了打破平局规则,并识别出传播激励与挖矿公平性之间的权衡。特别是,先到先得规则提供了最强的减少传播延迟的激励,但也最大程度地恶化了挖矿公平性。

英文摘要

Bitcoin is permissionless and does not rely on any central administrator, which gives it strong censorship resistance. At the same time, it is important to incentivize miners to behave in ways that align with the interests of the system as a whole. This paper asks whether miners are individually incentivized to propagate blocks, one of the most fundamental processes in Bitcoin. Miners collectively maintain the blockchain by generating blocks and disseminating them across the network. If miners have an incentive not to propagate some blocks, this would indicate a fundamental flaw in Bitcoin's incentive design. Although prior work has studied how propagation delays affect forks and mining rewards, it has not fully characterized miners' incentives to improve block propagation under different tie-breaking rules. To address this gap, we derive analytical reward expressions for each tie-breaking rule based on a blockchain network model that captures the effect of forks on mining fairness. These expressions explicitly characterize how block propagation delays, hashrate distribution, and tie-breaking rules jointly determine mining rewards. We then use them to analyze miners' incentives to improve block propagation. Our results show, for example, that miners have no mining-reward incentive to relay blocks generated by other miners. By contrast, under the first-seen rule, every non-majority miner is incentivized to receive other miners' blocks more quickly and to propagate its own blocks more quickly. Finally, we compare tie-breaking rules and identify a trade-off between propagation incentives and mining fairness. In particular, the first-seen rule provides the strongest incentives to reduce propagation delays, but it also worsens mining fairness the most.

2606.06851 2026-06-08 cs.CY cs.HC 新提交

Toward a Metaphysics of Learning Analytics: Ontological Positioning of Data, Inference, and Normativity

迈向学习分析学的形而上学:数据、推理与规范性的本体论定位

Kensuke Takii

AI总结 本文通过追溯学习分析学的定义和原则,从内部回答“学习分析学是什么”的本体论问题,揭示了数据的存在方式、八个本体论前提,并指出“规范嵌入型学习分析”与第一原则的本体论张力。

Comments 25 pages, 1 figures

详情
AI中文摘要

自首届LAK会议召开以来的15年间,学习分析学(LA)社区经历了快速发展。然而,尽管关于LA哲学基础的认识论和伦理辩论十分激烈,形而上学讨论却很少,这表明缺乏从内部原则推导LA身份的努力。本文试图通过解决“LA是什么?”的本体论问题来建立LA的形而上学。我们通过追溯LA自身的定义和原则,从LA内部推导出答案。具体来说,我们探讨了LA所操作的数据构成何种存在,识别了包括学习者在内的八个本体论前提,并通过实然/应然问题阐明LA并不从数据中推导出规范性。特别地,该系统揭示了一类LA实践(此处称为“规范嵌入型LA”)将LA的目的与其操作混为一谈,与第一原则产生了本体论张力。我们还讨论了与相关领域的联系以及该系统的局限性。这里概述的形而上学并非从外部强加于LA,而是揭示了LA自身一直隐含预设的内容。

英文摘要

The Learning Analytics (LA) community has undergone rapid development over the 15 years since the first LAK conference was held. However, while epistemological and ethical debates regarding the philosophical foundations of LA have been vigorous, metaphysical discussions have been sparse, signifying a lack of effort to derive the identity of LA from its internal principles. In this paper, we attempt to establish a metaphysics of LA by addressing the ontological question of ``What is LA?'' We do so by tracing back to LA's own definitions and principles to derive an answer from within LA itself. Specifically, we address what kind of existence the data LA operates on constitutes, identify eight agents including learners as ontological prerequisites, and clarify, via the is/ought problem, that LA does not derive norms from data. In particular, this system reveals that a class of LA practices, here termed \textit{norm-embedded LA}, conflates LA's purpose with its operations, creating an ontological tension with the first principle. We also discuss connections with related fields and the limitations of this system. The metaphysics outlined here is not imposed from outside LA, but surfaces what LA itself has always implicitly presupposed.

2606.06843 2026-06-08 cs.SE 新提交

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

GitHub仓库中AI使用特征与演化的实证研究:来自代码注释的证据

Abdullah Al Mujahid, Preetha Chatterjee, Mia Mohammad Imran

AI总结 通过分析35,361条提及AI的GitHub代码注释及后续提交,发现开发者主要用LLM实现代码,随后频繁重构、集成和修复,AI引用从代码生成转向知识支持和代码增强。

Comments Preprint version

详情
AI中文摘要

开发者在日常软件工作流中越来越多地使用ChatGPT、Copilot和Claude等AI工具,但先前的研究通常孤立评估LLM输出,而非考察开发者如何在真实项目中调整它们。我们分析了35,361条明确提及AI使用的GitHub代码注释及其关联代码块。首先,我们对500条独特的注释和代码块进行开放编码,推导出AI辅助开发活动的分类法;然后,使用两个基于LLM的分类器对完整数据集进行标注,并通过Dawid-Skene期望最大化聚合预测。我们还分析了12,996条后续提交消息,以研究AI辅助代码在引入后的演变,并考察了2022年12月至2026年3月的时间趋势。结果表明,开发者主要将LLM用于代码实现,其次是代码增强、调试、文档编写和测试。后续提交频繁涉及重构和清理、功能集成和扩展以及错误修复,表明在调整AI辅助代码时存在持续的人工监督。随着时间的推移,提及AI的注释从直接代码生成转向知识和概念支持以及代码增强。这些发现表明,AI工具不仅作为代码生成辅助工具,而且作为协作支持机制嵌入,其输出由开发者随时间进行精炼、扩展和纠正。

英文摘要

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.

2606.06826 2026-06-08 cs.SE 新提交

SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation

SkelDPO: 一种骨架引导的直接偏好优化框架用于高效代码生成

Yu Yu, Chen Lyu

AI总结 提出SkelDPO框架,通过骨架引导的偏好优化,在代码生成中同时优化语义正确性和执行效率,相比现有方法在Pass@1、Beyond@1和Effi@1上提升3-7%。

详情
AI中文摘要

随着代码大语言模型(Code LLMs)在语义正确性方面取得显著进展,执行效率已成为评估其实用性的重要维度。然而,现有方法通常将完整程序视为训练中的单一优化目标,而未显式建模影响效率的结构因素。因此,尽管这些模型能生成语义正确的代码,但无法在细粒度层面上学习导致高效实现的底层骨架特征。为解决这一局限,我们提出SkelDPO(骨架引导的直接偏好优化),一种骨架引导的偏好优化框架,系统性地提升代码生成效率。SkelDPO首先从代码数据集中识别高效和低效实现,通过对比分析定位它们的效率倾向点和低效倾向点,形成效率与低效骨架之间的对齐信号。训练过程中,引入联合代码和骨架偏好损失,使模型在学习语义正确性的同时,强化对代码中效率关键组件的理解。结果表明,SkelDPO持续超越现有方法:与仅依赖高效和低效代码偏好优化的SOTA方法相比,它在Pass@1、Beyond@1和Effi@1上分别提升3-6%、3-7%和2-5%,在复杂任务上提升更显著。总体而言,SkelDPO提供了骨架级效率对齐的新视角,打破了传统偏好优化仅依赖正确性或效率对的局限。所有数据集和源代码已公开:此 https URL。

英文摘要

With the remarkable progress of Code Large Language Models (Code LLMs) in achieving semantic correctness, execution efficiency has become an increasingly important dimension for evaluating their practical utility. However, existing approaches typically treat full programs as a single optimization target during training, without explicitly modeling the structural factors that influence efficiency. As a result, although these models can generate semantically correct code, they fail to learn, at a fine-grained level, the underlying skeleton features that lead to efficient implementations. To address this limitation, we propose SkelDPO (Skeleton-Guided Direct Preference Optimization), a skeleton-guided preference optimization framework that systematically enhances the efficiency of code generation. SkelDPO first identifies efficient and inefficient implementations from the code dataset and, through comparative analysis, locates their efficiency-prone and inefficiency-prone points, forming alignment signals between efficiency and inefficiency skeletons. During training, a joint code and skeleton preference loss is introduced, enabling the model to learn semantic correctness while reinforcing its understanding of efficiency-critical components in code. Results show that SkelDPO consistently surpasses existing methods: compared with SOTA method that relies solely on efficient and inefficient code preference optimization, it improves Pass@1, Beyond@1, and Effi@1 by 3-6%, 3-7%, and 2-5%, with greater improvements observed on complex tasks. Overall, SkelDPO provides a new perspective on skeleton-level efficiency alignment, breaking the limitation of conventional preference optimization that relies solely on correctness or efficiency pairs. All datasets and source code are publicly available at: https://github.com/icpcSkelDPO/SkelDPO.

2606.06821 2026-06-08 cs.SE 新提交

Chiseling Out Efficiency: Structured Skeleton Supervision for Efficient Code Generation

剔除低效:用于高效代码生成的结构骨架监督

Yu Yu, Zhihong Sun, Jia Li, Yao Wan, Chuanyi Li, Hongyu Zhang, Ruyun Wang, Tao Huang, Zhi Jin, Ge Li, Chen Lyu

AI总结 本文提出EffiSkel框架,通过提取和学习效率骨架,提升代码生成的效率与正确性,实验显示在多个编程语言和基准上均取得显著提升。

详情
AI中文摘要

大型语言模型(LLMs)能够生成语法正确且功能完整的程序,大幅简化软件开发。然而,近期研究表明这些程序通常比人类优化的程序执行更慢。现有方法通常通过迭代优化或在高效代码语料上微调模型来弥合效率差距。然而,这些方法仅通过模仿完整优化的解决方案暴露模型于效率信号,而未显式编码实现高性能运行时间的关键结构模式。本文提出EffiSkel框架,通过三种互补策略提取和学习效率骨架——抽象、可重用的结构模式,从而联合优化代码生成和骨架预测。实验表明,EffiSkel在多个编程语言和基准上显著提升了功能正确性和效率,例如在Mercury上使用DeepSeek-Coder(7B)时,效率比EffiCoder高11.11%,比CodeDPO高3.71%,平均加速比也分别提高0.36和0.22。这些结果表明显式建模效率骨架能有效提升LLM生成代码的运行性能。

英文摘要

Large Language Models (LLMs) are capable of generating syntactically correct and functionally complete programs, greatly streamlining software development. However, recent studies reveal that these programs typically execute substantially slower than human-optimized counterparts. Existing approaches to bridging this efficiency gap typically involve either iteratively optimizing code after generation or fine-tuning models on corpora of efficient code. Yet, these methods expose the model to efficiency signals only by mimicking complete, optimized solutions, without explicitly encoding the structural code patterns essential for achieving high runtime performance. Addressing this gap presents two core challenges: (1) extracting and representing latent, efficiency-oriented structural patterns embedded within complex syntax and control flows, and (2) effectively learning these patterns without destabilizing the semantic training of LLMs. To tackle these challenges, we propose EffiSkel, an efficiency skeleton-guided framework that explicitly extracts and learns efficiency skeletons-abstract, reusable structural patterns underpinning efficient code-by leveraging three complementary strategies. These skeletons are integrated into a multi-task learning regime that jointly optimizes code generation and skeleton prediction. Experiments across multiple programming languages and benchmarks demonstrate that EffiSkel significantly enhances both functional correctness and efficiency, resulting on Mercury with DeepSeek-Coder (7B) a +11.11% (vs. EffiCoder) and +3.71% (vs. CodeDPO) higher Efficiency Ratio (ER), and a +0.36 (vs. EffiCoder) and +0.22 (vs. CodeDPO) increase in Average Speedup (AS). These results highlight the effectiveness of explicitly modeling efficiency skeletons in improving the runtime performance of code generated by LLMs.

2606.06811 2026-06-08 cs.PF q-bio.GN 新提交

Dependencies and Dataflow in Seed-Filter-Extend Pipelines

种子-过滤-扩展流水线中的依赖关系与数据流

Shiv Sundram

AI总结 针对基因组比对中种子-过滤-扩展流水线的串行依赖和局部对齐不规则性,通过综合LASTZ等四种方法,优化跨区域全局流水线以加速端到端比对。

详情
AI中文摘要

比较基因组对于发现突变、追踪进化谱系和推进跨物种基因组学至关重要。从根本上讲,这归结为一个O(n^2)的字符串匹配动态规划(DP)问题,这一挑战推动了数十年的性能研究。然而,对于跨越数百万到数十亿碱基对的基因组,执行严格的O(n^2) DP算法在计算上是不可行的。因此,现代比对器依赖全局启发式方法来识别物种间数千个候选相似区域。不幸的是,这些方法受到复杂串行依赖关系的困扰。一旦识别出候选区域,流水线执行局部DP比对,这引入了其自身的非平凡启发式和不规则数据依赖。虽然并行化密集的二维DP是一个研究充分的问题,但加速这种端到端流水线更具挑战性。跨候选区域并行化以及将不规则、充满启发式的局部比对卸载到现代硬件(如GPU)仍然是一个主要障碍。在这项工作中,我们通过优化跨区域的全局流水线来克服这些串行瓶颈。我们从四篇论文中汲取灵感:LASTZ、SegAlign、Darwin-WGA和SNAP,综合每篇论文的发现以指导优化,我们在LASTZ中要么原型化要么直接实现这些优化。

英文摘要

Comparing genomes is critical for discovering mutations, tracking evolutionary lineages, and advancing cross-species genomics. Fundamentally, this reduces to an O(n^2) string-matching dynamic programming (DP) problem, a challenge that has driven decades of performance research. However, executing a strict O(n^2) DP algorithm is computationally intractable for genomes spanning millions to billions of base pairs. Consequently, modern aligners rely on global heuristics to identify thousands of candidate similarity regions between species. Unfortunately, these methods are burdened by complex serial dependencies. Once candidate regions are identified, the pipeline executes localized DP alignments, which introduce their own non-trivial heuristics and irregular data dependencies. While parallelizing dense, two-dimensional DP is a well-studied problem, accelerating this end-to-end pipeline is significantly more challenging. Parallelizing across candidate regions and offloading irregular, heuristic-laden local alignments to modern hardware (such as GPUs) remains a major hurdle. In this work, we address the challenge of overcoming these serial bottlenecks by optimizing the global pipeline across regions. We take inspiration from four papers: LASTZ, SegAlign, Darwin-WGA, and SNAP, synthesizing findings across each to inform optimizations, which we either prototype or implement directly in LASTZ.

2606.06767 2026-06-08 cs.CR cs.SE 新提交

The Custody Envelope Threshold: Authority-Scaled Admission of External Artifacts in Institutional Infrastructure

托管信封阈值:机构基础设施中外部工件的权限缩放准入

Amadeus Brandes

AI总结 提出托管信封阈值模型,根据工件身份、入口路径和撤销能力的封闭程度与执行权限的关系,决定机构对外部工件的准入策略,并应用于多种工件类型。

Comments 32 pages. Preregistered framework and protocol paper; empirical pilot is a separate planned study. OSF preregistration and replication package: https://doi.org/10.17605/OSF.IO/E57FJ

详情
AI中文摘要

现代基础设施依赖于外部维护的工件,例如包注册表依赖项、CI/CD 操作、容器镜像、Terraform 提供程序和模块、开发者扩展、模型工件和 AI 工具服务器。这些工件易于获取,但机构难以准入、管理和撤销。本文提出了托管信封阈值,一种权限缩放的工件准入模型。它认为,只有当对象身份、入口路径和撤销能力相对于委托给工件的执行权限足够封闭时,直接机构准入才是可辩护的。当未达到此阈值时,机构倾向于代理、策略中介、供应商中介、内部化、隔离或拒绝该工件。该框架被操作化为一个四条件序数工具,并与参考监视器推理、最小权限和交易成本经济学相关联。它应用于包依赖项、GitHub Actions、容器镜像、Terraform 提供程序和模块、开发者扩展和开放模型工件,其中模型上下文协议(MCP)服务器作为保留证据。本文还指定了验证设计、确定性预测函数和 OSF 复制包,用于测试高审查机构是否趋向于对高权限工件实现更强的托管封闭性。

英文摘要

Modern infrastructure depends on externally maintained artifacts such as package-registry dependencies, CI/CD actions, container images, Terraform providers and modules, developer extensions, model artifacts, and AI tool servers. These artifacts are easy to fetch but difficult for institutions to admit, govern, and revoke. This paper proposes the Custody Envelope Threshold, an authority-scaled model of artifact admission. It argues that direct institutional admission is defensible only when object identity, ingress path, and revocation capacity are sufficiently closed relative to the execution authority delegated to the artifact. When this threshold is not met, institutions tend to proxy, policy-mediate, vendor-mediate, internalize, quarantine, or reject the artifact. The framework is operationalized as a four-condition ordinal instrument and connected to reference-monitor reasoning, least privilege, and transaction cost economics. It is applied to package dependencies, GitHub Actions, container images, Terraform providers and modules, developer extensions, and open model artifacts, with Model Context Protocol (MCP) servers treated as held-out evidence. The paper also specifies a validation design, deterministic prediction function, and OSF replication package for testing whether high-scrutiny institutions converge toward stronger custody closure for high-authority artifacts.

2606.06752 2026-06-08 cs.SE 新提交

Pomona: Continuous Code Quality Improvement via Small, Automated Changes at Bloomberg

Pomona:通过小型自动化变更实现彭博社的持续代码质量改进

David Williams, Angelos Evripiotis, Serkan Kirbas, Harry Morgan, Sergey Magidovich, Peter Wainwright, Federica Sarro

AI总结 提出Pomona轻量级代理工具,通过扫描和修复技能自动发现并修复代码问题,生成约10行差异的小型PR,经一个月团队部署和10名高级工程师问卷评估,17个PR中15个成功合并,中位关闭时间低于2小时,8/10工程师愿意采用。

Comments 6 pages, 2 figures, 1 table, under review

详情
AI中文摘要

在这篇简短的经验论文中,我们介绍了Pomona,一个轻量级的代理工具,它利用代理技能实现持续的自动化代码质量改进。受Kaizen(TM)理念的启发,Pomona自动化了发现和增量修复的循环:扫描技能识别改进任务(例如,lint违规、技术债务标记和测试缺口),并在结构化待办事项列表中对其进行优先级排序,而修复技能则生成针对约10行差异的小型拉取请求(PR)。这种人在环设计使得频繁、低风险的改进成为可能,同时保持工程师的信任和生产力,并减少技术债务。我们通过在一个团队中部署一个月以及向10名高级工程师分发问卷来评估Pomona。我们的初步结果令人鼓舞:17个生成的PR中有15个成功合并,中位关闭时间低于2小时。此外,8/10的受访工程师表示希望采用Pomona,称赞其小差异大小以及Pomona专注于改进代码质量。最后,我们讨论了关于在工业中有效部署代理策略的研究人员和从业者可操作的见解。

英文摘要

In this short experience paper, we present Pomona, a lightweight agentic tool that utilises agent skills for continuous automated code quality improvement. Inspired by the philosophy of Kaizen(TM), Pomona automates a cycle of discovery and incremental repair: a Scanning skill identifies improvement tasks (e.g., linting violations, technical debt markers, and test gaps) and prioritises them in a structured backlog, while a Repair skill generates tiny pull requests (PRs) targeting ~10 lines of diff. This human-in-the-loop design enables frequent, low-risk improvements while maintaining engineer trust and productivity and reducing technical debt. We evaluated Pomona through a one-month deployment in a team and a questionnaire distributed to 10 senior engineers. Our preliminary results are promising: 15 of 17 generated PRs were successfully merged with a median time-to-close of under 2 hours. Furthermore, 8/10 of surveyed engineers expressed a desire to adopt Pomona, praising small diff sizes and Pomona's focus on improving code quality. We conclude by discussing actionable insights for researchers and practitioners on strategies for effective agentic deployment in industry.

2606.06751 2026-06-08 cs.DC 新提交

StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training

StageFrontier: 面向分布式ML训练的同步感知阶段记账

Boram Yoon, Wei Chen, Ville Kallioniemi

AI总结 针对分布式训练中同步掩盖故障根因的问题,提出轻量级在线信号StageFrontier,通过粗粒度阶段时长向量实现精确的暴露时间记账,定位慢节点和阶段,开销低于0.2%。

Comments 21 pages

详情
AI中文摘要

当分布式训练任务变慢时,难点在于知道该查看何处。同步掩盖了原因:一个rank上的停滞表现为其他rank上的等待,因此单个rank上的数据延迟可能表现为整个组的反向时间。始终运行的廉价仪表板(每阶段平均值和最大值)会误读此情况,重复计算相同的暴露延迟或将慢rank淹没在平均值中,而完整的性能分析器虽然能清晰看到问题,但过于沉重无法常开。StageFrontier是一种常开信号,填补了这一空白。每个rank仅报告一个由粗粒度阶段持续时间(数据、前向、反向等)组成的短有序向量,使用CPU挂钟计时,无需同步时钟和内核跟踪。在每个阶段边界,StageFrontier取最领先rank的累积时间;该前沿的增量构成了步骤暴露时间的精确可加性记账,并指向组可见延迟首次出现的阶段和rank,告诉操作员应将重型性能分析器指向何处,而非做出何种修复。该记账是精确的,但仅凭粗粒度信号无法判断领先阶段是真正导致减速还是仅仅与之并行发生;StageFrontier标记需要更多证据来区分这两种情况的窗口,而非进行猜测。一个PyTorch实现在Gloo和NCCL上通过128个rank增加了不到0.2%的吞吐量开销,在隐藏rank的DDP测试的所有50行中,将注入的故障排在前两位嫌疑中,并且一旦将其跟踪缩减为相同的粗粒度阶段,即可从0.11 MB的摘要(而非15.81 GB的跟踪)中恢复出与PyTorch Profiler、HTA和Nsight Systems相同的顶级阶段路由。

英文摘要

When a distributed training job slows down, the hard part is knowing where to look. Synchronization hides the cause: a stall on one rank shows up as a wait on the others, so a data delay on a single rank can surface as backward time across the group. The cheap dashboards that run all the time -- per-stage averages and maxima -- misread this, double-counting the same exposed delay or burying the slow rank in an average, while full profilers see it clearly but are far too heavy to leave on. StageFrontier is an always-on signal that closes this gap. Each rank reports only a short ordered vector of coarse stage durations -- data, forward, backward, and so on -- timed with CPU wall-clock, with no synchronized clocks and no kernel tracing. At each stage boundary, StageFrontier takes the cumulative time of whichever rank is furthest along; the increments of this frontier form an exact, additive accounting of the step's exposed time and point to the stage and rank where group-visible delay first appears, telling an operator where to aim a heavy profiler, not which fix to make. The accounting is exact, but the coarse signal alone cannot tell whether a leading stage truly caused the slowdown or merely ran alongside it; StageFrontier labels the windows where that distinction needs more evidence instead of guessing. A PyTorch implementation adds under 0.2% throughput overhead through 128 ranks on Gloo and NCCL, places injected faults among its top two suspects on all 50 rows of a hidden-rank DDP test, and recovers the same top-stage routing as PyTorch Profiler, HTA, and Nsight Systems once their traces are reduced to the same coarse stages -- from a 0.11 MB summary instead of a 15.81 GB trace.

2606.06726 2026-06-08 cs.NI 新提交

Natural Language Access Control (NLAC): From Help Desk Requests to Structured Policies

自然语言访问控制(NLAC):从帮助台请求到结构化策略

Jonas Wessner, Tobias Meuser, Janek Schoffit, Dennis Eisermann, Johannes Deger, Björn Scheuermann, Frank Kargl

AI总结 提出基于LLM的自然语言访问控制架构NLAC,通过嵌入相似度提取子图将策略翻译准确率提升至98.7%,并降低推理成本。

详情
AI中文摘要

在大型复杂网络中配置网络访问控制策略容易出错,且需要大量专家努力。LLM为用自然语言表达此类策略提供了有前景的接口,但它们将用户请求翻译为访问策略的能力以及最适合利用LLM的系统架构仍未被充分探索。我们提出了一种用于自然语言访问控制(NLAC)的架构,该架构使用LLM将用户请求翻译为访问策略,并引入了NLACBench,一个用于评估大规模网络中基于LLM的意图翻译系统的基准。我们在多个最先进模型上的评估表明,性能最佳的LLM在小网络设置中准确率高达96.9%,但随着网络规模增大,性能显著下降(某些模型低于20%)。为解决这一限制,我们通过嵌入相似度识别相关网络组件,并构建传递给LLM的紧凑子图。这种方法使得能够扩展到更大的网络,准确率高达98.7%,同时将推理时间、硬件需求和运营成本降低到恒定的资源预算。最后,一项案例研究表明,性能最佳的模型表现出很大程度上互补的错误模式,表明通过多LLM架构可以进一步提高意图翻译的准确性。

英文摘要

Configuring network access control policies in large, complex networks is error-prone and requires significant expert effort. LLMs offer a promising interface for expressing such policies in natural language, but their capability for translating user requests into access policies, and the system architectures best suited to leverage LLMs, remain underexplored. We present an architecture for natural-language access control (NLAC) that uses LLMs to translate user requests into access policies, and introduce NLACBench, a benchmark for evaluating LLM-based intent translation systems in large-scale networks. Our evaluation across multiple state-of-the-art models shows that top-performing LLMs achieve up to 96.9% accuracy in small-network settings, but performance degrades substantially (below 20% for some models) as network size increases. To address this limitation, we identify relevant network components via embedding similarity and construct compact subgraphs that are passed to the LLM. This approach enables scaling to larger networks with up to 98.7% accuracy, while simultaneously reducing inference time, hardware requirements, and operating costs to a constant resource budget. Finally, a case study indicates that top-performing models exhibit largely complementary error patterns, suggesting that intent translation accuracy may be further improved through multi-LLM architectures.

2606.06705 2026-06-08 eess.SY cs.SY math.ST stat.ME stat.TH 新提交

Estimating Evolving Functions with Dynamic Gaussian Processes

使用动态高斯过程估计演化函数

J. S. van Hulst, W. P. M. H. Heemels, D. J. Antunes

AI总结 提出动态高斯过程框架,通过积分-差分方程建模演化函数,将高斯过程回归扩展到时变函数,并利用可分离核结构简化为有限维卡尔曼滤波,支持向量值状态和高阶偏微分方程。

Comments This manuscript is a preprint submitted to a SIAM journal

详情
AI中文摘要

本文发展了动态高斯过程(DGP),一个用于估计由积分-差分方程(IDE)支配的函数的框架。IDE 对具有离散时间动态的连续函数进行建模,并自然地从线性偏微分方程(PDE)的时间离散化中产生。DGP 将高斯过程回归扩展到时变函数,并将卡尔曼滤波扩展到无限维状态。DGP 后验仍为高斯过程,具有闭式均值和协方差更新,且可分离核结构将问题简化为基函数系数上的有限维卡尔曼滤波。本文将 DGP 扩展到向量值状态,从而能够处理高阶 PDE,并提供了基函数近似的稳定性和逼近误差分析。函数 L2 估计误差精确分解为子空间内和子空间外贡献,且所有逼近误差随基函数数量增长而消失。该框架在热方程和波动方程(后者具有向量值状态)上进行了演示。代码可在 https://this URL 获取。

英文摘要

This paper develops the Dynamic Gaussian Process (DGP), a framework for estimating functions governed by integro-difference equations (IDEs). IDEs model continuous functions that evolve with discrete-time dynamics and arise naturally from time-discretization of linear partial differential equations (PDEs). The DGP extends Gaussian process regression to time-varying functions and extends Kalman filtering to infinite-dimensional states. The DGP posterior remains a Gaussian process with closed-form mean and covariance updates, and separable kernel structure reduces the problem to a finite-dimensional Kalman filter on basis function coefficients. This paper extends the DGP to vector-valued states, enabling the treatment of higher-order PDEs, and provides a stability and approximation error analysis for the basis function approximation. The functional L2 estimation error decomposes exactly into in-subspace and out-of-subspace contributions, and all approximation errors vanish as the number of basis functions grows. The framework is demonstrated on the heat equation and on the wave equation, the latter with a vector-valued state. Code is available at https://github.com/JvHulst/Dynamic_Gaussian_Processes.

2606.06702 2026-06-08 cs.HC 新提交

Adversarial Co-Thinking: Calibration and Triangulation Across Multiple GenAI Tools in HCI Writing

对抗性协同思考:HCI写作中多种GenAI工具的校准与三角验证

Pia Tukkinen

AI总结 本文探讨在学术论文起草阶段完全嵌入GenAI工具(Claude、ChatGPT、Gemini)的协作写作模式,提出“对抗性协同思考”方法:利用过往同行评审校准工具,并让工具输出相互对抗以测试而非盲从,强调评估能力而非生成能力是关键技能。

详情
AI中文摘要

本文探讨当GenAI工具完全嵌入学术论文起草阶段而非仅限于后期润色时会发生什么。为了研究密集的多工具GenAI工作流程与传统学术写作的差异,我从第一句话开始,同时使用三种GenAI工具——Claude、ChatGPT和Gemini——起草本文,并将其输出与我自己的预期贡献进行比较。在整个过程中,出现了一种反复出现的模式,我称之为“对抗性协同思考”:利用过往同行评审来校准工具,然后让它们的输出相互对抗以进行测试,而非盲从。我认为,从默认赞美工具中挖掘出真正的批评是与这些工具合作的核心实践挑战,而关键技能在于评估而非生成。对抗性协同思考是一种高技能的认知实践:它可以放大已有的专业知识,但也可能掩盖其缺失。我进一步认为,当前的披露框架难以捕捉这种转变。本文为研讨会讨论提供了关于自主性、监督、访问公平性和披露的四项建议。

英文摘要

This paper examines what happens when GenAI tools are fully embedded in the drafting of an academic paper rather than confined to late-stage polishing. To investigate how an intensive multi-tool GenAI workflow differs from conventional academic writing, I drafted this paper from the first sentence in parallel with three GenAI tools - Claude, ChatGPT, and Gemini - comparing their outputs against my own intended contribution. Across this process, a recurring pattern took shape that I call adversarial co-thinking: using past peer reviews to calibrate the tools, then setting their outputs against one another to be tested rather than deferred to. I argue that surfacing genuine critique from tools that default to praise is a central practical challenge of working with these tools, and that the skill at stake is evaluative rather than generative. Adversarial co-thinking is a high-skill epistemic practice: it can amplify expertise where it exists, but it can also mask its absence. I further argue that current disclosure frameworks are poorly equipped to capture this shift. The paper offers four propositions for workshop discussion concerning autonomy, supervision, equity of access, and disclosure.

2606.06700 2026-06-08 cs.GT cs.CR econ.TH 新提交

The Economics of Proof-of-Useful-Work

有用工作证明的经济学

Rafael Pass

AI总结 本文通过竞争均衡模型分析有用工作证明(PoUW)区块链的经济学,发现其不会降低攻击成本,且在特定条件下能通过区块奖励补贴推理计算,增加社会有用产出。

详情
AI中文摘要

工作量证明(PoW)区块链依赖计算支出来维护支持原生加密货币的账本。在比特币等现有系统中,这种支出是故意无用的:计算用于保障共识,但不产生外部经济产出。一种新兴的替代方案——有用工作证明(PoUW)——使相同的计算能够同时保障区块链并产生具有经济价值的产出。然而,PoUW 常因经济理由受到批评:如果工作是有用的,攻击者可能“被付费攻击”,从而可能削弱安全性。我们开发了一个 PoUW 区块链的竞争均衡模型,其中计算可以分配给纯挖矿、纯有用工作(实例化为机器学习推理)或同时产生两者并带有计算开销的“双工”工作。我们提供了均衡分配和价格的完整闭式特征,作为双工开销和单一经济参数——代币-推理比率(衡量代币采用相对于推理市场的指标)的函数。这一特征揭示了三种体制:“Bitconia”,其中经济简化为经典 PoW;“Fortessia”,其中双工取代挖矿,在有用产出不变的情况下提高安全性;以及“Duplexia”,其中代币奖励补贴推理,降低价格并扩大推理供应。与常见的稻草人论点相反,PoUW 并不会使攻击在经济上变得廉价:一旦考虑均衡价格,多数攻击的经济成本仍然与区块奖励挂钩。此外,在 Duplexia 中,区块奖励充当推理价格的回扣,产生额外的社会有用计算,而这些计算在没有区块链的情况下不会出现——这种扩展随代币采用和技术效率单调增加。

英文摘要

Proof-of-work (PoW) blockchains rely on computational expenditure to secure a ledger supporting a native cryptocurrency. In existing systems such as Bitcoin, this expenditure is intentionally useless: the computation secures consensus but produces no external economic output. An emerging alternative -- proof of useful work (PoUW) -- enables the same computation to simultaneously secure the blockchain and generate economically valuable output. However, PoUW is often criticized on economic grounds: if the work is useful, attackers might be "paid to attack," potentially weakening security. We develop a competitive-equilibrium model of a PoUW blockchain in which compute can be allocated across pure mining, pure useful work -- instantiated as machine-learning inference -- or "duplex" work that produces both with computational overheads. We provide a complete closed-form characterization of equilibrium allocations and prices as a function of the duplex overheads and a single economic parameter -- the token-inference ratio -- measuring token adoption relative to the inference market. This characterization reveals three regimes: "Bitconia," in which the economy reduces to classical PoW; "Fortessia," in which duplex replaces mining, increasing security while useful output remains unchanged; and "Duplexia," in which token rewards subsidize inference, lowering prices and expanding inference supply. Contrary to the common strawman argument, PoUW does not make attacks economically cheap: once equilibrium prices are taken into account, the economic cost of a majority attack remains tied to the block reward. Moreover, in Duplexia, block rewards act as rebates on inference prices, generating additional socially useful computation that would not arise without the blockchain -- an expansion monotonically increasing in token adoption and technological efficiency.

2606.06697 2026-06-08 cs.CR cs.OS 新提交

AgileOS: A GPU Operating System Layer for Protected CUDA Services

AgileOS: 用于受保护CUDA服务的GPU操作系统层

Zhuoping Yang, Yiyu Shi, Alex Jones, Peipei Zhou

AI总结 针对GPU服务缺乏操作系统级保护的问题,提出AgileOS,通过库边界虚拟化CUDA、GPU内存管理和PTX注入实现服务隔离与保护。

详情
AI中文摘要

现代GPU应用程序越来越多地与存储系统、网络设备、供应商库和GPU驻留服务交互,而不仅仅是执行隔离的计算内核。这种转变使得GPU服务需要类似操作系统的保护,其中服务元数据、设备队列、内存映射I/O区域和库内部状态不应直接暴露给不可信的应用程序内核。然而,今天的CUDA编程模型默认情况下仍然让每个应用程序直接拥有其CUDA上下文、设备指针、运行时句柄、模块加载路径和内核启动,使得受保护的GPU服务必须构建自己的临时接口和隔离机制。本文介绍了AgileOS的初始设计和原型范围,AgileOS是一个用于受保护CUDA服务的GPU操作系统层。AgileOS在库边界虚拟化CUDA:应用程序链接客户端CUDA运行时、驱动程序和选定的库垫片,而受信任的运行时工作者拥有真实的CUDA上下文并调解支持的操作。为了保护服务状态和模块接口,AgileOS还定义了一个GPU内存管理模型,将用户分配与受保护的模块/MMIO范围分离,通过PTX注入使用指针验证和内存访问保护。AgileOS是模块化和灵活的,支持一系列受保护的服务和现有库,如cuFFT和PyTorch。原型包括客户端拦截器、工作者端CUDA处理程序、虚拟化CUDA对象表、受保护的AgileOS模块、将用户分配与受保护的模块/MMIO范围分离的GPU内存管理器、选定的受信任库适配器以及PTX级内核内存保护。

英文摘要

Modern GPU applications increasingly interact with storage systems, network devices, vendor libraries, and GPU-resident services rather than executing only isolated compute kernels. This shift creates a need for operating-system-like protection around GPU services, where service metadata, device queues, memory-mapped I/O regions, and library-internal state should not be directly exposed to untrusted application kernels. However, today's CUDA programming model, by default, still gives each application direct ownership of its CUDA context, device pointers, runtime handles, module loading path, and kernel launches, leaving protected GPU services to build their own ad hoc interfaces and isolation mechanisms. This paper presents the initial design and prototype scope of AgileOS, a GPU operating-system layer for protected CUDA services. AgileOS virtualizes CUDA at the library boundary: applications link against client-side CUDA Runtime, Driver, and selected library shims, while a trusted runtime worker owns the real CUDA context and mediates supported operations. To protect service state and module interfaces, AgileOS also defines a GPU memory-management model that separates user allocations from protected module/MMIO ranges, using pointer validation and memory access guards via PTX injection. AgileOS is modularized and flexible, supporting a range of protected services and existing libraries such as cuFFT and PyTorch. The prototype includes client-side interceptors, worker-side CUDA handlers, virtualized CUDA object tables, protected AgileOS modules, a GPU memory manager that separates user allocations from protected module/MMIO ranges, selected trusted library adapters, and the PTX-level kernel memory guard.

2606.06681 2026-06-08 cs.DS 新提交

Online Span Minimization for Flexible Uniform Jobs

灵活均匀作业的在线跨度最小化

Mozhengfu Liu, Samir Khuller, Xueyan Tang

AI总结 针对云计算中节能调度需求,研究在线跨度最小化问题,对均匀长度作业提出随机化竞争比为1.443的上界和1.366的下界,并证明允许重启可达到黄金比例1.618的最优竞争比。

Comments This paper will appear in ACM SPAA 2026 conference

详情
AI中文摘要

受云计算中节能调度的关键需求驱动,本文研究了跨度最小化问题,这是经过充分研究的BusyTime问题的一个基本变体。在一般的BusyTime问题中,$n$个作业由释放时间、截止时间和处理时间表征,必须被划分为容量为$B$的束,目标是使虚拟机的总活跃时间最小化。跨度最小化处理无界容量($B = \infty$)的特殊情况,该问题作为在更复杂的调度环境中实现高性能近似保证的重要前提。虽然先前的研究为区间作业建立了确定性的$2$-近似,并为一般BusyTime问题建立了$3$-近似,但跨度最小化的在线领域仍较少探索。在本文中,我们专注于跨度最小化的在线版本。我们证明可以利用随机化来打破已知的确定性竞争比下界$2$。对于均匀长度的作业,我们推导出随机化竞争比的上界为$\frac{1}{\ln{2}}\approx 1.443$,下界为$\frac{\sqrt{3}+1}{2}\approx 1.366$。此外,我们表明通过引入作业重启能力,可以达到等于黄金分割比($\phi \approx 1.618$)的最优竞争比。我们的结果为在线节能调度中随机化和灵活性的力量提供了新的见解。

英文摘要

Motivated by the critical need for energy-efficient scheduling in cloud computing, this paper investigates Span Minimization, a fundamental variant of the well-studied BusyTime problem. In the general BusyTime problem, $n$ jobs characterized by release times, deadlines, and processing times must be partitioned into bundles of capacity $B$, where the objective is to minimize the total active duration of the virtual machines. Span minimization addresses the specific case of unbounded capacity ($B = \infty$), a problem that serves as a vital precursor for achieving high-performance approximation guarantees in more complex scheduling environments. While previous research established a deterministic $2$-approximation for interval jobs and a $3$-approximation for the general BusyTime problem, the online landscape of span minimization remains less explored. In this paper, we focus on the online version of span minimization. We demonstrate that randomization can be leveraged to break the known deterministic competitive barrier of $2$. For uniform-length jobs, we derive a randomized competitive upper bound of $\frac{1}{\ln{2}}\approx 1.443$ and a lower bound of $\frac{\sqrt{3}+1}{2}\approx 1.366$. Furthermore, we show that by introducing the ability to restart jobs, we can achieve an optimal competitive ratio equal to the golden ratio ($ϕ\approx 1.618$). Our results provide new insights into the power of randomization and flexibility in online energy-aware scheduling.

2606.06665 2026-06-08 cs.SI cs.MA 新提交

Comparing Sentiment Contagion in AI-Agent and Human Social Networks: Evidence from MOLTBOOK

AI智能体与人类社交网络中情绪传染的比较:来自MOLTBOOK的证据

Elyes Ben chaabane, Savindu Herath, Yash Raj Shrestha

AI总结 研究AI社交网络中的情绪传播,发现负面帖子吸引更多回复,但回复通常趋于中性,情绪不会持续扩散,表明AI网络可能抑制情绪极端化。

Comments 8 pages without appendix

详情
AI中文摘要

AI智能体不仅开始与人类互动,也开始相互互动。我们研究在这样一个纯AI社交网络中情绪会发生什么:消极情绪会扩散,还是回复会使其平息?我们研究了MOLTBOOK,一个由自主语言模型智能体组成的社交网络,使用了近290万条帖子和150万条评论。负面帖子获得的回复远多于中性或正面帖子,因此消极情绪仍然吸引注意力。然而,对负面内容的回复通常不会保持负面。它们大多变为中性,并且几乎没有证据表明负面情绪会在几天内传播。因此,主要模式不是消极情绪的循环,而是消极注意力随后被中和。这些发现表明,AI智能体网络可能与人类社交网络行为不同:它们可能抑制情绪极端化,同时仍然强烈依赖于交互的组织方式。

英文摘要

AI agents are beginning to interact not only with people, but also with one another. We investigate what happens to sentiment in such an AI-only social network: does negativity spread, or do replies calm it down? We study MOLTBOOK, a social network made up of autonomous language-model agents, using almost 2.9 million posts and 1.5 million comments. Negative posts receive many more replies than neutral or positive posts, so negativity still attracts attention. However, replies to negative content usually do not stay negative. They most often become neutral, and there is meager evidence that negative sentiment spreads across days. The main pattern is therefore not a cycle of negativity, but negative attention followed by neutralisation. These findings suggest that AI-agent networks may behave differently from human social networks: they may dampen emotional extremes, while still depending strongly on how interactions are organised.

2606.06662 2026-06-08 cs.SE 新提交

AutoPipelineAI: Context-Aware CI/CD Pipeline Generation from Natural Language

AutoPipelineAI: 基于自然语言的上下文感知CI/CD流水线生成

Youssef Mohamed Aboelfotoh, Mohamed Ahmed Hemdan, Mohammad El-Ramly, Khlood Hassan, Mahmoud Saleh Saad, Ahmed Mohamed Tolba, Seif Gamal Abdelmonem

AI总结 提出AutoPipelineAI系统,利用大语言模型从自然语言描述生成CI/CD流水线配置,集成仓库感知分析、自动验证和反馈机制,降低DevOps配置复杂度。

Comments 7 pages, 1 figure, 6 tables, 16 references, IMSA Conference 11-12 July 2026, International Conference on Intelligent Methods, Systems, and Applications 2026 #70415,

详情
AI中文摘要

现代软件开发依赖CI/CD流水线来自动化测试、构建和部署操作。配置DevOps流水线具有挑战性且耗时,因为开发人员必须理解平台特定的语法并手动创建配置文件。这种复杂性可能导致配置错误和生产力下降,尤其是对于DevOps经验有限的开发人员。本文介绍了AutoPipelineAI系统,该系统使用自然语言描述生成CI/CD流水线配置。提出的解决方案利用大语言模型(LLMs)来翻译开发人员意图、分析仓库结构,并为GitHub Actions和GitLab CI/CD等环境创建特定的流水线脚本。它集成了仓库感知分析、自动验证系统以及一个反馈机制,以确认所创建流水线的准确性和可用性。我们介绍了系统架构、实现以及一个评估框架,该框架旨在衡量生成精度、配置有效性以及与手动创建流水线相比的设置工作量减少。AutoPipelineAI展示了LLMs如何简化DevOps配置的复杂性,并增强开发人员对持续交付方法的访问。评估结果初步证明,仓库感知的、自然语言驱动的CI/CD生成是一种可行且有前景的范式,可降低DevOps配置的复杂性并实现更易访问的软件交付自动化。

英文摘要

Modern software development relies on CI/CD pipelines to automate testing, building, and deployment operations. Configuring DevOps pipelines is challenging and time-consuming, as developers must understand platform-specific syntax and manually create configuration files. This complexity can lead to configuration errors and reduced productivity, especially for developers with limited DevOps experience. This paper introduces the AutoPipelineAI system, which generates CI/CD pipeline configurations using natural language descriptions. The proposed solution uses large language models (LLMs) to translate developer intent, analyze repository structures, and create specific pipeline scripts for environments like GitHub Actions and GitLab CI/CD. It integrates repository-aware analysis, automated validation systems, and a feedback mechanism that confirms the accuracy and usability of the created pipelines. We present the system architecture, its implementation, and an assessment framework designed to measure generation precision, configuration validity, and reduction in setup effort compared to manual pipeline creation. AutoPipelineAI illustrates how LLMs can simplify the complexity of DevOps configuration and enhance developer access to continuous delivery methods. Evaluation results provide early evidence that repository-aware, natural-language-driven CI/CD generation is a viable and promising paradigm for reducing the complexity of DevOps configuration and enabling more accessible software delivery automation.

2606.06650 2026-06-08 cs.HC 新提交

LinkNav: Surfacing Interconnected Information in Scientific Articles

LinkNav:在科学文章中呈现互联信息

Sebastian Joseph, Jennifer Healey, Junyi Jessy Li, Ani Nenkova

AI总结 提出LinkNav系统,通过语言模型生成阅读时的问题并搜索文档内答案,建立非相邻段落间的显式连接,提升学术论文阅读体验。

Comments 10 pages, 3 figures, ACL 2026 (Demo Track)

详情
AI中文摘要

我们提出LinkNav,一种增强的学术论文阅读体验,它显式地连接相关但非相邻的段落。为了创建这种体验,我们指示语言模型生成阅读段落时可能产生的问题,然后在文档的其他地方搜索答案段落,当找到答案时形成文档内连接。我们确认这些构建模块能够很好地支持这种体验,答案检测管道具有高精度,从而为文档建立合理数量的连接。在一个学术论文数据集上,我们发现连接的段落平均相隔十个段落,这使得读者可能错过的显式连接得以呈现。

英文摘要

We present LinkNav, an enhanced experience for reading academic papers which makes explicit connections between related but non-adjacent passages. To create the experience, we instruct a language model to generate questions that may arise while reading a passage and then search for answer passages elsewhere in the document, forming intra-document connections when answers are found. We confirm that these building blocks work well to power the experience, with an answer detection pipeline that works with high precision, resulting in a reasonable number of connections being made for a document. On a dataset of academic papers, we find that connected passages are on average ten segments away from each other, making explicit connections that a reader may have otherwise missed.

2606.06642 2026-06-08 eess.SY cs.SY 新提交

MPC for nonlinear systems: a comparative review of discretization methods

非线性系统的模型预测控制:离散化方法的比较综述

Guido Sanchez, Marina Murillo, Lucas Genzelis, Nestor Deniz, Leonardo Giovanini

AI总结 本文比较了模型预测控制中三种离散化连续时间非线性方程的方法:直接多重打靶、直接配点和逐次线性化,并通过两个测试案例评估其性能。

Journal ref 2017 XVII Workshop on Information Processing and Control (RPIC), Mar del Plata, Argentina, 2017, pp. 1-6

详情
AI中文摘要

本文对三种常用于离散化模型预测控制问题中连续时间非线性方程的数值方法进行了比较综述:直接多重打靶、直接配点和逐次线性化。概述了每种方法的特点,并通过两个测试案例的仿真评估了每种方法的性能。

英文摘要

This work provides a comparative review of three different numerical methods generally used to discretize continuous-time non-linear equations appearing in model predictive control problems: direct multiple shooting, direct collocation and successive linearizations. An overview of the characteristics of each method is given and the performance of each method is evaluated through the simulation of two test cases.

2606.06633 2026-06-08 cs.GT 新提交

Competing Auctions in Intermediated Markets

中介市场中的竞争拍卖

Bruno Mazorra, Minghao Pan, Christoph Schlegel

AI总结 研究中介市场中卖家选择并行拍卖机制的问题,发现密封投标第二价格中介拍卖完全瓦解为密封第一价格主拍卖,而开放投标中介仅部分瓦解。

详情
AI中文摘要

我们分析了中介市场中的竞争拍卖,其中卖家为单一商品的销售在并行机制中进行选择,最突出的是以太坊中提议者-构建者分离的中继与协议架构。当中介可以强制其投标人单归属时,密封投标第二价格中介拍卖完全瓦解为密封第一价格主拍卖;开放投标格式的中介仅部分瓦解,在对称延迟下均衡中坍缩为第一价格,在非对称延迟下将快速投标人分类到中介。任何最后查看优势都通过可信的密封投标渠道的可用性被消除。这些结果扩展到多路复用环境(中介无强制)。虽然瓦解结果表明密封第一价格投标渠道的可用性推动整个市场走向相同的拍卖结构,但该渠道可信性的假设本身存在问题,因为卖家可能有泄露信息的动机:在存在单个“快速”投标人时第一价格拍卖具有抗泄露性,但面对两个或更多投标人则不然。然而,如果卖家能够可信地承诺不泄露投标,那么这样做对他们来说是最优的。一个主要动机是以太坊即将到来的Glamsterdam更新:我们的分析表明,协议内(第一价格)投标渠道的可用性严重限制了中继和其他中介的协议外拍卖的设计空间。

英文摘要

We analyze competing auctions in intermediated markets, where a seller selects among parallel mechanisms for the sale of a single good, most prominently the relay-and-protocol architecture of proposer-builder separation in Ethereum. When the intermediary can enforce single-homing on its bidders, sealed-bid second-price intermediary auctions fully unravel into the sealed first-price principal auction; open bidding-format intermediaries unravel only partially, collapsing into first-price in equilibrium under symmetric latency and sorting fast bidders to the intermediary under asymmetric latency. Any last-look advantage is removed through the availability of a credible sealed bidding channel. These results extend to multi-plexing environments (no enforcement by the intermediary). While the unraveling result indicates that the availability of a sealed first-price bidding channel pushes the overall market to the same auction structure, the very assumption of the credibility of such channel is problematic, as the seller may have an incentive to leak information: a first-price auction is leakage-resistant in the presence of a single ``fast'' bidder but not against two or more. However, if the seller can credibly commit to not leak bids, it is optimal for them to do so. A main motivation is the forthcoming Glamsterdam update of Ethereum: our analysis suggests that the availability of an in-protocol (first-price) bidding channel severely limits the design space for out-of-protocol auctions by relays and other intermediaries.

2606.06625 2026-06-08 cs.GT math.ST stat.TH 新提交

N-Player Binary Games with Unidirectional Dependencies: Cycle Robustness and Induced Indifference

具有单向依赖性的N人二元博弈:循环鲁棒性与诱导无差异

Jose Maria Sanchez-Saez, Nana Odishelidze, Francisco Criado-Aldeanueva

AI总结 本文针对具有单向依赖性的N人二元博弈,给出了纳什均衡的闭式刻画,重点研究了有向循环图博弈,提出了鲁棒激励结构,在O(N)时间内求解均衡,并揭示了奇偶条件与诱导无差异的作用。

Journal ref Communications in Nonlinear Science and Numerical Simulation, Volume 161, Part 2, 2026, 110151, ISSN 1007-5704

详情
AI中文摘要

本研究提供了具有单向依赖性的N人二元博弈中纳什均衡的闭式刻画。虽然一般网络博弈是PPAD完全的,但先前的工作已证明树或路径可通过动态规划在多项式时间内求解。我们为有向循环图博弈的子类提供了确定性刻画,表明非零边界激励将拓扑线性化为前馈传播。在这种鲁棒激励结构下,可在O(N)时间内求解:严格优势保证唯一均衡;在无严格优势时,纯策略均衡由奇偶条件支配,而通过诱导支付无差异保证唯一完全混合均衡。对于非鲁棒情形,我们给出了分支规则。转移矩阵公式可预先评估搜索树大小。这种透明性使得循环网络中目标均衡的逆向设计成为可能,明确了数值求解器中晦涩的机制。

英文摘要

The present study provides a closed-form characterisation of Nash equilibria in N-player binary games with unidirectional dependencies. While general network games are PPAD-complete, prior work has established that trees or paths admit polynomial-time solutions via dynamic programming. We provide a deterministic characterisation for the subclass of directed cycle graphical games, demonstrating that non-zero boundary incentives linearize the topology into a feed-forward propagation. Under this Robust Incentive Structure, resolution is achieved in O(N) time: strict dominance guarantees a unique equilibrium; in its absence, pure strategy equilibria are governed by the Parity Condition, while a unique fully mixed equilibrium is guaranteed via induced payoff indifference. For non-robust regimes, we deliver branching rules. The transition-matrix formulation evaluates the search tree size beforehand. This transparency enables the inverse design of target equilibria in circular networks, making explicit the mechanics that remain opaque in numerical solvers.