arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2022
2606.05818 2026-06-05 math.HO cs.AI math.AG math.CO math.RT

Benchmarks in Leipzig

莱比锡基准测试

Andrei Balakin, Miklós Bóna, Marie-Charlotte Brandenburg, Clara Briand, Veronica Calvo Cortes, Shelby Cox, Jesus A. De Loera, Danai Deligeorgaki, Hannah Friedman, Tim Gehrunger, Chiara Giardino, Stephen Griffeth, Baran Hashemi, Elena Hoster, Alexander Ivanov, Nupur Jain, Aryaman Jal, Leonie Kayser, Joris Koefler, Kevin Kühn, Mario Kummer, Felix Lotter, René Marczinzik, Victor S. Miller, Alejandro Morales, Greta Panova, Gianni Petrella, Nathan Pflueger, Lakshmi Ramesh, Nikolas Rieke, Carlos Rodriguez, Andrea Rosana, Flavio Salizzoni, Otto T. P. Schmidt, Sven Ulf Schmitz, Lina Maria Simbaqueba Marin, Luca Sodomaco, Christian Stump, Bernd Sturmfels, Alexander Taveira Blomenhofer, Simon Telen, Philipp Tuchel, Emil Verkama, Carl Felix Waller, Julian Weigert, Annette Werner, Nathan Williams, Claudius Zibrowius

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 49位数学家于2026年4月至5月编制了100个研究级数学问题数据集,通过多阶段评估大型语言模型的数学推理能力,最终仅剩2个问题未解决。

Comments 8 pages including 8 benchmark statistics tables + 20 pages appendix containing the 100 Leipzig Benchmark questions

详情
AI中文摘要

在2026年4月1日至5月15日期间,由49位数学家组成的小组编制了一个包含已知答案的研究级数学问题数据集。大部分工作是在德国莱比锡马克斯·普朗克数学科学研究所举办的为期3天的研讨会*Benchmarks in Leipzig*上完成的,共有35名参与者。我们展示了由此产生的100个问题集合。我们分三个阶段评估了这些问题:首先由五个最先进的大型语言模型各尝试一次,随后对其中三个模型进行每个模型20次运行的评估,最后用两个深度思考模型进行3次尝试。第一阶段后,41个问题完全未解决;第二阶段后,这一数字降至16个;第三阶段结束时,仅剩2个问题未解决。这表明大型语言模型的数学推理能力正变得令人印象深刻。

英文摘要

Between April 1 and May 15, 2026, a group of 49 mathematicians compiled a dataset of research-level mathematics questions with known answers. Most of the work was done during the 3-day workshop *Benchmarks in Leipzig* with 35 participants at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany. We present the resulting collection of 100 questions. We evaluated these questions in three stages: a single attempt by five state-of-the-art LLMs, followed by a 20-runs-per-model evaluation with three of these models, and finally a 3-run attempt with two heavy-thinking models. After Stage 1, 41 questions remained completely unsolved; after Stage 2, this count dropped to 16; and we concluded Stage 3 with only 2 unsolved questions. This demonstrates that the mathematical reasoning capabilities of LLMs are becoming impressive.

2606.05779 2026-06-05 cs.CR cs.AI stat.ML

TinyML-Driven Cybersecurity for Autonomous Spacecraft: Latency-Accuracy Analysis for SPARTA RF and Cyber Threat Detection

TinyML驱动的自主航天器网络安全:SPARTA射频与网络威胁检测的延迟-精度分析

Van Le, Trevor Tran, Tan Le

发表机构 * Virginia Tech(弗吉尼亚理工学院) Hampton University(哈姆普顿大学)

AI总结 针对自主航天器,基于SPARTA攻击模型分析TinyML兼容经典模型(随机森林、逻辑回归、SVM、MLP)在检测多种网络射频威胁时的延迟-精度权衡,发现逻辑回归在微秒级推理下仅比随机森林精度低1%,适合作为机载自主基线。

Comments Twenty Fifth International Conference on Security & Management (SAM'26)

详情
AI中文摘要

自主航天器需要快速、轻量且可靠的在轨检测网络射频威胁。利用SPARTA攻击模型,我们分析了TinyML兼容的经典模型——随机森林、逻辑回归、支持向量机和多层感知机——在检测上行链路干扰、Fake-NR欺骗、有效载荷操纵、地面段妥协和未授权命令注入时的延迟-精度权衡。我们对每个模型的计算复杂度、VC维、Lipschitz连续性和延迟缩放进行了基于物理的理论分析,并通过在通过BandErasure、FakeNR和NoiseBurst损坏模式生成的对抗性射频频谱图上的经验测量加以支持。结果表明,逻辑回归实现了微秒级推理,且相对于随机森林仅下降1%的精度,使其成为机载自主的有效TinyML基线。该研究还指出了通过更丰富的特征编码器和多时间尺度学习架构来推进航天器网络安全的机会,这建立在边缘智能和可信AI的最新进展之上。

英文摘要

Autonomous spacecraft require rapid, lightweight, and reliable onboard detection of cyber-RF threats. Using the SPARTA attack model, we analyze the latency-accuracy trade-offs of TinyML-compatible classical models -- Random Forest, Logistic Regression, SVM, and MLP -- for detecting uplink jamming, Fake-NR spoofing, payload manipulation, ground-segment compromise, and unauthorized command injection. We present a physics-informed theoretical analysis of each model's computational complexity, VC dimension, Lipschitz continuity, and latency scaling, supported by empirical measurements on adversarial RF spectrograms generated via BandErasure, FakeNR, and NoiseBurst corruption modes. Results show that Logistic Regression achieves microsecond-level inference with only a 1\% accuracy drop relative to Random Forest, making it an effective TinyML baseline for onboard autonomy. The study also identifies opportunities for advancing spacecraft cybersecurity through richer feature encoders and multi-timescale learning architectures, building on recent progress in edge intelligence and trustworthy AI.

2606.05776 2026-06-05 cs.CR cs.AI cs.LG

An Improved CNN-LSTM Based Intrusion Detection System for IoT Networks

基于改进的CNN-LSTM的物联网网络入侵检测系统

Mohammad Tariq Ikhlas, Pohanyar Khowaja Khil, Malik Muhammad Mueed Aslam, Muhammad Khuram Shahzad

发表机构 * University of Engineering and Technology, Lahore(拉合尔工程与技术大学)

AI总结 提出一种结合多类分类、数据集集成和时间特征学习的改进CNN-LSTM入侵检测模型,在物联网网络上达到约97%的准确率。

Comments 8 pages, 8 figures

详情
AI中文摘要

随着物联网设备的快速普及,安全问题急剧增加,入侵检测系统对于保护网络环境变得至关重要。本文提出了一种改进的基于CNN-LSTM的入侵检测模型,该模型结合了多类分类、数据集集成和时间特征学习,以增强物联网网络中的检测性能。使用网络流量数据,所提出的方法在入侵检测任务上进行了评估,达到了约97%的准确率。实验结果表明,该模型能有效检测多种攻击类别,同时保持稳定的训练和验证性能。卷积和循环神经网络组件的集成使框架能够捕获网络流量的空间和时间特征,提高了物联网环境中的整体入侵检测能力。

英文摘要

With the rapid proliferation of IoT devices, security concerns have dramatically escalated and intrusion detection systems have become critical for protecting networked environments. This paper presents an improved CNN-LSTM based intrusion detection model that combines multi-class classification, dataset integration, and temporal feature learning to enhance detection performance in IoT networks. Using network traffic data, the proposed approach is evaluated on intrusion detection tasks and achieves an accuracy of approximately 97%. Experimental results demonstrate that the model effectively detects multiple attack categories while maintaining stable training and validation performance. The integration of convolutional and recurrent neural network components enables the framework to capture both spatial and temporal characteristics of network traffic, improving overall intrusion detection capability in IoT environments.

2606.05770 2026-06-05 cs.SE cs.AI

Human Oversight and Overload: Two Hidden and Costly Burdens of AI-Assisted Software Engineering

人类监督与过载:AI辅助软件工程中两种隐藏且昂贵的负担

Vahid Garousi

发表机构 * Queen’s University Belfast(女王大学贝尔法斯特) Azerbaijan Technical University(阿塞拜疆技术大学)

AI总结 本文通过分析从业者观点,揭示了AI辅助软件工程中人类持续监督AI生成产物和认知过载两种隐藏负担,并探讨了团队应对策略。

详情
AI中文摘要

AI正在改变软件工程师的工作方式,但常常伴随着隐藏的负担和成本。在本文中,我们描述了两种常被忽视的负担:(1)对人类持续监督和检查AI生成产物的需求;(2)软件工程师因接收大量AI工具建议而日益增长的认知过载。人类监督的需求并非可选——工程师必须审查、验证,有时甚至重做AI产生的内容。同时,大量AI建议、提示和可能的解决方案会使开发者精神紧张。通过融合近期从业者观点的证据,我们强调了这些常被忽视的挑战,并开启了关于团队如何在日常AI辅助软件工程中应对这些挑战的对话。

英文摘要

AI is changing how software engineers work, but it often comes with hidden burdens and costs. In this paper, we characterize two such often-overlooked burdens: (1) the constant need for human oversight and inspection of AI-generated artifacts; and (2) the growing cognitive overload on software engineers from receiving large amounts of suggestions from AI tools. The need for human oversight is not optional-engineers must review, validate, and sometimes rework what AI produces. At the same time, the flood of AI suggestions, prompts, and possible solutions can leave developers mentally stretched. By blending evidence from recent opinions from practitioners, we highlight these often-overlooked challenges and open a conversation about how teams can handle them in day-to-day AI-assisted software engineering.

2606.05748 2026-06-05 cs.MM cs.AI cs.CL

UNIVID: Unified Vision-Language Model for Video Moderation

UNIVID:用于视频审核的统一视觉语言模型

Kejuan Yang, Yizhuo Zhang, Mingyuan Du, Yue Zhang, Dixin Zheng, Kaili Zhao, Yang Xiao, Hanzhong Liang, Kenan Xiao

发表机构 * Bytedance(字节跳动)

AI总结 提出UNIVID统一视觉语言模型,通过生成可解释的策略感知字幕,实现端到端视频审核,减少违规泄露42.7%和过度审核率37.0%。

Comments 7 pages, 3 figures. Accepted to ACL 2026 Industry Track

详情
AI中文摘要

全球规模的视频审核面临双重挑战:需要细粒度的多模态推理以及可解释的输出以支持下游执法。传统的审核系统通常依赖于难以维护且缺乏透明度的碎片化黑盒分类器。在本文中,我们提出了UNIVID,一种用于视频审核的统一视觉语言模型。与标准分类模型不同,UNIVID生成策略感知的字幕,作为可解释的中间表示,实现人类可验证的决策和多任务可重用性。尽管现有的开源和商业VLM通常存在安全护栏拒绝问题,并且缺乏细粒度的策略对齐,我们开发了一种专门的训练数据配方,结合专家人工精炼的标签和合成数据,使模型与我们的安全指南对齐。通过将UNIVID作为核心字幕生成器,我们设计了一种新颖的端到端视频审核系统,相对减少了42.7%的违规泄露和37.0%的过度审核率。同时,通过用单个UNIVID骨干替换超过1000个策略特定模型,我们回收了大量计算资源,同时减少了工程维护开销。据我们所知,这是首批关于高效字幕生成VLM成功支持工业规模审核和跨职能业务的报告之一。

英文摘要

Global-scale video moderation faces a dual challenge: the need for fine-grained multi-modal reasoning and the demand for interpretable outputs to support downstream enforcement. Traditional moderation systems often rely on fragmented black-box classifiers that are difficult to maintain and lack transparency. In this paper, we present UNIVID, a UNIfied VIsion-language model for video moDeration. Unlike standard classification models, UNIVID generates policy-aware captions that serve as an interpretable intermediate representation, enabling human-verifiable decisions and multi-task reusability. While existing open-source and commercial VLMs often suffer from safety-guardrail refusals and lack fine-grained policy alignment, we develop a specialized training data recipe that combines expert human-refined labels with synthetic data to align the model with our safety guidelines. By integrating UNIVID as the core captioner, we design a novel end-to-end video moderation system that reduces violation leakage by 42.7% and overkill rate by 37.0% relatively. Meanwhile, by replacing over 1,000 policy-specific models with a single UNIVID backbone, we recycled extensive computation resources while reducing engineering maintenance overhead. To our knowledge, this is one of the first reports of a high-efficiency captioning VLM successfully supporting industrial-scale moderation and cross-functional business.

2606.05743 2026-06-05 cs.CR cs.CL

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

Membrane: 一种用于LLM智能体防御的自演化对比安全记忆

Minseok Choi, Seungbin Yang, Dongjin Kim, Subin Kim, Jungmin Son, Yunseung Lee, Jaegul Choo, Youngjun Kwak

发表机构 * KAIST AI(韩国科学技术院人工智能实验室) Financial Tech Lab, KakaoBank Corp(Kakao银行金融科技实验室)

AI总结 提出Membrane,一种基于对比安全记忆(CSM)的自演化护栏,通过将有害交互及其良性对应物蒸馏为对比单元来防御不断演化的越狱攻击,无需重新训练即可实现高F1和低良性拒绝率。

详情
AI中文摘要

尽管在安全对齐方面取得了进展,大型语言模型仍然容易受到不断演化的越狱攻击。现有的微调安全分类器无法适应这些演化的攻击,而基于自适应记忆的护栏往往过度拒绝与存储攻击相似的良性查询。我们提出Membrane,一种基于对比安全记忆(CSM)构建的自演化护栏:每个单元将阻止有害查询的条件与允许表面相似的良性请求的条件配对。无需重新训练,Membrane通过将每次有害交互及其良性对应物蒸馏为一个由底层攻击策略索引的对比单元来演化CSM,使得一个单元能够泛化到同一机制的不同主题变体。在推理时,检索到的单元作为精确安全决策的上下文基础。在HarmBench上的模型级安全和AgentHarm上的智能体级安全评估中,Membrane在所有六种越狱攻击上实现了最高的F1分数。值得注意的是,AgentHarm上的良性拒绝率保持在7-14%,远低于先前护栏的28-85%范围。在跨攻击转移下,记忆单元仍保持87-88%的F1,并在记忆投毒下保持稳定。

英文摘要

Despite advances in safety alignment, large language models remain vulnerable to continuously evolving jailbreaks. Existing fine-tuned safety classifiers cannot adapt to these evolving attacks, while adaptive memory-based guardrails tend to over-refuse benign queries that resemble stored attacks. We propose Membrane, a self-evolving guardrail built on Contrastive Safety Memory (CSM): each cell pairs the conditions for blocking a harmful query with those for permitting a superficially similar benign request. Without retraining, Membrane evolves CSM by distilling each harmful interaction and its benign counterpart into a contrastive cell indexed by the underlying attack strategy, so that one cell generalizes across topical variants of the same mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest F1 on all six jailbreak attacks. Notably, benign refusal on AgentHarm stays at 7-14%, well below the 28-85% range of prior guards. Memory cells also retain 87-88% F1 under cross-attack transfer and remain stable under memory poisoning.

2606.05729 2026-06-05 cs.IT cs.LG math.IT

Automated Proving of Shannon-Type Entropy Inequalities via Fine-Tuned Language Models and Guided Tree Search

通过微调语言模型和引导树搜索自动证明香农型熵不等式

Shing Yin Wong, Shaocheng Liu, Linqi Song, Amin Gohari, Cheuk Ting Li

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文通过微调小规模语言模型并结合引导束搜索,自动化证明香农型熵不等式,在含10-15个变量的测试集上达到85%的证明成功率。

详情
AI中文摘要

证明香农型熵不等式是信息论中的一项基本任务,通常需要构造已知约束的非平凡线性组合,这是一个组合搜索问题,其规模随随机变量数量增加而急剧增长。我们研究了小规模大语言模型(0.6B--1.7B参数),在原子证明步骤上微调并结合引导束搜索,能否自动化这一过程。在包含n=10到15个变量的60个不等式的保留测试集上,我们的0.6B微调模型通过树搜索达到了85%的证明成功率。GPT-5.5在零样本提示下解决了1.7%的样本,而Psitip解决了33.3%的样本。跨训练上下文长度(4096 vs. 8192 token)和数据分布(n=9偏斜 vs. 非偏斜)的系统消融研究表明,4096 token的非偏斜训练分布表现最佳,而扩展上下文和偏斜数据没有带来边际收益。我们进一步识别了两种主要的失败模式——格式失败和步骤质量退化——并通过受控消融验证了束评分启发式的必要性(随机评分将成功率从83%降至23%)。

英文摘要

Proving Shannon-type entropy inequalities is a fundamental task in information theory that often requires constructing non-trivial linear combinations of known constraints, which is a combinatorial search problem that scales poorly with the number of random variables. We investigate whether small-scale large language models (0.6B--1.7B parameters), fine-tuned on atomic proof steps and combined with guided beam search, can automate this process. On a held-out test set of 60 inequalities spanning n=10 to 15 variables, our 0.6B fine-tuned model achieves an 85\% proof success rate with tree search. GPT-5.5 solves 1.7\% samples under zero-shot prompting while Psitip solves 33.3\% samples. A systematic ablation study across training context length (4096 vs.\ 8192 tokens) and data distribution (n=9-skewed vs not skewed) reveals that a 4096-token not skewed training distribution yields the best performance, with extended context and skewed data providing no marginal benefit. We further identify two dominant failure modes -- format failures and step quality degradation -- and verify that the beam-scoring heuristic is essential via a controlled ablation (random scoring reduces success from 83\% to 23\%).

2606.05725 2026-06-05 cs.CR cs.CL

An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic

一种用于大型语言模型API流量中模型提取攻击的极其简单的检测器

Shuze Liu, Qianwen Guo, Yushun Dong

发表机构 * Santa Clara University(圣克拉拉大学) Florida State University(佛罗里达州立大学)

AI总结 本文提出一种基于最大均值差异(MMD)的简单检测方法,通过将查询嵌入语义空间并比较其与历史良性流量的分布差异,有效检测LLM API中的模型提取攻击。

Comments Preprint. Code available at https://github.com/LabRAI/mmd-llm-mea-detection

详情
AI中文摘要

大型语言模型(LLM)越来越多地通过托管API部署,使得模型提取成为对模型所有权和服务安全的实际威胁。然而,单个提取查询通常类似于良性请求,现有评估往往关注单查询异常评分或纯良性对攻击者用户设置。我们将模型提取监控形式化为良性校准的流量窗口分布测试,并展示一个极其简单的检测器是有效的:将传入查询嵌入语义空间,并测试其聚合分布是否偏离历史良性流量。我们使用最大均值差异(MMD)实例化该检测器,仅通过良性对良性比较来设置决策阈值。我们在来自四个提取场景的十四个攻击者-正常查询对上进行评估,并与改编的PRADA、SEAT、CAP、DATE和边际马氏距离基线进行比较。在三个随机种子下,MMD实现了0.3%的良性假阳性率、100.0%的纯攻击者真阳性率、攻击者比例上的平均真阳性率90.5%以及平衡准确率95.1%。这些结果表明,良性校准的分布测试是用户级和混合多用户LLM API流量中模型提取检测的强经验基线。代码发布在:https://github.com/LabRAI/mmd-llm-mea-detection。

英文摘要

Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-calibrated traffic-window distribution testing and show that an embarrassingly simple detector is effective: embed incoming queries into a semantic space and test whether their aggregate distribution deviates from historical benign traffic. We instantiate the detector with maximum mean discrepancy (MMD), using only benign-vs-benign comparisons to set the decision threshold. We evaluate on fourteen attacker-normal query pairs from four extraction scenarios and compare with adapted PRADA, SEAT, CAP, DATE, and marginal Mahalanobis baselines. Across three random seeds, MMD achieves 0.3% benign FPR, 100.0% pure-attacker TPR, 90.5% average TPR over attacker fractions, and 95.1% balanced accuracy. These results show that benign-calibrated distribution testing is a strong empirical baseline for model extraction detection in both user-level and mixed multi-user LLM API traffic. Code is released at: https://github.com/LabRAI/mmd-llm-mea-detection.

2606.05720 2026-06-05 cs.SE cs.AI

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

微技能架构:一种面向AI原生代码生成的模块化技能驱动框架

Mohammad Zare, Omid Abdolrahmani

发表机构 * Artificial Intelligence Laboratory at AriooBarzan(AriooBarzan人工智能实验室) Engineering Team, Shiraz, Iran(伊朗谢尔兹工程团队)

AI总结 本文提出微技能架构,通过将知识封装为原子技能胶囊并动态选择相关胶囊,解决AI代码生成中的上下文窗口管理问题,显著降低token消耗、提高编译成功率并消除架构违规。

详情
AI中文摘要

大型语言模型和AI编码代理已经重塑了软件开发,但完全AI原生系统的路径面临结构性挑战。其中最主要的是在保持准确性和效率的同时管理上下文窗口。当开发者将完整的项目文档和代码注入模型内存时,模型会丢失序列中间的信息,token成本激增,架构发生漂移。本文提出微技能架构:一种受微服务启发的模块化设计范式,应用于知识封装而非服务分解。该架构不是将整个代码库提供给代理,而是将知识划分为原子化、范围明确的技能胶囊,并由动态路由器仅选择语义相关的胶囊来执行任务。我们将上下文分配形式化为在token预算约束下基于语义相关性的约束优化。一个针对具有十五个复杂特性的企业内容管理系统的实证案例研究表明,微技能将token消耗降低了90%以上,首次尝试编译成功率几乎翻倍,完全消除了架构违规,并通过自学习机制实现了七个新技能胶囊的自主提取和注册。这些发现表明,微技能架构为构建更高效、更可靠且能够随时间演进的AI原生开发系统提供了可扩展的基础。

英文摘要

Large language models and AI coding agents have reshaped software development, but the path to fully AI-native systems faces structural challenges. Chief among them is managing context windows without losing accuracy or efficiency. When developers inject full project documentation and code into a model's memory, the model loses mid-sequence information, token costs spiral, and architecture drifts. This paper presents MicroSkill Architecture: a modular design paradigm inspired by microservices, applied to knowledge encapsulation instead of service decomposition. Instead of feeding an agent the entire codebase, the architecture partitions knowledge into atomic, sharply scoped skill capsules, and a dynamic router selects only semantically relevant capsules for the task. We formally model context allocation as constrained optimization over semantic relevance subject to a token budget. An empirical case study an enterprise content management system with fifteen complex features shows that MicroSkill cuts token consumption by over 90%, nearly doubles first-try compilation success rates, eliminates architectural violations entirely, and enables autonomous extraction and registration of seven new skill capsules via a self-learning mechanism. These findings suggest MicroSkill Architecture offers a scalable foundation for building AI-native development systems that are more efficient, more reliable, and capable of evolving over time.

2606.05714 2026-06-05 cs.CR cs.LG

Hybrid CNN-LSTM Framework for Intelligent Cyber Attack Detection and Prevention in U.S. Critical Digital Infrastructure: A Comparative Machine Learning Evaluation on CSE-CIC-IDS2018

混合CNN-LSTM框架用于美国关键数字基础设施的智能网络攻击检测与防御:基于CSE-CIC-IDS2018的机器学习比较评估

Md. Iqbal Hossan, Md. Serajul Kabir Chowdhury Rubel, Md. Arifur Rahman, B. M. Taslimul Haque

发表机构 * Department of Computer Science, Maharishi International University(马哈拉吉国际大学计算机科学系) Department of Information Studies, Trine University(特林大学信息学系) Department of Business Information Systems, Central Michigan University(中央密歇根大学商业信息系统系)

AI总结 提出一种结合CNN和LSTM的混合深度学习框架,利用CSE-CIC-IDS2018数据集进行网络攻击检测与防御,通过比较多种机器学习模型,实现高精度入侵检测和自动防御。

Comments 25 pages, 9 figures, CSE CIC IDS2018 dataset, Hybrid CNN LSTM, cyber attack detection

详情
Journal ref
Journal of Ai ML DL, 1(1), 2025
AI中文摘要

美国数字基础设施正在快速增长,因此,关键领域(包括医疗、金融、交通、能源和政府系统)面临的先进网络威胁也在增加。传统的网络安全方法,包括基于签名的入侵检测系统,已无法有效应对当今的网络攻击,因为它们无法实时检测未知和变化的攻击。为了克服这些限制,本研究提出了一种智能网络防御系统,利用人工智能(AI)和机器学习(ML)算法来检测和预防美国数字基础设施中的网络攻击。本研究使用CSE-CIC-IDS2018数据集,这是一个真实的网络流量数据集,包含各种网络攻击场景,包括分布式拒绝服务(DDoS)、暴力攻击、僵尸网络、渗透攻击和基于Web的攻击。实施并评估了多种机器学习和深度学习模型,如随机森林、XGBoost、卷积神经网络(CNN)和长短期记忆(LSTM)网络,用于识别恶意网络行为并提高入侵检测的准确性。所提出的框架结合了数据预处理、特征工程、实时流量监控、智能威胁分类和自动防御机制,以增强网络安全弹性。

英文摘要

Digital infrastructure is growing at a rapid pace in the United States, and as a result, exposure to advanced cyber threats to critical sectors including healthcare, finance, transportation, energy and government systems is growing. The traditional cybersecurity approaches, including signature-based intrusion detection systems, have become less effective against today's cyber attacks, as they are unable to detect unknown and changing attacks in real time. To overcome these constraints, this research suggests a smart cyber-defense system, which utilizes Artificial Intelligence (AI) and Machine Learning (ML) algorithms in the detection and prevention of cyber attacks in the U.S. digital infrastructure. This study uses the CSE-CIC-IDS2018 dataset, which is a realistic network traffic dataset, along with various cyber attack scenarios, including Distributed Denial of Service (DDoS), brute force attacks, botnets, infiltration attacks, and web-based attacks. A number of machine learning and deep learning models such as Random Forest, XGBoost, Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks are implemented and evaluated to be used in identifying malicious network behavior and boosting the accuracy of intrusion detection. The framework proposed combines data preprocessing, feature engineering, real-time traffic monitoring, intelligent threat classification with automated prevention mechanisms to build cybersecurity resilience. E

2606.05713 2026-06-05 cs.MM cs.SD eess.AS

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

超越生成式解码:来自原生全模态大语言模型的判别性隐藏状态读出用于多模态情感分析

Bin Wen, Tien-Ping Tan

发表机构 * School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia(计算机科学学院,马来西亚国际科学大学,槟城)

AI总结 针对多模态情感分析中生成式读出将连续回归绑定到离散自回归解码导致精度和效率损失的问题,提出基于原生全模态大语言模型Qwen2.5-Omni-7B的Thinker模块的判别性读出方法,通过轻量回归头直接映射最终层隐藏状态,在单消费级GPU上实现最先进性能。

Comments 18 pages, 4 figures, 6 tables

详情
AI中文摘要

多模态情感分析(MSA)从语言、声学和视觉信号推断人类情感。最近的方法越来越多地通过生成式读出适应大型多模态模型(LMM):提示模型将情感分数作为文本字符串输出。虽然方便,但这将连续回归与离散自回归解码绑定,带来了未测量的成本。我们重新审视这种读出机制,并提出一种基于原生全模态大语言模型(Qwen2.5-Omni-7B)的Thinker模块构建的判别性公式。我们不是进行文本解码,而是通过轻量回归头在单次前向传播中将最后一个非填充标记的最终层隐藏状态映射到连续分数。使用4位量化和低秩适应(QLoRA),整个7B管道——包括视频和音频处理——在单个消费级GPU(RTX 5090,32 GB)上训练,峰值内存10-21 GB,可训练参数仅1.14%。通过固定骨干网络、数据和LoRA配置的受控比较,我们隔离了读出的影响。在CMU-MOSI和CMU-MOSEI上,我们的判别性读出无需任务特定特征工程即可达到最先进的准确率(MOSI:MAE 0.551,Corr 0.888;MOSEI:MAE 0.506,Corr 0.790),并表现出强大的多种子稳定性。相比之下,生成式读出——即使经过等效的监督训练——平均绝对误差增加了一倍以上,产生无法解析或超出范围的输出(零样本下2.8%),并且延迟更高。模态消融实验揭示了CMU-MOSI上的文本主导模式。我们的发现表明,LMM的读出方式与其训练方式同样重要,证明判别性读出为连续MSA提供了更准确、高效和可靠的替代方案。

英文摘要

Multimodal sentiment analysis (MSA) infers human affect from language, acoustic, and visual signals. Recent methods increasingly adapt large multimodal models (LMMs) via generative readout: prompting the model to emit a sentiment score as a text string. While convenient, this ties continuous regression to discrete autoregressive decoding, incurring unmeasured costs. We revisit this readout mechanism and propose a discriminative formulation built on the Thinker module of a native omni-modal LLM (Qwen2.5-Omni-7B). Instead of text decoding, we map the final-layer hidden state of the last non-padding token to a continuous score via a lightweight regression head in a single forward pass. Using 4-bit quantization and low-rank adaptation (QLoRA), the entire 7B pipeline -- including video and audio processing -- trains on a single consumer GPU (RTX 5090, 32 GB) with 10-21 GB peak memory and 1.14% trainable parameters. Through a controlled comparison fixing the backbone, data, and LoRA configuration, we isolate the impact of the readout. On CMU-MOSI and CMU-MOSEI, our discriminative readout reaches state-of-the-art accuracy without task-specific feature engineering (MOSI: MAE 0.551, Corr 0.888; MOSEI: MAE 0.506, Corr 0.790) and exhibits strong multi-seed stability. In contrast, the generative readout -- even after equivalent supervised training -- more than doubles the mean absolute error, yields unparsable or out-of-range outputs (2.8% zero-shot), and suffers from higher latency. Modality ablations reveal a text-dominant regime on CMU-MOSI. Our findings indicate that how an LMM is read out is as consequential as how it is trained, demonstrating that a discriminative readout offers a more accurate, efficient, and reliable alternative for continuous MSA.

2606.05710 2026-06-05 cs.CR cs.AI

Explainable AI-Driven Cyber Risk Analytics and Model Reliability Assessment for Intelligent Governance of U.S. Critical Infrastructure: An XGBoost and SHAP-Based Intrusion Detection Framework

面向美国关键基础设施智能治理的可解释AI驱动的网络风险分析与模型可靠性评估:基于XGBoost和SHAP的入侵检测框架

B. M. Taslimul Haque, Md. Arifur Rahman, Md. Serajul Kabir Chowdhury Rubel, Md. Iqbal Hossan

发表机构 * Department of Business Information Systems, Central Michigan University(中央密歇根大学商业信息系统系) Department of Information Studies, Trine University(特林大学信息学系) Department of Computer Science, Maharishi International University(Maharishi国际大学计算机科学系)

AI总结 针对美国关键基础设施面临的网络威胁,提出一种结合XGBoost、随机森林等机器学习分类器与可解释AI(XAI)技术的入侵检测与网络风险预测框架,通过CICIDS2017数据集验证模型性能与可靠性。

Comments 20 pages, 8 figures, empirical research article, CICIDS2017 dataset, XGBoost, Random Forest, Decision Tree, Logistic Regression, SHAP explainability analysis, cyber risk analytics, intrusion detection, critical infrastructure cybersecurity, model reliability assessment

详情
Journal ref
Applied IT & Engineering, 2(1), 1-20, 2024
AI中文摘要

美国关键基础设施领域智能数字技术的日益渗透极大地增加了面对高级网络对手和运营漏洞的风险。AI驱动的治理和自动化决策系统正成为关键基础设施系统(包括能源、医疗、交通、金融服务和通信基础设施)运行的关键部分,以提高效率和战略管理。不断增长的网络威胁环境,如分布式拒绝服务(DDoS)攻击、僵尸网络、勒索软件和高级持续性威胁(APT),对基础设施韧性、网络安全可靠性和治理可信度构成了重大挑战。在不断变化的攻击态势和动态网络环境中,传统的网络安全机制往往无法满足不断变化的需求和保护关键系统。本研究将开发一个弹性网络风险分析和模型可靠性评估框架,以支持美国关键基础设施环境中网络风险暴露的智能治理和决策支持。本研究基于CICIDS2017数据集,用于开发和测试基于机器学习的入侵检测系统模型和网络风险预测模型。使用XGBoost、随机森林和决策树等多种分类器来检测网络上的恶意活动并确定网络风险水平。此外,集成了可解释人工智能(XAI)技术,以增强网络安全决策过程的透明度、可解释性和信任度。所提出的框架通过多种性能指标(如准确率、精确率、召回率、F1分数、ROC-AUC和假阳性率)展示了模型的可靠性和韧性。

英文摘要

The increasing penetrations of the critical infrastructure sector in the United States with intelligent digital technologies have greatly increased exposure to advanced cyber adversaries and operational vulnerabilities. AI-powered governance and automated decision-making systems are becoming a key part of the operation of critical infrastructure systems, including energy, healthcare, transportation, financial services, and communication infrastructure, in order to improve efficiency and strategic management. The growing cyber threat environment, such as Distributed Denial of Service (DDos) attacks, botnets, ransomware, and Advanced Persistent Threats (APTs) pose significant challenges to infrastructure resilience, cyber security reliability, and governance trustworthiness. In a changing attack landscape and dynamic network environment, traditional cybersecurity mechanisms can often fall short of meeting the evolving needs and protecting critical systems. This study will develop a resilient cyber risk analytics and model reliability assessment framework to support intelligent governance and decision support for cyber risk exposure in the U.S. critical infrastructure environment. This study is based on the CICIDS2017 dataset for the development and testing of intrusion detection system models and cyber risk prediction models based on machine learning. Various classifiers like XGBoost, Random Forest, and Decision Tree are used to detect malicious activities on the network and determine the level of cyber risk. Furthermore, the Explainable Artificial Intelligence (XAI) techniques are integrated to enhance transparency, interpretability, and trust in cybersecurity decision-making processes. The proposed framework presents the reliability and resilience of the model by having various performance measures such as accuracy, precision, recall, F1 score, ROC-AUC, and false positive rate.

2606.05701 2026-06-05 cs.CR cs.AI

Cognitive Threat Intelligence and Explainable Federated Security Analytics for distributed Infrastructure Systems

面向分布式基础设施系统的认知威胁情报与可解释联邦安全分析

Md. Arifur Rahman, B. M. Taslimul Haque, Md. Iqbal Hossan, Md. Serajul Kabir Chowdhury Rubel

发表机构 * Dept. of Information Studies, Trine University(信息研究系,特林大学) Dept. of Business Information Systems, Central Michigan University(商业信息系统系,中央密歇根大学) Dept. of CS, Maharishi International University(计算机科学系, Maharishi 国际大学)

AI总结 提出一种集成联邦学习、可解释人工智能和认知网络安全分析的框架,用于分布式基础设施系统的协作式隐私保护威胁检测。

Comments 22 pages, 10 figures, 1 conceptual framework diagram, 1 methodology workflow diagram, empirical study using NSL-KDD and CIC-IDS2017 datasets, Federated Learning, Explainable AI (SHAP, LIME), cybersecurity and intrusion detection framework

详情
Journal ref
International Journal of Research and Technology (IJRT), Volume 13, Issue 01, January-March 2025, pp. 132-151
AI中文摘要

分布式基础设施系统、云计算、物联网技术和边缘架构的日益普及显著扩大了网络安全攻击面,并引入了日益复杂的网络威胁。传统的集中式入侵检测方法在可扩展性、数据隐私、通信开销以及人工智能驱动决策过程的透明度方面常面临挑战。为解决这些限制,本文提出了一种面向分布式基础设施系统的认知威胁情报与可解释联邦安全分析框架。该框架集成了联邦学习、可解释人工智能和认知网络安全分析,能够在分布式网络环境中实现协作式且保护隐私的网络威胁检测。敏感原始网络流量数据不传输到集中式服务器,而是在分布式节点上独立训练本地安全模型,仅通过联邦聚合机制共享加密的模型参数和更新。这种去中心化学习架构在减少通信依赖和集中式安全风险的同时提高了隐私保护。为增强智能威胁分析,该框架采用了机器学习和深度学习算法,包括随机森林、XGBoost、自编码器、卷积神经网络和长短期记忆网络。此外,可解释人工智能技术(如SHAP和LIME)被集成以提供透明且可理解的威胁检测决策解释,从而增强安全分析师之间的信任和可操作性。在包括CICIDS2017、UNSW-NB15和CSE-CIC-IDS2018在内的多个基准网络入侵数据集上进行的实验评估表明,所提框架在检测准确率、精确率、召回率和F1分数方面优于传统集中式和现有联邦学习方法,同时确保数据隐私、通信效率和模型可解释性。

英文摘要

The increasing adoption of distributed infrastructure systems, cloud computing, Internet of Things (IoT) technologies, and edge-based architectures has significantly expanded the cybersecurity attack surface and introduced increasingly sophisticated cyber threats. Conventional centralized intrusion detection approaches often face challenges related to scalability, data privacy, communication overhead, and limited transparency in artificial intelligence-driven decision-making processes. To address these limitations, this study proposes a Cognitive Threat Intelligence and Explainable Federated Security Analytics framework for distributed infrastructure systems. The proposed framework integrates Federated Learning (FL), Explainable Artificial Intelligence (XAI), and cognitive cybersecurity analytics to enable collaborative and privacy-preserving cyber threat detection across distributed network environments. Instead of transmitting sensitive raw network traffic data to centralized servers, local security models are independently trained at distributed nodes, where only encrypted model parameters and updates are shared through a federated aggregation mechanism. This decentralized learning architecture improves privacy protection while reducing communication dependency and centralized security risks. To enhance intelligent threat analysis, the framework incorporates machine learning and deep learning algorithms including Random Forest, XGBoost, Autoencoder

2606.05680 2026-06-05 cs.PL cs.AR cs.LG

CASS-RTL: Correctness-Aware Subspace Steering for RTL Generation with LLMs

CASS-RTL:面向LLM的RTL生成的正确性感知子空间引导

Mohammad Akyash, Nowfel Mashnoor, Kimia Azar, Hadi Kamali

发表机构 * Department of Electrical and Computer Engineering (ECE), University of Central Florida, Orlando, FL 32816, USA(电子与计算机工程系,中央佛罗里达大学,奥兰多,佛罗里达州32816,美国)

AI总结 提出CASS-RTL框架,通过识别LLM中与RTL正确性相关的注意力头并构建低维子空间进行轻量级干预,在无需额外监督或重训练的情况下提升RTL代码生成的功能准确性。

Comments Accepted to the IEEE International Conference on LLM-Aided Design (LAD '26)

详情
AI中文摘要

近期大型语言模型(LLM)的进展使得从自然语言指令自动综合(生成)寄存器传输级(RTL)代码成为可能,为加速芯片设计提供了有前景的途径。与典型的自然语言(及软件编码)任务不同,基于LLM的RTL代码生成要求严格的周期准确性和并发性,微小的逻辑错误可能导致电路无法使用或不安全。尽管先前的工作通过外部验证、自我评估提示、检索增强提示、领域特定微调、智能体解决方案和推理来探索幻觉缓解,但这些方法大多忽视了LLM中可能固有地与RTL正确性相关的注意力导向内部机制。本文提出CASS-RTL,这是首个发现并利用LLM的正确性感知组件来引导RTL生成朝向功能准确输出的框架。我们(i)识别注意力头,其激活模式一致地区分正确与不正确的RTL;(ii)构建一个低维子空间以捕获正确性相关信号;(iii)设计一种轻量级的、几何感知的干预,在推理时引导模型。CASS-RTL完全与模型无关,无需额外监督或重训练,并易于集成到现有模型中。实验上,我们在多个模型上评估CASS-RTL,观察到在VerilogEval上pass@1/5/10准确率提升10%-20%,在CVDP上提升5%,证明了我们的方法在增强可靠性方面的有效性,同时不牺牲模型效率或需要大型标注数据集进行微调。

英文摘要

Recent advances in large language models (LLMs) have enabled the automatic synthesis (generation) of register-transfer level (RTL) code from natural language instructions, offering a promising pathway to accelerate chip design. Unlike typical natural language (and software coding) tasks, LLM-based RTL code generation demands strict cycle accuracy with concurrency, where minor logical errors can render a circuit unusable or insecure. While prior work has explored hallucination mitigation via external verification, self-evaluation prompts, retrieval-augmented prompting, domain specific fine-tuning, agentic solutions, and reasoning, these approaches largely overlook the attention-oriented internal mechanisms of LLMs that may inherently correlate with RTL correctness. This work proposes CASS-RTL, a first-of-its-kind framework for discovering and leveraging LLMs' correctness-aware components to guide RTL generation toward functionally accurate outputs. We (i) identify attention heads whose activation patterns consistently differentiate correct from incorrect RTL; (ii) construct a low-dimensional subspace capturing correctness-relevant signals; and (iii) design a lightweight, geometry-aware intervention that steers the model at inference time. CASS-RTL is fully model-agnostic, requires no additional supervision or retraining, and readily integrates into existing models. Empirically, we evaluate CASS-RTL on multiple models and observe 10%-20% improvement in pass@1/5/10 accuracy on VerilogEval and 5% improvement on CVDP, demonstrating the effectiveness of our method in enhancing reliability without sacrificing model efficiency or requiring a large labeled dataset for fine-tuning.

2606.05679 2026-06-05 cs.DB cs.AI

Data Flow Control: Data Safety Policies for AI Agents

数据流控制:AI 智能体的数据安全策略

Charlie Summers, Eugene Wu

发表机构 * Columbia University(哥伦比亚大学)

AI总结 提出数据流控制框架,通过声明式策略和可移植查询重写层 Passant,在 DBMS 中强制执行元组级数据安全策略,实现接近零开销。

Comments 15 pages, 12 figures

详情
AI中文摘要

智能体越来越多地代表用户生成 SQL、编排管道和自动化数据分析。虽然最近的工作提高了查询的正确性,但正确性不等于安全性。一个查询可能在语义上有效,却违反了管理数据如何组合和发布的监管、隐私或业务约束。我们认为,强制执行此类约束本质上是一个数据基础设施问题。本文介绍了数据流控制(DFC),一个在 DBMS 查询中声明式指定并保证对元组级数据流实施策略的框架。一个关键挑战是定义一种优化器无关但可大规模高效执行的策略语言。我们将数据安全形式化为关于溯源单项的聚合谓词,并提出了 Passant,一个可移植的查询重写层,无需物化溯源即可强制执行 DFC 策略。在五个 DBMS 引擎——DuckDB、Umbra、PostgreSQL、DataFusion 和 SQLServer 上,Passant 实现了约 0% 的开销,并且性能优于替代方案数个数量级。因此,数据流控制是将数据安全从提示和事后检查转移到数据基础设施的第一步。数据流控制开源可用:https://github.com/dataflowcontrol/data-flow-control。

英文摘要

Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. We argue that enforcing such constraints is fundamentally a data infrastructure problem. This paper introduces Data Flow Control (DFC), a framework to declaratively specify and guarantee policy enforcement over tuple-level data flows within a DBMS query. A key challenge is defining a policy language that is optimizer-invariant yet efficient to enforce at scale. We formalize data safety as aggregate predicates over provenance monomials and present Passant, a portable query rewriting layer that enforces DFC policies without materializing provenance. Across five DBMS engines -- DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer -- Passant achieves ~0% overhead and outperforms alternatives by orders of magnitude. As a result, Data Flow Control is the first step towards moving data safety from prompts and post-hoc checks into the data infrastructure. Data Flow Control is available open source at https://github.com/dataflowcontrol/data-flow-control.

2606.05658 2026-06-05 cs.IR cs.AI

Agent-Orchestrated Adaptive RAG: A Comparative Study on Structured and Multi-Hop Retrieval

Agent编排的自适应RAG:结构化与多跳检索的比较研究

Anuj Maharjan, Devinder Kaur, Richard Molyet

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出Agent编排的自适应RAG框架,通过动态查询分解、迭代检索和自反思评估,在结构化领域(DevOps)和多跳推理基准(MuSiQue)上对比发现,查询分解在结构化领域提升性能但降低多跳排名精度,反思机制提高引用准确性但增加延迟,表明Agent增强需根据查询和领域特性选择性应用。

详情
AI中文摘要

检索增强生成(RAG)通过将响应基于外部知识来增强大型语言模型(LLM),但传统流水线依赖于静态的单步检索,这限制了复杂查询的性能。本文提出了一种Agent编排的自适应RAG框架,引入了动态查询分解、迭代检索和有界自反思评估循环。我们在两个互补的数据集上评估该系统:一个特定领域的DevOps知识库和多跳推理基准MuSiQue。使用包括总体得分、引用准确性、平均倒数排名和主题覆盖度在内的指标,我们发现查询分解在结构化领域(DevOps上总体得分+0.04,MRR+0.17)带来一致的增益,但在多跳基准上降低了排名精度,而反思机制以显著的延迟成本提高了引用准确性。这些对比结果表明,Agent增强并非普遍有益,必须根据查询和领域特性选择性应用。我们的发现支持自适应、成本感知的编排,而非统一激进的推理流水线。

英文摘要

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding their responses in external knowledge, but conventional pipelines rely on static, single-step retrieval that limits performance on complex queries. This paper presents an Agent-Orchestrated Adaptive RAG framework that introduces dynamic query decomposition, iterative retrieval, and a bounded self-reflective evaluation loop. We evaluate the system across two complementary datasets: a domain-specific DevOps knowledge base and the multi-hop reasoning benchmark MuSiQue. Using metrics that include overall score, citation accuracy, mean reciprocal rank, and topic coverage, we find that query decomposition yields consistent gains in the structured domain (overall score $+0.04$, MRR $+0.17$ on DevOps) but degrades ranking precision on the multi-hop benchmark, while the reflection mechanism improves citation accuracy at a substantial latency cost. These contrasting results show that agentic enhancements are not universally beneficial and must be applied selectively according to query and domain characteristics. Our findings argue for adaptive, cost-aware orchestration rather than uniformly aggressive reasoning pipelines.

2606.05650 2026-06-05 cs.MM cs.CV cs.GR cs.NI

GS-NFS: Bandwidth-adaptive Streaming of Dynamic Gaussian Splats and Point Clouds

GS-NFS: 动态高斯溅射和点云的带宽自适应流传输

Rajrup Ghosh, Haodong Wang, Haoran Hong, Eduardo Pavez, Amartya Chaudhuri, Weiwu Pang, Harsha V. Madhyastha, Antonio Ortega, Ramesh Govindan

发表机构 * University of Southern California(南加州大学)

AI总结 提出GS-NFS方法,通过GPU并行加速动态3DGS帧的编解码,实现全帧率运行,速度比现有技术快1-2个数量级,同时保持竞争性的压缩性能和渲染质量。

详情
AI中文摘要

动态3D高斯溅射(3DGS)作为一种3D视频流技术具有很大前景,因为它能够以高保真度表示复杂的3D场景。在该方法中,3D视频的每一帧将环境表示为一组高斯体,每个高斯体具有位置以及其他属性,如尺度、旋转、不透明度和颜色。帧捕捉了精细细节,允许从任意视角观看,但数据量比2D视频帧大一个数量级或更多。最近的一系列工作探索了如何压缩动态3DGS帧,但这些方法通常较慢,部分原因是它们的压缩技术不适合高效加速。GS-NFS在GPU上加速动态3DGS的压缩和解压缩,达到能够以全帧率编码和解码的程度。它通过开发基于GPU的新型并行化方法,对现有的高斯位置和属性编码算法进行并行化来实现这一点。因此,它在编码和解码一帧时比现有技术快1-2个数量级,同时提供具有竞争力的压缩性能和渲染质量。

英文摘要

Dynamic 3D Gaussian Splatting (3DGS) holds great promise as a 3D video streaming technology since it can represent complex 3D scenes with high fidelity. In this approach, every frame in a 3D video represents the environment as a collection of Gaussians with position and other attributes such as scale, rotation, opacity, and color. Frames capture fine details, permit views from any arbitrary perspective, but are an order of magnitude, or more, larger than 2D video frames. A line of recent work has explored how to compress dynamic 3DGS frames, but these approaches are often slow, in part because their compression techniques are not amenable to efficient acceleration. GS-NFS accelerates dynamic 3DGS compression and decompression on a GPU, to the point where it can encode and decode at full frame rate. It achieves this by developing novel GPU-based parallelizations of existing algorithms for encoding both positions and attributes of Gaussians. As a result, it is 1-2 orders of magnitude faster than the state-of-the-art in encoding and decoding a frame, while offering competitive compression performance and rendering quality.

2606.05649 2026-06-05 stat.CO cs.LG

Diff2SP: Diffusion Models for Correlated Scenario Generation in Stochastic Programming

Diff2SP:随机规划中相关场景生成的扩散模型

Haixiang Sun, Andrew Liu

发表机构 * Purdue University(普渡大学)

AI总结 提出Diff2SP扩散生成框架,将下游优化目标嵌入场景生成过程,通过理论证明和经验验证实现统计一致性与决策感知的平衡。

详情
AI中文摘要

场景生成是随机规划(SP)中的关键组成部分,直接影响不确定性下决策的质量。现有方法主要依赖于基于采样的技术或使用神经网络的监督学习。基于采样的方法通常难以捕捉复杂依赖关系和罕见但可能的事件,而监督学习需要固定的输入-输出对进行训练,且生成不受预定义模式或规则限制的多样化现实场景的能力有限。为了解决这些局限性,我们引入了Diff2SP,一种基于扩散的生成框架,将下游优化目标直接融入场景生成中。与将场景生成和决策制定视为独立步骤的传统方法不同,Diff2SP将随机优化嵌入训练过程,从而生成既统计一致又具有决策感知的场景。为了正式证明这种优化感知设计的合理性,我们建立了将分布精度与决策质量联系起来的遗憾界,并建立了样本复杂度保证,显示出比传统生成模型(如GAN)更快的收敛速度。在合成数据集和电力系统数据集上的实证结果验证了这些理论见解,表明Diff2SP在统计保真度和下游优化结果上均有一致提升。

英文摘要

Scenario generation is a critical component in stochastic programming (SP), as it directly influences the quality of decision-making under uncertainty. Existing approaches predominantly rely on either sampling-based techniques or supervised learning using neural networks. Sampling-based techniques often struggle to capture complex dependencies and rare but plausible events, while supervised learning requires fixed input-output pairs for training and is limited in its ability to generate a wide variety of realistic scenarios that are not restricted by predefined patterns or rules. To address these limitations, we introduce Diff2SP, a diffusion-based generative framework that incorporates downstream optimization objectives directly into scenario generation. Unlike conventional methods that treat scenario generation and decision-making as separate steps, Diff2SP embeds stochastic optimization into the training process, enabling the generation of scenarios that are both statistically coherent and decision-aware. To formally justify this optimization-aware design, we establish a regret bounds that link distributional accuracy to decision quality, and establish sample complexity guarantees showing faster convergence than traditional generative models such as GANs. Empirical results on both synthetic and power-system datasets validate these theoretical insights, demonstrating that Diff2SP consistently improves both statistical fidelity and downstream optimization outcomes.

2606.05646 2026-06-05 cs.SE cs.AI

Enhancing Software Engineering Through Closed-Loop Memory Optimization

通过闭环内存优化增强软件工程

Xuehang Guo, Zora Zhiruo Wang, Qingyun Wang, Graham Neubig, Xingyao Wang

发表机构 * William & Mary(威廉玛丽学院) Carnegie Mellon University(卡内基梅隆大学) OpenHands University(OpenHands大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出闭环内存优化框架,通过验证下游影响来定义内存效用,作为评估基准和优化信号,显著提升软件工程代理的成功率和效率。

详情
AI中文摘要

大型语言模型(LLMs)使得强大的软件工程(SE)代理能够导航复杂的代码库并解决现实世界的问题。然而,这些代理本质上仍然是 episodic 的:它们无法跨任务保留、改进和重用经验,反复从头构建上下文并重复类似的错误。即使有内存支持,它们也无法弥补缺乏原则性、任务无关的 \textit{内存效用} 的缺陷,这使得难以严格评估或跨代理和设置进行泛化。为了解决这些限制,我们引入了 \ours,一个用于 SE 代理内存增强的闭环框架。\ours 将内存效用建立在 \textit{验证的下游影响} 上,将效用确立为任务无关的 \textbf{评估基准} 和无注释的 \textbf{优化信号}。通过在 \textit{单 episode} 和 \textit{跨 episode} 内存增强上的互补评估,结果表明 \ours 在不同设置下一致地改进了 SE 代理,在成功率上实现了高达 $\uparrow5.25\\%$ 的绝对增益,在解决效率上实现了 $\uparrow4.63\\%$ 的绝对增益,同时大幅降低了计算成本 $\geq9.79\\%$。我们的项目页面:\href{https://xhguo7.github.io/MemOp/}{https://xhguo7.github.io/MemOp/}。

英文摘要

Large language models (LLMs) have enabled powerful software engineering (SE) agents capable of navigating complex codebases and resolving real-world issues. However, these agents remain fundamentally episodic: they fail to retain, refine, and reuse experiences across tasks, repeatedly reconstructing context from scratch and reproducing similar mistakes. Even with memory support, they offer no remedy for the absence of a principled, task-agnostic \textit{memory utility}, making them difficult to evaluate rigorously or generalize across agents and settings. To tackle these limitations, we introduce \ours, a closed-loop framework for memory augmentation in SE agents. \ours grounds memory utility in \textit{validated downstream impact}, establishing utility as both a task-agnostic \textbf{evaluation benchmark} and an annotation-free \textbf{optimization signal}. Through complementary evaluation on \textit{single-episode} and \textit{cross-episode} memory augmentation, results demonstrate that \ours consistently improves SE agents across settings, achieving absolute gains of up to $\uparrow5.25\%$ in success rate and $\uparrow4.63\%$ in resolve efficiency, while substantially reducing computational cost by $\geq9.79\%$. Our project page: \href{https://xhguo7.github.io/MemOp/}{https://xhguo7.github.io/MemOp/}.

2606.05618 2026-06-05 nlin.CD cs.LG math.DS

Uncovering Extreme Event Mechanisms for Prediction and Control with Sensitivity-Balanced Projections

利用敏感度平衡投影揭示极端事件机制以进行预测与控制

Nicholas Zolman, Sajeda Mokbel, Samuel E. Otto, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington(华盛顿大学机械工程系) AI Institute in Dynamic Systems, University of Washington(华盛顿大学动态系统人工智能研究所) Sibley School of Mechanical and Aerospace Engineering, Cornell University(康奈尔大学Sibley机械与航空航天工程学院)

AI总结 提出基于协方差平衡降维(CoBRAS)的可解释方法,通过自动微分替代伴随计算,识别敏感度平衡投影以揭示极端事件机制,并用于数据驱动预测和事件抑制控制。

Comments 12 pages, 6 figures (main text). Additional 14 pages of references and Supplementary Information

详情
AI中文摘要

极端事件——如地震和日冕物质抛射——在许多混沌动力系统中很常见,但由于驱动它们的微妙不稳定性机制,很难表征和预测。在这项工作中,我们开发了一种可解释的技术,揭示极端事件背后的潜在机制,并利用它们构建数据驱动的预测和直观的事件抑制控制器。特别是,我们利用伴随快照的协方差平衡降维(CoBRAS)方法来识别线性斜投影,这些投影最好地捕获感兴趣量的敏感度并重建原始状态。重要的是,我们绕过了繁琐的伴随计算的需要,而是通过现代自动可微数值框架使用反向传播。为了适应空间局部事件,我们还引入了一种新的CoBRAS变体,以获得局部敏感度平衡投影。我们展示了这种方法在一系列具有挑战性的系统中表征极端事件的效用,包括二维Kolmogorov流中湍流能量耗散的爆发、耦合FitzHugh-Nagumo振荡器网络中的自发同步,以及由修正非线性薛定谔方程产生的海洋怪波的局部形成。对于每个例子,我们展示了我们的简单预测模型准确预测极端事件,并且潜在机制可用于设计控制律以防止这些事件。最后,我们证明了通过直接从数据学习动力学的神经网络代理模型,我们可以将这种方法扩展到实验系统和那些并非原生用自动可微编程语言编写的系统。

英文摘要

Extreme events -- such as earthquakes and coronal mass ejections -- are common in many chaotic dynamical systems, yet are difficult to characterize and predict due to the subtle instability mechanisms that drive them. In this work, we develop an interpretable technique that reveals the underlying mechanisms behind extreme events and uses them to build data-driven forecasts and intuitive event suppression controllers. In particular, we utilize the covariance balancing reduction using adjoint snapshots (CoBRAS) method to identify linear oblique projections that best capture the sensitivity of a quantity of interest and reconstruct the original state. Importantly, we bypass the need for cumbersome adjoint calculations, instead using backpropagation via modern automatically differentiable numerical frameworks. To accommodate spatially localized events, we also introduce a new variant of CoBRAS to obtain local sensitivity-balanced projections. We demonstrate the utility of this approach to characterize extreme events across a diverse set of challenging systems, including turbulent bursts of energy dissipation in the 2D Kolmogorov Flow, spontaneous synchronization in networks of coupled FitzHugh-Nagumo oscillators, and the localized formation of ocean rogue waves from a modified nonlinear Schrödinger equation. For each example, we show that our simple forecast models accurately predict extreme events and that the underlying mechanisms may be used to design control laws to prevent these events. Finally, we demonstrate that by learning a neural network surrogate model of the dynamics directly from data, we may extend this approach to experimental systems and systems that are not natively written in an automatically differentiable programming language.

2606.05609 2026-06-05 cs.CR cs.AI cs.LG

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

SlotGCG:利用LLMs中的位置脆弱性进行越狱攻击

Seungwon Jeong, Jiwoo Jeong, Hyeonjin Kim, Yunseok Lee, Woojin Lee

发表机构 * Dongguk University-Seoul(东国大学-首尔)

AI总结 本文提出SlotGCG方法,通过量化提示中不同插入位置(槽)的脆弱性得分(VSS),选择最脆弱的位置插入对抗性令牌,从而显著提升基于优化的越狱攻击成功率。

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

随着大型语言模型(LLMs)的广泛部署,通过越狱攻击识别其脆弱性变得日益关键。基于优化的攻击方法如贪婪坐标梯度(GCG)专注于将对抗性令牌插入到提示的末尾。然而,GCG将对抗性令牌限制在固定的插入点(通常是提示后缀),未探索在其他位置插入令牌的效果。在本文中,我们实证研究了提示中可插入令牌的候选位置(称为槽)。我们发现越狱的脆弱性与槽的选择高度相关。基于这些发现,我们引入了脆弱性槽得分(VSS)来量化越狱的位置脆弱性。随后,我们提出SlotGCG,该方法使用VSS评估所有槽,选择最脆弱的槽进行插入,并在这些槽上运行针对性的优化攻击。我们的方法提供了一种与攻击无关的位置搜索机制,可插入任何基于优化的攻击,仅增加200毫秒的预处理时间。在多个模型上的实验表明,SlotGCG显著优于现有方法。具体而言,与基于GCG的攻击相比,它实现了14%更高的攻击成功率(ASR),收敛更快,并且对防御方法表现出更强的鲁棒性,ASR比基线方法高42%。我们的实现可在https://github.com/youai058/SlotGCG获取。

英文摘要

As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting adversarial tokens to the end of prompts. However, GCG restricts adversarial tokens to a fixed insertion point (typically the prompt suffix), leaving the effect of inserting tokens at other positions unexplored. In this paper, we empirically investigate \emph{slots}, i.e., candidate positions within a prompt where tokens can be inserted. We find that vulnerability to jailbreaking is highly related to the selection of the \emph{slots}. Based on these findings, we introduce the \textit{Vulnerable Slot Score} (VSS) to quantify the positional vulnerability to jailbreaking. We then propose SlotGCG, which evaluates all slots with VSS, selects the most vulnerable slots for insertion, and runs a targeted optimization attack at those slots. Our approach provides a position-search mechanism that is attack-agnostic and can be plugged into any optimization-based attack, adding only 200ms of preprocessing time. Experiments across multiple models demonstrate that SlotGCG significantly outperforms existing methods. Specifically, it achieves 14\% higher Attack Success Rates (ASR) over GCG-based attacks, converges faster, and shows superior robustness against defense methods with 42\% higher ASR than baseline approaches. Our implementation is available at \href{https://github.com/youai058/SlotGCG}{https://github.com/youai058/SlotGCG}

2606.05584 2026-06-05 cs.CR cs.AI

Dimensionality Reduction for Cyberattack Classification: A Comparative Evaluation of PCA and Linear Predictive Coding

网络攻击分类的降维:PCA与线性预测编码的比较评估

Nelly Elsayed, Zag ElSayed, Navid Asadizanjani

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文通过比较主成分分析(PCA)和线性预测编码(LPC)两种降维方法,研究网络攻击分类中的特征压缩技术,实验表明PCA在激进压缩下仍能保持分类性能,LPC则略有性能下降,但两者均能在最小影响分类准确率的情况下大幅降低特征维度。

Comments Acceprted in the IEEE MWSCAS 2026

详情
AI中文摘要

高维特征表示被广泛用于基于机器学习的网络攻击检测系统。然而,它们增加了计算复杂度,并可能阻碍在资源受限环境中的部署。在本文中,我们通过比较两种降维方法:主成分分析(PCA)和线性预测编码(LPC),研究用于网络攻击分类的特征压缩技术。生成具有不同维度的压缩特征表示,并在多个分类模型上进行评估。实验分析表明,即使在激进压缩下,PCA也能保持分类性能。另一方面,LPC提供了具有竞争力的预测表示,但性能下降略大。结果表明,可以在对分类准确率影响最小的情况下实现特征维度的显著降低,突显了轻量级特征压缩在高效网络安全分析中的潜力。

英文摘要

High-dimensional feature representations are widely used in machine learning-based cyberattack detection systems. However, they increase computational complexity and may hinder deployment in resource-constrained environments. In this paper, we investigate feature compression techniques for cyberattack classification by comparing two dimensionality reduction approaches: Principal Component Analysis (PCA) and Linear Predictive Coding (LPC). Compressed feature representations with varying dimensionalities are generated and evaluated across several classification models. Experimental analysis demonstrates that PCA preserves classification performance even under aggressive compression. On the other hand, LPC provides competitive predictive representations with slightly larger performance degradation. The results show that substantial reductions in feature dimensionality can be achieved with minimal impact on classification accuracy, highlighting the potential of lightweight feature compression for efficient cybersecurity analytics.

2606.05581 2026-06-05 cs.GR cs.CV cs.LG

Monte Carlo Steklov Operators for Large-Scale Geometry Processing in the Wild

蒙特卡洛Steklov算子用于大规模野外几何处理

Arman Maesumi, Tanish Makadia, Aruna Anderson, Oras Phongpanangam, Justin Solomon, Daniel Ritchie

发表机构 * Brown University(布朗大学) Loyola Marymount University(洛约拉玛丽蒙特大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出一种蒙特卡洛方法估计Dirichlet-to-Neumann算子及其Steklov特征模态,实现鲁棒且高效的体积算子计算,并应用于大规模3D对比表示学习。

Comments 21 pages

详情
AI中文摘要

内在方法填充了网格几何处理的默认工具箱。内在算子,特别是拉普拉斯算子,是对等距不变性有要求的方法的基础,因此已用于许多形状分析、学习和编辑算法。然而,内在方法的前提假设在处理野外几何时变得脆弱,因为(i)网格质量无法保证,(ii)许多网格由多个连通分量建模。在这种情况下,体积构造定义更清晰,因为可以放宽对表面拓扑的限制。本文提出了一种蒙特卡洛方法,用于估计Dirichlet-to-Neumann (DtN)算子——一种边界到边界的体积算子——及其相关的Steklov特征模态。我们基于蒙特卡洛几何处理的最新发展,将该边界算子本身作为估计对象。通过体积随机过程定义的DtN算子被推广到外部域,通过周围环境空间耦合断开的分量。我们表明,我们的方法在计算Steklov谱时比现有的边界元方法快几个数量级,同时对低质量三角剖分、高分辨率网格和多分量几何保持鲁棒。为了展示这种可扩展性,我们计算了来自未策划的Objaverse数据集的约450,000个形状的内外Steklov特征谱。我们将这些算子集成到Steklov-CLIP中,这是一种基于网格的神经网络,使用体积谱算子进行大规模对比3D表示学习。得到的网络学习到语义上有意义的全局和密集形状表示,说明几何上有原则的体积算子可以在现代3D数据集规模上变得实用。

英文摘要

Intrinsic methods fill the default toolbox for geometry processing on meshes. Intrinsic operators, in particular the Laplacian, underlie methods that require invariance to isometry and have hence been employed in many algorithms for shape analysis, learning, and editing. However, intrinsic methods are predicated on assumptions that quickly become brittle when working with in-the-wild geometry, where (i) mesh quality is not guaranteed, and (ii) many meshes are modeled with multiple connected components. In such settings, volumetric constructions are better-defined, since restrictions on surface topology can be relaxed. This paper presents a Monte Carlo method for estimating the Dirichlet-to-Neumann (DtN) operator -- a boundary-to-boundary volumetric operator -- and its associated Steklov eigenmodes. We build on recent developments in Monte Carlo geometry processing by casting this boundary operator itself as the subject of estimation. The DtN operator, defined through a volumetric stochastic process, is then generalized to the exterior domain, where it couples disconnected components through the surrounding ambient space. We show that our method is orders of magnitude faster than existing boundary-element approaches for computing Steklov spectra while remaining robust to poor triangulations, high-resolution meshes, and multi-component geometry. To demonstrate this scalability, we compute interior and exterior Steklov eigenspectra for approximately 450,000 shapes from the uncurated Objaverse dataset. We incorporate these operators into Steklov-CLIP, a mesh-based neural network that uses volumetric spectral operators for large-scale contrastive 3D representation learning. The resulting network learns semantically meaningful global and dense shape representations, illustrating that geometrically-principled volumetric operators can be made practical at the scale of modern 3D datasets.

2606.05572 2026-06-05 cs.ET cs.HC cs.RO physics.app-ph

Wave Focusing in Metamaterials: Tactile Displays Beyond the Diffraction Limit

超材料中的波聚焦:超越衍射极限的触觉显示器

Gregory Reardon, Max Linnander, Dustin Goetz, Neeli Tummala, Yon Visell

发表机构 * Media Arts and Technology Program(媒体艺术与技术项目) Department of Mechanical Engineering(机械工程系) Department of Electrical and Computer Engineering(电气与计算机工程系) University of California, Santa Barbara(加州大学圣芭芭拉分校)

AI总结 本文利用局部共振超材料板中的慢波分支实现机械波聚焦,突破衍射极限,生成高分辨率虚拟触觉像素,并将像素面积缩小十倍。

详情
AI中文摘要

我们解决了工程化分布式触觉显示器的挑战,该显示器能够在表面上任意位置再现多个局部化、可独立寻址的振动——代表虚拟触觉像素。我们的技术基于使用稀疏的致动器阵列在弯曲板中聚焦机械波。在触觉频率下,波衍射阻止了在多指触摸交互相关空间尺度上形成局部化虚拟触觉像素。我们通过在板上增加机械共振器晶格,形成局部共振超材料板,克服了这一限制。板的动态模式与共振器模式之间的耦合改变了控制波传播的色散关系,引入了一个慢波分支,使得能够超越未修改板所施加的衍射极限进行聚焦。我们使用数值模拟来设计超材料系统的色散关系,以实现触觉频率下的高分辨率聚焦。然后,我们制造了一个超材料触觉显示器,并实验证明虚拟像素比在没有共振器的相同板上生成的像素更加局部化,导致虚拟像素面积缩小十倍。在行为实验中,我们展示了该系统能够传递感知上局部化的单点和多点触觉反馈以及移动触觉源,同时保持对多个显示位置的时间波形的独立控制。这里报告的方法可以使用少量致动自由度实现高分辨率触觉显示器,适用于广泛应用。

英文摘要

We address the challenge of engineering distributed haptic displays capable of reproducing multiple localized, independently addressable vibrations -- representing virtual tactile pixels -- at arbitrary locations on a surface. Our technique is based on the focusing of mechanical waves in a flexural plate using a sparse set of actuators. At tactile frequencies, wave diffraction prevents the formation of localized virtual tactile pixels at spatial scales relevant for multi-digit touch interactions. We overcome this limitation by augmenting the plate with a lattice of mechanical resonators, forming a locally resonant metamaterial plate. Coupling between the plate's dynamic modes and those of the resonators alters the dispersion relation governing wave transmission, introducing a slow-wave branch that enables focusing beyond the diffraction limit imposed by the unmodified plate. We use numerical simulations to engineer the dispersion relation of the metamaterial system for high-resolution focusing at tactile frequencies. We then fabricate a metamaterial tactile display and experimentally demonstrate virtual pixels that are far more localized than those generated on an otherwise identical plate without resonators, resulting in a tenfold reduction in virtual-pixel area. In behavioral experiments, we show that this system can deliver perceptually localized single- and multi-point tactile feedback and moving tactile sources while maintaining independent control over temporal waveforms at multiple display locations. The methods reported here can enable high-resolution haptic displays for widespread applications using a small number of actuated degrees of freedom.

2606.05568 2026-06-05 cs.IR cs.CL

ColBERTSaR: Sparsified ColBERT Index via Product Quantization

ColBERTSaR: 通过乘积量化实现稀疏化的 ColBERT 索引

Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield, Saron Samuel, Rohan Jha

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出通过乘积量化将 ColBERT 索引转化为真正的倒排索引,显著减小索引大小(比 PLAID 小 50-70%)同时保持检索效果。

Comments 6 pages, 1 figure, accepted at SIGIR 2026 as a short paper

详情
AI中文摘要

虽然 ColBERT 是一种有效的神经检索架构,但它需要庞大的索引结构来支持基于近似 token 嵌入的候选集检索、收集和解压文档 token 嵌入以及应用 MaxSim 操作。PLAID 和类似 ColBERT 实现中的索引所需磁盘存储量是原始原始文本的五到十倍,这限制了它们的可扩展性。此外,先前的工作已经确定,收集和解压阶段是查询时的主要低效环节。通过阈值和分数近似来限制必须收集的文档 token 数量并不能消除整个索引支持即席查询的需求。在这项工作中,我们提出了一种嵌入量化方法,将 ColBERT 索引转变为真正的倒排索引。我们从理论上证明,除了评分机制外,带有嵌入量化的 ColBERT 等价于学习型稀疏检索。实验表明,我们的索引比一位 PLAID 索引小 50-70%,同时保持检索效果。

英文摘要

While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and decompressing document token embeddings, and applying the MaxSim operation. Indexes in PLAID and similar ColBERT implementations require five to ten times the disk storage of the original raw text, which limits their scalability. Furthermore, prior work has identified that the gathering and decompression stages are the primary inefficiencies at query time. Limiting the number of document tokens that must be gathered by thresholding and score approximation does not eliminate the need for the entire index to support ad hoc queries. In this work, we propose an embedding quantization approach that turns a ColBERT index into a true inverted index. We show that, theoretically, ColBERT with embedding quantization is equivalent to learned-sparse retrieval except for the scoring mechanism. Empirically, we demonstrate that our index is 50-70% smaller than a one-bit PLAID index while retaining retrieval effectiveness.

2606.05548 2026-06-05 cs.SE cs.AI

ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer

ADK Arena: 通过LLM即开发者评估智能体开发工具包

Jintao Huang, Xiaomin Li, Gaurav Mittal, Yu Hu

发表机构 * The Ohio State University(俄亥俄州立大学) Microsoft(微软)

AI总结 提出LLM-as-a-Developer方法,通过自动化流水线ADK Arena评估51个Python ADK框架,发现框架间生成成本差异达5.6倍,但无单一框架占优,且文档、源码和参数知识可相互替代。

Comments Work in Progress

详情
AI中文摘要

智能体开发工具包(ADK)作为构建LLM驱动自主智能体的SDK级框架的快速普及,已经超越了关于框架选择如何影响智能体性能的任何实证理解。我们提出 extbf{LLM即开发者}方法,用LLM编码智能体替代人类开发者,该智能体从文档中学习每个框架的API,编写智能体代码,并通过验证-反馈循环迭代修复直到测试通过。通过保持开发者不变而仅改变框架,生成工作成为API可用性的定量代理,生成的智能体提供了框架有效性的受控度量。我们在 extbf{ADK Arena}中实现这一点,这是一个完全自动化的流水线,具有每个框架的Docker隔离、三级验证流水线以及针对SWE-bench、$τ^2$-bench、Terminal-Bench和MCP-Atlas的基准适配器。评估所有51个流行的Python ADK框架(204个智能体-基准对),我们发现:(1)生成在57%的运行中成功,其成本在框架间变化5.6倍(每个智能体0.6美元至3.4美元),这是API复杂性的定量代理,尽管成本本身不能预测成功;(2)没有单一框架占主导:最佳单基准ADK智能体解决了高达80%的任务,甚至能以一小部分成本击败通用前沿编码智能体,但中位数框架仅解决32%;(3)在信息源消融实验中,真正的框架使用率保持在狭窄的28-40%范围内(原始源码访问时最高,无参考材料时仍为33%),表明文档、源代码和参数知识在很大程度上是可替代的,而不是任何一个成为硬瓶颈。

英文摘要

The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any empirical understanding of how framework choice affects agent performance. We propose \textbf{LLM-as-a-Developer}, a methodology that replaces human developers with an LLM coding agent that learns each framework's API from documentation, writes agent code, and iteratively repairs it through a validate-and-feedback loop until tests pass. By holding the developer constant and varying only the framework, generation effort becomes a quantitative proxy for API usability and the resulting agents provide a controlled measure of framework effectiveness. We implement this in \textbf{ADK Arena}, a fully automated pipeline with per-framework Docker isolation, a three-level validation pipeline, and benchmark adapters for SWE-bench, $τ^2$-bench, Terminal-Bench, and MCP-Atlas. Evaluating all 51 popular Python ADK frameworks (204 agent--benchmark pairs), we find that: (1)~generation succeeds for 57\% of runs, and its cost varies 5.6$\times$ across frameworks (\$0.6 to \$3.4 per agent), a quantitative proxy for API complexity, though cost alone does not predict success; (2)~no single framework dominates: the best single-benchmark ADK agents resolve up to 80\% of tasks and can even \emph{beat} general-purpose frontier coding agents at a fraction of the cost, yet the median framework resolves only 32\%; (3)~across information-source ablations, genuine framework usage stays within a narrow 28--40\% band (highest with raw source access and still 33\% with no reference material at all), indicating that documentation, source code, and parametric knowledge are largely substitutable rather than any one being a hard bottleneck.

2606.05509 2026-06-05 cs.HC cs.AI

The Role of Instructional Guidance in Generative AI-Assisted Learning: Empirical Evidence from Construction Engineering Education

教学指导在生成式AI辅助学习中的作用:来自建筑工程教育的实证证据

Xiaoyu Hou, Bo Xiao, Hexu Liu, Shane Mueller

发表机构 * Dept. of Civil, Environmental, and Geospatial Engineering, Michigan Technological Univ.(土木、环境与地理空间工程系,密歇根技术大学) Dept. of Civil and Construction Engineering, Western Michigan Univ.(土木与建设工程系,西部密歇根大学) Dept. of Psychology and Human Factors, Michigan Technological Univ.(心理学与人因工程系,密歇根技术大学)

AI总结 本研究通过引入基于生成学习理论的五步提示框架,在建筑工程教育中对比无提示AI辅助、有提示AI辅助和幻灯片学习三种条件,发现提示框架显著提升了需要解释和推理的任务表现(开放式评分提高约2-3分,p<0.01),表明AI辅助学习的有效性取决于交互结构。

详情
AI中文摘要

生成式人工智能(AI)越来越多地被用于支持自主学习,然而学生与此类系统的交互往往缺乏结构性,限制了对更深层次认知过程的参与。本研究探讨了教学指导如何塑造建筑工程教育中学生与AI的交互。引入了一个基于生成学习理论(GLT)的五步提示框架,以指导学习者在复习活动中的交互。一项对照实验比较了三种学习条件:基于幻灯片的学习、无提示的AI辅助学习和有提示的AI辅助学习。学习表现通过多项选择和开放式任务进行评估,用户体验通过用户体验问卷(UEQ)测量。表现差异集中在需要解释和推理的任务上。有提示条件在开放式任务上得分更高,在18分量表上提高了约2或3分(p < 0.01),而多项选择表现无显著差异。无提示条件与基于幻灯片的学习相当。这些发现表明,AI辅助学习的有效性取决于交互如何结构化。所提出的框架为将学习科学原理整合到建筑工程教育的生成式AI系统中提供了基础。

英文摘要

Generative artificial intelligence (AI) is increasingly used to support self-directed learning, yet student interaction with such systems often remains unstructured, limiting engagement in deeper cognitive processes. This study examines how instructional guidance shapes student and AI interaction in construction education. A five-step prompting framework grounded in Generative Learning Theory (GLT) is introduced to guide learner interaction during review activities. A controlled experiment compares three learning conditions: slide-based learning, unprompted AI-supported learning, and prompted AI-supported learning. Learning performance is assessed using multiple-choice and open-ended tasks, and user experience is measured using the User Experience Questionnaire (UEQ). Performance differences are concentrated on tasks requiring explanation and reasoning. The prompted condition achieves higher open-ended scores, with an improvement of approximately 2 or 3 points on a scale of 18 (p < 0.01), while no significant differences are observed in multiple-choice performance. The unprompted condition remains comparable to slide-based learning. These findings indicate that the effectiveness of AI-supported learning depends on how interaction is structured. The proposed framework provides a basis for integrating learning science principles into generative AI systems for construction education.

2606.05488 2026-06-05 stat.ML cs.LG stat.ME

Sparse Functional Singular Value Decomposition for Biclustering and Triclustering Longitudinal Data

纵向数据的稀疏函数奇异值分解用于双聚类和三聚类

Yue Zhao, Thierry Chekouo, Sandra Safo

发表机构 * Division of Biostatistics and Health Data Science University of Minnesota(生物统计学与健康数据科学系明尼苏达大学)

AI总结 提出Tri-SfSVD框架,通过稀疏惩罚同时进行连续轨迹估计与对象、特征和时间选择,实现纵向数据中的双聚类和三聚类,优于现有方法。

详情
AI中文摘要

识别复杂疾病(如炎症性肠病,IBD)的亚型通常需要捕捉纵向组学数据中的潜在模式。然而,这些数据通常是高维、稀疏采样且时间上不规则观测的,对传统的(双)聚类和函数数据分析方法构成了重大挑战。我们提出Tri-SfSVD,一个统一的稀疏函数奇异值分解框架,用于发现纵向数据中的双聚类和三聚类。与现有的依赖于临时插值或强制限制性形状同质性假设的函数双聚类方法不同,Tri-SfSVD在单个优化框架中集成了连续轨迹估计与同时的对象、特征和时间选择。通过在对象、变量和时间子区域上施加稀疏惩罚,所提出的方法直接对观测数据操作,以发现对象级、对象-特征级和对象-特征-时间级的局部结构。大量模拟表明,Tri-SfSVD在高维设置下优于现有方法。应用于IBD多组学数据,该方法识别了三个双聚类,将样本聚类与不同的IBD相关临床特征以及特定细菌类群相关的微生物通路组联系起来,提供了可解释的对象-通路关联以表征疾病异质性。应用于多通道脑电图数据,该方法识别了三个三聚类,将样本聚类与不同的酒精相关表型以及局部脑活动模式联系起来,包括同一空间区域内由时间子区域分隔的亚组差异。

英文摘要

Identifying subtypes of complex conditions, such as Inflammatory Bowel Disease (IBD), often requires capturing latent patterns in longitudinal omics data. However, these data are typically high-dimensional, sparsely sampled, and irregularly observed over time, posing substantial challenges for conventional (bi)clustering and functional data analysis methods. We propose Tri-SfSVD, a unified sparse functional Singular Value Decomposition framework for discovering biclusters and triclusters in longitudinal data. Unlike existing functional biclustering methods that rely on ad hoc imputation or enforce restrictive shape-homogeneity assumptions, Tri-SfSVD integrates continuous trajectory estimation with simultaneous subject, feature, and temporal selection within a single optimization framework. By imposing sparse penalties across subjects, variables, and temporal subregions, the proposed method works directly on observed data to uncover localized structures at the subject, subject-feature, and subject-feature-time levels. Extensive simulations demonstrate that Tri-SfSVD outperforms existing approaches in high-dimensional settings. Applied to IBD multi-omics data, the method identified three biclusters linking sample clusters with distinct IBD-related clinical characteristics to microbial pathway groups associated with specific bacterial taxa, providing interpretable subject-pathway associations for characterizing disease heterogeneity. Applied to multi-channel EEG data, the method identified three triclusters linking sample clusters with distinct alcohol-related phenotypes to localized brain activity patterns, including subgroup differences separated by temporal subregions within the same spatial region.

2606.05474 2026-06-05 q-bio.BM cs.LG

AlloGen: Conformation-Selective Binder Generation with Differential State Scoring

AlloGen: 基于差异状态评分的构象选择性结合物生成

Hanqun Cao, Zachary Quinn, Aastha Pal, Sumi Kimura, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) The Chinese University of Hong Kong(香港中文大学) Department of Bioengineering(生物工程系) University of Pennsylvania(宾夕法尼亚大学) Department of Computer and Information Science(计算机与信息科学系)

AI总结 提出AlloGen框架,通过可学习的构象选择性评分器Qθ,结合骨架生成与状态选择性,实现针对蛋白不同构象状态的选择性结合物设计。

详情
AI中文摘要

蛋白质结合物设计主要优化亲和力,忽视了构象选择性:对于激酶、核受体和GPCR等变构靶点,无论结合多紧密,同时结合活性态和非活性态的结合物无法提供功能特异性。我们提出AlloGen,一个模块化框架,将骨架生成与学习到的状态选择性评分器$Q_θ$解耦,$Q_θ$是一个SE(3)不变的界面图变换器,通过两阶段课程训练,先学习界面几何,再施加构象区分。由于$Q_θ$完全可微且与生成器无关,它可以作为被动重排序器或主动基于梯度的引导器与任何骨架生成器集成,无需重新训练。在跨越多个家族和构象机制的多样化蛋白质基准上,AlloGen一致地识别出优先识别所需结构状态同时排斥替代构象的结合物。在钙调蛋白上的实验验证进一步表明,这些计算选择性信号可转化为物理分子,产生从头设计的肽,结合所需的全息构象,而对apo状态无检测到的结合。总之,这些结果确立了构象选择性作为可学习属性,并为状态选择性蛋白质结合物设计提供了通用框架。

英文摘要

Protein binder design has largely optimized for affinity alone, leaving conformational selectivity unaddressed: for allosteric targets such as kinases, nuclear receptors, and GPCRs, a binder that engages both active and inactive states provides no functional specificity regardless of how tightly it binds. We introduce AlloGen, a modular framework that decouples backbone generation from a learned state-selectivity scorer $Q_θ$, an SE(3)-invariant interface graph transformer trained via a two-phase curriculum that first learns interface geometry before imposing conformational discrimination. Because $Q_θ$ is fully differentiable and generator-agnostic, it integrates with any backbone generator as a passive reranker or an active gradient-based guide without retraining. Across a diverse benchmark of proteins spanning multiple families and conformational mechanisms, AlloGen consistently identifies binders that preferentially recognize desired structural states while rejecting alternative conformations. Experimental validation on calmodulin further demonstrates that these computational selectivity signals translate to physical molecules, yielding de novo peptides that bind the desired holo conformation while exhibiting no detectable binding to the apo state. Together, these results establish conformational selectivity as a learnable property and provide a general framework for state-selective protein binder design.

2606.05443 2026-06-05 cs.DL cs.CL

MIRAI: Prediction and Generation of High-Impact Academic Research

MIRAI:高影响力学术研究的预测与生成

Alex Li, Joseph Jacobson

发表机构 * MIT Media Lab(MIT媒体实验室)

AI总结 提出MIRAI深度学习框架,利用论文标题、摘要和发表日期预测其5年PageRank和引用量,并基于此构建研究构思流程以生成高影响力研究想法。

详情
AI中文摘要

科学出版的快速步伐使得识别和综合高影响力工作成为日益紧迫的挑战。我们提出了MIRAI(Multi-year Inference of Research trends and Academic Impact),一个深度学习框架,仅使用论文的标题、摘要和发表日期来预测其影响力。我们在arXiv学术图上训练MIRAI,预测5年PageRank和引用次数,对于2021年发表的论文,在PageRank预测上达到Spearman's $ρ$ 0.4686,在引用预测上达到0.6192。我们提出了一个基于MIRAI的研究构思流程,该流程产生面向高影响力的研究想法。这些想法被一个无偏的LLM评判者以4:3的比例认为比没有MIRAI的基线更具影响力。我们在https://predict-paper-impact.vercel.app上公开了5年引用预测模型。

英文摘要

The rapid pace of scientific publishing has made the identification and synthesis of high-impact work an increasingly urgent challenge. We introduce MIRAI (Multi-year Inference of Research trends and Academic Impact), a deep learning framework that predicts paper impact using only it's title, abstract, and publication date. We train MIRAI on the arXiv academic graph to predict 5-year PageRank and citation counts, achieving Spearman's $ρ$ of 0.4686 on PageRank prediction and 0.6192 on citation prediction for papers published in 2021. We propose a research ideation pipeline built on top of MIRAI that produces research ideas oriented towards high impact. These ideas were judged as more impactful than a baseline without MIRAI by an unbiased LLM judge at a 4:3 ratio. We make the 5-year citation prediction model publicly available at https://predict-paper-impact.vercel.app.