代码大模型 / AI 编程 - arXivDaily 专题

2606.19149 2026-06-19 cs.CR cs.LG 新提交 85%

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt：通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

专题命中程序修复：LLM驱动漏洞发现，属于程序修复

AI总结提出OpenAnt系统，结合静态分析与LLM推理，通过代码分解、对抗性验证和动态测试三阶段流水线，在降低误报率的同时发现未知漏洞。

详情

AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性：传统静态分析误报率高，而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型（LLM）的最新进展使得对程序行为进行语义推理成为可能，但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt，一个开源漏洞发现系统，它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先，代码库被分解为自包含的分析单元，并通过从外部入口点的可达性进行过滤，将分析面减少高达97%，同时保留与攻击相关的代码。其次，候选漏洞通过受限攻击者模拟进行对抗性验证，其中模型在现实攻击者能力下评估可利用性。第三，通过动态验证确认发现结果，其中自动生成利用环境，在沙箱容器中执行，并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明，这种架构可以识别先前未知的漏洞，同时保持可管理的分析成本并大幅减少误报。我们的结果表明，结合语义推理与利用验证的闭环漏洞发现流水线，为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源，网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

URL PDF HTML ☆

赞 0 踩 0

2506.16136 2026-06-19 cs.SE 85%

Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing

视觉即修复：基于多模态大语言模型的视觉软件问题修复

Kai Huang, Jian Zhang, Xiaofei Xie, Chunyang Chen

专题命中程序修复：多模态LLM修复视觉软件问题，属于程序修复。

AI总结本文提出GUIRepair方法，通过多模态推理解决视觉软件问题，结合图像到代码和代码到图像的组件提升故障理解和修复验证。

Journal ref 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

详情

DOI: 10.1109/ASE63991.2025.00100

AI中文摘要

基于大语言模型（LLM）的自动程序修复（APR）技术在解决真实世界GitHub问题任务中表现出有前景的结果。现有APR系统主要在单模态设置（例如SWE-bench）中进行评估。然而，这些自主系统在处理多模态问题场景（例如SWE-bench M）时面临困难，因为它们在解释和利用视觉信息方面存在局限。在多模态场景中，LLM需要依赖图形用户界面（GUI）中的视觉信息来理解故障并生成修复。为了弥合这一差距，我们提出了GUIRepair，一种用于解决多模态问题场景的跨模态推理方法，通过理解和捕捉视觉信息。具体而言，GUIRepair集成了两个关键组件，Image2Code和Code2Image，以增强故障理解和修复验证。Image2Code根据问题报告提取相关的项目文档，然后应用该领域知识生成负责视觉症状的重现代码，有效地将GUI图像转换为可执行上下文以更好地理解故障。Code2Image通过重现的代码回放视觉问题场景，并捕获修复程序的GUI渲染以评估修复是否在视觉上解决了问题，为修复验证提供反馈。我们评估了GUIRepair在SWE-bench M上的表现，该方法显示出显著的有效性。当使用GPT-4o作为基础模型时，GUIRepair解决了157个实例，优于最佳开源基线26个实例。此外，当使用o4-mini作为基础模型时，GUIRepair可以实现甚至更好的结果，解决了175个实例，优于顶级商业系统22个实例。这强调了我们新视角的成功，即通过理解和捕捉视觉信息来解决多模态问题。

英文摘要

Large language model-(LLM) based automated program repair (APR) techniques have shown promising results in resolving real-world GitHub issue tasks. Existing APR systems are primarily evaluated in unimodal settings (e.g., SWE-bench). However, these autonomous systems struggle to resolve multimodal problem scenarios (e.g., SWE-bench M) due to limitations in interpreting and leveraging visual information. In multimodal scenarios, LLMs need to rely on visual information in the graphical user interface (GUI) to understand bugs and generate fixes. To bridge this gap, we propose GUIRepair, a cross-modal reasoning approach for resolving multimodal issue scenarios by understanding and capturing visual information. Specifically, GUIRepair integrates two key components, Image2Code and Code2Image, to enhance fault comprehension and patch validation. Image2Code extracts relevant project documents based on the issue report, then applies this domain knowledge to generate the reproduced code responsible for the visual symptoms, effectively translating GUI images into executable context for better fault comprehension. Code2Image replays the visual issue scenario using the reproduced code and captures GUI renderings of the patched program to assess whether the fix visually resolves the issue, providing feedback for patch validation. We evaluate GUIRepair on SWE-bench M, and the approach demonstrates significant effectiveness. When utilizing GPT-4o as the base model, GUIRepair solves 157 instances, outperforming the best open-source baseline by 26 instances. Furthermore, when using o4-mini as the base model, GUIRepair can achieve even better results and solve 175 instances, outperforming the top commercial system by 22 instances. This emphasizes the success of our new perspective on incorporating cross-modal reasoning by understanding and capturing visual information to resolve multimodal issues.

URL PDF HTML ☆

赞 0 踩 0

2606.18941 2026-06-19 cs.PL cs.CL 新提交 70%

ESBMC-GraphPLC: Formal Verification of Graphical PLCopen XML Ladder Diagram Programs Using SMT-Based Model Checking

Graph-ESBMC-PLC：使用基于SMT的模型检查对图形化PLCopen XML梯形图程序进行形式验证

Pierre Dantas, Lucas Cordeiro, Waldir Junior

发表机构 * Computer Science, The University of Manchester（计算机科学，曼彻斯特大学）； Electrical Engineering, Federal University of Amazonas (UFAM)（电气工程，亚马逊联邦大学（UFAM））

专题命中程序修复：形式验证PLC程序，属于程序修复

AI总结针对ESBMC-PLC无法处理图形化PLCopen XML梯形图的问题，提出基于DFS的图形LD解析器，将连接图转换为布尔触点合取，并采用三级I/O推断方案，成功实现完整GOTO IR转换，验证了3个图形LD程序。

Comments 18 pages

详情

AI中文摘要

PLCopen XML为IEC 61131-3梯形图程序定义了两种编码格式：一种使用<rung>元素的文本编码，另一种将梯形逻辑表示为localId/refLocalId连接的有向图的图形编码。ESBMC-PLC支持文本格式，但将来自CONTROLLINO、Beremiz和OpenPLC Editor的图形导出解析为空GOTO中间表示，导致空洞的验证成功。本文提出Graph-ESBMC-PLC，通过基于DFS的图形LD解析器填补了这一空白。该解析器从leftPowerRail遍历连接图到每个线圈，将梯形路径提取为布尔触点合取，并应用三级I/O推断方案。按rightPowerRail的connectionPointIn序列对线圈排序，确保SET线圈在RESET线圈之前处理，匹配IEC扫描周期语义。图形到IR的转换无需改动ESBMC后端。在来自CONTROLLINO/OpenPLC Editor的3个图形LD程序上的验证表明，所有程序都生成了包含非确定性输入和梯形逻辑的完整GOTO IR，而之前生成的是空IR。所有3个程序在k=2时在70ms内验证为SAFE。11个文本LD基准测试完全保留，无回归。两个不含LD内容或不支持定时器语义的Beremiz示例被报告为发现的局限性。工件位于Zenodo（DantasCordeiro2026graphical，doi: https://doi.org/10.5281/zenodo.20699856）。

英文摘要

PLCopen XML defines two encoding formats for IEC 61131-3 Ladder Diagram programs: a textual encoding using <rung> elements, and a graphical encoding that represents rung logic as a directed graph of localId/refLocalId connections. ESBMC-PLC supported the textual format but parsed graphical exports from CONTROLLINO, Beremiz, and OpenPLC Editor into an empty GOTO intermediate representation, causing vacuous verification success. This paper presents ESBMC-GraphPLC, which closes this gap with a DFS-based graphical LD resolver. The resolver traverses the connection graph from leftPowerRail to each coil, extracts rung paths as Boolean contact conjunctions, and applies a three-tier I/O inference scheme. Ordering coils by rightPowerRail connectionPointIn sequence ensures SET coils process before RESET coils, matching IEC scan-cycle semantics. The graphical-to-IR conversion leaves the ESBMC backend unchanged. Validation on 3 graphical LD programs from CONTROLLINO/OpenPLC Editor shows all produce full GOTO IR with nondeterministic inputs and rung logic, versus the empty IR previously. All 3 verify SAFE at k=2 under 70ms. The 11 textual LD benchmarks are fully preserved, with no regression. Two Beremiz examples with no LD content or unsupported timer semantics are reported as discovered limitations. Artifact at Zenodo (DantasCordeiro2026graphical, doi:10.5281/zenodo.20699856).

URL PDF HTML ☆

赞 0 踩 0