代码大模型 / AI 编程

2604.11556 2026-06-19 cs.SE cs.AI 版本更新专题 90

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

FM-Agent: 通过基于LLM的Hoare风格推理将形式化方法扩展到大型系统

Haoran Ding, Zhaoguo Wang, Haibo Chen

专题命中代码生成：LLM自动生成函数规范实现形式化推理

AI总结提出FM-Agent框架，利用LLM自动生成函数级规范，实现大型系统的组合式推理，在143k行代码的系统中2天内发现522个新bug。

URL PDF HTML

2602.00510 2026-06-19 cs.AI cs.LG cs.SE 版本更新专题 85

PCBSchemaGen: Reward-Guided LLM Code Synthesis for Printed Circuit Boards (PCB) Schematic Design with Structured Verification

PCBSchemaGen: 奖励引导的LLM代码合成用于印刷电路板(PCB)原理图设计及结构化验证

Huanghaohe Zou, Peng Han, Emad Nazerian, Mafu Zhang, Zhicheng Guo, Alex Q. Huang

专题命中代码生成：LLM生成PCB原理图代码合成

AI总结提出PCBSchemaGen框架，通过结构化验证器引导冻结的LLM生成可修复的PCB原理图，在无单元测试的领域实现高准确率。

URL PDF HTML

2606.01338 2026-06-19 cs.CL 版本更新专题 80

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

在生物制药制造中本地LLM的自然语言到SQL查询基准测试：消费级硬件上的实证基准

Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta, Ambika Baniya Bhandari

专题命中代码生成：评估本地LLM在生物制药制造中的NL2SQL性能。

AI总结本研究评估了四种本地部署的开源大语言模型在生物制药制造数据库上的自然语言到SQL生成性能，发现代码调优的通用模型优于领域特定模型，但当前性能仍需人工监督。

URL PDF HTML

2606.05017 2026-06-19 cs.AR cs.MS 版本更新专题 60

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

GoldenFloat: 从GF4到GF256的基于Phi的静态拆分浮点系列及其Lucas精确整数恒等式

Dmitrii Vasilev

专题命中代码生成：提出GoldenFloat浮点系列RTL生成器。

AI总结提出一种由单一闭式规则生成的静态拆分浮点系列GoldenFloat，并给出多宽度RTL生成器、Lucas精确累加器路径和FPGA编解码器三个具体实现。

Comments 20 pages, single-file LaTeX, ASCII source. v2: peer-anchor updates. Adds Sarnoff P3109 (arXiv:2606.04028), AMD MXFP4 silicon (arXiv:2605.09825), NVIDIA GB10 NVFP4 measurement, companion catalog (arXiv:2606.09686), MixFP4 (arXiv:2605.31035). FL-002 expanded: (c1) GF256 bias, (c2) count drift, (g) static-split vs micro-mixing. TTSKY26a regeneration timeline added. No mathematical claims revised

URL PDF HTML

2511.18288 2026-06-19 cs.SE 版本更新专题 90

Can Large Language Models Reason About Complex Execution Paths? An Empirical Study on Python

大型语言模型能否推理复杂执行路径？基于Python的实证研究

Wenhan Wang, Kaibo Liu, Zeyu Sun, An Ran Chen, Ge Li, Gang Huang, Lei Ma

专题命中代码评测：实证研究LLM在Python执行路径推理中的能力。

AI总结本文实证研究大型语言模型在Python执行路径推理中的可行性，构建测试用例生成和缺陷分类任务，发现LLM能提升路径覆盖率，但强推理模型不一定优于弱模型。

Comments Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)

URL PDF HTML

2512.00560 2026-06-19 cs.SE 版本更新专题 80

SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

SAGE: 基于语义的灰盒游戏回归测试与大型语言模型

Jinyu Cai, Jialong Li, Nianyu Li, Zhenyu Mao, Mingyue Zhang, Kenji Tei

专题命中软件智能体：利用LLM引导强化学习自动生成游戏测试套件。

AI总结提出SAGE框架，利用LLM引导强化学习自动生成测试套件，通过语义多目标优化精简测试，并基于更新日志语义分析优先排序，在Overcooked Plus和Minecraft中实现高效回归测试。

Comments This paper has been accepted by Automated Software Engineering journal

URL PDF HTML

2601.22978 2026-06-19 cs.CR cs.PL 版本更新专题 60

Triosecuris: Formally Verified Protection Against Speculative Control-Flow Hijacking

Triosecuris：针对推测控制流劫持的形式化验证防御

Jonathan Baumann, Yonghyun Kim, Yan Farba, Catalin Hritcu, Julay Leatherman-Brooks

专题命中程序修复：形式化验证防御推测控制流劫持

AI总结提出Triosecuris，结合CET风格硬件辅助控制流完整性与编译器插入的推测加载硬化，通过形式化证明实现相对安全性，确保任意程序在推测执行下不泄露比源程序无推测时更多的信息。

Comments To appear at CSF'26; extended version with appendices. W.r.t. first revision: extended with concrete protection against Spectre RSB and renamed to Triosecuris

Journal ref 39th IEEE Computer Security Foundations Symposium (CSF) (2026) 544-559

URL PDF HTML

1. 代码生成 4 篇

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

PCBSchemaGen: Reward-Guided LLM Code Synthesis for Printed Circuit Boards (PCB) Schematic Design with Structured Verification

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

2. 代码评测 1 篇

Can Large Language Models Reason About Complex Execution Paths? An Empirical Study on Python

3. 软件智能体 1 篇

SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

4. 程序修复 1 篇

Triosecuris: Formally Verified Protection Against Speculative Control-Flow Hijacking