arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

代码大模型 / AI 编程

代码生成、软件工程智能体、程序修复、测试生成和开发者工具。

今日/当前日期收录 36 信号源:cs.SE, cs.CL, cs.AI, cs.LG, cs.PL

1. 代码生成 2 篇

2606.20072 2026-06-19 cs.CL 新提交 专题 70

Source-Grounded Data Generation for Text-to-JSON Learning

基于源数据的文本到JSON学习数据生成

Sunghee Ahn, Guijin Son, Youngjae Yu

发表机构 * Seoul National University(首尔大学)

专题命中 代码生成 :文本到JSON数据生成

AI总结 提出STAGE方法,利用电子表格作为源数据,通过LLM生成报告和JSON模式,并验证真实值,显著提升文本到JSON任务的训练数据质量。

Comments Preprint

详情
AI中文摘要

从财务文件到临床记录,传统行业严重依赖冗长、非结构化的文档来存储高价值信息。将这些信息可靠地提取为结构化的、机器可读的表示形式,是使自动化系统能够访问这些内容的关键前提。JSON是这种结构化提取的自然目标,然而构建可靠且可扩展的文本到JSON训练数据仍然具有挑战性。为了解决这一差距,我们提出了STAGE(电子表格基础的文本到JSON工件生成),一种基于源数据的数据生成管道,通过使用LLM进行可扩展合成,同时根据底层电子表格验证真实值,来构建报告和JSON模式。在STAGE-Eval(我们的基于源数据的基准测试,包含851个示例的测试集)上的评估表明,STAGE生成的训练数据优于现有方法。这使Qwen3-4B的精确匹配从31.37%提高到74.27%,值准确率从45.46%提高到90.69%。

英文摘要

From financial filings to clinical records, legacy industries rely heavily on long, unstructured documents to store high-value information. Reliably extracting this information into structured, machine-readable representations is a key prerequisite to making the contents accessible to automated systems. JSON is a natural target for such structured extraction, yet constructing reliable and scalable text-to-JSON training data remains challenging. To address this gap, we propose STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a source-grounded data generation pipeline that constructs reports and JSON schema by using LLMs for scalable synthesis while validating ground-truth values against the underlying spreadsheet. Evaluations on STAGE-Eval, our source-grounded benchmark with an 851-example test set, show that STAGE produces stronger training data than existing approaches. This improves Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69%.

2606.19419 2026-06-19 cs.RO cs.AI 新提交 专题 65

Playful Agentic Robot Learning

趣味性具身机器人学习

Junyi Zhang, Jiaxin Ge, Hanjun Yoo, Letian Fu, Zihan Yang, Yaowei Liu, Raj Saravanan, Shaofeng Yin, Justin Yu, Dantong Niu, Zirui Wang, Roei Herzig, Ken Goldberg, Yutong Bai, David M. Chan, Ion Stoica, Angjoo Kanazawa, Jiahui Lei, Haiwen Feng, Trevor Darrell

发表机构 * University of California, Berkeley(加州大学伯克利分校) Impossible Research

专题命中 代码生成 :机器人编码智能体生成可执行代码策略。

AI总结 提出RATs框架,让机器人通过自主探索学习可复用技能,在LIBERO-PRO和MolmoSpaces上分别提升20.6和17.0个百分点。

Comments Project page: https://playful-rats.github.io/

详情
AI中文摘要

当前的具身机器人系统可以编写可执行的代码即策略程序、观察反馈并在多次尝试中修正行为,但它们仍然主要是任务驱动的:可复用技能仅在明确指令后获得。我们研究趣味性具身机器人学习,其中具身编码代理在下游任务到来之前,将自主导向的趣味性作为持续技能学习阶段。我们引入RATs,即专为趣味性技能获取设计的机器人代理团队。在趣味性阶段,RATs提出新颖且可学习的探索性任务,规划并执行机器人代码策略,验证中间进展,诊断失败,通过密集的步骤级反馈进行重试,并将成功执行提炼到持久代码技能库中。在测试时,代理从该冻结库中重用相关技能以帮助解决新任务。在LIBERO-PRO和MolmoSpaces上的实验表明,与无趣味性和随机趣味性基线相比,趣味性学习技能在保留的下游任务上分别提升了20.6和17.0个百分点(相对于CaP-Agent0)。此外,学习到的技能可以通过简单地检索到上下文中插入到其他推理时代码即策略代理中,无需微调基础模型,即可在RoboSuite和真实世界迁移中分别提升8.9和8.8个百分点。

英文摘要

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

2. 代码评测 3 篇

2606.19654 2026-06-19 cs.CR cs.SE 新提交 专题 70

PUFFERDOS: Efficient and Effective Attack String Generation for Regular Expression Denial of Service Vulnerabilities

PUFFERDOS:针对正则表达式拒绝服务漏洞的高效攻击字符串生成

Shangzhi Xu, Ziqi Ding, Xiao Cheng, Yuekang Li, Nan Sun, Benjamin Turnbull, Shuangxiang Kan, Siqi Ma

专题命中 代码评测 :生成正则表达式拒绝服务攻击字符串,涉及程序分析

AI总结 提出PUFFERDOS方法,通过定义三种脆弱模式并利用合成技术与组合符号执行,生成在现实长度预算内且经程序验证有效的ReDoS攻击字符串。

Comments Accepted by S&P'26

详情
AI中文摘要

ReDoS攻击构成了一类关键的资源耗尽漏洞。在此类攻击中,攻击者利用正则表达式引擎的病态最坏情况执行行为,诱导高度不对称的计算工作负载,最终耗尽系统资源并降低服务可用性。为了保护系统免受ReDoS攻击,研究人员提出了许多检测技术,这些技术通过生成攻击字符串来模拟攻击过程,以便在早期开发阶段主动利用ReDoS漏洞并促进修复。现有技术大致分为两类:搜索病态正则表达式结构的静态分析,以及合成候选攻击字符串的动态探索方法。然而,生成的攻击字符串通常不适用于实际利用,因为它们往往假设不切实际的输入长度预算,并且未在程序级别验证攻击的有效性和效率。因此,许多生成的字符串在应用于实际程序时无法触发易受攻击的正则表达式,进一步限制了其实用性。为了解决这些不足,我们引入了一种有效且高效的攻击字符串生成器PUFFERDOS,旨在合成在现实长度预算内可行且经程序级别验证的攻击输入,从而实现对实际程序中ReDoS漏洞的有效利用。具体来说,我们首先基于观察和形式化验证定义了三种脆弱模式。根据这些模式,PUFFERDOS采用合成技术生成攻击字符串,然后通过针对ReDoS的组合符号执行对字符串进行细化和验证,以确保现实世界中的可利用性。

英文摘要

ReDoS attacks constitute a critical class of resource-exhaustion vulnerabilities. In such attacks, adversaries exploit the pathological worst-case execution behavior of regular expression (regex) engines to induce highly asymmetric computational workloads, ultimately exhausting system resources and degrading service availability. To protect systems against ReDoS attacks, numerous detection techniques have been proposed that simulate the attack process by generating attack strings to proactively exploit ReDoS vulnerabilities at the early development stage and facilitate remediation. Existing techniques broadly fall into two classes: static analyses that search for pathological regex structures, and dynamic exploration methods that synthesize candidate attack strings. However, the generated attack strings are often impractical for real-world exploitation because they usually assume unrealistic input-length budgets and do not validate the effectiveness and efficiency of the attack at the program level. Therefore, many generated strings fail to trigger vulnerable regexes when applied to real-world programs, further limiting the practical utility. To address these shortcomings, we introduce an effective and efficient attack string generator, PUFFERDOS, designed to synthesize attack inputs that are both feasible within realistic length budgets and validated at the program level, enabling effective exploitation of ReDoS vulnerabilities in real-world programs. Specifically, we first define three vulnerable patterns based on our observation and formal verification. According to the patterns, PUFFERDOS conducts a synthesis technique to generate attack strings, and then refines and validates the strings with ReDoS-specific compositional concolic execution to guarantee real-world exploitability.

2606.20129 2026-06-19 cs.SE 新提交 专题 60

Learning Critical Testing Literacy Through Puzzles: an Experience Report

通过谜题学习关键测试素养:经验报告

Niels Doorn, Bart Th. Knaack, Tanja E. J. Vos, Beatriz Marín

专题命中 代码评测 :通过谜题学习软件测试素养。

AI总结 本文报告了使用谜题教授关键测试素养(CTL)的13次工作坊经验,发现参与者通过解谜、汇报和反思的完整序列学习效果显著,并开发了开源分析工具。

详情
AI中文摘要

在本文中,我们报告了使用谜题学习CTL的工作坊经验和收获。背景:软件测试重要但难以教授。我们引入了一个基于谜题的学习活动知识体系来教授CTL,该体系基于关键测试者认知模型,形成了P4TEST教学框架。我们与学生、测试人员、教师和小学生共举办了13次工作坊,评估基于谜题的关键测试素养教学。经验:在11次工作坊中,我们采用半结构化方法,变化谜题、材料和时长。在另外两次工作坊中,我们引入了工作手册和出声思考环节,以收集更多关于学习体验的数据。观察:参与者普遍认为自己在解谜时进行实验。学生倾向于收敛于解决方案,而专业人员继续探索。情绪在行为中可见,但难以通过书面反思单独浮现。出声思考环节揭示了即时推理;书面反思引发了更多元认知反思。主题“意义建构/行动中反思”捕捉了参与者如何构建问题、应对死胡同和转变策略。反思:谜题本身并非干预手段;解谜、汇报和反思的完整序列才是。更刻意地设计这一序列是未来的工作。我们还开发了一个带有内置分析功能的开源网络应用程序,用于定制工作坊。

英文摘要

In this paper, we report our experiences and takeaways from workshops using puzzles to learn CTL. Background: Software testing is important yet difficult to teach. We introduced a BoK of puzzle-based learning activities to teach CTL, based on a model of critical tester's cognition, leading to the pedagogical framework P4TEST. We conducted thirteen workshops with students, testers, teachers, and primary school pupils to assess puzzle-based teaching of critical testing literacy. Experience: Across eleven workshops, we used a semi-structured approach, varying puzzles, materials, and timing. In two additional workshops, we introduced workbooks and think-aloud sessions to gather more data on the learning experience. Observations: Participants consistently perceived themselves as experimenting while solving puzzles. Students tended to converge on solutions, while professionals continued exploring. Emotions were visible in behaviour but hard to surface through written reflection alone. Think-aloud sessions revealed immediate reasoning; written reflections elicited more meta-cognitive reflection. The theme Sensemaking / reflection-in-action captured how participants framed problems, navigated dead ends, and shifted strategies. Reflections: Puzzles are not the intervention: the entire sequence of solving, debriefing, and reflecting is. Designing that sequence more deliberately is the work ahead. We also developed an open-source web application with built-in analytics to customise workshops.

2606.20370 2026-06-19 astro-ph.IM astro-ph.GA 新提交 专题 60

ELMA: ELlipse-based bar MAjor axis estimator

ELMA:基于椭圆的棒主轴估计器

Bruna R. Bragança de Lima, Andressa Wille, Rafael S. de Souza, Ana L. Chies-Santos

专题命中 代码评测 :Python包用于星系棒长度自动估计

AI总结 提出ELMA Python包,通过迭代椭圆等照度线拟合自动估计星系棒长度,在GOODS-South的JWST/NIRCam图像上验证。

Comments 4 pages, 1 figure, published in RNAAS

Journal ref Research Notes of the AAS, Volume 10, Number 6, 2026

详情
AI中文摘要

星系棒是盘星系中关键的非轴对称结构,驱动角动量重新分布,并促进长期演化、中心质量积累以及核结构的形成。然而,对棒长度的稳健且均匀的测量仍然具有挑战性,特别是在大型成像巡天中,人工估计耗时且对方法选择敏感。我们推出了elma,一个独立的、可通过pip安装的Python包,用于自动估计已被识别为候选棒状系统的星系中的棒长度。该方法直接对二维成像数据进行操作,使用迭代椭圆等照度线拟合来追踪径向椭圆率轮廓,并从与椭圆率局部最大值对应的半长轴中识别出投影棒长度估计值。利用图像的WCS信息和用户提供的红移,elma将角度测量值转换为投影物理长度。我们在GOODS-South天区的JWST/NIRCam成像的棒状星系上演示了该包。代码在MIT许可下发布在Github仓库中。

英文摘要

Galactic bars are key non-axisymmetric structures in disk galaxies, driving angular-momentum redistribution and contributing to secular evolution, central mass build-up, and the formation of nuclear structures. Robust and homogeneous measurements of bar length, however, remain challenging, particularly for large imaging surveys, where manual estimates are time-consuming and sensitive to methodological choices. We introduce elma, a standalone, pip-installable Python package for automated bar-length estimation in galaxies already identified as candidate barred systems. The method operates directly on two-dimensional imaging data, using iterative elliptical-isophote fitting to trace the radial ellipticity profile and identify a projected bar-length estimate from the semi-major axis associated with the local maximum in ellipticity. Using the image WCS information and a user-supplied redshift, elma converts angular measurement into a projected physical length. We demonstrate the package on JWST/NIRCam imaging of barred galaxies in the GOODS-South field. The code is released under the MIT license at a repository in Github.

3. 程序修复 1 篇

2606.18941 2026-06-19 cs.PL cs.CL 新提交 专题 70

ESBMC-GraphPLC: Formal Verification of Graphical PLCopen XML Ladder Diagram Programs Using SMT-Based Model Checking

Graph-ESBMC-PLC:使用基于SMT的模型检查对图形化PLCopen XML梯形图程序进行形式验证

Pierre Dantas, Lucas Cordeiro, Waldir Junior

发表机构 * Computer Science, The University of Manchester(计算机科学,曼彻斯特大学) Electrical Engineering, Federal University of Amazonas (UFAM)(电气工程,亚马逊联邦大学(UFAM))

专题命中 程序修复 :形式验证PLC程序,属于程序修复

AI总结 针对ESBMC-PLC无法处理图形化PLCopen XML梯形图的问题,提出基于DFS的图形LD解析器,将连接图转换为布尔触点合取,并采用三级I/O推断方案,成功实现完整GOTO IR转换,验证了3个图形LD程序。

Comments 18 pages

详情
AI中文摘要

PLCopen XML为IEC 61131-3梯形图程序定义了两种编码格式:一种使用<rung>元素的文本编码,另一种将梯形逻辑表示为localId/refLocalId连接的有向图的图形编码。ESBMC-PLC支持文本格式,但将来自CONTROLLINO、Beremiz和OpenPLC Editor的图形导出解析为空GOTO中间表示,导致空洞的验证成功。本文提出Graph-ESBMC-PLC,通过基于DFS的图形LD解析器填补了这一空白。该解析器从leftPowerRail遍历连接图到每个线圈,将梯形路径提取为布尔触点合取,并应用三级I/O推断方案。按rightPowerRail的connectionPointIn序列对线圈排序,确保SET线圈在RESET线圈之前处理,匹配IEC扫描周期语义。图形到IR的转换无需改动ESBMC后端。在来自CONTROLLINO/OpenPLC Editor的3个图形LD程序上的验证表明,所有程序都生成了包含非确定性输入和梯形逻辑的完整GOTO IR,而之前生成的是空IR。所有3个程序在k=2时在70ms内验证为SAFE。11个文本LD基准测试完全保留,无回归。两个不含LD内容或不支持定时器语义的Beremiz示例被报告为发现的局限性。工件位于Zenodo(DantasCordeiro2026graphical,doi: https://doi.org/10.5281/zenodo.20699856)。

英文摘要

PLCopen XML defines two encoding formats for IEC 61131-3 Ladder Diagram programs: a textual encoding using <rung> elements, and a graphical encoding that represents rung logic as a directed graph of localId/refLocalId connections. ESBMC-PLC supported the textual format but parsed graphical exports from CONTROLLINO, Beremiz, and OpenPLC Editor into an empty GOTO intermediate representation, causing vacuous verification success. This paper presents ESBMC-GraphPLC, which closes this gap with a DFS-based graphical LD resolver. The resolver traverses the connection graph from leftPowerRail to each coil, extracts rung paths as Boolean contact conjunctions, and applies a three-tier I/O inference scheme. Ordering coils by rightPowerRail connectionPointIn sequence ensures SET coils process before RESET coils, matching IEC scan-cycle semantics. The graphical-to-IR conversion leaves the ESBMC backend unchanged. Validation on 3 graphical LD programs from CONTROLLINO/OpenPLC Editor shows all produce full GOTO IR with nondeterministic inputs and rung logic, versus the empty IR previously. All 3 verify SAFE at k=2 under 70ms. The 11 textual LD benchmarks are fully preserved, with no regression. Two Beremiz examples with no LD content or unsupported timer semantics are reported as discovered limitations. Artifact at Zenodo (DantasCordeiro2026graphical, doi:10.5281/zenodo.20699856).