AI Agent - arXivDaily 专题

2508.04086 2026-06-18 cs.CL 版本更新 95%

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

ToolGrad：利用文本“梯度”高效生成工具使用数据集

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

发表机构 * Google（谷歌）； The University of Tokyo（东京大学）； RIKEN AIP（日本学术振兴会AIP）； Tohoku University（东北大学）

专题命中工具调用：提出ToolGrad框架生成工具使用数据集

AI总结提出ToolGrad框架，通过文本“梯度”引导的迭代过程先构建有效工具使用链再合成用户查询，实现低成本、高成功率的数据生成，训练模型性能超越基线。

Comments ACL 2026 Findings. Source code: https://github.com/zhongyi-zhou/toolgrad

2605.29676 2026-06-18 cs.AI cs.CL 版本更新 85%

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

符号至关重要：智能体AI系统中令牌优化格式的基准研究

Lorenz Kutschka, Bernhard Geiger

发表机构 * Know Center Research GmbH（知中心研究有限公司）； Graz University of Technology（格拉茨技术大学）； Graz Center for Machine Learning（格拉茨机器学习中心）

专题命中工具调用：智能体系统中令牌优化格式，提升工具调用效率

AI总结本研究在四个智能体基准上评估了两种令牌优化格式TOON和TRON，发现TRON在保持准确率的同时最多减少27%的令牌，而TOON虽减少18%但存在多轮解析失败和并行工具调用输出崩溃的问题。

Comments 16 pages, 6 figures, 4 tables

详情

AI中文摘要

智能体AI系统中的大型语言模型消耗工具模式和执行结果，并发出结构化数据的工具调用。这种交换的默认语言JSON是为应用间交换而非令牌效率设计的，因此其结构元素带来大量令牌开销。最近的工作提出了令牌优化替代方案，如TOON（令牌导向对象表示法）和TRON（令牌减少对象表示法）作为更紧凑的替代，但这些格式仅在孤立的理解或生成任务上进行了评估。它们在端到端智能体循环中是否保持令牌减少仍是一个开放问题。我们在四个智能体基准（BFCL、MCPToolBenchPP、MCP-Universe、StableToolBench）和五个开放权重LLM上评估了TOON和TRON，将输入压缩与输出压缩解耦，以独立测量理解和生成。TRON最多减少27%的令牌，准确率在JSON基线的14个百分点内。TOON实现了最多18%的减少，准确率成本类似为9个百分点，但在多轮解析失败上额外级联，并且对于大多数模型导致并行工具调用输出崩溃。

英文摘要

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rather than token efficiency, so its structural elements impose substantial token overhead. Recent work proposes token-optimized alternatives such as TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation) as more compact replacements, but these formats have been evaluated only on isolated comprehension or generation tasks. Whether their token reductions hold inside end-to-end agentic loops therefore remains an open question. We evaluate TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupling input compression from output compression to measure comprehension and generation independently. TRON reduces tokens by up to 27% with accuracy within 14pp of the JSON baseline. TOON achieves up to 18% reduction at a similar 9pp accuracy cost, but additionally cascades on multi-turn parsing failures and collapses parallel tool-call output for most models. The code is available at: https://github.com/lkutschka/notation-matters

URL PDF HTML ☆

赞 0 踩 0