Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning
知道在如何解决之前该解决什么:预规划赋能的大语言模型数学推理
Shaojie Wang, Liang Zhang
AI总结 提出PPC框架,通过引入显式的问题理解阶段(预规划)来弥补现有规划推理方法中“如何解决”与“该解决什么”之间的范式差距,在多个数学推理基准上取得最佳结果。
详情
当前的基于规划的推理方法通过在执行前插入规划阶段来改进大语言模型(LLMs),形成了问题→规划→思维链的范式。虽然有效,但仔细审视发现存在固有的范式级差距:规划和执行阶段都决定了如何解决问题,而之前的问题——该解决什么,即识别问题类型、适用工具和可预见的陷阱——仍然完全隐含。为弥补这一差距,我们提出PPC(预规划-规划-思维链),一个引入显式问题理解阶段(预规划)的框架,产生了新的问题→预规划→规划→思维链范式。实现这一范式需要在两端维护预规划的概念完整性。具体地,我们设计了一个三阶段合成流程,配备一个剧透分数检测器来过滤泄漏和剧透故障,以构建干净的预规划监督,并且一个复合GRPO奖励强制生成的规划真正遵循预规划。在四个骨干模型和五个数学推理基准上的实验表明,PPC在40个指标中的39个上取得了最佳结果,在不引入额外推理令牌开销的情况下,将maj@16和pass@16分别比最强基线提高了+2.23和+3.06。
Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examination reveals an inherent paradigm-level gap: both the planning and its execution stages decide how to solve a problem, while the prior question of what to solve; recognizing the problem type, the applicable tools, and the foreseeable pitfalls; remains entirely implicit. To bridge this gap, we propose PPC (Preplan-Plan-CoT), a framework that introduces an explicit problem-understanding stage, the preplan, yielding a new question $\rightarrow$ preplan $\rightarrow$ plan $\rightarrow$ cot paradigm. Realizing this paradigm requires safeguarding the conceptual integrity of preplan at both ends. Specifically, we design a three-stage synthesis pipeline with a spoiler-score detector that filters out leakage and spoiler failures to build clean preplan supervision, and a composite GRPO reward enforces that the generated plan genuinely follows from the preplan. Experiments across four backbones and five mathematical reasoning benchmarks show that PPC achieves the best results on 39 of 40 metrics, improving maj@16 and pass@16 by +2.23 and +3.06 over the strongest baseline without introducing additional inference token overhead.