Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
高维两阶段优化中的外动量重启
Kristi Topollai, Allan Ma, Tolga Dimlioglu, Sui Jiet Tay, Anna Choromanska
AI总结 本文研究在分布式优化中周期性重启外动量以控制外存效应,通过理论分析、玩具实验和语言模型预训练验证其能扩大稳定范围。
详情
通信高效的分布式优化器(如DiLoCo)通过让工作节点在聚合进度之前执行多次本地更新来减少同步成本,并使用外动量优化器进行聚合。近期理论表明,外优化器作用于由内优化循环诱导的有效谱,而外动量的选择控制着本地更新的进度如何在通信轮次间累积。我们研究外动量的周期性重启,作为控制这种外存的一种简单互补机制。在线性化平方损失模型中,预测空间残差在经验NTK下演化,我们推导出模态重启收缩,表明重置通过丢弃陈旧动量同时保留内循环进度来利用相位抵消。玩具实验验证了预测的收缩行为,语言模型预训练实验表明,周期性重启扩大了外学习率和动量值在通信周期内的稳定范围。
Communication-efficient distributed optimizers such as DiLoCo reduce synchronization costs by letting workers perform many local updates before aggregating their progress with an outer momentum optimizer. Recent theory suggests that the outer optimizer acts on an effective spectrum induced by the inner optimization loop, and that the choice of outer momentum controls how progress from local updates is accumulated across communication rounds. We study periodic restarting of the outer momentum as a simple complementary mechanism for controlling this outer memory. In a linearized squared-loss model where prediction-space residuals evolve under the empirical NTK, we derive a mode-wise restart contraction showing that resets exploit phase cancellation by discarding stale momentum while preserving inner-loop progress. Toy experiments verify the predicted contraction behavior, and language-model pretraining experiments show that periodic restarts widen the stable range of outer learning rates and momentum values across communication periods.