Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs
面向可扩展性和鲁棒性的力感知神经切线核用于机器学习原子势的主动学习
Eszter Varga-Umbrich, Zachary Weller-Davies, Paul Duckworth, Jules Tilly, Olivier Peltre, Shikha Surana
AI总结 本文提出一种线性可扩展的主动学习框架,结合力感知神经切线核,有效提升MLIPs在大规模候选池中的鲁棒性和效率,验证了其在多个数据集上的优越性能。
Comments 10 main pages, total 34 pages
详情
针对机器学习原子势(MLIPs)的主动学习,必须解决几个挑战以实现实用性:扩展到大规模候选池、利用能量-力监督以及在候选池相对于目标分布偏移时保持鲁棒性。本文联合解决这些挑战。我们首先引入了一种基于分块特征空间后验方差筛选的线性可扩展获取框架。通过避免候选集和训练集核的实体化,该方法能够在数小时内筛选出约20万结构,并广泛适用于基于分子相似性度量评分候选的获取策略。随后,我们将神经切线核(NTK)扩展到力感知设置,通过混合参数坐标导数,得到力NTK和联合能量-力NTK,为矢量场预测提供自然的相似性度量。我们在OC20数据集上展示了联合能量-力NTK的有效性,其中力感知获取至关重要:它在所有指标和分布分割中实现了最低的能量和力MAE和RMSE。在T1x、PMechDB和RGD基准测试中,我们的力NTK方法在与现有基线竞争的同时,显著优于基于委员会的方法。在受控候选池偏移案例研究中,基于预训练MLIP嵌入和NTK的获取保持稳健,而基于委员会的方法则表现出更高的方差。总体而言,这些结果表明,单个预训练MLIP可以实现可扩展、力感知和分布稳健的主动学习,用于基础模型微调。
Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework based on chunked feature-space posterior-variance shortlisting. By avoiding materialisation of the candidate and train set kernels, this approach enables screening of ~200k structures within hours and applies broadly to acquisition strategies that score candidates based on molecular similarity metrics. We then extend the Neural Tangent Kernel (NTK) to a force-aware setting via mixed parameter-coordinate derivatives, yielding a force NTK and a joint energy-force NTK that provide natural similarity metrics for vector-field prediction. We demonstrate the effectiveness of the joint energy-force NTK on the OC20 dataset, where force-aware acquisition is crucial: it achieves the lowest energy and force MAE and RMSE across all metrics and distribution splits. Across T1x, PMechDB, and RGD benchmarks, our force NTK methods remain competitive with established baselines while being significantly more efficient than committee-based approaches. Under a controlled candidate-pool shift case study on T1x, acquisition based on pretrained MLIP embeddings and NTKs remains robust, whereas committee-based methods exhibit higher variance. Overall, these results show that a single pretrained MLIP can enable scalable, force-aware, and distribution-robust active learning for foundation-model fine-tuning.