AI中文摘要
机器学习药物发现流程越来越依赖生成模型,这些模型提出的分子远离用于训练下游可合成性滤波器的数据。现有滤波器(SAScore、SCScore、RAscore、DeepSA)纯粹基于统计,在分布外(OOD)场景下性能下降。我们探究廉价的闭式物理先验,作为图神经网络(GNN)的辅助监督,是否能改善OOD泛化。我们在GINE骨干网络上添加两个辅助损失:基于Bertz指数的拓扑复杂度回归,以及基于MMFF94力场能量的应变能软惩罚。在由SAScore阈值标注的65,177个分子语料库(HIV、Tox21、COCONUT)上,我们复现了强分布内基线,然后在单源OOD划分(在类药HIV+Tox21上训练,在COCONUT天然产物上测试)上评估4路消融实验(基线/+复杂度/+应变/+两者),重复5个种子并采用配对bootstrap置信区间。所有三个物理感知变体相比基线(平均OOD AUC 0.9774)均带来微小但统计显著的OOD提升:+复杂度Delta = +0.0060(95% CI [+0.0023, +0.0102]),+应变Delta = +0.0032([+0.0008, +0.0052]),+两者Delta = +0.0066([+0.0038, +0.0093]);每个区间均不包含零,且组合效果最佳。各变体在分布内表现无差异,因此效果仅在OOD评估下可见。我们明确指出效果是适度的,并报告一个警示性方法学发现:该实验的单种子版本产生了定性不同(非单调)的故事,未能在多种子评估中复现。
英文摘要
Machine-learning drug-discovery pipelines increasingly rely on generative models that propose molecules far from the data used to train downstream synthesizability filters. Existing filters (SAScore, SCScore, RAscore, DeepSA) are purely statistical and degrade in exactly this out-of-distribution (OOD) regime. We ask whether cheap, closed-form physical priors, used as auxiliary supervision on a graph neural network (GNN), improve OOD generalization. We add two auxiliary losses to a GINE backbone: a topological complexity regression supervised by the Bertz index, and a strain-energy soft penalty supervised by MMFF94 force-field energy. On a 65,177-molecule corpus (HIV, Tox21, COCONUT) labeled by SAScore thresholds we reproduce a strong in-distribution baseline, then evaluate a 4-way ablation (baseline / +complexity / +strain / +both) on a single-source OOD split (train on drug-like HIV+Tox21, test on COCONUT natural products), repeated over 5 seeds with paired bootstrap confidence intervals. All three physics-aware variants give a small but statistically significant OOD improvement over the baseline (mean OOD AUC 0.9774): +complexity Delta = +0.0060 (95% CI [+0.0023, +0.0102]), +strain Delta = +0.0032 ([+0.0008, +0.0052]), +both Delta = +0.0066 ([+0.0038, +0.0093]); every interval excludes zero, and the combination is best. The variants are indistinguishable in-distribution, so the effect is visible only under OOD evaluation. We are explicit that the effects are modest, and we report a cautionary methodological finding: a single-seed version of this experiment produced a qualitatively different (non-monotone) story that did not survive multi-seed evaluation.