arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 图学习与结构化数据 1 篇

2606.12651 2026-06-13 cs.LG q-bio.QM 新提交

Physics-Aware Auxiliary Losses Improve Out-of-Distribution Generalization of a GNN Synthesizability Filter

物理感知辅助损失提升图神经网络可合成性滤波器的分布外泛化能力

Riya Bisht, Dhruv Agarwal

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过在GNN上添加基于Bertz指数的拓扑复杂度回归和MMFF94力场应变能软惩罚作为辅助损失,在分布外数据上小幅但显著提升了可合成性滤波器的AUC(最高+0.0066)。

详情
AI中文摘要

机器学习药物发现流程越来越依赖生成模型,这些模型提出的分子远离用于训练下游可合成性滤波器的数据。现有滤波器(SAScore、SCScore、RAscore、DeepSA)纯粹基于统计,在分布外(OOD)场景下性能下降。我们探究廉价的闭式物理先验,作为图神经网络(GNN)的辅助监督,是否能改善OOD泛化。我们在GINE骨干网络上添加两个辅助损失:基于Bertz指数的拓扑复杂度回归,以及基于MMFF94力场能量的应变能软惩罚。在由SAScore阈值标注的65,177个分子语料库(HIV、Tox21、COCONUT)上,我们复现了强分布内基线,然后在单源OOD划分(在类药HIV+Tox21上训练,在COCONUT天然产物上测试)上评估4路消融实验(基线/+复杂度/+应变/+两者),重复5个种子并采用配对bootstrap置信区间。所有三个物理感知变体相比基线(平均OOD AUC 0.9774)均带来微小但统计显著的OOD提升:+复杂度Delta = +0.0060(95% CI [+0.0023, +0.0102]),+应变Delta = +0.0032([+0.0008, +0.0052]),+两者Delta = +0.0066([+0.0038, +0.0093]);每个区间均不包含零,且组合效果最佳。各变体在分布内表现无差异,因此效果仅在OOD评估下可见。我们明确指出效果是适度的,并报告一个警示性方法学发现:该实验的单种子版本产生了定性不同(非单调)的故事,未能在多种子评估中复现。

英文摘要

Machine-learning drug-discovery pipelines increasingly rely on generative models that propose molecules far from the data used to train downstream synthesizability filters. Existing filters (SAScore, SCScore, RAscore, DeepSA) are purely statistical and degrade in exactly this out-of-distribution (OOD) regime. We ask whether cheap, closed-form physical priors, used as auxiliary supervision on a graph neural network (GNN), improve OOD generalization. We add two auxiliary losses to a GINE backbone: a topological complexity regression supervised by the Bertz index, and a strain-energy soft penalty supervised by MMFF94 force-field energy. On a 65,177-molecule corpus (HIV, Tox21, COCONUT) labeled by SAScore thresholds we reproduce a strong in-distribution baseline, then evaluate a 4-way ablation (baseline / +complexity / +strain / +both) on a single-source OOD split (train on drug-like HIV+Tox21, test on COCONUT natural products), repeated over 5 seeds with paired bootstrap confidence intervals. All three physics-aware variants give a small but statistically significant OOD improvement over the baseline (mean OOD AUC 0.9774): +complexity Delta = +0.0060 (95% CI [+0.0023, +0.0102]), +strain Delta = +0.0032 ([+0.0008, +0.0052]), +both Delta = +0.0066 ([+0.0038, +0.0093]); every interval excludes zero, and the combination is best. The variants are indistinguishable in-distribution, so the effect is visible only under OOD evaluation. We are explicit that the effects are modest, and we report a cautionary methodological finding: a single-seed version of this experiment produced a qualitatively different (non-monotone) story that did not survive multi-seed evaluation.

2. 其他/综合机器学习 1 篇

2412.13012 2026-06-13 cs.LG cond-mat.mtrl-sci cond-mat.str-el 版本更新

Deep Learning Based Superconductivity: Prediction and Experimental Tests

基于深度学习的超导性:预测与实验测试

Daniel Kaplan, Adam Zhang, Joanna Blawat, Rongying Jin, Robert J. Cava, Viktor Oudovenko, Gabriel Kotliar, Anirvan M. Sengupta, Weiwei Xie

发表机构 * Department of Physics and Astronomy(物理与天文学系) Rutgers University(罗格斯大学) Department of Chemistry(化学系) Michigan State University(密歇根州立大学) University of South Carolina(南卡罗来纳大学) Princeton University(普林斯顿大学) Center for Computational Quantum Physics(计算量子物理中心) Flatiron Institute(Flatiron研究所) Center for Computational Mathematics(计算数学中心)

AI总结 本文提出基于深度学习的超导材料预测方法,并通过实验验证,发现新型三元化合物Mo₂₀Re₆Si₄在5.4K以下超导,同时讨论AI预测的局限性与未来研究方向。

Comments 14 pages + 2 appendices + references. EPJ submission

详情
Journal ref
Eur. Phys. J. Plus (2025) 140:58
AI中文摘要

新型超导材料的发现一直是材料科学中的长期挑战,具有在能源、交通和计算领域的广泛应用潜力。近年来,人工智能(AI)的进步使通过高效利用庞大的材料数据库来加速新材料的搜索成为可能。本文提出了一种基于深度学习(DL)的方法来预测新超导材料。我们从DL网络中合成了一种化合物,并确认其超导性质与预测一致。我们的方法还与基于随机森林(RFs)的先前工作进行了比较。特别是,RFs需要了解化合物的化学性质,而我们的神经网络输入仅依赖于化学组成。借助网络的提示,我们发现了一种新的三元化合物Mo₂₀Re₆Si₄,在5.4K以下表现出超导性。我们进一步讨论了使用AI进行预测所存在的现有限制和挑战,并提出了潜在的未来研究方向。

英文摘要

The discovery of novel superconducting materials is a longstanding challenge in materials science, with a wealth of potential for applications in energy, transportation, and computing. Recent advances in artificial intelligence (AI) have enabled expediting the search for new materials by efficiently utilizing vast materials databases. In this study, we developed an approach based on deep learning (DL) to predict new superconducting materials. We have synthesized a compound derived from our DL network and confirmed its superconducting properties in agreement with our prediction. Our approach is also compared to previous work based on random forests (RFs). In particular, RFs require knowledge of the chemical properties of the compound, while our neural net inputs depend solely on the chemical composition. With the help of hints from our network, we discover a new ternary compound $\textrm{Mo}_{20} \textrm{Re}_{6} \textrm{Si}_{4}$, which becomes superconducting below 5.4 K. We further discuss the existing limitations and challenges associated with using AI to predict and, along with potential future research directions.