Abstract
The design of new crystalline materials, or simply crystals, with desired properties relies on the ability to estimate the properties of crystals based on their structure. To advance the ability of machine learning (ML) to enable property estimation, we address two key limitations. First, creating labeled data for training entails time-consuming laboratory experiments and physical simulations, yielding a shortage
of such data. To reduce the need for labeled training data, we propose a pre-training framework that adopts a mutually exclusive mask strategy, enabling models to discern underlying patterns. Second, crystal structures obey physical principles. To exploit the principle of periodic invariance, we propose multi-graph attention (MGA) and crystal knowledge-enhanced (CKE) modules. The MGA module considers different types of multi-graph edges to capture complex structural patterns. The CKE module incorporates periodic attribute learning and atomtype contrastive learning by explicitly introducing crystal knowledge to enhance crystal representation learning. We integrate these modules in a CRystal knOwledge-enhanced Pre-training (CROP) framework. Experiments on eight different datasets show that CROP is capable of promising estimation performance and can outperform strong baselines.
of such data. To reduce the need for labeled training data, we propose a pre-training framework that adopts a mutually exclusive mask strategy, enabling models to discern underlying patterns. Second, crystal structures obey physical principles. To exploit the principle of periodic invariance, we propose multi-graph attention (MGA) and crystal knowledge-enhanced (CKE) modules. The MGA module considers different types of multi-graph edges to capture complex structural patterns. The CKE module incorporates periodic attribute learning and atomtype contrastive learning by explicitly introducing crystal knowledge to enhance crystal representation learning. We integrate these modules in a CRystal knOwledge-enhanced Pre-training (CROP) framework. Experiments on eight different datasets show that CROP is capable of promising estimation performance and can outperform strong baselines.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track - European Conference, ECML PKDD 2024, Proceedings |
Editors | Albert Bifet, Tomas Krilavičius, Ioanna Miliou, Slawomir Nowaczyk |
Number of pages | 16 |
Place of Publication | Berlin, Heidelberg |
Publisher | Springer |
Publication date | 2024 |
Pages | 231-246 |
ISBN (Print) | 9783031703805 |
ISBN (Electronic) | 978-3-031-70380-5 |
DOIs | |
Publication status | Published - 2024 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Vilnius, Lithuania Duration: 9 Sept 2024 → 13 Sept 2024 https://ecmlpkdd.org/2024/ |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Country/Territory | Lithuania |
City | Vilnius |
Period | 09/09/2024 → 13/09/2024 |
Internet address |
Series | Joint European Conference on Machine Learning and Knowledge Discovery in Databases |
---|
Keywords
- Crystal property
- Knowledge-enhanced
- Pre-training