In our work of NASICON stability predictor, we used SIS+MLR to identify the best feature for predicting NASICON stabilities from the set of millions of candidate features. This repo hosts necessary data and codes for computing the best feature and reproducing the machine-learning parts of our work.
To display the stabilities of NASICON compounds, we created an interactive compatibilty map in Stability.html (Fig. 2a).
The raw data (i.e. Ehull values) of this interactive map can be found in ./RawData/test_2D.csv
and ./RawData/train_2D.csv
.
All the raw data/processed datasets for machine learning are stored in the folder ./RawData
.
KeyFeatureNames.txt
: Name of basic features used for SIS procedure (see details in SI) (will link to SI on publisher's website when available).exp_comps.json
: Composition of experimentally synthesized materials in Fig. 5b.
train.csv
/test.csv
: values of Ehull and all basic features of 80-20 split train/test data for model selection. The optimal 2D SIS features is selected from 1,999,000 SIS+MLR models based on these data.train_2D.csv
/test_2D.csv
: values of Ehull and the optimal 2D SIS features of train/test data in train.csv/test.csv.train_X_fold[1-5].dat
/test_X_fold[1-5].dat
: values of 2D SIS features of train/test data for the five-fold cross-validation to evaluate the final model.train_Y_fold[1-5].dat
/test_Y_fold[1-5].dat
: 0/1 encodings of synthesizability of train/test data for the five-fold cross-validation to evaluate the final model.
The scripts for reproducing the machine learning model, metrics and all figures in our manuscript can be found in ./Script folder.
Before running the jupyter notebooks, make sure you have all dependencies installed:
pip install -r requirements.txt
Run_preprocess_feature_transformation.ipynb
: preprocess data by transforming basic features to 2D SIS features.Run_Ranked_SVM.ipynb
: train ranked SVM model to predict ranking of synthesizability (Ehull values).Run_five_fold_CV.ipynb
: train five-fold cross-validation to evaluate the effectiveness of a linear decision boundary to separate synthetically accessible/non-accessible NASICON compositions.
All high-resolution figures related to machine learning model in our manuscript can be found in folder Figures.
If you find this repo useful in your own projects, please consider citing our paper:
@article{ouyang2021synthetic,
title={Synthetic accessibility and stability rules of NASICONs},
author={Ouyang, Bin and Wang, Jingyang and He, Tanjin and Bartel, Christopher J and Huo, Haoyan and Wang, Yan and Lacivita, Valentina and Kim, Haegyeom and Ceder, Gerbrand},
journal={arXiv preprint arXiv:2102.03627},
year={2021}
}