Phase-prediction-of-HEAs-private-share
README.md

Phase-prediction-of-HEAs

Accompanying dataset and code for article "Phase prediction and experimental realisation of a new high entropy alloy using machine learning".

Authors: Swati Singh, Nirmal Kumar Katiyar, Saurav Goel and Shrikrishna N. Joshi

Affiliations: (1) Department of Mechanical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India. (2)School of Engineering, London South Bank University, 103 Borough Road, London, SE1 0AA, UK

Content File "Phase_data.excel": Database file - 1200 HEAs with different phases: FCC (Face centered Cubic) solid solution, BCC (Body centered Cubic) solid solution, FCC+BCC solid solution, MIP (mixture of intermetallic phases). The data is compiled from:

https://www.sciencedirect.com/science/article/pii/S1359645416306759

https://www.sciencedirect.com/science/article/pii/S2352340921006302

https://zenodo.org/record/5155150#.Y2n9aHZBw2w

Model parameters. Refer to original file for reference.

"y" - (Independent feature) Column name in database that will be used as target for phase prediction (keep the column 'Phase' as phases of HEAs are being predicted). Multiclass classification of four different phases (FCC, BCC, FCC+BCC, and MIP) is performed.

"X" - (Dependent feature) Column names in database that will be included as dependent variables in the model (these can be changed based on the elements present in the study), feature added here must be present as a column in "Phase_data.excel" file.

Five well-known machine learning algorithms namely "K-nearest neighbour (KNN)" , "Support vector machine (SVM)", "Decision Tree Classifier (DTC)", "Random Forest Classifier (RFC)", and "XGBOOST (XGB)" in their vanilla form (base models) were employed for multiclass classification of different phases of HEAs.

"accuracy" - Training and test accuracy were evaluated.

"roc_auc_score" - roc (Receiver operating characteristic) curve was plotted to analyse the model's performance for each phase. auc (Area under the curve) measures the area under roc curve, and summarizes classifier’s performance.

"classification_report" - It provides precision, recall, f1-score, accuracy, macro average and weighted average accuracy. Evaluated and calculated for each algorithm.

"plot_confusion_matrix" - It provides 4*4 matrix of correctly classified vs incorrectly classified phases for classification of four distinct phases.

"cross_val_score" - 10-fold cross-validation mean and standard deviation score were calculated to avoid overfitting.