Apply different machine learning classification methods and select the best model.
python3 Classification.py training_dataset.tsv test_dataset.tsv
The 1st, 2nd, 3rd, 4th, and 6th columns have been normalized (scaled between [0, 1]) as those column values considerably larger than the rest of the other columns. If not normalized these columns would have a greater impact on the predictions than the other columns. Since there is class imbalance in the data, Random Oversampling is applied to methods which benefit from it. Note: Stratified CV is automatically done for classification methods in the GridSearchCV method.
Multi Layer Perceptron, Support Vector Machine, and Random Forest models which are quite different from each other have been considered initially. Since Random Forest is performing much better than the other two, Gradient Boosting is also taken into consideration.