Welcome! Shared using https://gitfront.io/, this is a private school project that isn't publicly listed on my GitHub profile. This is a recommended measure taken to prevent plagiarism, while still enabling me to share outside of my university -- hello, resume viewer! :)
Meet my wildfire classifier that uses hourly weather stats to classify whether a wildfire is likely to form under given conditions.
Regional focus was on North America.
Hourly weather statistics collected from Weather Underground.
Historical wildfires sourced from:
Data partitioned into an 80-20 train-test split.
Data features standardized using scikit-learn's StandardScaler.
Relatively small dataset meant preventative measures were needed to address overfitting.
Resampling procedures: Bootstrapping, 5-fold cross-validation.
1000 bootstrap samples were used to train and evaluate Logistic Regression & K-Nearest Neighbours (KNN) models.
10 bootstrap samples were used to train and evaluate SVM (Support Vector Machine) model
One-hot encoding was used for for categorical variables (eg. location, season).
Nominal encoding for our output (0 and 1 for "no" and "yes")
Success was evaluated by computing F1 scores from confusion matrices for each model:
83% F1 score for Logistic Regression
77% F1 score for Support Vector Machine (SVM)
74% F1 score for unscaled K-Nearest Neighbours (KNN)
72% F1 score for scaled K-Nearest Neighbours (KNN)