# The FireFighter: A Wildfire Classification Model _Welcome! Shared using https://gitfront.io/, this is a private school project that isn't publicly listed on my GitHub profile. This is a recommended measure taken to prevent plagiarism, while still enabling me to share outside of my university -- hello, resume viewer! :)_ Meet my wildfire classifier that uses hourly weather stats to classify whether a wildfire is likely to form under given conditions. # Table of Contents 1. [Data Sourcing & Preprocessing](#data-sourcing--preprocessing) 2. [Parameter Tuning](#parameter-tuning) 3. [Results](#results) ## Data Sourcing & Preprocessing Regional focus was on North America. Hourly weather statistics collected from [Weather Underground](https://www.wunderground.com/). Historical wildfires sourced from: 1. [Canadian National Fire Database](https://cwfis.cfs.nrcan.gc.ca/ha/nfdb) 2. [Wikipedia's "List of wildfires" (North America)](https://en.wikipedia.org/wiki/List_of_wildfires#North_America). Data partitioned into an 80-20 train-test split. Data features standardized using [scikit-learn's StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html). ## Parameter Tuning Relatively small dataset meant preventative measures were needed to address overfitting. __Resampling procedures:__ Bootstrapping, 5-fold cross-validation. - 1000 bootstrap samples were used to train and evaluate Logistic Regression & K-Nearest Neighbours (KNN) models. - 10 bootstrap samples were used to train and evaluate SVM (Support Vector Machine) model - Bounded by computational restraints. One-hot encoding was used for for categorical variables (eg. location, season). Nominal encoding for our output (0 and 1 for "no" and "yes") ## Results Success was evaluated by computing F1 scores from confusion matrices for each model: - 83% F1 score for Logistic Regression - P = 0.5 (default) - Precision: 86% - Recall: 81% - Accuracy: 84% - 77% F1 score for Support Vector Machine (SVM) - RBF kernel - λ = 5 (regularization) - γ = 0.001 - Precision: 84% - Recall: 71% - Accuracy: 79% - 74% F1 score for __unscaled__ K-Nearest Neighbours (KNN) - k = 1 (optimal MSE) - Precision: 81% - Recall: 68% - Accuracy: 75% - 72% F1 score for __scaled__ K-Nearest Neighbours (KNN) - k = 1 (optimal MSE) - Precision: 70% - Recall: 74% - Accuracy: 70%