# The FireFighter: A Wildfire Classification Model
_Welcome! Shared using https://gitfront.io/, this is a private school project that isn't publicly listed on my GitHub profile. This is a recommended measure taken to prevent plagiarism, while still enabling me to share outside of my university -- hello, resume viewer! :)_

Meet my wildfire classifier that uses hourly weather stats to classify whether a wildfire is likely to form under given conditions.

# Table of Contents  
1. [Data Sourcing & Preprocessing](#data-sourcing--preprocessing)
2. [Parameter Tuning](#parameter-tuning)  
3. [Results](#results)  

## Data Sourcing & Preprocessing
Regional focus was on North America. 

Hourly weather statistics collected from [Weather Underground](https://www.wunderground.com/).

Historical wildfires sourced from:
1. [Canadian National Fire Database](https://cwfis.cfs.nrcan.gc.ca/ha/nfdb)
2. [Wikipedia's "List of wildfires" (North America)](https://en.wikipedia.org/wiki/List_of_wildfires#North_America).

Data partitioned into an 80-20 train-test split.

Data features standardized using [scikit-learn's StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).

## Parameter Tuning
Relatively small dataset meant preventative measures were needed to address overfitting.

__Resampling procedures:__ Bootstrapping, 5-fold cross-validation. 

- 1000 bootstrap samples were used to train and evaluate Logistic Regression & K-Nearest Neighbours (KNN) models.

- 10 bootstrap samples were used to train and evaluate SVM (Support Vector Machine) model
  - Bounded by computational restraints.
 
One-hot encoding was used for for categorical variables (eg. location, season).

Nominal encoding for our output (0 and 1 for "no" and "yes")

## Results
Success was evaluated by computing F1 scores from confusion matrices for each model:
- 83% F1 score for Logistic Regression
  - P = 0.5 (default)
  - Precision: 86%
  - Recall: 81%
  - Accuracy: 84%
  
- 77% F1 score for Support Vector Machine (SVM)
  - RBF kernel
  - λ = 5 (regularization)
  - γ = 0.001
  - Precision: 84%
  - Recall: 71%
  - Accuracy: 79%
  
- 74% F1 score for __unscaled__ K-Nearest Neighbours (KNN)
  - k = 1 (optimal MSE)
  - Precision: 81%
  - Recall: 68%
  - Accuracy: 75% 
   
- 72% F1 score for __scaled__ K-Nearest Neighbours (KNN)
  - k = 1 (optimal MSE)
  - Precision: 70%
  - Recall: 74%
  - Accuracy: 70%