firefighter
README.md

The FireFighter: A Wildfire Classification Model

Welcome! Shared using https://gitfront.io/, this is a private school project that isn't publicly listed on my GitHub profile. This is a recommended measure taken to prevent plagiarism, while still enabling me to share outside of my university -- hello, resume viewer! :)

Meet my wildfire classifier that uses hourly weather stats to classify whether a wildfire is likely to form under given conditions.

Table of Contents

  1. Data Sourcing & Preprocessing
  2. Parameter Tuning
  3. Results

Data Sourcing & Preprocessing

Regional focus was on North America.

Hourly weather statistics collected from Weather Underground.

Historical wildfires sourced from:

  1. Canadian National Fire Database
  2. Wikipedia's "List of wildfires" (North America).

Data partitioned into an 80-20 train-test split.

Data features standardized using scikit-learn's StandardScaler.

Parameter Tuning

Relatively small dataset meant preventative measures were needed to address overfitting.

Resampling procedures: Bootstrapping, 5-fold cross-validation.

  • 1000 bootstrap samples were used to train and evaluate Logistic Regression & K-Nearest Neighbours (KNN) models.

  • 10 bootstrap samples were used to train and evaluate SVM (Support Vector Machine) model

    • Bounded by computational restraints.

One-hot encoding was used for for categorical variables (eg. location, season).

Nominal encoding for our output (0 and 1 for "no" and "yes")

Results

Success was evaluated by computing F1 scores from confusion matrices for each model:

  • 83% F1 score for Logistic Regression

    • P = 0.5 (default)
    • Precision: 86%
    • Recall: 81%
    • Accuracy: 84%
  • 77% F1 score for Support Vector Machine (SVM)

    • RBF kernel
    • λ = 5 (regularization)
    • γ = 0.001
    • Precision: 84%
    • Recall: 71%
    • Accuracy: 79%
  • 74% F1 score for unscaled K-Nearest Neighbours (KNN)

    • k = 1 (optimal MSE)
    • Precision: 81%
    • Recall: 68%
    • Accuracy: 75%
  • 72% F1 score for scaled K-Nearest Neighbours (KNN)

    • k = 1 (optimal MSE)
    • Precision: 70%
    • Recall: 74%
    • Accuracy: 70%