Title: Machine Learning Model Ensemble Fitting and Prediction Workflow for Biodiversity Analysis
Author: Vinicius Marcilio-Silva
Contact: viniciuschms@gmail.com
Date: October 2024
----------------------------------------------------------------------------
Description: This script provides a generalized workflow for fitting models and
predicting biodiversity metrics based on environmental predictors. Developed as
supplemental material for the paper "Synergies and trade-offs between biodiversity conservation, human well-being, and agricultural production: lessons from the Atlantic Forest in Santa Catarina, Brazil" and based on code from
Thiago Sanna F. Silva (tsfsilva@rc.unesp.br) available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.6m905qfzp
Explanation of Key Steps:
- Data Preprocessing: Prepares predictors by centering and scaling.
- Data Splitting: Divides data into training and testing subsets for model validation.
- Model Formula Creation: Dynamically creates a formula for each response variable.
- Visualization: Generates scatter plots to check the relationship between predictors and response.
- Cross-Validation: Sets up a repeated cross-validation scheme.
- Model List Setup: Specifies the ensemble of models to be trained.
- Model Training: Uses caretList to fit multiple models simultaneously.
- Model Summaries: Extracts and saves a summary of model results.
- Stacked Ensemble Model: Trains a stacked model to aggregate predictions.
- Test Set Prediction: Predicts outcomes for the test set and calculates RMSE.
- Variable Importance: Determines the importance of each variable in the ensemble model.
- Future Projections: Applies the ensemble model to new data if future predictions are needed.
This structure provides a reproducible workflow for analyzing multiple biodiversity metrics using ensemble modeling.