Machine-Learning-Model-Ensemble-for-Biodiversity-and-Agricultural-Revenue
README.md

Title: Machine Learning Model Ensemble Fitting and Prediction Workflow for Biodiversity Analysis

Author: Vinicius Marcilio-Silva Contact: viniciuschms@gmail.com Date: October 2024

----------------------------------------------------------------------------

Description: This script provides a generalized workflow for fitting models and predicting biodiversity metrics based on environmental predictors. Developed as supplemental material for the paper "Synergies and trade-offs between biodiversity conservation, human well-being, and agricultural production: lessons from the Atlantic Forest in Santa Catarina, Brazil" and based on code from Thiago Sanna F. Silva (tsfsilva@rc.unesp.br) available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.6m905qfzp

Explanation of Key Steps:

  1. Data Preprocessing: Prepares predictors by centering and scaling.
  2. Data Splitting: Divides data into training and testing subsets for model validation.
  3. Model Formula Creation: Dynamically creates a formula for each response variable.
  4. Visualization: Generates scatter plots to check the relationship between predictors and response.
  5. Cross-Validation: Sets up a repeated cross-validation scheme.
  6. Model List Setup: Specifies the ensemble of models to be trained.
  7. Model Training: Uses caretList to fit multiple models simultaneously.
  8. Model Summaries: Extracts and saves a summary of model results.
  9. Stacked Ensemble Model: Trains a stacked model to aggregate predictions.
  10. Test Set Prediction: Predicts outcomes for the test set and calculates RMSE.
  11. Variable Importance: Determines the importance of each variable in the ensemble model.
  12. Future Projections: Applies the ensemble model to new data if future predictions are needed. This structure provides a reproducible workflow for analyzing multiple biodiversity metrics using ensemble modeling.