Title: Machine Learning Model Ensemble Fitting and Prediction Workflow for Biodiversity Analysis

Author: Vinicius Marcilio-Silva Contact: viniciuschms@gmail.com Date: October 2024

----------------------------------------------------------------------------

Description: This script provides a generalized workflow for fitting models and predicting biodiversity metrics based on environmental predictors. Developed as supplemental material for the paper "Synergies and trade-offs between biodiversity conservation, human well-being, and agricultural production: lessons from the Atlantic Forest in Santa Catarina, Brazil" and based on code from Thiago Sanna F. Silva (tsfsilva@rc.unesp.br) available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.6m905qfzp

Explanation of Key Steps:

Data Preprocessing: Prepares predictors by centering and scaling.
Data Splitting: Divides data into training and testing subsets for model validation.
Model Formula Creation: Dynamically creates a formula for each response variable.
Visualization: Generates scatter plots to check the relationship between predictors and response.
Cross-Validation: Sets up a repeated cross-validation scheme.
Model List Setup: Specifies the ensemble of models to be trained.
Model Training: Uses caretList to fit multiple models simultaneously.
Model Summaries: Extracts and saves a summary of model results.
Stacked Ensemble Model: Trains a stacked model to aggregate predictions.
Test Set Prediction: Predicts outcomes for the test set and calculates RMSE.
Variable Importance: Determines the importance of each variable in the ensemble model.
Future Projections: Applies the ensemble model to new data if future predictions are needed. This structure provides a reproducible workflow for analyzing multiple biodiversity metrics using ensemble modeling.