# The Role of Stem Cell Dynamics in Epigenetic Aging DNA methylation changes are reliable biomarkers of aging, but the mechanisms driving these changes remain poorly understood. Here we present SCARLET (Stem Cells and Age-ReLated Epigenetic Trajectories), a parsimonious mathematical model that explains how methylation changes arise and propagate through hematopoietic stem cell divisions. Using a large human cohort, we demonstrate that seemingly distinct temporal patterns of age-related methylation changes can be explained by a single general mechanistic model of stem cell dynamics. We show that SCARLET captures known drivers of biological aging, with individuals with accelerated epigenetic aging showing significantly reduced ratios of stem cell pool size to symmetric division rate (N/s). Applying SCARLET to methylation data from 11 mammalian species reveals that N/s scales with maximum lifespan, suggesting that evolutionary adjustments to stem cell dynamics, rather than epigenetic maintenance efficiency, drive the previously observed relationship between methylation rates and lifespan. Our findings provide a quantitative framework for understanding epigenetic aging and suggest that stem cell dynamics may be a key driver of aging across mammals.​​​​ ## Overview This repository implements a mechanistic model of DNA methylation dynamics based on stem cell division processes (SCARLET). The model captures how methylation patterns change with age across different mammalian species and human cohorts, using PyMC for Bayesian inference. This code is an accompaniment to our paper "The Role of Stem Cell Dynamics in Epigenetic Aging". ## Mathematical Model The model describes methylation level Z(t) as a function of: - **N**: Number of stem cells - **s**: Division rate per stem cell per year - **Pm (PM->U)**: Probability (per stem cell division) of a CpG site changing from methylated to unmethylated - **Pu (PU->M)**: Probability (per stem cell division) of a CpG site changing from unmethylated to methylated - **n (η)**: Theoretical equilibrium methylation level (Pu/(Pm + Pu)) - **w (ω)**: Combined methylation/demethylation probability (Pm + Pu) - **p**: Initial methylation level at t=0 The mean methylation evolves as: ``` Z(t) = n + exp(-2stω)(p - n) ``` See `src/general_imports.py` for complete mathematical derivations including variance terms. ## Project layout The main project scripts are split into 3 categories: ### 1) Preprocessing scripts (prefix: "preprocessing") These are the scripts used to preprocess the AnnData objects (see above for details) to prepare them for analysis. Generally speaking, this means adding/calculating key variables for either the CpGs (e.g. mean methylation of a site) or the organism itself (e.g. maximum lifespan). ### 2) Running scripts (prefix: "run") These are the scripts which run the various models. ### 3) Analysis scripts (prefix: "analysis") These are the scripts which analyse the model runs. Generally speaking, these are the final scripts used to make the figures. ### Other files and folders General package imports and re-used functions are stored within **src/general_imports.py**. Exports (e.g. model outputs, figures) are saved in **exports**. Data (e.g. the methyaltion AnnData objects) are stored within **data**. See below for the the full repository structure: ``` ├── data/ # Data files │ └── example_anndata.h5ad # Example methylation data ├── env/ # Environment configuration │ └── prolif_clock.yml # Conda environment specification ├── exports/ # Output directory │ ├── figures/ # Generated plots │ └── model_outputs/ # Model results and fits ├── notebooks/ # Analysis workflows │ ├── 0_data_preprocessing/ # Data preparation scripts │ ├── 1_model_runs/ # Model fitting scripts │ └── 2_post_run_analyses/ # Post-processing and visualization └── src/ # Source code └── general_imports.py # Core functions and model definitions ``` ## Installation To install and activate the conda environment (to run all code using CPUs), run: ``` conda env create -f env/prol_env.yml conda activate prol_env ``` To run code on GPUs, the setup is more involved due to compatibility issues of packages with e.g. CUDA, and will depend on the system used and GPU software available. However, the packages remain the same as those used in the CPU setup with the addition of "jax". Additionally, any code run on GPUs should be able to be run on CPUs in theory (albeit much slower). ### Key Dependencies - **PyMC 5.5.0** - Probabilistic programming - **PyTensor** - Backend for automatic differentiation - **NumPyro/JAX** - Alternative MCMC sampling - **AnnData** - Methylation data storage - **ArviZ** - Bayesian model diagnostics - **Pandas, NumPy** - Data manipulation - **Matplotlib, Seaborn, Plotly** - Visualization ## Description of main scripts: ### 0. Data Preprocessing **`preprocessing_human_data.py`** Preprocesses GenScot methylation data. Calculates CpG-level statistics including Spearman correlations, variance metrics, and regression coefficients. Adds computed statistics to AnnData object. **`preprocessing_mammal_data.py`** Preprocesses mammalian comparative methylation data across multiple species. Calculates CpG-level statistics and prepares data for cross-species modeling. ### 1. Model Runs **`run_humans_fixed_n_s.py`** Runs conditional SCARLET model on human data with fixed N (stem cells) and s (division rate) parameters. Relevant figures: Fig. 2a, Fig. 3a. **`run_humans_cohorts_unconditional.py`** Fits unconditional models allowing cohort-specific parameters. Relevant figures: Fig. 2c, Supp. Fig. 2a. **`run_humans_trajectory_cats_fixed_n_s.py`** Runs conditional SCARLET model on different categories of CpGs (by trajectory patterns). Includes comparisons with linear and null models. Relevant figures: Fig. 2b, Supp. Figs 1a-c. **`run_humans_sensitivity_n_sites.py`** Sensitivity analysis varying the number of CpG sites used in model fitting to assess robustness. Relevant figures: Supp. Fig. 2c. **`run_humans_sensitivity_sample_size.py`** Sensitivity analysis varying sample sizes to evaluate model stability and parameter estimation accuracy. Relevant figures: Supp. Fig. 2c. **`run_humans_sensitivity_timespans.py`** Sensitivity analysis examining model performance across different age ranges. Relevant figures: Supp. Figs 3a-b. **`run_mammals_separate_models.py`** Fits independent SCARLET models for each mammalian species to obtain species-specific parameter estimates. Relevant figures: Fig. 3b, Fig. 3c, Supp. Fig. 3c. **`run_mammals_joint_models.py`** Fits hierarchical SCARLET model with all mammals in a single joint model, sharing information across species. Relevant figures: Fig. 3d, Supp. Figs. 3d-i. **`run_mouse_dog_fixed_n_s.py`** Runs SCARLET model on mouse and dog data with fixed N and s parameters. Relevant figures: Fig. 3a. ### 2. Post-Run Analyses **`analysis_humans.py`** Comprehensive analysis of human GenScot data results. Generates heatmaps of log likelihoods across N and s, plots parameter distributions by group, analyzes site fits across CpG categories, and creates summary statistics tables. Relevant figures: Fig. 2a, Fig. 2b, Fig. 2c, Supp. Figs. 1a-c, Supp. Fig. 2a, Supp. Table 1 **`analysis_scaling.py`** Cross-species scaling analysis. Plots N/s ratios vs. lifespan, examines methylation/demethylation probabilities across species, compares joint vs. separate models, and generates example site fits. Relevant figures: Fig. 3b, Fig. 3c, Fig. 3d, Supp. Figures 3c-i* **`analysis_sensitivity.py`** Analyzes and visualizes results from all sensitivity analyses (sample size, time spans, number of sites). Evaluates model robustness and parameter stability. Relevant figures: Supp. Figs 2b-c, 3a-b **`analysis_mouse_human_heatmap_lineplot.py`** Generates comparative visualizations between mouse and human methylation patterns, including heatmaps and trajectory line plots. Relevant figures: Fig. 3a ## Data Format **AnnData Structure:** ```python AnnData object .X # Methylation beta values (n_cpgs × n_samples) .obs # CpG metadata (r², mean, variance, etc.) .var # Sample metadata (age, cohort, species, etc.) ``` ## Contact Please contact Sam Crofts (sam.crofts@ed.ac.uk) for further details.