CSC8114 Group 9 — Adaptive Joint Compression and Synchronisation in Federated Split Learning for IoT Rainfall Prediction

A research prototype / experimental platform Federated Split Learning system with adaptive communication control for real-time IoT rainfall prediction.

Overview

In real-world IoT sensing networks, distributed weather stations continuously generate large volumes of environmental data. Traditional centralised training requires raw data to be transmitted to a central server — introducing privacy risks, high bandwidth costs, and single points of failure. Federated Split Learning (FSL) addresses this by splitting the neural network between edge clients and a central server: clients compute only the early layers locally and transmit compact intermediate activations instead of raw data.

However, FSL introduces its own communication bottleneck — per-step activation/gradient round trips and periodic model synchronisation can dominate training time under constrained or unstable IoT networks. This project tackles this problem head-on.

Raspberry Pi federated split learning deployment

What We Built

A fully functional, end-to-end FSL training platform built on a gRPC microservice architecture with the following core components:

Split LSTM Model — A 2-layer LSTM encoder runs on each client to extract temporal weather features from a 48-hour sliding window (5 meteorological variables), producing a fixed-size 64-dim smashed activation. The server hosts a dual-head prediction module (rain/no-rain classifier + rainfall amount regressor) and uses focal loss plus positive-sample rebalancing to handle rainfall class imbalance.
gRPC Communication Layer — Four well-defined RPCs (Register, Forward, Synchronize, NotifyCompletion) orchestrate the full distributed training lifecycle, including client registration, split forward/backward passes, barrier-based FedAvg aggregation (with configurable quorum and timeout), and graceful shutdown.
Adaptive Communication Scheduler — A latency-aware, rule-based scheduler running on the server side that jointly controls two optimisation dimensions at runtime:
- Spatial — Activation/gradient compression mode (float32 → float16 → int8), reducing per-step payload from 256 B to 68 B.
- Temporal — Synchronisation interval ρ, dynamically moving from 1 to 3 in the evaluated adaptive regimes while retaining a configured safety range of 1–20.
Decisions are driven by EMA-smoothed client-reported latency, enabling the system to dynamically adapt to heterogeneous and non-stationary network conditions without manual tuning.
Reproducible Experiment Matrix — 17 simulation scenarios × 3 random seeds, systematically decoupling the effects of latency profiles (none / low / high / mixed), compression modes, synchronisation intervals, and adaptive vs. static policies. A second four-scenario Raspberry Pi matrix validates the same ideas on physical edge devices over a real wide-area link.

Key Results

Measured across the updated paper experiments. Simulation metrics use 17 scenarios × 3 seeds; Raspberry Pi metrics use 4 scenarios × 3 seeds.

Finding	Detail
Simulation AUPRC stability	AUPRC remains stable across all 17 scenarios (0.6381–0.6484), showing no measurable degradation from int8 compression or ρ=3 synchronisation.
INT8 compression	Per-step activation payload falls from 256 B to 68 B while AUPRC stays within seed-level variance of the float32 baseline.
ρ = 3 synchronisation	Synchronisation traffic drops by about 53% in simulation, from roughly 6.7 MB to 3.19 MB, while AUPRC remains stable.
Joint adaptive simulation	H16 reaches the lowest simulation communication cost: 0.55 MB total activation payload and 3.19 MB sync traffic.
Raspberry Pi deployment	P4 adaptive reduces activation payload by 87% and synchronisation traffic by 54% vs. P1, while keeping AUPRC within 0.011 of all Pi scenarios.
Runtime stability on real hardware	P4 reduces runtime jitter from ±688 s (P1) to ±10 s on Raspberry Pi.

The central paper result is that FSL communication cost can be reduced by an order of magnitude without sacrificing predictive performance when activation compression and synchronisation frequency are controlled jointly rather than independently.

Repository Structure

csc8114/
├── code/                        # FSL system implementation
│   ├── config.yaml              # Runtime config — model, training, data, comms
│   ├── matrix.yaml              # Experiment matrix — 17 simulation scenarios & seeds
│   ├── Makefile                 # Build / run / plot automation
│   ├── proto/fsl.proto          # gRPC protocol (4 RPCs)
│   ├── src/
│   │   ├── models/              # ClientLSTM (encoder) + ServerHead (predictor)
│   │   ├── nodes/               # Server and client entry points
│   │   ├── client/              # Training loop, forward step, sync, checkpointing
│   │   ├── server/              # Forward service, FedAvg, adaptive scheduler
│   │   ├── shared/              # Compression, serialisation, config, runtime
│   │   └── data/                # Data download, evaluation, plotting
│   ├── dataset/
│   │   └── processed/           # 11 station files (one per federated client)
│   ├── bestweights/             # Model checkpoints (per session)
│   └── results/                 # Training logs, evaluation reports, plots
│
└── paper/                       # IEEE conference paper (IEEEtran)
    ├── csc8114 assessment1.tex  
    ├── csc8114.tex              # Main paper
    ├── refs.bib                 # Bibliography
    └── diagrams/                # Architecture & sequence diagrams (Mermaid + PNG)

Quick Start

Prerequisites

Python 3.11+ with uv
(Optional) Docker & Docker Compose for containerised runs
(Optional) TeX Live for paper compilation

Environment Setup

cd code/
uv sync

Data Download

make download-data

Downloads Open-Meteo/ERA5 historical weather data (2015-01-01 to 2026-03-31) for 11 stations:

11 federated client locations → dataset/processed/ (one station file per client)
Chronological split: train before 2024-01-01, validation during 2024, test from 2025-01-01 to 2026-03-31
Input: previous 48 hours of temperature, humidity, pressure, wind speed, and rain
Target: cumulative rainfall over the next 24 hours; rain label is positive when future 24-hour rainfall is at least 0.5 mm

Training (Native — Recommended for development)

make native-clean
make native-run                                         # uses config.yaml defaults
make native-run NUM_CLIENTS=3                           # quick local smoke test
make native-run CLIENT_DEVICE=mps                       # use Apple Silicon GPU

Training (Docker — single machine)

make docker-run NUM_CLIENTS=11   # server + 11 clients in containers
make docker-clean                # teardown

Training (Distributed — VPS server + 11 Raspberry Pis)

Prerequisites: Tailscale overlay network set up, ansible/inventory.ini populated, Docker image pushed to Docker Hub.

# First time or after code changes: build and push a new image
make dist-build IMAGE_TAG=sha-abc123

# Sync configs, restart server on VPS, deploy clients to all Pis
make dist-start IMAGE_TAG=sha-abc123

# Follow live server logs
make dist-logs

# Fetch results back to Mac when done
make dist-fetch

For a full experiment matrix over the cluster:

make matrix BACKEND=dist

Each scenario's merged config is automatically pushed to VPS and all Pis before that scenario starts — clients always run with the correct overrides applied.

To wipe everything and start fresh:

make dist-restart

Evaluation & Plotting

# Re-evaluate all 17 scenarios
bash run_eval_all.sh

# Or evaluate a single scenario
uv run python src/data/run_evaluation.py \
    --session 2026-05-03_00-20-00 --scenario N01 \
    --eval-max-samples 0 --device mps

# Aggregate all eval reports into one CSV
uv run python src/data/build_matrix_summary.py

# Regenerate paper figures (output → results/graphics/)
uv run python src/data/plot_compression_auprc.py      # Fig 2
uv run python src/data/plot_efficiency_accuracy.py    # Fig 3
uv run python src/data/plot_cross_platform.py         # Fig 4
uv run python src/data/plot_rho_convergence.py        # Fig 5
uv run python src/data/plot_scheduler_timeline.py     # Fig A
uv run python src/data/plot_monthly_performance.py    # Fig B

Experiment Matrix

Scenarios and seeds are defined in matrix.yaml. Each scenario overrides config.yaml for that run only — processes never see the original base config during a matrix run.

make matrix-dry-run                        # preview scenario plan
make matrix                                # run all 17 scenarios × 3 seeds (native)
make matrix ONLY=L09,H15 MAX_RUNS=1        # run selected scenarios only

Run make help for the full command list.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                      gRPC Server                        │
│  ┌────────────┐  ┌───────────┐  ┌────────────────────┐  │
│  │ ServerHead │  │  FedAvg   │  │ Adaptive Scheduler │  │
│  │ (MLP+Dual  │  │ Coordinator│  │ (Compression + ρ)  │  │
│  │  Head)     │  │           │  │                    │  │
│  └────────────┘  └───────────┘  └────────────────────┘  │
└──────────┬──────────────┬──────────────┬────────────────┘
           │ Forward/     │ Synchronize  │ Register/
           │ Backward     │ (FedAvg)     │ Complete
           │              │              │
┌──────────▼──────────────▼──────────────▼────────────────┐
│              gRPC Clients (×11)                         │
│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ClientLSTM│  │Training Loop │  │  Compression &   │  │
│  │(2L LSTM) │  │(local steps) │  │  Latency Report  │  │
│  └──────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────┘

Model (Split LSTM)

Component	Location	Description
ClientLSTM	`src/models/split_lstm.py`	2-layer LSTM encoder; input (batch, 48, 5) → smashed activation (batch, 64)
ServerHead	`src/models/split_lstm.py`	MLP backbone → dual head: rain classifier (logit) + rain regressor (amount)

gRPC Protocol (`proto/fsl.proto`)

RPC	Direction	Purpose
`Register`	Client → Server	Register client, receive assigned ID and session
`Forward`	Client ↔ Server	Send compressed activation, receive gradient + scheduler directives
`Synchronize`	Client ↔ Server	FedAvg weight aggregation (barrier-based with quorum + timeout)
`NotifyCompletion`	Client → Server	Signal training complete; triggers shutdown when all clients finish

Adaptive Scheduler (`src/server/scheduler.py`)

Uses per-client EMA-smoothed latency to choose compression and synchronisation interval:

Latency (EMA)	Compression Mode	Severity	Resulting ρ with base=1, step=1
< 4 ms	float32	0	1
4–10 ms	float16	1	2
≥ 10 ms	int8	2	3

Compression Modes (`src/shared/compression.py`)

Mode	Encoding	64-dim Payload
float32	Raw numpy	256 B
float16	Half-precision	128 B
int8	Scale (4B) + quantised	68 B

Configuration

Two config files, two responsibilities:

config.yaml — runtime config; what a single direct run actually uses:

Section	Description
`model.*`	LSTM hidden size, layers, sequence length, horizon
`training.*`	Learning rate, rounds, local steps, loss weights, focal loss params
`compression.*`	Default mode (`float32`, `float16`, or `int8`)
`federated.*`	Number of clients, base ρ
`scheduler.*`	Latency thresholds, ρ bounds, EMA alpha
`profiler.*`	Synthetic latency generator (base, offsets, jitter, burst)
`data.*`	Dataset paths, train/val/test split boundaries

matrix.yaml — experiment matrix; only read by make matrix:

Section	Description
`experiment_matrix.seeds`	Random seeds for repeated runs
`experiment_matrix.runner.*`	Backend (native/docker), devices, timeout
`experiment_matrix.scenarios`	17 scenario definitions with per-scenario overrides

Each scenario in matrix.yaml deep-merges its overrides onto config.yaml to produce a fully resolved per-run config. Training processes only ever see this merged result — they never read matrix.yaml directly.

Experiment Matrix (N01–M17)

Defined in matrix.yaml. 17 scenarios total, run with 3 seeds each. Results land in results/<session>/<scenario>/ and bestweights/<session>/<scenario>/.

ID	Latency Profile	Compression	ρ	Scheduler	AUPRC	F1	Payload/step
N01	No latency	float32	1	Off	0.6396 ± 0.0032	0.6023	256.0 B
N02	No latency	float16	1	Off	0.6384 ± 0.0106	0.6052	128.0 B
N03	No latency	int8	1	Off	0.6395 ± 0.0078	0.5688	68.0 B
N04	No latency	float32	3	Off	0.6473 ± 0.0001	0.6075	256.0 B
L05	Low (~8 ms)	float32	1	Off	0.6409 ± 0.0070	0.5293	256.0 B
L06	Low (~8 ms)	float16	1	Off	0.6431 ± 0.0083	0.5831	128.0 B
L07	Low (~8 ms)	int8	1	Off	0.6422 ± 0.0061	0.6129	68.0 B
L08	Low (~8 ms)	float32	3	Off	0.6479 ± 0.0004	0.6185	256.0 B
L09	Low (~8 ms)	dynamic	1	Adaptive	0.6415 ± 0.0060	0.6358	92.5 B
L10	Low (~8 ms)	dynamic	dynamic	Adaptive + ρ	0.6474 ± 0.0004	0.5815	92.5 B
H11	High (~50 ms)	float32	1	Off	0.6393 ± 0.0057	0.6085	256.0 B
H12	High (~50 ms)	float16	1	Off	0.6421 ± 0.0053	0.6308	128.0 B
H13	High (~50 ms)	int8	1	Off	0.6407 ± 0.0099	0.5560	68.0 B
H14	High (~50 ms)	float32	3	Off	0.6484 ± 0.0004	0.6265	256.0 B
H15	High (~50 ms)	dynamic	1	Adaptive	0.6381 ± 0.0072	0.5640	68.0 B
H16	High (~50 ms)	dynamic	dynamic	Adaptive + ρ	0.6483 ± 0.0003	0.6222	68.0 B
M17	Mixed (per-client)	dynamic	dynamic	Adaptive + ρ	0.6476 ± 0.0007	0.6368	128.2 B

Raspberry Pi Deployment Matrix

The paper also validates four representative strategies on 11 Raspberry Pi 4B clients connected to a cloud server over a real wide-area link. The profiler is disabled; measured RTT is approximately 21–24 ms, which places the adaptive scheduler in the int8 + ρ=3 regime.

ID	Strategy	Compression	ρ	AUPRC	Total Payload	Sync Traffic	Runtime
P1	Baseline	float32	1	0.6381 ± 0.0052	47.36 ± 10.26 MB	75.77 ± 16.42 MB	3280.1 ± 688.0 s
P2	Compression only	float16	1	0.6434 ± 0.0053	23.93 ± 5.91 MB	76.57 ± 18.92 MB	3318.4 ± 814.3 s
P3	Sparse synchronisation	float32	3	0.6479 ± 0.0008	22.83 ± 0.00 MB	35.06 ± 0.00 MB	3331.4 ± 25.7 s
P4	Joint adaptive	dynamic	dynamic	0.6482 ± 0.0014	6.07 ± 0.00 MB	35.06 ± 0.00 MB	3316.5 ± 10.3 s

Ablation Study Results

Results averaged across 3 random seeds (42, 52, 62). Values shown as mean ± std. Communication cost metrics (total_payload_mb, sync_sent_mb) are summed across seeds to reflect aggregate system cost.

Scenario	AUPRC	ROC-AUC	F1	total_payload_mb (sum)	sync_sent_mb (sum)	throughput_sps	runtime_s	mem_peak_mb	cpu_percent	latency_ms
P1: float32, rho=1 (Baseline)	0.6542 ± 0.0062	0.7150 ± 0.0063	0.6287 ± 0.0428	2576.37	239.08	7.70 ± 1.63	16054.9 ± 3553.6	678.0 ± 52.2	19.2 ± 0.3%	27.09 ± 0.28
P2: float16, rho=1	0.6552 ± 0.0034	0.7165 ± 0.0030	0.5897 ± 0.0475	1360.11	252.43	7.29 ± 1.64	16968.6 ± 3500.9	688.1 ± 47.8	19.2 ± 0.5%	27.13 ± 1.12
P3: float32, rho=3	0.6612 ± 0.0004	0.7228 ± 0.0013	0.6289 ± 0.0490	1132.57	105.19	13.41 ± 0.22	8941.8 ± 149.2	535.7 ± 11.2	23.0 ± 0.2%	27.59 ± 0.38
P4: Adaptive	0.6608 ± 0.0029	0.7247 ± 0.0056	0.6404 ± 0.0191	823.73	100.01	13.30 ± 0.55	9019.8 ± 381.1	536.5 ± 6.7	22.9 ± 0.2%	26.72 ± 1.16

Output Structure

Each training session produces:

bestweights/<session>/<scenario>/
  ├── best_client_<i>_round_<r>_model_<ts>.pth   # best checkpoint per client
  └── periodic/
      ├── client_<i>_round_<r>.pth               # per-round paired checkpoints
      └── server_round_<r>.pth

results/<session>/
  ├── <scenario>/
  │   ├── training_log_client<i>_<ts>.csv         # step-level: loss, latency, payload
  │   ├── training_log_client<i>_meta.json        # best checkpoint path + config
  │   └── server_log_<session>.csv                # per-round aggregation
  ├── <scenario>_eval_report.csv/.json             # test metrics (source of truth)
  ├── graphics/                                   # paper figures (pdf + png)
  │   ├── fig2_compression_auprc.pdf/.png
  │   ├── fig3_efficiency_accuracy.pdf/.png
  │   ├── fig4_cross_platform.pdf/.png
  │   ├── fig5_rho_convergence.pdf/.png
  │   ├── figA_scheduler_timeline.pdf/.png
  │   └── figB_monthly_performance.pdf/.png
  └── matrix_summary.csv                          # one row per scenario, all key metrics

Paper

The IEEE conference paper is in paper/. To compile:

cd paper/
pdflatex csc8114.tex
bibtex csc8114
pdflatex csc8114.tex
pdflatex csc8114.tex

Run pdflatex three times to resolve all cross-references and citations.