A research prototype / experimental platform Federated Split Learning system with adaptive communication control for real-time IoT rainfall prediction.
In real-world IoT sensing networks, distributed weather stations continuously generate large volumes of environmental data. Traditional centralised training requires raw data to be transmitted to a central server — introducing privacy risks, high bandwidth costs, and single points of failure. Federated Split Learning (FSL) addresses this by splitting the neural network between edge clients and a central server: clients compute only the early layers locally and transmit compact intermediate activations instead of raw data.
However, FSL introduces its own communication bottleneck — per-step activation/gradient round trips and periodic model synchronisation can dominate training time under constrained or unstable IoT networks. This project tackles this problem head-on.
A fully functional, end-to-end FSL training platform built on a gRPC microservice architecture with the following core components:
Split LSTM Model — A 2-layer LSTM encoder runs on each client to extract temporal weather features from a 48-hour sliding window (5 meteorological variables), producing a fixed-size 64-dim smashed activation. The server hosts a dual-head prediction module (rain/no-rain classifier + rainfall amount regressor) and uses focal loss plus positive-sample rebalancing to handle rainfall class imbalance.
gRPC Communication Layer — Four well-defined RPCs (Register, Forward, Synchronize, NotifyCompletion) orchestrate the full distributed training lifecycle, including client registration, split forward/backward passes, barrier-based FedAvg aggregation (with configurable quorum and timeout), and graceful shutdown.
Adaptive Communication Scheduler — A latency-aware, rule-based scheduler running on the server side that jointly controls two optimisation dimensions at runtime:
Decisions are driven by EMA-smoothed client-reported latency, enabling the system to dynamically adapt to heterogeneous and non-stationary network conditions without manual tuning.
Reproducible Experiment Matrix — 17 simulation scenarios × 3 random seeds, systematically decoupling the effects of latency profiles (none / low / high / mixed), compression modes, synchronisation intervals, and adaptive vs. static policies. A second four-scenario Raspberry Pi matrix validates the same ideas on physical edge devices over a real wide-area link.
Measured across the updated paper experiments. Simulation metrics use 17 scenarios × 3 seeds; Raspberry Pi metrics use 4 scenarios × 3 seeds.
| Finding | Detail |
|---|---|
| Simulation AUPRC stability | AUPRC remains stable across all 17 scenarios (0.6381–0.6484), showing no measurable degradation from int8 compression or ρ=3 synchronisation. |
| INT8 compression | Per-step activation payload falls from 256 B to 68 B while AUPRC stays within seed-level variance of the float32 baseline. |
| ρ = 3 synchronisation | Synchronisation traffic drops by about 53% in simulation, from roughly 6.7 MB to 3.19 MB, while AUPRC remains stable. |
| Joint adaptive simulation | H16 reaches the lowest simulation communication cost: 0.55 MB total activation payload and 3.19 MB sync traffic. |
| Raspberry Pi deployment | P4 adaptive reduces activation payload by 87% and synchronisation traffic by 54% vs. P1, while keeping AUPRC within 0.011 of all Pi scenarios. |
| Runtime stability on real hardware | P4 reduces runtime jitter from ±688 s (P1) to ±10 s on Raspberry Pi. |
The central paper result is that FSL communication cost can be reduced by an order of magnitude without sacrificing predictive performance when activation compression and synchronisation frequency are controlled jointly rather than independently.
csc8114/
├── code/ # FSL system implementation
│ ├── config.yaml # Runtime config — model, training, data, comms
│ ├── matrix.yaml # Experiment matrix — 17 simulation scenarios & seeds
│ ├── Makefile # Build / run / plot automation
│ ├── proto/fsl.proto # gRPC protocol (4 RPCs)
│ ├── src/
│ │ ├── models/ # ClientLSTM (encoder) + ServerHead (predictor)
│ │ ├── nodes/ # Server and client entry points
│ │ ├── client/ # Training loop, forward step, sync, checkpointing
│ │ ├── server/ # Forward service, FedAvg, adaptive scheduler
│ │ ├── shared/ # Compression, serialisation, config, runtime
│ │ └── data/ # Data download, evaluation, plotting
│ ├── dataset/
│ │ └── processed/ # 11 station files (one per federated client)
│ ├── bestweights/ # Model checkpoints (per session)
│ └── results/ # Training logs, evaluation reports, plots
│
└── paper/ # IEEE conference paper (IEEEtran)
├── csc8114 assessment1.tex
├── csc8114.tex # Main paper
├── refs.bib # Bibliography
└── diagrams/ # Architecture & sequence diagrams (Mermaid + PNG)
cd code/
uv sync
make download-data
Downloads Open-Meteo/ERA5 historical weather data (2015-01-01 to 2026-03-31) for 11 stations:
dataset/processed/ (one station file per client)make native-clean
make native-run # uses config.yaml defaults
make native-run NUM_CLIENTS=3 # quick local smoke test
make native-run CLIENT_DEVICE=mps # use Apple Silicon GPU
make docker-run NUM_CLIENTS=11 # server + 11 clients in containers
make docker-clean # teardown
Prerequisites: Tailscale overlay network set up, ansible/inventory.ini populated, Docker image pushed to Docker Hub.
# First time or after code changes: build and push a new image
make dist-build IMAGE_TAG=sha-abc123
# Sync configs, restart server on VPS, deploy clients to all Pis
make dist-start IMAGE_TAG=sha-abc123
# Follow live server logs
make dist-logs
# Fetch results back to Mac when done
make dist-fetch
For a full experiment matrix over the cluster:
make matrix BACKEND=dist
Each scenario's merged config is automatically pushed to VPS and all Pis before that scenario starts — clients always run with the correct overrides applied.
To wipe everything and start fresh:
make dist-restart
# Re-evaluate all 17 scenarios
bash run_eval_all.sh
# Or evaluate a single scenario
uv run python src/data/run_evaluation.py \
--session 2026-05-03_00-20-00 --scenario N01 \
--eval-max-samples 0 --device mps
# Aggregate all eval reports into one CSV
uv run python src/data/build_matrix_summary.py
# Regenerate paper figures (output → results/graphics/)
uv run python src/data/plot_compression_auprc.py # Fig 2
uv run python src/data/plot_efficiency_accuracy.py # Fig 3
uv run python src/data/plot_cross_platform.py # Fig 4
uv run python src/data/plot_rho_convergence.py # Fig 5
uv run python src/data/plot_scheduler_timeline.py # Fig A
uv run python src/data/plot_monthly_performance.py # Fig B
Scenarios and seeds are defined in matrix.yaml. Each scenario overrides config.yaml for that run only — processes never see the original base config during a matrix run.
make matrix-dry-run # preview scenario plan
make matrix # run all 17 scenarios × 3 seeds (native)
make matrix ONLY=L09,H15 MAX_RUNS=1 # run selected scenarios only
Run make help for the full command list.
┌─────────────────────────────────────────────────────────┐
│ gRPC Server │
│ ┌────────────┐ ┌───────────┐ ┌────────────────────┐ │
│ │ ServerHead │ │ FedAvg │ │ Adaptive Scheduler │ │
│ │ (MLP+Dual │ │ Coordinator│ │ (Compression + ρ) │ │
│ │ Head) │ │ │ │ │ │
│ └────────────┘ └───────────┘ └────────────────────┘ │
└──────────┬──────────────┬──────────────┬────────────────┘
│ Forward/ │ Synchronize │ Register/
│ Backward │ (FedAvg) │ Complete
│ │ │
┌──────────▼──────────────▼──────────────▼────────────────┐
│ gRPC Clients (×11) │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ClientLSTM│ │Training Loop │ │ Compression & │ │
│ │(2L LSTM) │ │(local steps) │ │ Latency Report │ │
│ └──────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────┘
| Component | Location | Description |
|---|---|---|
| ClientLSTM | src/models/split_lstm.py |
2-layer LSTM encoder; input (batch, 48, 5) → smashed activation (batch, 64) |
| ServerHead | src/models/split_lstm.py |
MLP backbone → dual head: rain classifier (logit) + rain regressor (amount) |
proto/fsl.proto)| RPC | Direction | Purpose |
|---|---|---|
Register |
Client → Server | Register client, receive assigned ID and session |
Forward |
Client ↔ Server | Send compressed activation, receive gradient + scheduler directives |
Synchronize |
Client ↔ Server | FedAvg weight aggregation (barrier-based with quorum + timeout) |
NotifyCompletion |
Client → Server | Signal training complete; triggers shutdown when all clients finish |
src/server/scheduler.py)Uses per-client EMA-smoothed latency to choose compression and synchronisation interval:
| Latency (EMA) | Compression Mode | Severity | Resulting ρ with base=1, step=1 |
|---|---|---|---|
| < 4 ms | float32 | 0 | 1 |
| 4–10 ms | float16 | 1 | 2 |
| ≥ 10 ms | int8 | 2 | 3 |
src/shared/compression.py)| Mode | Encoding | 64-dim Payload |
|---|---|---|
| float32 | Raw numpy | 256 B |
| float16 | Half-precision | 128 B |
| int8 | Scale (4B) + quantised | 68 B |
Two config files, two responsibilities:
config.yaml — runtime config; what a single direct run actually uses:
| Section | Description |
|---|---|
model.* |
LSTM hidden size, layers, sequence length, horizon |
training.* |
Learning rate, rounds, local steps, loss weights, focal loss params |
compression.* |
Default mode (float32, float16, or int8) |
federated.* |
Number of clients, base ρ |
scheduler.* |
Latency thresholds, ρ bounds, EMA alpha |
profiler.* |
Synthetic latency generator (base, offsets, jitter, burst) |
data.* |
Dataset paths, train/val/test split boundaries |
matrix.yaml — experiment matrix; only read by make matrix:
| Section | Description |
|---|---|
experiment_matrix.seeds |
Random seeds for repeated runs |
experiment_matrix.runner.* |
Backend (native/docker), devices, timeout |
experiment_matrix.scenarios |
17 scenario definitions with per-scenario overrides |
Each scenario in matrix.yaml deep-merges its overrides onto config.yaml to produce a fully resolved per-run config. Training processes only ever see this merged result — they never read matrix.yaml directly.
Defined in matrix.yaml. 17 scenarios total, run with 3 seeds each. Results land in results/<session>/<scenario>/ and bestweights/<session>/<scenario>/.
| ID | Latency Profile | Compression | ρ | Scheduler | AUPRC | F1 | Payload/step |
|---|---|---|---|---|---|---|---|
| N01 | No latency | float32 | 1 | Off | 0.6396 ± 0.0032 | 0.6023 | 256.0 B |
| N02 | No latency | float16 | 1 | Off | 0.6384 ± 0.0106 | 0.6052 | 128.0 B |
| N03 | No latency | int8 | 1 | Off | 0.6395 ± 0.0078 | 0.5688 | 68.0 B |
| N04 | No latency | float32 | 3 | Off | 0.6473 ± 0.0001 | 0.6075 | 256.0 B |
| L05 | Low (~8 ms) | float32 | 1 | Off | 0.6409 ± 0.0070 | 0.5293 | 256.0 B |
| L06 | Low (~8 ms) | float16 | 1 | Off | 0.6431 ± 0.0083 | 0.5831 | 128.0 B |
| L07 | Low (~8 ms) | int8 | 1 | Off | 0.6422 ± 0.0061 | 0.6129 | 68.0 B |
| L08 | Low (~8 ms) | float32 | 3 | Off | 0.6479 ± 0.0004 | 0.6185 | 256.0 B |
| L09 | Low (~8 ms) | dynamic | 1 | Adaptive | 0.6415 ± 0.0060 | 0.6358 | 92.5 B |
| L10 | Low (~8 ms) | dynamic | dynamic | Adaptive + ρ | 0.6474 ± 0.0004 | 0.5815 | 92.5 B |
| H11 | High (~50 ms) | float32 | 1 | Off | 0.6393 ± 0.0057 | 0.6085 | 256.0 B |
| H12 | High (~50 ms) | float16 | 1 | Off | 0.6421 ± 0.0053 | 0.6308 | 128.0 B |
| H13 | High (~50 ms) | int8 | 1 | Off | 0.6407 ± 0.0099 | 0.5560 | 68.0 B |
| H14 | High (~50 ms) | float32 | 3 | Off | 0.6484 ± 0.0004 | 0.6265 | 256.0 B |
| H15 | High (~50 ms) | dynamic | 1 | Adaptive | 0.6381 ± 0.0072 | 0.5640 | 68.0 B |
| H16 | High (~50 ms) | dynamic | dynamic | Adaptive + ρ | 0.6483 ± 0.0003 | 0.6222 | 68.0 B |
| M17 | Mixed (per-client) | dynamic | dynamic | Adaptive + ρ | 0.6476 ± 0.0007 | 0.6368 | 128.2 B |
The paper also validates four representative strategies on 11 Raspberry Pi 4B clients connected to a cloud server over a real wide-area link. The profiler is disabled; measured RTT is approximately 21–24 ms, which places the adaptive scheduler in the int8 + ρ=3 regime.
| ID | Strategy | Compression | ρ | AUPRC | Total Payload | Sync Traffic | Runtime |
|---|---|---|---|---|---|---|---|
| P1 | Baseline | float32 | 1 | 0.6381 ± 0.0052 | 47.36 ± 10.26 MB | 75.77 ± 16.42 MB | 3280.1 ± 688.0 s |
| P2 | Compression only | float16 | 1 | 0.6434 ± 0.0053 | 23.93 ± 5.91 MB | 76.57 ± 18.92 MB | 3318.4 ± 814.3 s |
| P3 | Sparse synchronisation | float32 | 3 | 0.6479 ± 0.0008 | 22.83 ± 0.00 MB | 35.06 ± 0.00 MB | 3331.4 ± 25.7 s |
| P4 | Joint adaptive | dynamic | dynamic | 0.6482 ± 0.0014 | 6.07 ± 0.00 MB | 35.06 ± 0.00 MB | 3316.5 ± 10.3 s |
Results averaged across 3 random seeds (42, 52, 62). Values shown as mean ± std.
Communication cost metrics (total_payload_mb, sync_sent_mb) are summed across seeds to reflect aggregate system cost.
| Scenario | AUPRC | ROC-AUC | F1 | total_payload_mb (sum) | sync_sent_mb (sum) | throughput_sps | runtime_s | mem_peak_mb | cpu_percent | latency_ms |
|---|---|---|---|---|---|---|---|---|---|---|
| P1: float32, rho=1 (Baseline) | 0.6542 ± 0.0062 | 0.7150 ± 0.0063 | 0.6287 ± 0.0428 | 2576.37 | 239.08 | 7.70 ± 1.63 | 16054.9 ± 3553.6 | 678.0 ± 52.2 | 19.2 ± 0.3% | 27.09 ± 0.28 |
| P2: float16, rho=1 | 0.6552 ± 0.0034 | 0.7165 ± 0.0030 | 0.5897 ± 0.0475 | 1360.11 | 252.43 | 7.29 ± 1.64 | 16968.6 ± 3500.9 | 688.1 ± 47.8 | 19.2 ± 0.5% | 27.13 ± 1.12 |
| P3: float32, rho=3 | 0.6612 ± 0.0004 | 0.7228 ± 0.0013 | 0.6289 ± 0.0490 | 1132.57 | 105.19 | 13.41 ± 0.22 | 8941.8 ± 149.2 | 535.7 ± 11.2 | 23.0 ± 0.2% | 27.59 ± 0.38 |
| P4: Adaptive | 0.6608 ± 0.0029 | 0.7247 ± 0.0056 | 0.6404 ± 0.0191 | 823.73 | 100.01 | 13.30 ± 0.55 | 9019.8 ± 381.1 | 536.5 ± 6.7 | 22.9 ± 0.2% | 26.72 ± 1.16 |
Each training session produces:
bestweights/<session>/<scenario>/
├── best_client_<i>_round_<r>_model_<ts>.pth # best checkpoint per client
└── periodic/
├── client_<i>_round_<r>.pth # per-round paired checkpoints
└── server_round_<r>.pth
results/<session>/
├── <scenario>/
│ ├── training_log_client<i>_<ts>.csv # step-level: loss, latency, payload
│ ├── training_log_client<i>_meta.json # best checkpoint path + config
│ └── server_log_<session>.csv # per-round aggregation
├── <scenario>_eval_report.csv/.json # test metrics (source of truth)
├── graphics/ # paper figures (pdf + png)
│ ├── fig2_compression_auprc.pdf/.png
│ ├── fig3_efficiency_accuracy.pdf/.png
│ ├── fig4_cross_platform.pdf/.png
│ ├── fig5_rho_convergence.pdf/.png
│ ├── figA_scheduler_timeline.pdf/.png
│ └── figB_monthly_performance.pdf/.png
└── matrix_summary.csv # one row per scenario, all key metrics
The IEEE conference paper is in paper/. To compile:
cd paper/
pdflatex csc8114.tex
bibtex csc8114
pdflatex csc8114.tex
pdflatex csc8114.tex
Run
pdflatexthree times to resolve all cross-references and citations.