# Large-scale experiments and ENN baselines This subtree contains the GPU-ready code used for the large-scale QM9 evaluation of the ★_G framework against equivariant neural network baselines: * **Full QM9** (~134k molecules) for HOMO-LUMO gap and ZPVE * **Molecular tensor prediction**: dipole vector µ (rank-1) and isotropic polarizability α (rank-2) * **ENN baselines**: SchNet (invariant), e3nn-based SE(3)-equivariant model, MACE (current SOTA on QM9) * **Matched protocols**: identical train/val/test splits, identical seeds, identical evaluation metrics across all methods The MATLAB code in the rest of the repository remains the reference implementation. The PyTorch code here is a faithful re-implementation that scales to GPU and integrates with the `e3nn` / `mace-torch` baselines. ## Layout ``` large_scale/ ├── starg_torch/ PyTorch ★_G algebra │ ├── algebra.py Group + cached F_G + irrep tables (cyclic, dihedral, octahedral, products) │ ├── product.py Batched ★_G product (torch.fft for cyclic, einsum for general) │ ├── svd.py Batched ★_G-SVD on GPU │ ├── features.py torch port of Algorithm 2 (extractStarGFeatures) │ ├── neural.py Neural ★_G (torch.nn.Module) │ └── octahedral.py 24-element octahedral group + 5 irreps ├── data/ │ ├── qm9.py Full QM9 loader (PyG-compatible) │ └── featurizers.py molecule → tensor mapping for each group ├── targets/ │ ├── scalar.py HOMO-LUMO gap, ZPVE, etc. │ ├── vector.py Dipole vector µ (rank-1 target) │ └── tensor.py Polarizability α (rank-2 target) ├── train_starg.py unified ★_G entry point (ridge | neural) ├── train_baseline_mlp.py Standard / Invariant / Augmented MLP ├── train_baseline_schnet.py SchNet (invariant ENN baseline) ├── train_baseline_e3nn.py e3nn-based equivariant baseline ├── train_baseline_mace.py MACE (current SOTA) ├── eval_collect.py merges per-method JSON results into a table └── bsub/ IBM CCC LSF submission files ├── submit_starg_ridge.bsub array job 1..18 (target × seed) ├── submit_starg_neural.bsub array job 1..18 ├── submit_mlp.bsub array job 1..54 (mode × target × seed) ├── submit_schnet.bsub array job 1..12 (scalars × seed) ├── submit_e3nn.bsub array job 1..18 ├── submit_mace.bsub array job 1..18 └── submit_all.sh bsubs every .bsub above ``` ## Reproducing the revised experiments on IBM CCC ### Setup (one-time) The CCC compute nodes do not have conda; we use `module load` + user-level `pip install --user`. Each .bsub script does this on first run, so there is no manual env step beyond pushing the code: ```bash # On the CCC login node, just push the code (see "Sending files" below) # and submit. Dependencies install on the compute node from requirements.txt. ``` ### Launch all jobs ```bash cd ~/starg/python/large_scale bash bsub/submit_all.sh ``` This issues six `bsub` calls, one per array job, totaling 138 slots (see `bsub/submit_all.sh` for the per-method counts). Single-method launches: ```bash bsub < bsub/submit_starg_ridge.bsub # ★_G-SVD + Ridge only bsub < bsub/submit_mace.bsub # MACE only ``` To re-run a single array index (e.g. seed 1, target gap of MACE, which is array index 4): ```bash bsub -J "starg_mace[4]" < bsub/submit_mace.bsub ``` This launches one job per (method, target) combination. Each job writes a JSON result file under `results///seed.json`. After all jobs finish, run `python eval_collect.py` to assemble the revised result tables. ## Methods covered | Method | Target: HOMO-LUMO | Target: µ vector | Target: α scalar | Target: α tensor | |---|---|---|---|---| | ★_G-SVD + Ridge | yes | yes (per-component) | yes | yes (full tensor) | | Neural ★_G | yes | yes | yes | yes | | Standard MLP | yes | yes | yes | yes | | Invariant MLP | yes | yes | yes | yes | | Augmented MLP | yes | yes | yes | yes | | SchNet | yes | n/a (invariant) | yes | n/a | | e3nn (SE(3)-equiv) | yes | yes | yes | yes | | MACE | yes | yes | yes | yes | The ENN baselines are pulled at pinned versions (see `requirements.txt`) to ensure exact reproducibility. ## Running on a remote GPU cluster The `bsub/` subdirectory contains LSF submission scripts that assume the repo lives at `$HOME/starg/` on the remote host and that QM9 `.xyz` files sit under `$QM9_DIR` (defaulting to `$HOME/data/qm9/dsgdb9nsd/`; override per-launch with `QM9_DIR=/path/to/your/qm9 bash bsub/submit_all.sh`). The scripts were written for IBM LSF (`bsub`) but the per-method invocations are plain Python, so they port to SLURM (`sbatch`) or local execution by replacing the `#BSUB` directives. ### Sending the code to the cluster ```bash ccc=@ ssh "$ccc" 'mkdir -p ~/starg' rsync -avz --exclude '.git' --exclude '__pycache__' \ --exclude 'logs/' --exclude 'results/' \ ./ "$ccc":~/starg/ ``` `scp -r -C -p` works equivalently if `rsync` isn't available. ### Submit jobs ```bash ssh "$ccc" 'cd ~/starg/python/large_scale && bash bsub/submit_all.sh' ssh "$ccc" 'bjobs -l | head -40' # watch ``` ### Pull results back ```bash rsync -avz "$ccc":~/starg/python/large_scale/results/ \ ./python/large_scale/results/ ``` ### Optional: stash the host in `~/.ssh/config` ``` Host ccc HostName User ServerAliveInterval 60 ControlMaster auto ControlPath ~/.ssh/cm-%r@%h:%p ControlPersist 10m ``` Once configured the commands above shorten to `ssh ccc ...`. ### Note on QM9 data location The bsub files default to `QM9_DIR=$HOME/data/qm9/dsgdb9nsd`. If your QM9 files live elsewhere, override at submit time without editing the .bsub files: ```powershell ssh ccc 'cd ~/starg/python/large_scale && QM9_DIR=/mnt/myshare/qm9/dsgdb9nsd bash bsub/submit_all.sh' ``` ## Hardware sizing A single A100 (40 GB) handles every ★_G method in <1 hour for full QM9. MACE on QM9 takes ~6 hours per seed on a single H100. The submission scripts request 1×H100 and 16-core/64GB host memory by default.