This subtree contains the GPU-ready code used for the large-scale QM9 evaluation of the ★_G framework against equivariant neural network baselines:
The MATLAB code in the rest of the repository remains the reference
implementation. The PyTorch code here is a faithful re-implementation that
scales to GPU and integrates with the e3nn / mace-torch baselines.
large_scale/
├── starg_torch/ PyTorch ★_G algebra
│ ├── algebra.py Group + cached F_G + irrep tables (cyclic, dihedral, octahedral, products)
│ ├── product.py Batched ★_G product (torch.fft for cyclic, einsum for general)
│ ├── svd.py Batched ★_G-SVD on GPU
│ ├── features.py torch port of Algorithm 2 (extractStarGFeatures)
│ ├── neural.py Neural ★_G (torch.nn.Module)
│ └── octahedral.py 24-element octahedral group + 5 irreps
├── data/
│ ├── qm9.py Full QM9 loader (PyG-compatible)
│ └── featurizers.py molecule → tensor mapping for each group
├── targets/
│ ├── scalar.py HOMO-LUMO gap, ZPVE, etc.
│ ├── vector.py Dipole vector µ (rank-1 target)
│ └── tensor.py Polarizability α (rank-2 target)
├── train_starg.py unified ★_G entry point (ridge | neural)
├── train_baseline_mlp.py Standard / Invariant / Augmented MLP
├── train_baseline_schnet.py SchNet (invariant ENN baseline)
├── train_baseline_e3nn.py e3nn-based equivariant baseline
├── train_baseline_mace.py MACE (current SOTA)
├── eval_collect.py merges per-method JSON results into a table
└── bsub/ IBM CCC LSF submission files
├── submit_starg_ridge.bsub array job 1..18 (target × seed)
├── submit_starg_neural.bsub array job 1..18
├── submit_mlp.bsub array job 1..54 (mode × target × seed)
├── submit_schnet.bsub array job 1..12 (scalars × seed)
├── submit_e3nn.bsub array job 1..18
├── submit_mace.bsub array job 1..18
└── submit_all.sh bsubs every .bsub above
The CCC compute nodes do not have conda; we use module load + user-level
pip install --user. Each .bsub script does this on first run, so there is
no manual env step beyond pushing the code:
# On the CCC login node, just push the code (see "Sending files" below)
# and submit. Dependencies install on the compute node from requirements.txt.
cd ~/starg/python/large_scale
bash bsub/submit_all.sh
This issues six bsub calls, one per array job, totaling 138 slots
(see bsub/submit_all.sh for the per-method counts).
Single-method launches:
bsub < bsub/submit_starg_ridge.bsub # ★_G-SVD + Ridge only
bsub < bsub/submit_mace.bsub # MACE only
To re-run a single array index (e.g. seed 1, target gap of MACE, which is array index 4):
bsub -J "starg_mace[4]" < bsub/submit_mace.bsub
This launches one job per (method, target) combination. Each job writes a
JSON result file under results/<method>/<target>/seed<k>.json. After all
jobs finish, run python eval_collect.py to assemble the revised result
tables.
| Method | Target: HOMO-LUMO | Target: µ vector | Target: α scalar | Target: α tensor |
|---|---|---|---|---|
| ★_G-SVD + Ridge | yes | yes (per-component) | yes | yes (full tensor) |
| Neural ★_G | yes | yes | yes | yes |
| Standard MLP | yes | yes | yes | yes |
| Invariant MLP | yes | yes | yes | yes |
| Augmented MLP | yes | yes | yes | yes |
| SchNet | yes | n/a (invariant) | yes | n/a |
| e3nn (SE(3)-equiv) | yes | yes | yes | yes |
| MACE | yes | yes | yes | yes |
The ENN baselines are pulled at pinned versions (see requirements.txt)
to ensure exact reproducibility.
The bsub/ subdirectory contains LSF submission scripts that assume the
repo lives at $HOME/starg/ on the remote host and that QM9 .xyz
files sit under $QM9_DIR (defaulting to
$HOME/data/qm9/dsgdb9nsd/; override per-launch with
QM9_DIR=/path/to/your/qm9 bash bsub/submit_all.sh). The scripts
were written for IBM LSF (bsub) but the per-method invocations are
plain Python, so they port to SLURM (sbatch) or local execution by
replacing the #BSUB directives.
ccc=<user>@<your-cluster-host>
ssh "$ccc" 'mkdir -p ~/starg'
rsync -avz --exclude '.git' --exclude '__pycache__' \
--exclude 'logs/' --exclude 'results/' \
./ "$ccc":~/starg/
scp -r -C -p works equivalently if rsync isn't available.
ssh "$ccc" 'cd ~/starg/python/large_scale && bash bsub/submit_all.sh'
ssh "$ccc" 'bjobs -l | head -40' # watch
rsync -avz "$ccc":~/starg/python/large_scale/results/ \
./python/large_scale/results/
~/.ssh/configHost ccc
HostName <your-cluster-host>
User <your-username>
ServerAliveInterval 60
ControlMaster auto
ControlPath ~/.ssh/cm-%r@%h:%p
ControlPersist 10m
Once configured the commands above shorten to ssh ccc ....
The bsub files default to QM9_DIR=$HOME/data/qm9/dsgdb9nsd. If
your QM9 files live elsewhere, override at submit
time without editing the .bsub files:
ssh ccc 'cd ~/starg/python/large_scale && QM9_DIR=/mnt/myshare/qm9/dsgdb9nsd bash bsub/submit_all.sh'
A single A100 (40 GB) handles every ★_G method in <1 hour for full QM9. MACE on QM9 takes ~6 hours per seed on a single H100. The submission scripts request 1×H100 and 16-core/64GB host memory by default.