# Large-scale experiments and ENN baselines

This subtree contains the GPU-ready code used for the large-scale
QM9 evaluation of the ★_G framework against equivariant neural network
baselines:

* **Full QM9** (~134k molecules) for HOMO-LUMO gap and ZPVE
* **Molecular tensor prediction**: dipole vector µ (rank-1) and isotropic
  polarizability α (rank-2)
* **ENN baselines**: SchNet (invariant), e3nn-based SE(3)-equivariant model,
  MACE (current SOTA on QM9)
* **Matched protocols**: identical train/val/test splits, identical seeds,
  identical evaluation metrics across all methods

The MATLAB code in the rest of the repository remains the reference
implementation. The PyTorch code here is a faithful re-implementation that
scales to GPU and integrates with the `e3nn` / `mace-torch` baselines.

## Layout

```
large_scale/
├── starg_torch/                 PyTorch ★_G algebra
│   ├── algebra.py               Group + cached F_G + irrep tables (cyclic, dihedral, octahedral, products)
│   ├── product.py               Batched ★_G product (torch.fft for cyclic, einsum for general)
│   ├── svd.py                   Batched ★_G-SVD on GPU
│   ├── features.py              torch port of Algorithm 2 (extractStarGFeatures)
│   ├── neural.py                Neural ★_G (torch.nn.Module)
│   └── octahedral.py            24-element octahedral group + 5 irreps
├── data/
│   ├── qm9.py                   Full QM9 loader (PyG-compatible)
│   └── featurizers.py           molecule → tensor mapping for each group
├── targets/
│   ├── scalar.py                HOMO-LUMO gap, ZPVE, etc.
│   ├── vector.py                Dipole vector µ (rank-1 target)
│   └── tensor.py                Polarizability α (rank-2 target)
├── train_starg.py               unified ★_G entry point (ridge | neural)
├── train_baseline_mlp.py        Standard / Invariant / Augmented MLP
├── train_baseline_schnet.py     SchNet (invariant ENN baseline)
├── train_baseline_e3nn.py       e3nn-based equivariant baseline
├── train_baseline_mace.py       MACE (current SOTA)
├── eval_collect.py              merges per-method JSON results into a table
└── bsub/                        IBM CCC LSF submission files
    ├── submit_starg_ridge.bsub      array job 1..18  (target × seed)
    ├── submit_starg_neural.bsub     array job 1..18
    ├── submit_mlp.bsub              array job 1..54  (mode × target × seed)
    ├── submit_schnet.bsub           array job 1..12  (scalars × seed)
    ├── submit_e3nn.bsub             array job 1..18
    ├── submit_mace.bsub             array job 1..18
    └── submit_all.sh                bsubs every .bsub above
```

## Reproducing the revised experiments on IBM CCC

### Setup (one-time)

The CCC compute nodes do not have conda; we use `module load` + user-level
`pip install --user`. Each .bsub script does this on first run, so there is
no manual env step beyond pushing the code:

```bash
# On the CCC login node, just push the code (see "Sending files" below)
# and submit. Dependencies install on the compute node from requirements.txt.
```

### Launch all jobs

```bash
cd ~/starg/python/large_scale
bash bsub/submit_all.sh
```

This issues six `bsub` calls, one per array job, totaling 138 slots
(see `bsub/submit_all.sh` for the per-method counts).
Single-method launches:

```bash
bsub < bsub/submit_starg_ridge.bsub      # ★_G-SVD + Ridge only
bsub < bsub/submit_mace.bsub             # MACE only
```

To re-run a single array index (e.g. seed 1, target gap of MACE, which is
array index 4):

```bash
bsub -J "starg_mace[4]" < bsub/submit_mace.bsub
```

This launches one job per (method, target) combination. Each job writes a
JSON result file under `results/<method>/<target>/seed<k>.json`. After all
jobs finish, run `python eval_collect.py` to assemble the revised result
tables.

## Methods covered

| Method | Target: HOMO-LUMO | Target: µ vector | Target: α scalar | Target: α tensor |
|---|---|---|---|---|
| ★_G-SVD + Ridge | yes | yes (per-component) | yes | yes (full tensor) |
| Neural ★_G | yes | yes | yes | yes |
| Standard MLP | yes | yes | yes | yes |
| Invariant MLP | yes | yes | yes | yes |
| Augmented MLP | yes | yes | yes | yes |
| SchNet | yes | n/a (invariant) | yes | n/a |
| e3nn (SE(3)-equiv) | yes | yes | yes | yes |
| MACE | yes | yes | yes | yes |

The ENN baselines are pulled at pinned versions (see `requirements.txt`)
to ensure exact reproducibility.

## Running on a remote GPU cluster

The `bsub/` subdirectory contains LSF submission scripts that assume the
repo lives at `$HOME/starg/` on the remote host and that QM9 `.xyz`
files sit under `$QM9_DIR` (defaulting to
`$HOME/data/qm9/dsgdb9nsd/`; override per-launch with
`QM9_DIR=/path/to/your/qm9 bash bsub/submit_all.sh`). The scripts
were written for IBM LSF (`bsub`) but the per-method invocations are
plain Python, so they port to SLURM (`sbatch`) or local execution by
replacing the `#BSUB` directives.

### Sending the code to the cluster

```bash
ccc=<user>@<your-cluster-host>
ssh "$ccc" 'mkdir -p ~/starg'
rsync -avz --exclude '.git' --exclude '__pycache__' \
    --exclude 'logs/' --exclude 'results/' \
    ./ "$ccc":~/starg/
```

`scp -r -C -p` works equivalently if `rsync` isn't available.

### Submit jobs

```bash
ssh "$ccc" 'cd ~/starg/python/large_scale && bash bsub/submit_all.sh'
ssh "$ccc" 'bjobs -l | head -40'    # watch
```

### Pull results back

```bash
rsync -avz "$ccc":~/starg/python/large_scale/results/ \
    ./python/large_scale/results/
```

### Optional: stash the host in `~/.ssh/config`

```
Host ccc
    HostName            <your-cluster-host>
    User                <your-username>
    ServerAliveInterval 60
    ControlMaster       auto
    ControlPath         ~/.ssh/cm-%r@%h:%p
    ControlPersist      10m
```

Once configured the commands above shorten to `ssh ccc ...`.

### Note on QM9 data location

The bsub files default to `QM9_DIR=$HOME/data/qm9/dsgdb9nsd`. If
your QM9 files live elsewhere, override at submit
time without editing the .bsub files:

```powershell
ssh ccc 'cd ~/starg/python/large_scale && QM9_DIR=/mnt/myshare/qm9/dsgdb9nsd bash bsub/submit_all.sh'
```

## Hardware sizing

A single A100 (40 GB) handles every ★_G method in <1 hour for full QM9.
MACE on QM9 takes ~6 hours per seed on a single H100. The submission
scripts request 1×H100 and 16-core/64GB host memory by default.