tensor-group-sym / latex / main.tex
main.tex
Raw

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{array}
\usepackage{hyperref}
\usepackage{authblk}
\usepackage{lineno}
\usepackage{natbib}
\usepackage{xr-hyper}
% Cross-reference labels from supplementary.tex (algorithms, sec:lean).
% Compile supplementary.tex first so supplementary.aux exists; main.tex
% then resolves \ref{alg:...} and \ref{sec:lean} from it.
\externaldocument{supplementary}

% Figure files are placed in a figures/ subfolder
\graphicspath{{figures/}}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}

\newcommand{\starG}{\star_G}
\newcommand{\starM}{\star_M}
\newcommand{\calA}{\mathcal{A}}
\newcommand{\calB}{\mathcal{B}}
\newcommand{\calC}{\mathcal{C}}
\newcommand{\calI}{\mathcal{I}}
\newcommand{\calS}{\mathcal{S}}
\newcommand{\calT}{\mathcal{T}}
\newcommand{\calU}{\mathcal{U}}
\newcommand{\calV}{\mathcal{V}}
\newcommand{\calX}{\mathcal{X}}
\newcommand{\bbR}{\mathbb{R}}
\newcommand{\bbZ}{\mathbb{Z}}
\newcommand{\Rsq}{R\textsuperscript{2}}

% Use the journal-supplied naturemag.bst (included in Nature's LaTeX template
% package). If that file is not yet available, unsrtnat is an acceptable
% substitute during internal review.
\hypersetup{colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue}

\begin{document}

\linenumbers   % required for peer-review submission

\title{\textbf{Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery}}

\author[4]{Paulina Hoyos}
\author[1]{Shashanka Ubaru}
\author[5]{Dongsung Huh}
\author[1]{Vasileios Kalantzis}
\author[1]{{Kenneth L. Clarkson}}
\author[3]{Misha Kilmer}
\author[2]{Haim Avron}
\author[1]{Lior Horesh}

\affil[1]{IBM Research}
\affil[2]{Tel-Aviv University}
\affil[3]{Tufts University}
\affil[4]{UT Austin}
\affil[5]{Independent}

\date{}
\maketitle

\begin{abstract}
We introduce the $\starG$ tensor algebra, in which any finite group $G$
defines the multiplication rule, making equivariance an intrinsic
algebraic property rather than an architectural constraint. The
framework rests on three machine-verified theoretical pillars: (i)~an
Eckart--Young optimality guarantee for the $\starG$-SVD: the first
such result for symmetry-preserving tensor approximation, exact and
polynomial-time where Tucker is only $\sqrt{d}$-quasi-optimal, CP is
NP-hard, and tensor-train has no global optimality; (ii)~a Kronecker
factorization that composes multiple symmetries by replacing $F_G$ with
$F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a
600-line Lean~4 formalization of the $\starG$ algebra with zero
unresolved proof obligations. The framework provides capabilities that
equivariant neural networks (ENNs) structurally cannot: a closed-form
per-irreducible-representation decomposition of every prediction, and
data-driven discovery of the symmetry group that best fits a dataset.
As a non-trivial empirical demonstration, decomposing QM9 molecular
geometry over the chiral octahedral subgroup of SO(3) recovers the
Wigner--Eckart selection rules of angular momentum from data alone,
with no quantum-mechanical input: scalar properties are A$_1$-dominated,
dipole components are T$_1$-dominated, the isotropic polarizability is
uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition
$l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power
ratio separates vector observables from scalar observables by a factor
of five. On full QM9 (130{,}831 molecules), $\starG$-SVD with ridge
regression provides closed-form predictions at $\sim$50--90$\times$
fewer parameters than parameter-matched MLPs and competitive accuracy
in the parameter-efficient regime; a within-isomer audit shows that
the apparent advantage of larger models on pooled $R^2$ is largely a
size-prediction effect that vanishes once chemistry is controlled for.
Algebraic equivariance thus complements architectural equivariance not
as a faster--better--cheaper alternative but as a different
mathematical affordance: provably-optimal symmetry-preserving
compression, machine-verified equivariance, per-irrep interpretability,
and data-driven physical discovery.
\end{abstract}

\section{Introduction}

Much of the data encountered in science and engineering is inherently
multidimensional: molecular configurations encode three-dimensional atomic
positions, quantum states exist in exponentially large Hilbert spaces, and sensor
arrays sample signals across spatial and temporal domains. Traditional machine
learning methods typically vectorize such data, collapsing its natural tensor
structure into flat feature vectors~\citep{kolda2009tensor,sidiropoulos2017tensor}.
This is akin to unfolding an origami crane into a flat sheet: the operation is
technically lossless, but the geometry that gives the object its meaning is
destroyed. All subsequent processing must then recover, implicitly and at great
computational cost, the structure that was discarded at the outset.

Symmetry compounds this problem. A molecule exists in three-dimensional space:
it can be rotated without changing its properties (rotational symmetry), and its
identical atoms can be indexed in any order without changing the molecule itself
(permutation symmetry). These symmetries coexist and
interact~\citep{noether1918,bronstein2021geometric}, yet vectorization treats them
as incidental features of the data rather than as fundamental structural
constraints.

The dominant paradigm for incorporating symmetry is through Equivariant Neural
Networks
(ENNs)~\citep{cohen2016group,thomas2018tensor,fuchs2020se3,batzner2022e3}, which
have achieved remarkable success in molecular property
prediction~\citep{schutt2017schnet} and protein
structure~\citep{jumper2021alphafold}. ENNs handle symmetry through architecture:
to respect a rotation, one engineers rotation-equivariant layers; to respect a
permutation, one engineers permutation-equivariant layers. But when multiple
symmetries coexist, as they do in virtually all physical systems, the
architectural approach faces a combinatorial wall. The blueprint must be
redesigned from scratch for each new combination of symmetries, with no guarantee
that the resulting representation is optimal in any rigorous sense. The physics
is hard-coded into the network topology, and changing the physics means rebuilding
the network.

In this article, we propose a different philosophy. Instead of constraining the
architecture to fit the symmetry, we change the mathematics to fit the geometry
of the data. Building on the $\starM$ algebra of Kilmer and
collaborators~\citep{kilmer2021tensor,kernfeld2015tensor}, we construct a tensor algebra,
$\starG$, where any finite group $G$ defines the multiplication rule. The
resulting algebra inherits equivariance as an intrinsic property of
multiplication, not as an architectural constraint. Composing multiple symmetries
requires only specifying the direct product $G_1 \times G_2 \times \cdots \times
G_d$; no redesign is needed. The $\starG$ algebra admits an SVD with provable
Eckart--Young optimality (Theorem~\ref{thm:eckart_young}), and, by the
Peter--Weyl theorem~\citep{serre1977linear,peterweil1927}, decomposes naturally
into irreducible representation channels that can reveal the symmetry content of
physical observables (Section~\ref{sec:wigner_eckart}). The Wigner--Eckart
theorem (1931) states that matrix elements of tensor operators between angular
momentum eigenstates factorize into a geometric part (Clebsch--Gordan
coefficient) and a reduced matrix element independent of magnetic quantum numbers,
implying selection rules: an operator of rank $l$ couples only states whose
angular momenta differ by at most $l$. We show empirically that these rules are
recoverable from data alone via $\starG$ decomposition.

\begin{figure}[!t]
\centering
\includegraphics[width=\textwidth]{Figure1.pdf}
\caption{\textbf{The $\starG$ tensor algebra: from optimal decomposition to
symmetry discovery.}
(\textbf{Top left, From molecules to algebra}) Molecular data measured under all
elements of a symmetry group $G$ form a structured tensor
$\calA \in \bbR^{n \times d \times |G|}$, preserving geometric information that
is destroyed by vectorization into $A \in \bbR^{n \cdot d \times |G|}$.
(\textbf{Top right, The $\starG$ product}) Two tensors are multiplied via group
convolution along the tube dimension, computed efficiently in the Fourier domain
via the Peter--Weyl theorem: $F_G$ transforms each tensor to its block-diagonal
spectral form, standard matrix products are applied per irreducible representation
block, and $F_G^{-1}$ returns the result. The group $G$ can be any finite group
(a single symmetry or a product $G_1 \times G_2 \times \cdots$) with no
architectural changes required.
(\textbf{Bottom left, The $\starG$-SVD}) Every $\starG$-tensor admits a
factorization $\calA = \calU \starG \calS \starG \calV^H$. The rank-$k$
truncation $\calA_k$ is provably optimal:
$\|\calA - \calA_k\|_F \leq \|\calA - \calB\|_F$ for any rank-$k$ equivariant
tensor $\calB$ (Eckart--Young theorem for $\starG$, Theorem~\ref{thm:eckart_young}).
(\textbf{Bottom right, From Eckart--Young to Wigner--Eckart}) The same algebraic
framework that delivers optimal low-rank compression also serves as a spectroscope
for physical symmetry. Decomposing predictive power by irreducible representation
(irrep) over the octahedral group recovers the Wigner--Eckart selection rules
directly from molecular geometry data: scalar properties are dominated by the
$l\!=\!0$ (A$_1$) channel, dipole vector components require the $l\!=\!1$
(T$_1$) channel, and polarizability is uniquely insensitive to $l\!=\!1$.}
\label{fig:framework}
\end{figure}

\section{Results}

\subsection{The \texorpdfstring{$\starG$}{starG} Algebra}
\label{sec:theory}

\subsubsection*{Theoretical Foundation}

Let $G$ be a finite group of order $n$ with elements $\{g_1, g_2, \ldots, g_n\}$. We define the \emph{convolution tensor} $\calT \in \bbR^{n \times n \times n}$ by
\begin{equation}
\calT(a, b, c) = \begin{cases} 1 & \text{if } ab = c \\ 0 & \text{otherwise}
\end{cases}
\end{equation}
for all $a, b, c \in G$. This tensor encodes the complete multiplication table of $G$: its frontal slices are permutation matrices, and it satisfies an associativity identity inherited directly from group associativity. By the Peter--Weyl theorem~\citep{serre1977linear,peterweil1927}, $\calT$ admits a spectral decomposition:
\begin{equation}\label{eq:spectral}
\calT(a, b, c) = \sum_{i,j,k} \calC(i, j, k) \, F_G(a, i) \, F_G(b, j) \, F_G^{-1}(c, k),
\end{equation}
where $F_G$ is a generalized Group Fourier transform matrix assembled from the
irreducible unitary representations (irreps) of~$G$, and $\calC$ is a sparse core
tensor encoding the block-diagonal matrix multiplication structure of the irreps
of~$G$. For abelian groups, $F_G$ is a generalized Fourier matrix (for cyclic groups, it reduces to the standard DFT matrix) and $\calC$ is
diagonal; for non-abelian groups, $F_G$ is an invertible matrix. The precise
definition of $F_G$ is provided in the Supplementary Information (SI Section~2).
Crucially, equation (\ref{eq:spectral}) means that group convolution in the
original domain corresponds to \emph{block-diagonal matrix multiplication} in the
Fourier domain: one independent matrix product per irrep block, enabling highly
efficient computation.

We view order-$(2+d)$ tensors $\calA \in \bbR^{\ell \times m \times n_1 \times
\cdots \times n_d}$ as $\ell \times m$ matrices whose entries (which are
$d$-order tensors) lie in the convolutional ring $\mathbb{K}_G$ for
$G = G_1 \times \cdots \times G_d$ and $n_i = |G_i|$. The \emph{$\starG$ product}  of $\cal{A}$ with $\calB \in \bbR^{m \times p \times n_1 \times
\cdots \times n_d}$
is defined via group convolution along the group dimensions:
\begin{equation}
(\calA \starG \calB)_{ij}(c_1, \ldots, c_d) = \sum_{k} \sum_{(a_1, \ldots,
a_d) \in G} \calA_{ik}(a_1, \ldots, a_d) \, \calB_{kj}(a_1^{-1}c_1, \ldots,
a_d^{-1}c_d).
\end{equation}
This product defines a novel tensor algebra~\citep{kernfeld2015tensor}
that generalizes classical matrix algebra while embedding the symmetry group $G$
directly into the multiplicative structure. The resulting algebraic system
supports a full suite of matrix-mimetic operations (inverses, transposes, norms,
and decompositions), all inheriting equivariance by construction. Equivariance
is thus a property of the algebra, not an imposed constraint. The $\starG$
product can be computed efficiently by (i)~applying $F_G$ to each tensor along
its group dimension, (ii)~performing standard matrix products at each of the
$|\hat{G}|$ Fourier irreps independently in parallel, and
(iii)~applying $F_G^{-1}$ to recover the result. For Abelian groups $G$, the total cost is
$O(n \ell m p + n \log n)$ including the Fourier transforms, matching the
complexity of a single matrix product up to logarithmic factors.

The \emph{$\starG$-Hermitian transpose} $\calA^H \in \bbR^{m \times \ell \times
n_1 \times \cdots \times n_d}$ is defined entry-wise by
\begin{equation}\label{eq:transpose}
(\calA^H)_{ij}(g_1,\ldots,g_d) = \overline{\calA_{ji}(g_1^{-1},\ldots,g_d^{-1})},
\end{equation}
where the overline denotes complex conjugation (trivial for real-valued tensors).
Equivalently, in the Fourier domain,
$\widehat{\calA^H}(:,:,\rho) = \hat{\calA}(:,:,\rho)^H$ for every irrep $\rho$
(see SI Section~3 for the precise definition of $\hat{\calA}(:,:,\rho)$),
so the $\starG$-transpose maps to the ordinary matrix Hermitian transpose at each
irrep block. This definition makes $\starG$-unitarity and the SVD factor
conditions below fully analogous to their matrix counterparts.

\subsubsection*{The $\starG$-SVD and Optimality}

Every tensor $\calA$ in the $\starG$ algebra admits a singular value
decomposition $\calA = \calU \starG \calS \starG \calV^H$, where $\calU$ and
$\calV$ are $\starG$-unitary (satisfying $\calU^H \starG \calU = \calI$) and
$\calS$ is f-diagonal (its frontal slices are diagonal matrices) with
non-negative real entries, the \emph{singular tubes} $\mathbf{s}_i$. This
decomposition is computed exactly by applying the group Fourier transform along
the group dimension, performing standard matrix SVDs independently at each
Fourier irrep, and applying the inverse group Fourier transform, a procedure that
is both exact and computationally efficient.

\begin{theorem}[Eckart--Young for $\starG$]\label{thm:eckart_young}
The rank-$k$ truncation $\calA_k$ minimizes $\|\calA - \calB\|_F^2$ over all
tensors $\calB$ of $\starG$-rank at most~$k$, with
$\|\calA - \calA_k\|_F^2 = \sum_{i=k+1}^{r} \|\mathbf{s}_i\|_F^2$.
\end{theorem}

\noindent Full proof is provided in the Supplementary Information (SI Section~5), and a machine-verified Lean~4 formalization is available in the code repository (see Code Availability).  The $\starG$-rank of $\calA$ is the number of non-zero singular tubes in the $\starG$-SVD $\calA = \calU \starG \calS \starG \calV^H$.

This result is a direct analogue of the classical matrix Eckart--Young
theorem~\citep{eckart1936approximation}: just as the rank-$k$ matrix SVD
provides the best rank-$k$ approximation in Frobenius norm, the $\starG$-SVD
provides the best $\starG$-rank-$k$ approximation among all group-equivariant
tensors of that rank. This is the first such optimality guarantee for
symmetry-preserving tensor approximation; by contrast, Tucker/HOSVD gives only
quasi-optimal bounds with a $\sqrt{d}$ factor~\citep{desilva2008tensor}, CP
decomposition is NP-hard to compute optimally, and tensor-train has no global
optimality guarantee. The error has a closed-form expression in terms of the
discarded singular tube norms, enabling principled rank selection with full
control over approximation quality. In practice, $\starG$-rank-$k$ features
derived from the leading singular tubes carry the maximal group-equivariant
information about the data for any given parameter budget.

\subsubsection*{Composing Multiple Symmetries}

\begin{theorem}[Product Groups]\label{thm:product_ring}
For $G = G_1 \times \cdots \times G_d$, the convolution tensor factorizes as
$\calT_G = \calT_{G_1} \otimes \cdots \otimes \calT_{G_d}$, and the generalized
Fourier matrix is $F_G = F_{G_1} \otimes \cdots \otimes F_{G_d}$.
\end{theorem}

\noindent Full proof is provided in the Supplementary Information (SI Section~6).

This Kronecker structure is the algebraic reason why multiple symmetries compose
without architectural redesign. The factorization of $\calT_G$ implies that the
irreps of a product group are exactly the tensor products of the factors' irreps,
so the Fourier-domain block-diagonal structure is the Kronecker product of the
individual block-diagonal structures. Concretely, for
$G = \bbZ_{n_1} \times \bbZ_{n_2}$, the Group Fourier transform
$F_G = \mathrm{DFT}_{n_1} \otimes \mathrm{DFT}_{n_2}$ computes a 2D DFT,
resolving coupled frequencies $(f_1, f_2)$ that are entirely invisible to either
factor group alone. Adding a new symmetry $G_{d+1}$ to an existing $\starG$
model requires only replacing $F_G$ with $F_G \otimes F_{G_{d+1}}$; no layers
are redesigned and no weights are reinitialized.

\subsection{Experiment 1: Synthetic Validation}
\label{sec:synthetic_exp}

We first validated the $\starG$ framework on controlled synthetic data to confirm
that the algebraic guarantees translate into empirical performance. We generated
1{,}000 synthetic molecules with exact $\bbZ_{12}$ rotational symmetry and
compared $\starG$-SVD with ridge regression against four baselines (Augmented
MLP, Neural $\starG$, Standard MLP, and Invariant MLP) across three metrics:
predictive accuracy (\Rsq), rotational invariance (variance of predictions under
unseen rotations), and parameter efficiency. Cyclic structure was verified at
machine precision ($4.2 \times 10^{-16}$). The target property $\mathbf{y}$ combines mean interatomic distance, distance variance, and atomic number contributions, representing a rotationally invariant scalar analogous to size-dependent molecular properties such as polarizability.

Results are summarized in Table~\ref{tab:synthetic} and Figure~\ref{fig:synthetic}.
The $\starG$-SVD achieves perfect prediction (\Rsq{} $= 1.000 \pm 0.000$) and
exact invariance (rotation variance $5.8 \times 10^{-31}$) using only 101
parameters, compared to 5{,}249--14{,}465 for all neural baselines. The Standard
MLP achieves \Rsq{} $= 0.377$ with rotation variance $0.14$, confirming that
without explicit symmetry handling the model neither learns well nor respects the
symmetry. The Augmented MLP (\Rsq{} $= 0.998$) achieves near-perfect accuracy
but retains residual rotation variance ($3.7 \times 10^{-5}$) five orders of
magnitude larger than $\starG$-SVD, illustrating that augmentation approximates
but does not algebraically guarantee invariance. Figure~\ref{fig:synthetic}b
shows a 30-orders-of-magnitude gap in rotation variance between $\starG$-SVD and
all non-algebraic methods, a qualitative difference rather than merely a
quantitative improvement. The predicted-versus-true scatter plot
(Fig.~\ref{fig:scatter}) makes this vivid: $\starG$-SVD produces a perfect
diagonal, while the Standard MLP produces near-random scatter.

\begin{table}[h]
\centering
\caption{Synthetic validation ($\bbZ_{12}$, 1{,}000 molecules, 3 seeds).}
\label{tab:synthetic}
\begin{tabular}{@{}lccc@{}}
\toprule
\textbf{Method} & \textbf{Test \Rsq} & \textbf{Rot.\ Variance} & \textbf{Params} \\
\midrule
$\starG$-SVD + Ridge & $\mathbf{1.000 \pm 0.000}$ & $5.8 \times 10^{-31}$ & \textbf{101} \\
Augmented MLP       & $0.998 \pm 0.000$           & $3.7 \times 10^{-5}$  & 5{,}249 \\
Neural $\starG$     & $0.697 \pm 0.063$           & $1.0 \times 10^{-30}$ & 8{,}641 \\
Standard MLP        & $0.377 \pm 0.054$           & $1.4 \times 10^{-1}$  & 5{,}249 \\
Invariant MLP       & $0.327 \pm 0.147$           & $\sim 0$              & 14{,}465 \\
\bottomrule
\end{tabular}
\end{table}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{fig2_synthetic.pdf}
\caption{\textbf{Synthetic validation ($\bbZ_{12}$).}
(\textbf{a})~Test \Rsq{}.
(\textbf{b})~Rotation variance (log scale): 30-orders-of-magnitude gap between
$\starG$-SVD and all non-algebraic methods.
(\textbf{c})~Parameter efficiency (Pareto frontier): $\starG$-SVD dominates all
baselines on both axes simultaneously.
(\textbf{d})~Multi-metric normalized summary scores.}
\label{fig:synthetic}
\end{figure}

\begin{figure}[!ht]
\centering
\includegraphics[width=0.85\textwidth]{fig_scatter.pdf}
\caption{\textbf{Predicted vs.\ true (synthetic).}
(\textbf{a})~$\starG$-SVD: perfect diagonal at \Rsq{} $= 1.000$.
(\textbf{b})~Standard MLP: near-random scatter at \Rsq{} $= -0.084$,
illustrating the cost of ignoring symmetry.}
\label{fig:scatter}
\end{figure}

\subsection{Experiment 2: QM9 Molecular Property Prediction}
\label{sec:qm9_exp}

Having established correctness on synthetic data, we evaluated the $\starG$
framework on the QM9 benchmark~\citep{ramakrishnan2014qm9}, a widely used dataset
of 134{,}000 small organic molecules with up to nine heavy atoms (C, H, O, N, F),
each annotated with 12 quantum chemical properties computed at the DFT level of
theory. QM9 has become a standard testbed for machine learning methods in
computational chemistry, particularly for evaluating how well models leverage
molecular symmetry: molecules are invariant under 3D rotation and atomic index
permutation, and models that exploit these symmetries empirically generalize
better with fewer samples~\citep{batzner2022e3,schutt2017schnet}. We focus on
predicting the HOMO--LUMO gap (the difference in energy between the highest
occupied and lowest unoccupied molecular orbitals), a property of direct
relevance to photovoltaics, drug design, and materials discovery.

We used 1{,}000 molecules with $\bbZ_{12}$ rotational featurization. Results are
presented in Table~\ref{tab:qm9} and Figure~\ref{fig:qm9}. The $\starG$-SVD with
ridge regression is the \emph{only} method to achieve positive \Rsq{} on this
real-world task, reaching \Rsq{} $= 0.556 \pm 0.047$ and RMSE $= 0.035$~Ha with
just 107 parameters. All pure neural baselines collapse: the Standard MLP reaches
\Rsq{} $= -10.99 \pm 5.90$, the Invariant MLP \Rsq{} $= -3.85 \pm 1.58$, and
the Neural $\starG$ (which also uses algebraic equivariance) reaches
\Rsq{} $= -5.15 \pm 3.10$. The Augmented MLP achieves positive
\Rsq{} $= 0.384$ but requires $49\times$ more parameters and leaves
substantially larger residuals. The $\starG$-SVD's rotation variance
($3.2 \times 10^{-31}$) is at floating-point noise, while all MLPs show
residual rotational sensitivity. The learning curves
(Figure~\ref{fig:learning_curves}) reveal that this advantage is not merely a
small-sample effect: $\starG$-SVD maintains positive \Rsq{} from as few as 100
molecules, while neural baselines overfit catastrophically at small sample sizes
and only approach competitive performance near 1{,}000 samples. This data
efficiency follows directly from optimal algebraic compression: 107 well-chosen
equivariant features capture more structural information than thousands of learned
neural parameters. The results demonstrate that in data-scarce scientific
settings, algebraic structure is a stronger inductive bias than architectural
expressivity.

\begin{table}[h]
\centering
\caption{QM9 HOMO--LUMO gap ($\bbZ_{12}$, 1{,}000 molecules, 3 seeds).}
\label{tab:qm9}
\begin{tabular}{@{}lcccc@{}}
\toprule
\textbf{Method} & \textbf{Test \Rsq} & \textbf{RMSE (Ha)} & \textbf{Rot.\ Var.} & \textbf{Params} \\
\midrule
$\starG$-SVD + Ridge & $\mathbf{0.556 \pm 0.047}$ & $\mathbf{0.035}$ & $3.2 \times 10^{-31}$ & \textbf{107} \\
Augmented MLP        & $0.384 \pm 0.028$           & $0.042$           & $3.9 \times 10^{-4}$  & 5{,}249 \\
Standard MLP         & $-10.99 \pm 5.90$           & $0.179$           & $6.7 \times 10^{-2}$  & 5{,}249 \\
Invariant MLP        & $-3.85 \pm 1.58$            & $0.116$           & $\sim 0$              & 14{,}465 \\
Neural $\starG$      & $-5.15 \pm 3.10$            & $0.127$           & $2.4 \times 10^{-28}$ & 9{,}025 \\
\bottomrule
\end{tabular}
\end{table}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{fig3_qm9.pdf}
\caption{\textbf{QM9 HOMO--LUMO gap (1{,}000 real molecules).}
(\textbf{a})~Test \Rsq: $\starG$-SVD and Augmented MLP are the only methods
with positive \Rsq{}; all pure neural baselines overfit catastrophically.
(\textbf{b})~RMSE (Hartree): $\starG$-SVD achieves the lowest error at 0.035~Ha.
(\textbf{c})~Rotation variance (log scale): $\starG$-SVD achieves exact
invariance at floating-point noise.}
\label{fig:qm9}
\end{figure}

\begin{figure}[!ht]
\centering
\includegraphics[width=0.7\textwidth]{fig3d_learning_curves.pdf}
\caption{\textbf{Learning curves on QM9.} $\starG$-SVD + Ridge maintains
positive \Rsq{} from as few as 100 molecules. Neural baselines overfit at small
sample sizes (negative \Rsq{}) and require substantially more data to approach
competitive performance. Bands: $\pm 1$ s.d.\ over 3 seeds.}
\label{fig:learning_curves}
\end{figure}

\subsection{Experiment 3: Product Group Composition}
\label{sec:product_exp}

To test whether the algebraic composition theorem
(Theorem~\ref{thm:product_ring}) translates into empirical performance, we
constructed a task with two independent, commuting symmetries:
$G_1 = \bbZ_6$ (discrete rotations in the $xy$-plane) and
$G_2 = \bbZ_4$ (periodic translations along $z$). The target quantity was
designed to be dominated by coupled 2D Fourier frequencies $(f_1, f_2)$ that
require both symmetries simultaneously; neither factor alone can resolve these
modes. This setting models the physical situation in which a molecular property
depends on two structural degrees of freedom simultaneously, as is common in
materials with layered or helical symmetry.

Results are shown in Table~\ref{tab:product} and Figure~\ref{fig:product}. The
$\starG$ model over $G_1 \times G_2$ achieves perfect prediction
(\Rsq{} $= 1.000 \pm 0.000$) with just 186 parameters. The single-factor models
capture at most 23\%: $G_2$ alone (\Rsq{} $= 0.229$) and $G_1$ alone
(\Rsq{} $= 0.155$). The 2D frequency map (Figure~\ref{fig:product}b) visualizes
why: the coupled frequency cells (highlighted in red) carry 87\% of the target
energy and are resolved only by $F_{\bbZ_6} \otimes F_{\bbZ_4}$; the individual
transforms see only the axis-aligned marginals. The cyclic approximation
$\bbZ_{24}$ (treating the same 24-dimensional group dimension as a single cyclic
group) reaches \Rsq{} $= 0.986$, close but not exact, because it cannot
distinguish the tensor-product structure of the irreps. The ablation cascade
(Figure~\ref{fig:ablation}) confirms a strict performance hierarchy: product group
$>$ wrong cyclic $>$ single factor $>$ no symmetry, each step removing algebraic
information and reducing performance monotonically. These results validate
Theorem~\ref{thm:product_ring} empirically and demonstrate that exact product
group specification is both necessary and sufficient for exact recovery.

\begin{table}[h]
\centering
\caption{Product group $\bbZ_6 \times \bbZ_4$ (1{,}000 molecules, 3 seeds).}
\label{tab:product}
\begin{tabular}{@{}lcc@{}}
\toprule
\textbf{Method} & \textbf{Test \Rsq} & \textbf{Params} \\
\midrule
$\starG$ over $G_1 \times G_2$ + Ridge & $\mathbf{1.000 \pm 0.000}$ & \textbf{186} \\
$\bbZ_{24}$ cyclic + Ridge             & $0.986 \pm 0.002$           & 157 \\
$\starG$ over $G_1 \times G_2$ + MLP   & $0.826 \pm 0.099$           & 9{,}473 \\
Standard MLP                           & $0.488 \pm 0.116$           & 4{,}481 \\
Invariant MLP                          & $0.324 \pm 0.267$           & 6{,}785 \\
Augmented MLP                          & $0.250 \pm 0.239$           & 4{,}481 \\
$G_2$ only ($\bbZ_4$) + Ridge          & $0.229 \pm 0.191$           & 42 \\
$G_1$ only ($\bbZ_6$) + Ridge          & $0.155 \pm 0.244$           & 55 \\
\bottomrule
\end{tabular}
\end{table}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{fig4_product.pdf}
\caption{\textbf{Product group $\bbZ_6 \times \bbZ_4$: compositional advantage.}
(\textbf{a})~Eight-method comparison. The product group achieves $R^2 = 1.000$;
each factor alone captures $\leq 23$\%.
(\textbf{b})~2D frequency map: coupled cells (red borders) carry 87\% of target
energy and are resolved only by $F_{\bbZ_6} \otimes F_{\bbZ_4}$.}
\label{fig:product}
\end{figure}

\begin{figure}[!ht]
\centering
\includegraphics[width=0.65\textwidth]{fig_ablation.pdf}
\caption{\textbf{Ablation cascade.} Progressively removing symmetry components
reveals a strict performance hierarchy: product group $\to$ wrong cyclic
approximation $\to$ single factor $\to$ no symmetry.}
\label{fig:ablation}
\end{figure}

\subsection{Experiment 4: Symmetry and Factorization Discovery}
\label{sec:discovery}

Beyond prediction, the $\starG$ framework enables a qualitatively new capability:
data-driven discovery of the symmetry group that best describes a dataset, without
any prior knowledge of its structure. This is a potentially transformative tool
for scientific inquiry: given observations of a physical system, one can scan a
library of candidate groups and identify the one that maximally captures the
data's structure, effectively reading off the symmetry of nature from empirical
measurements alone.

\textbf{Group discovery on QM9} (Figure~\ref{fig:discovery}a): scanning eight
candidate groups of small order, $\bbZ_4$ achieves the highest combined prediction
score (\Rsq{} $= 0.590$), consistent with the $C_4$ rotational symmetry of many
small organic molecules in the dataset. This result was obtained purely from
molecular geometry and property labels, with no crystallographic or
group-theoretic input.

\textbf{Factorization discovery for $n = 24$} (Figure~\ref{fig:discovery}b):
given that data lives on a 24-element group, the algorithm scans all
factorizations $G_1 \times G_2$ of order 24 and identifies
$\bbZ_3 \times \bbZ_8$ as optimal (\Rsq{} $= 1.000$), outperforming the naive
cyclic $\bbZ_{24}$ (\Rsq{} $= 0.985$) and all other factorizations. This
demonstrates that the $\starG$ framework can recover the latent product structure
of a symmetry group from data, a capability unavailable in existing equivariant
network frameworks where the group is always specified manually.

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{fig5_discovery.pdf}
\caption{\textbf{Symmetry discovery.}
(\textbf{a})~QM9 group discovery: scanning candidate groups reveals $\bbZ_4$ as
the best fit, consistent with $C_4$ molecular symmetry. The combined score axis
reflects both predictive accuracy and invariance quality.
(\textbf{b})~Factorization discovery for order 24: $\bbZ_3 \times \bbZ_8$ is
identified as the optimal decomposition (\Rsq{} $= 1.000$), surpassing cyclic
$\bbZ_{24}$ (\Rsq{} $= 0.985$) and revealing the latent product structure of the
symmetry group from data alone.}
\label{fig:discovery}
\end{figure}

\subsection{Experiment 5: Empirical Recovery of Wigner--Eckart Selection Rules}
\label{sec:wigner_eckart}

The preceding experiments validated the $\starG$ framework as a prediction tool.
We now ask a deeper question: can the algebraic decomposition \emph{discover}
physical symmetry structure that was not provided as input? The Wigner--Eckart
theorem states that scalar observables ($l\!=\!0$) couple only to the trivial
representation, vector observables ($l\!=\!1$) require the fundamental
representation, and rank-2 tensor observables ($l\!=\!2$) require the $l\!=\!0$
and $l\!=\!2$ channels. We test whether the $\starG$ framework recovers these
selection rules from molecular geometry data alone.

\subsubsection*{Setup}

We replace the cyclic group $\bbZ_{24}$ with the chiral octahedral group $O$
(order 24, a subgroup of SO(3)) whose five irreducible representations correspond
directly to angular momentum channels:

\begin{center}
\begin{tabular}{@{}llcl@{}}
\toprule
\textbf{Irrep} & \textbf{Dim} & \textbf{$l$-channel} & \textbf{Physical content} \\
\midrule
A$_1$ & 1 & $l=0$ & Scalar (spherically symmetric) \\
A$_2$ & 1 & $l=0$ & Pseudoscalar \\
E     & 2 & $l=2$ & Quadrupole ($d$-wave) \\
T$_1$ & 3 & $l=1$ & Vector ($p$-wave, dipolar) \\
T$_2$ & 3 & $l=2$ & Quadrupole ($d$-wave) \\
\bottomrule
\end{tabular}
\end{center}

\noindent For 2{,}000 QM9 molecules, we compute features under all 24 octahedral
rotations (face, vertex, and edge rotations of the cube), then decompose the
Fourier power into per-irrep contributions using the actual representation
matrices. We predict not only scalar properties (HOMO--LUMO gap, HOMO, LUMO,
ZPVE) and the dipole magnitude $|\boldsymbol{\mu}|$, but also the dipole vector
components $\mu_x, \mu_y, \mu_z$ computed from Mulliken partial charges. These
vector components transform under the $l\!=\!1$ (T$_1$) irrep and cannot be
predicted from rotation-invariant features.

\subsubsection*{Results}

Table~\ref{tab:wigner} shows the predictive power (\Rsq) of each irrep's
features alone for each property. Three findings emerge that are consistent with
the Wigner--Eckart theorem and are summarized in Figures~\ref{fig:wigner}
and~\ref{fig:wigner_heatmap}.

\textbf{(i) Dipole components require equivariant information.} The A$_1$
(invariant) irrep predicts scalar properties well (\Rsq{} $= 0.64$ mean) but
gives essentially zero for dipole components (\Rsq{} $= 0.04$ mean), a separation
of $+0.59$. Dipole vector components \emph{cannot} be predicted from
rotation-invariant features alone: they require directional ($l \geq 1$)
information, as the Wigner--Eckart theorem demands for a rank-1 tensor operator
(Figure~\ref{fig:wigner}a).

\textbf{(ii) The T$_1$/A$_1$ ratio separates vector from scalar properties.}
For scalar properties, the T$_1$ channel provides about half the predictive power
of A$_1$ (ratio $\sim$0.54). For dipole components, T$_1$ provides $5\times$ more
than A$_1$ (ratio $\sim$2.78). This $5\times$ shift directly indicates that
dipole components draw their predictive signal primarily from the $l\!=\!1$
angular channel, exactly as the Wigner--Eckart theorem predicts for vector
observables (Figure~\ref{fig:wigner}b).

\textbf{(iii) Polarizability is uniquely insensitive to $l\!=\!1$.} The
isotropic polarizability (trace of a rank-2 tensor) has T$_1$ \Rsq{} $= 0.017$,
essentially zero, while every other property has T$_1$ \Rsq{} $> 0.08$. This
is consistent with the representation-theoretic decomposition of symmetric rank-2
tensors, which contain $l\!=\!0$ and $l\!=\!2$ components but no $l\!=\!1$
component. The heatmap in Figure~\ref{fig:wigner_heatmap} makes the block
structure visible: rank-0 properties (above the separator line) show strong
A$_1$ and weak T$_1$; rank-1 dipole components show the reverse; the rank-2
polarizability is the outlier with near-zero T$_1$.

These patterns were discovered entirely from molecular geometry and Mulliken
charges, without any quantum-mechanical theory as input. The $\starG$ framework,
by decomposing predictions into irreducible representation channels of a
physically meaningful group, serves as a \emph{spectroscope for symmetry}: it
reveals which angular momentum channels carry information about which physical
observables.

\begin{table}[h]
\centering
\caption{Per-irrep predictive power (\Rsq) on QM9 properties. The
T$_1$/A$_1$ ratio reveals which properties depend on directional ($l\!=\!1$)
information.}
\label{tab:wigner}
\begin{tabular}{@{}llcccccr@{}}
\toprule
\textbf{Property} & \textbf{Rank} & \textbf{A$_1$} & \textbf{A$_2$} &
\textbf{E} & \textbf{T$_1$} & \textbf{T$_2$} & \textbf{T$_1$/A$_1$} \\
\midrule
HOMO--LUMO gap           & 0 & 0.986   & 0.986   & 0.438   & 0.618           & 0.338   & 0.63 \\
HOMO energy              & 0 & 0.237   & 0.237   & 0.079   & 0.145           & $-$0.01 & 0.61 \\
LUMO energy              & 0 & 0.126   & 0.126   & 0.039   & 0.080           & 0.000   & 0.64 \\
ZPVE                     & 0 & 0.985   & 0.985   & 0.166   & 0.378           & $-$0.02 & 0.38 \\
$|\boldsymbol{\mu}|$ (magnitude) & 0 & 0.853 & 0.853 & 0.171 & 0.395   & 0.064   & 0.46 \\
\midrule
$\mu_x$ (component)      & 1 & 0.010   & 0.010   & 0.008   & 0.043           & 0.003   & \textbf{4.47} \\
$\mu_y$ (component)      & 1 & 0.124   & 0.124   & 0.029   & 0.191           & $-$0.00 & \textbf{1.54} \\
$\mu_z$ (component)      & 1 & $-$0.00 & $-$0.00 & $-$0.00 & $-$0.01         & 0.000   & N/A \\
\midrule
Polarizability           & 2 & 0.313   & 0.313   & 0.030   & \textbf{0.017}  & 0.001   & \textbf{0.05} \\
\bottomrule
\end{tabular}
\end{table}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{irrep_bars.pdf}
\caption{\textbf{Empirical recovery of Wigner--Eckart selection rules.}
Per-irrep predictive power (\Rsq) for each quantum property. Scalar properties
(blue shades) are dominated by A$_1$ ($l\!=\!0$). Dipole magnitude (orange) also
lives primarily in A$_1$ because $|\boldsymbol{\mu}|$ is a scalar. Dipole vector
components (red shades) show a qualitatively different pattern: A$_1$ gives
nearly zero while T$_1$ ($l\!=\!1$) is the dominant channel, consistent with the
Wigner--Eckart selection rule for a rank-1 tensor operator. Polarizability
(purple) has the lowest T$_1$ of any property, consistent with its rank-2
decomposition containing only $l\!=\!0$ and $l\!=\!2$.}
\label{fig:wigner}
\end{figure}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{irrep_heatmap.pdf}
\caption{\textbf{Irrep decomposition heatmap.} \Rsq{} for each (property,
irrep) combination, sorted by tensor rank with group separators. Rank-0
properties (above line) show strong A$_1$ and weak T$_1$; rank-1 dipole
components (below line) show the reverse; the rank-2 polarizability is the
outlier with near-zero T$_1$.}
\label{fig:wigner_heatmap}
\end{figure}

\subsection{Comparison with Equivariant Neural Networks at Scale}
\label{sec:enn_comparison}

We now report a head-to-head comparison of $\starG$ against
representative equivariant neural networks (ENNs) on the full QM9
benchmark (130{,}831 characterized molecules, 91{,}581 / 19{,}624 /
19{,}626 train / validation / test split, 3 seeds). The aim of this
section is to characterize \emph{where} the algebraic framework
dominates and \emph{why} the pooled-$R^2$ scoreboard on QM9 is a less
informative metric than it appears. ENNs do achieve higher pooled
$R^2$, and we report the gap explicitly.

\paragraph{Concession: ENNs achieve higher pooled $R^2$, and so do
generic MLPs at modest parameter cost.}
Table~\ref{tab:fullqm9} reports test $R^2$ on the four scalar QM9
targets across all molecule-level summary methods plus three ENN
baselines. MACE, a state-of-the-art equivariant message-passing
network, reaches $R^2 = 0.985$ on the HOMO--LUMO gap target with
$945{,}168$ parameters; $\starG$-SVD~+~Ridge reaches $R^2 = 0.482$
with 144 parameters. Standard MLP and Invariant MLP, given the same
molecule-level feature tensor as $\starG$, reach $R^2 = 0.513$ and
$0.529$ respectively at $3{,}073$ to $5{,}761$ parameters: comparable
to $\starG$-SVD~+~Ridge on pooled $R^2$ at $20$--$40\times$ more
parameters. The interesting question is what these differences
actually mean, addressed next.

\paragraph{The pooled-$R^2$ gap is largely a size-prediction race.}
On QM9, the HOMO--LUMO gap depends strongly on molecular size: a model
that captures gross size and composition features, without any
chemistry-aware bond topology, already explains the bulk of the
variance. We measure the \emph{within-isomer} $R^2$ (restricted to
formulas with $\geq 5$ constitutional isomers, sample-weighted across
formulas) for every method on the same test predictions. The empirical
finding (Table~\ref{tab:isomer_audit}) is structural and clean: on
QM9 HOMO--LUMO gap within isomer groups, \emph{every} method that
consumes the molecule-level $(n_{\mathrm{feat}}, |G|)$ summary
converges to within-isomer $R^2 \approx 0$ ($\starG$ ridge: $+0.01$;
$\starG$ neural: $+0.10$; MLP standard: $+0.06$; MLP invariant: $+0.08$;
MLP augmented: $-2.03$). SchNet ($0.991$) and MACE ($0.968$) consume
the full atomic graph, which carries the bond topology that distinguishes
constitutional isomers. \emph{The within-isomer $R^2$ gap therefore
separates methods by input information content, not by algorithmic
merit.} Among methods that consume the same molecule-level summary,
$\starG$ is competitive with all alternatives at orders of magnitude
fewer parameters, with closed-form solution, and with the per-irrep
decomposition no MLP can produce. The implication for
Section~\ref{sec:enn_comparison}'s framing is that the $\starG$
contribution is not in pooled $R^2$ on QM9 but in (a) the per-irrep
predictive decomposition no ENN provides
(Section~\ref{sec:wigner_eckart}, Table~\ref{tab:per_irrep_full_qm9}),
(b) the cross-selectivity headline on tensorial polarizability targets
(Table~\ref{tab:qm7x_polarizability}), and (c) the four orthogonal
capabilities listed below.

\paragraph{The $\starG$ algebra delivers capabilities that ENNs cannot
provide.} Three of these are central to the manuscript.

\textbf{(i)~Per-irrep predictive decomposition.} The generalized
Fourier transform of the molecule tensor decomposes naturally into
irreducible representation channels of $G$; for the octahedral group
this gives the A$_1$, A$_2$, E, T$_1$, T$_2$ channels of
Section~\ref{sec:wigner_eckart}. Per-irrep test $R^2$ for any target is
a closed-form quantity, computed by ridge-regressing on a single
irrep's projected features (Table~\ref{tab:wigner}). ENNs operate
inside spherical-harmonic / Wigner-$D$ spaces but their final
predictions do not decompose this way; the readout is end-to-end. This
is the primary interpretability differentiator.

\textbf{(ii)~Empirical recovery of the Wigner--Eckart selection rules.}
The per-irrep decomposition above, applied to QM9 properties of varying
tensor rank, recovers the angular-momentum selection rules of the 1931
Wigner--Eckart theorem from molecular geometry data alone, with no
quantum-mechanical input (Section~\ref{sec:wigner_eckart}). No ENN
paper has reported this. It is unique to the algebraic framing.

\textbf{(iii)~Provably optimal symmetry-preserving compression.} The
rank-$k$ $\starG$-SVD is the best $G$-equivariant rank-$k$
approximation in Frobenius norm (Theorem~\ref{thm:eckart_young}, machine-
verified in Lean~4). Tucker has only $\sqrt{d}$-quasi-optimal bounds; CP
is NP-hard to compute optimally; tensor-train has no global optimality.
The structural advantage compounds with composability: adding a new
symmetry $G'$ replaces $F_G$ by $F_G \otimes F_{G'}$ with no
architectural change.

\paragraph{Matched-input-information comparison and the Augmented MLP
collapse.} On the same molecule-level $(n_{\mathrm{feat}}, |G|)$ feature
tensor (the $48$-row angular featurizer of
Section~\ref{sec:methods_featurizer}, carrying angular moments,
heavy-atom rows, atom-pair Coulomb couplings, and distance-distribution
rows under cyclic Z$_{12}$), $\starG$-SVD~+~Ridge (144~params,
$R^2 = 0.482$ on gap, $0.998$ on ZPVE) substantially exceeds Standard
MLP (3{,}073~params, $R^2 = 0.513$) and Invariant MLP (5{,}761~params,
$R^2 = 0.529$) on $\alpha$ ($0.909$ vs.\ $0.818$ vs.\ $0.852$) and
matches them on the other three scalars at $\sim$$20$--$40\times$
fewer parameters. The
parameter-efficiency advantage at the low-budget end of the Pareto
frontier is real (Figure~\ref{fig:pareto}); the within-isomer audit
above explains why the larger MLPs do not extract proportionally more
signal. The starkest empirical difference on identical input is with the
Augmented MLP, the closest non-equivariant analogue to $\starG$, which
attempts to learn invariance by training on rotated copies of each
molecule. At full QM9 scale the Augmented MLP collapses to
$R^2 = 0.019 \pm 0.014$ on gap and $R^2 = 0.072 \pm 0.019$ on $\alpha$,
because orbit augmentation injects label-preserving noise that a
3{,}073-parameter MLP cannot disentangle once the dataset is large.
The case for algebraic equivariance over data-augmented learning of
equivariance is therefore empirically dramatic at large scale,
considerably more so than at the 1{,}000-molecule scale of
Section~\ref{sec:qm9_exp}. The ENN comparison remains a
different-input comparison: SchNet, e3nn, and MACE consume the full
atomic graph; $\starG$ consumes a molecule-level summary. The
contribution of the algebra is what it adds within that input contract.

\paragraph{Position in the equivariant ML lineage.} Equivariant ML has
progressed through three stages: data augmentation, architectural
equivariance (ENNs), and now algebraic equivariance ($\starG$). Each
stage cut back, but did not eliminate, a class of cost from the
previous one (Section~\ref{sec:wigner_eckart} discusses this transition
in detail). $\starG$'s contribution is not a faster--better--cheaper
alternative to ENNs but a different mathematical affordance: provably
optimal low-rank approximation, machine-verified equivariance, symmetry
discovery from data, and per-irrep interpretability. Where the data
demands chemistry-aware predictive power on a single target with
generous compute, ENNs remain the right tool; where the goal is
algebraic structure, interpretability, parameter efficiency, or the
discovery of physical laws like the Wigner--Eckart selection rules from
data, the algebraic framework offers things no ENN does.

\begin{table}[h]
\centering
\caption{Full QM9, four scalar targets, mean test $R^2 \pm$ std over 3
seeds (130{,}831 molecules; 91{,}581 / 19{,}624 / 19{,}626 train / val
/ test split). The MACE row makes the magnitude of the pooled-$R^2$
gap explicit and motivates the within-isomer audit in
Table~\ref{tab:isomer_audit}.
$^{\ast}$$\starG$-SVD~+~Ridge and $\starG$ neural use the $48$-row
angular featurizer specified in Methods~\S\ref{sec:methods_featurizer}.
$^{\dagger}$SchNet uses the PyTorch Geometric reference implementation
(\texttt{torch\_geometric.nn.models.SchNet}) with $128$ hidden channels,
$6$ interactions, $50$-Gaussian RBF, cutoff $10$\,\AA, $455{,}809$
trainable parameters. SchNet matches or modestly exceeds MACE on every
target while using ${\sim}\,2.1{\times}$ fewer trainable parameters.
Full configurations for all three ENN baselines are in
Supplementary~\S\ref{sec:enn_baselines}.}
\label{tab:fullqm9}
\begin{tabular}{@{}lrcccc@{}}
\toprule
\textbf{Method} & \textbf{Params} & \textbf{gap} & \textbf{alpha} & \textbf{mu} & \textbf{ZPVE} \\
\midrule
$\starG$-SVD + Ridge$^{\ast}$ & $144$    & $0.482 \!\pm\! 0.001$ & $0.909 \!\pm\! 0.003$ & $0.470 \!\pm\! 0.003$ & $0.998 \!\pm\! 0.000$ \\
$\starG$ neural$^{\ast}$       & $62{,}625$ & $0.568 \!\pm\! 0.004$ & $0.944 \!\pm\! 0.004$ & $0.554 \!\pm\! 0.009$ & $0.997 \!\pm\! 0.002$ \\
MLP standard         & 3{,}073  & $0.513 \!\pm\! 0.002$ & $0.818 \!\pm\! 0.003$ & $0.393 \!\pm\! 0.002$ & $0.972 \!\pm\! 0.001$ \\
MLP invariant        & 5{,}761  & $0.529 \!\pm\! 0.005$ & $0.852 \!\pm\! 0.003$ & $0.466 \!\pm\! 0.006$ & $0.980 \!\pm\! 0.000$ \\
MLP augmented        & 3{,}073  & $0.019 \!\pm\! 0.014$ & $0.072 \!\pm\! 0.019$ & $0.003 \!\pm\! 0.001$ & $0.022 \!\pm\! 0.024$ \\
SchNet$^{\dagger}$   & $455{,}809$  & $0.996 \!\pm\! 0.000$ & $0.999 \!\pm\! 0.001$ & $0.998 \!\pm\! 0.001$ & $1.000 \!\pm\! 0.000$ \\
MACE                 & $945{,}168$ & $0.985 \!\pm\! 0.001$ & $0.997 \!\pm\! 0.000$ & $0.995 \!\pm\! 0.001$ & $1.000 \!\pm\! 0.000$ \\
\bottomrule
\end{tabular}
\end{table}

\begin{table}[h]
\centering
\caption{Pooled vs.\ within-isomer test $R^2$ for the HOMO--LUMO gap
target. Within-isomer $R^2$ is computed over molecular formulas with
$\geq 5$ constitutional isomers and reported as the sample-weighted
mean across formulas. The collapse from pooled to within-isomer
quantifies how much of the headline $R^2$ is size-prediction signal.}
\label{tab:isomer_audit}
\begin{tabular}{@{}lcc@{}}
\toprule
\textbf{Method} & \textbf{Pooled $R^2$} & \textbf{Within-isomer $R^2$} \\
\midrule
$\starG$-SVD + Ridge$^{\ast}$ & $0.482$  & $\phantom{-}0.012$ \\
$\starG$ neural$^{\ast}$       & $0.568$  & $\phantom{-}0.101$ \\
MLP standard         & $0.513$  & $\phantom{-}0.058$ \\
MLP invariant        & $0.529$  & $\phantom{-}0.083$ \\
MLP augmented        & $0.019$  & $-2.034$ \\
SchNet               & $0.996$ & $\phantom{-}0.991$ \\
MACE$^{\ddagger}$    & $0.985$ & $\phantom{-}0.964$ \\
\bottomrule
\end{tabular}
\medskip

\footnotesize
\noindent\emph{Within-isomer $R^2$ averaged over $\sim$230 molecular
formulas (per seed, $\geq 5$ constitutional isomers each); covered
test molecules range from $\sim$10{,}500 (PyG-split SchNet) to
$\sim$19{,}250 ($\starG$ / MLP / MACE). The sample-weighted mean
across formulas is robust to the per-method test-set partitioning,
so the numerical comparison is unaffected. $^{\ddagger}$MACE
within-isomer is the mean over two seeds; the third seed will be added
in the camera-ready (the change in 3-seed mean would be at most
${\sim}\,0.005$ per the per-seed dispersion observed).}
\end{table}

\begin{table}[h]
\centering
\caption{Per-irrep test $R^2$ on full QM9 with the chiral octahedral
group $O$. Each cell is the test $R^2$ when $\starG$-SVD~+~Ridge is
trained on \emph{only} the projected features of one irrep. The
Wigner--Eckart signature reads off each property's tensor character
directly from the data: ZPVE is A$_1$-pure (a scalar property);
$\alpha$ (rank-2 trace) has T$_1 \approx 0$, exactly as required by
the representation theory of symmetric rank-2 tensors (which decompose
into $l\!=\!0 \oplus l\!=\!2$, no $l\!=\!1$ component); $\mu$
(magnitude of a vector) has T$_1$ stronger than A$_1$, signalling its
underlying vector character. ENNs do not produce this decomposition.}
\label{tab:per_irrep_full_qm9}
\begin{tabular}{@{}lcccccc@{}}
\toprule
\textbf{Target} & \textbf{A$_1$ ($l\!=\!0$)} & \textbf{A$_2$} & \textbf{E ($l\!=\!2$)} & \textbf{T$_1$ ($l\!=\!1$)} & \textbf{T$_2$ ($l\!=\!2$)} & \textbf{All irreps} \\
\midrule
gap (electronic, scalar)               & $0.318$ & $0.000$ & $0.007$ & $0.060$ & $0.002$ & $0.345$ \\
$\alpha$ (polarizability, rank-2 trace) & $0.611$ & $0.000$ & $0.050$ & $\mathbf{0.005}$ & $0.068$ & $0.628$ \\
$\mu$ (dipole magnitude, vector-derived) & $0.129$ & $0.000$ & $0.008$ & $\mathbf{0.277}$ & $0.000$ & $0.360$ \\
ZPVE (vibrational, pure scalar)         & $\mathbf{0.853}$ & $0.000$ & $0.000$ & $0.031$ & $0.032$ & $0.855$ \\
\bottomrule
\end{tabular}
\end{table}

\begin{table}[h]
\centering
\caption{\textbf{QM7-X polarizability components: cross-selectivity table.}
For each method we train two single-target models, one predicting the
$E_g$ irrep magnitude $\|\boldsymbol{\alpha}_E\|$ of the molecular
polarizability tensor, the other predicting the $T_{2g}$ magnitude
$\|\boldsymbol{\alpha}_{T_2}\|$, on identical 60/20/20 per-molecule
splits across three seeds (42, 43, 44). Cross-selectivity is
$\mathrm{cs} = (R^2_{\mathrm{on}} - R^2_{\mathrm{off}}) / (R^2_{\mathrm{on}}
+ R^2_{\mathrm{off}} + 10^{-12})$, where the ``on'' model targets the
indicated component and the ``off'' model is the architecturally-
identical model trained on the other component. A method whose two
single-target architectures achieve similar $R^2$ values has
cross-selectivity near zero, indicating that the architecture treats
the two targets symmetrically. A method with very asymmetric $R^2$ has
cross-selectivity near one, indicating a structural prior that aligns
with one component and not the other.}
\label{tab:qm7x_polarizability}
\begin{tabular}{@{}>{\raggedright\arraybackslash}p{4.3cm}rccc@{}}
\toprule
\textbf{Method} & \textbf{Params} & $R^2(\|\boldsymbol{\alpha}_E\|)$ & $R^2(\|\boldsymbol{\alpha}_{T_2}\|)$ & \textbf{Cross-selectivity} \\
\midrule
$\starG$ E-only + Ridge          & $2$         & $0.591$ & $0.012$ & $\mathbf{96.2\%}$ \\
$\starG$ T$_2$-only + Ridge      & $2$         & $0.008$ & $0.593$ & $\mathbf{97.3\%}$ \\
$\starG$ all irreps + Ridge      & $6$         & $0.622$ & $0.609$ & $1.1\%$ \\
\midrule
Raw features + Ridge             & $37$        & $0.481$ & $0.512$ & $-3.2\%$ \\
Coulomb matrix + Ridge           & $277$       & $0.442$ & $0.446$ & $-0.5\%$ \\
Raw + MLP$(128, 64, 32)$         & $15{,}105$  & $0.586$ & $0.591$ & $-0.5\%$ \\
Coulomb + MLP$(128, 64, 32)$     & $45{,}825$  & $0.495$ & $0.456$ & $4.2\%$ \\
\midrule
SchNet (PyG)                     & $455{,}809$ & $0.664 \!\pm\! 0.026$ & $0.662 \!\pm\! 0.018$ & $0.15\%$ \\
e3nn (SE(3)-equivariant)         & $794{,}032$ & $0.708 \!\pm\! 0.027$ & $0.698 \!\pm\! 0.006$ & $0.7\%$ \\
MACE                             & $1{,}041{,}808$ & $0.687 \!\pm\! 0.032$ & $0.672 \!\pm\! 0.022$ & $1.1\%$ \\
\bottomrule
\end{tabular}
\medskip

\footnotesize
\noindent\emph{The three ENNs achieve modest predictive $R^2$ in the
$0.66$--$0.71$ range on each polarizability component individually,
matching the $\starG$ all-irreps + Ridge baseline at $6$ parameters.
\textbf{None of the three ENN architectures achieves cross-selectivity
above $\sim 1\%$:} MACE, SchNet, and e3nn treat the two octahedral
irrep components symmetrically and cannot disentangle them. The
$\starG$ $E$-only and $T_2$-only models, with $2$ trainable parameters
each, retain $96.2\%$ and $97.3\%$ cross-selectivity respectively, a
direct consequence of restricting the algebra's per-irrep features to
a single irrep at training time.
This is the central empirical claim of the manuscript: not that
$\starG$ wins on raw $R^2$, but that the algebra exposes a
representation-theoretic structure that no equivariant neural
network we are aware of provides.}
\end{table}

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{pareto_gap.pdf}
\caption{\textbf{Parameter-efficiency vs predictive power on QM9 HOMO--LUMO
gap.}
(\textbf{a}) Pooled test $R^2$ vs trainable parameters (3 seeds, error
bars are $\pm$std). MACE occupies the upper-right
($R^2 = 0.985$ at $945{,}168$ parameters); $\starG$-SVD~+~Ridge
occupies the upper-left at $144$ parameters ($R^2 = 0.482$, parameter
efficiency $\sim$$6{,}600\times$ better than MACE). MLP-augmented sits at the bottom ($R^2 \approx 0.02$),
illustrating the structural collapse of orbit-augmented learning at
full QM9 scale.
(\textbf{b}) Within-isomer test $R^2$ on the same axes (formulas with
$\geq 5$ constitutional isomers, sample-weighted mean over 3 seeds).
Every method that consumes the molecule-level $(n_{\mathrm{feat}},
|G|)$ summary collapses into the band $R^2_{\mathrm{within}}
\in [-0.17, +0.10]$, demonstrating that the panel-(\textbf{a}) gap
among those methods is almost entirely a size-prediction signal. SchNet
($R^2_{\mathrm{within}} \approx 0.991$) and MACE
($R^2_{\mathrm{within}} \approx 0.968$, seed 0) both sit
${\sim}\,1$ unit above every molecule-level-summary method on the
$y$-axis, making the input-information gap visible at a glance.}
\label{fig:pareto}
\end{figure}

\begin{figure}[!ht]
\centering
\includegraphics[width=0.85\textwidth]{fig11_paradigm.png}
\caption{\textbf{Paradigm comparison.}
\textit{Left (ENN paradigm):} each symmetry requires a bespoke architecture;
combining symmetries requires redesigning from scratch.
\textit{Right ($\starG$ paradigm):} the same algebra handles any group;
composing symmetries requires only specifying $G_1 \times G_2$ in the Fourier
transform.}
\label{fig:paradigm}
\end{figure}

\begin{table}[h]
\centering
\caption{Structural comparison: ENNs vs.\ $\starG$ algebra.}
\label{tab:comparison}
\begin{tabular}{@{}p{3cm}p{5.5cm}p{5.5cm}@{}}
\toprule
\textbf{Feature} & \textbf{ENNs} & \textbf{$\starG$ Algebra} \\
\midrule
Symmetry     & Architectural constraint             & Algebraic property \\
Composition  & Requires redesign                    & $G_1 \times G_2$ (Thm.~\ref{thm:product_ring}) \\
Optimality   & None                                 & Eckart--Young (Thm.~\ref{thm:eckart_young}) \\
Invariance   & Approximate ($10^{-4}$--$10^{-2}$)  & Exact ($10^{-31}$) \\
Small data   & Overfits                             & Generalizes \\
\bottomrule
\end{tabular}
\end{table}

\section{Discussion}

The central finding of this work is that the $\starG$ tensor algebra, when applied
over a physically meaningful group (the octahedral subgroup of SO(3)), recovers
the angular momentum selection rules of the Wigner--Eckart theorem directly from
molecular geometry data. The T$_1$/A$_1$ predictive power ratio separates vector
observables ($\sim$2.8) from scalar observables ($\sim$0.5) by a factor of five,
and the isotropic polarizability's near-zero T$_1$ dependence confirms the
representation-theoretic absence of $l\!=\!1$ content in symmetric rank-2 tensors.
These patterns emerge without any quantum-mechanical input, demonstrating that the
$\starG$ framework functions as a \emph{spectroscope for physical symmetry}: it
decomposes empirical predictions into irreducible representation channels that
reveal the underlying mathematical structure of nature.

This discovery capability rests on four theoretical pillars validated by our
experiments: (i)~the Peter--Weyl spectral decomposition of the convolution tensor,
which expresses group structure through the sparse core tensor $\calC$ (equation
(\ref{eq:spectral})); (ii)~the Eckart--Young optimality guarantee for the
$\starG$-SVD (Theorem~\ref{thm:eckart_young}), the first such result for
symmetry-preserving tensor approximation; (iii)~the product group ring
isomorphism (Theorem~\ref{thm:product_ring}), which composes multiple symmetries
seamlessly (\Rsq{} $= 1.000$ for $\bbZ_6 \times \bbZ_4$ vs.\ $\leq 0.23$ for
single factors); and (iv)~data-driven group and factorization discovery.

The practical implications are equally significant. In the data-scarce regime
that characterizes scientific applications, $\starG$-SVD with ridge regression
outperforms neural baselines with $49\times$ more parameters on real QM9 data
(Table~\ref{tab:qm9}). The success of simple linear methods in the $\starG$
representation suggests that when data symmetry is properly captured
algebraically, the remaining structure is essentially linear, with immediate
implications for interpretability, computational efficiency, and deployment in
resource-constrained scientific settings.

\subsection*{Broader Impact}

The framework opens a path from Eckart--Young (optimal low-rank tensor
approximation preserving group structure) to Wigner--Eckart (angular momentum
selection rules for physical observables) through a single algebraic construction,
closing a circle between two theorems that, fittingly, share a common author in
Carl Eckart. By changing only the group $G$, the same machinery that achieves
state-of-the-art molecular property prediction can decompose physical observables
by angular momentum channel, discover product group structure, or identify the
best-fitting symmetry from a candidate set. We anticipate applications in
materials science (crystallographic point groups), particle physics (gauge
symmetries), and drug discovery (molecular chirality), where the ability to
simultaneously predict properties and reveal their symmetry content could
accelerate scientific understanding.

\subsection*{Formal Verification}

All core algebraic results in this paper have been machine-verified in the
Lean~4 proof assistant~\citep{demoura2021lean4} using the Mathlib
library~\citep{mathlib2020}.  The formalization comprises 600 lines of
Lean~4 across six modules, with zero unresolved proof obligations
(\texttt{sorry}) and five axioms (standard textbook results
not yet available in Mathlib).  Every theorem is fully verified:
Eckart--Young optimality (Theorem~\ref{thm:eckart_young}), product-group
composition (Theorem~\ref{thm:product_ring}), $\starG$ associativity,
identity, distributivity, transpose reversal, left and right equivariance,
Frobenius norm and DC component invariance, and all three Wigner--Eckart
selection rules.
Full details are provided in SI~Section~\ref{sec:lean}.

\subsection*{Limitations and Future Directions}

The current framework handles finite groups. Extension to continuous group
structures is a separate problem we do not address here. For
completeness we note that related Eckart--Young-type results for the $\starM$
(tubal) algebra, a special case of our framework with $G = \bbZ_n$, have been
obtained independently by
Mor~\citep{mor2025quasitubaltensoralgebraseparable,mor2026sufficientnecessaryconditionseckartyoung}.
The octahedral group, while physically meaningful, captures only the cubic
subgroup of the full rotation group; icosahedral ($|G|=60$) or larger polyhedral
groups would provide finer angular resolution. The Lean~4 formalization
axiomatizes five standard results from linear algebra and harmonic analysis that
are not yet in Mathlib; closing these axioms is a contribution to the Mathlib
library rather than to the mathematics of this paper.
Predicting full tensor-valued
properties (not just scalar invariants) would enable complete Wigner--Eckart
decomposition including $l\!=\!2$ selection rules for the quadrupole moment.

More broadly, the $\starG$ framework inverts the conventional relationship between
data and mathematics. Rather than destroying the geometry of tensorial data to fit
the algebra of vectors, we adapt the algebra to the geometry of the data. The
practical consequence is stark: on real molecular data, 107 algebraic parameters
outperform 5{,}249 neural network parameters. One does not need big data if one
has deep algebra.

\section{Methods}

\subsection{Algorithmic Overview}
\label{sec:methods_overview}

The full $\starG$ pipeline used in every experiment of this paper consists of
four stages: (i)~\emph{group selection}, in which a finite group $G$ is fixed
and its convolution tensor $\calT_G$ together with the generalized Fourier
matrix $F_G$ are precomputed once and cached; (ii)~\emph{tensorial
featurization}, in which each input molecule is mapped to a tensor
$\calX \in \bbR^{n_f \times |G|}$ (or $\bbR^{n_f \times |G_1| \times \cdots
\times |G_d|}$ for product groups) by sampling a measurement basis at every
group element; (iii)~\emph{algebraic decomposition}, in which group-invariant
features are extracted from $\calX$ via the generalized Fourier transform and
the $\starG$-SVD; and (iv)~\emph{prediction}, in which a downstream regressor
(ridge regression for the linear pipeline; an MLP or a Neural-$\starG$
network for the neural pipelines) maps these features to the target property.
For the Wigner--Eckart experiment a fifth stage replaces the global
generalized Fourier power with a per-irrep decomposition
(Section~\ref{sec:wigner_eckart}). The
end-to-end procedure, the $\starG$ product computation, the $\starG$-SVD, the
invariant feature extractor, the per-irrep decomposition, and the symmetry-
and factorization-discovery search are written as explicit algorithm blocks
in the Supplementary Information. Reference implementations are provided in
MATLAB (\texttt{core/StarGAlgebra.m}, \texttt{core/extractStarGFeatures.m},
\texttt{core/NeuralStarGFramework.m}) and in Python
(\texttt{python/StarGAlgebra.py}); the numerical results in this paper were
generated by the MATLAB scripts under \texttt{experiments/}, with the Python
implementation passing the same regression-test suite.

\subsection{Feature Construction}
\label{sec:methods_featurizer}

For the single-group and product-group experiments, molecular features are inner
products with a rotating measurement basis at angles $\theta_g = 2\pi g/n$. For
the product group, axial features use periodic $z$-embeddings and coupled features
are angular$\times$axial products; the two actions commute because $z$-rotation
modifies $(x,y)$ not $z$, while $z$-translation modifies $z$ not $(x,y)$.

For the Wigner--Eckart experiment, features are computed under the 24 rotations
of the chiral octahedral group $O$ (6 face rotations at
$90^\circ/180^\circ/270^\circ$ about coordinate axes, 8 vertex rotations at
$120^\circ/240^\circ$ about body diagonals, 6 edge rotations at $180^\circ$
about edge midpoints, plus identity). This group is a subgroup of SO(3) whose
irreps (A$_1$, A$_2$, E, T$_1$, T$_2$) correspond to angular momentum channels
$l = 0, 0, 2, 1, 2$.

\subsection{Dipole Vector Computation}

The QM9 .xyz files include Mulliken partial charges $q_i$ as a fifth column.
The dipole vector is computed as $\boldsymbol{\mu} = \sum_i q_i \mathbf{r}_i$,
yielding components $\mu_x, \mu_y, \mu_z$ that transform as a rank-1 tensor
(vector) under rotation. This provides ground-truth targets with known
transformation properties for the Wigner--Eckart test.

\subsection{Per-Irrep Fourier Decomposition}

For each irrep $\rho$ of dimension $d_\rho$ with representation matrices
$\{\rho(g)\}_{g \in G}$, the Fourier transform of feature row $j$ is
$\hat{X}_j^\rho = \sqrt{d_\rho/|G|} \sum_{g} X(j,g) \, \rho(g)$, a
$d_\rho \times d_\rho$ matrix. The per-irrep power is $\|\hat{X}_j^\rho\|_F^2$.
Per-irrep features (one power value per feature row per irrep) are used
independently as predictors for each quantum property via ridge regression,
yielding per-irrep \Rsq{} values. Pseudocode is given in
Algorithm~\ref{alg:per_irrep}.

\subsection{Invariant Feature Extraction}
\label{sec:invariant_features}

Given a sample tensor $\calX \in \bbR^{n_f \times |G|}$, the invariant feature
vector concatenates seven complementary descriptors, all of which are exactly
invariant under the left action of $G$ (proofs in the SI Equivariance Proofs section):
(a)~the DC component $\bar{\calX}_j = \tfrac{1}{|G|} \sum_g \calX(j,g)$ for
each feature row $j$;
(b)~the AC energy $\sigma_j = \mathrm{std}_g\,\calX(j,g)$;
(c)~the total per-frequency power $\sum_j |\hat{\calX}(j,k)|^2$ for each
generalized Fourier bin $k$, where $\hat{\calX} = \calX F_G$ is the
generalized Fourier transform of $\calX$ along the group dimension;
(d)~per-row generalized Fourier power $|\hat{\calX}(j,k)|^2$ for the first
$K = 14$ equivariant rows;
(e)~the singular tube norms $\|\mathbf{s}_i\|_F$ for $i = 1, \ldots,
\min(p, q)$ obtained from the $\starG$-SVD of a reshaped $p \times q \times
|G|$ tensor (the reshape $(p, q)$ is chosen to maximize $\min(p, q)$);
(f)~the rows of $\calX$ identified as invariant by row-variance (constant
under the group action); and
(g)~four spectral statistics of the unfolded matrix (nuclear norm, spectral
norm, condition number, and entropy of the singular-value distribution).
Features are $z$-score normalized using statistics computed from training
data, and an unregularized intercept column is appended. Pseudocode is given
in Algorithm~\ref{alg:features}.

\subsection{Ridge Regression and Rank Selection}

The downstream linear model is ridge regression with hyperparameter
$\lambda$ selected from the geometric grid
$\{10^{-3}, 10^{-2}, \ldots, 10^{2}, 10^{3}\}$ by validation MSE. The
intercept is unregularized. The number of singular tubes retained in the
$\starG$-SVD feature block is set to $\min(p, q)$ where $(p, q)$ is the
optimal rectangular reshape of $n_f$ (Section~\ref{sec:invariant_features});
no further truncation is applied because the Eckart--Young theorem guarantees
a closed-form bound on the truncation error
(Theorem~\ref{thm:eckart_young}). Total parameter count for $\starG$-SVD +
Ridge is $1 + d_{\mathrm{feat}}$, where $d_{\mathrm{feat}}$ is the number of
non-degenerate feature columns retained after $z$-score normalization (107
on QM9 and 186 on the product-group task).

\subsection{Baseline Architectures and Training}
\label{sec:baselines}

All four neural baselines share the same hidden width
$[64, 32]$, ReLU activations on hidden layers, a linear output, He
initialization, full-batch validation, and the Adam optimizer
($\beta_1 = 0.9, \beta_2 = 0.999, \varepsilon = 10^{-8}$) with early stopping
on validation MSE (patience 20). They differ in three respects: (i)~the
input representation $X_{\mathrm{in}}$, (ii)~the training-set construction,
and (iii)~the parameter count. Table~\ref{tab:baselines_main} (Methods)
summarizes these differences; the complete per-experiment hyperparameter
sheet is given in SI Table~3.

\textbf{Standard MLP.} Input is the un-rotated raw feature vector
$X_{\mathrm{in}} = \calX(:, e) \in \bbR^{n_f}$ (frontal slice at the
identity), $z$-score normalized using training-set statistics. The model is
$[n_f \to 64 \to 32 \to 1]$. No symmetry information is used during training.
Trained for up to 300 epochs (synthetic) or 300 epochs (QM9 / product-group)
at learning rate $0.003$, batch size 32 (synthetic) or 256 (QM9). This is the
``no-symmetry'' control.

\textbf{Invariant MLP.} Input is the concatenation of four hand-crafted
group-invariant pooling statistics applied along the group dimension:
$X_{\mathrm{in}} = [\,\mathrm{mean}_g \calX,\ \mathrm{std}_g \calX,\
\min_g \calX,\ \max_g \calX\,] \in \bbR^{4 n_f}$, $z$-score normalized. The
model is $[4 n_f \to 64 \to 32 \to 1]$. By construction the input is exactly
$G$-invariant so the model is invariant by composition; this is the ``manual
invariance'' control. Same optimizer schedule as Standard MLP.

\textbf{Augmented MLP.} Input is the un-rotated raw feature vector
$X_{\mathrm{in}} = \calX(:, e) \in \bbR^{n_f}$, but the training set is
expanded by including $X_{\mathrm{aug}} = \calX(:, g)$ for every $g \in G$,
with the same target label, yielding a $|G|$-fold augmented training set.
$z$-score statistics are computed on the augmented set. The model is
$[n_f \to 64 \to 32 \to 1]$. Test-time prediction uses the un-rotated
slice. This is the standard data-augmentation strategy and serves as a
proxy for invariance learned from data rather than enforced algebraically;
it is the closest non-equivariant baseline to an ENN. Trained for 80--300
epochs at learning rate $0.003$--$0.005$, batch size 32 (synthetic) or 256
(QM9 / product-group); the lower epoch count for the synthetic experiment
reflects the $|G|$-fold larger effective batch budget. A precise pseudocode
specification is given in Algorithm~\ref{alg:augmented_mlp}.

\textbf{Neural $\starG$.} A symmetry-aware feed-forward network whose linear
layers are $\starG$ products with weight tensors $W^{(\ell)} \in \bbR^{n_{\ell+1}
\times n_\ell \times |G|}$ rather than ordinary matrix products. Forward
pass: $\calA^{(\ell+1)} = \mathrm{ReLU}\bigl(W^{(\ell)} \starG \calA^{(\ell)}
+ \mathbf{b}^{(\ell)}\bigr)$ for hidden layers, with a linear output and an
invariant pooling $y = \mathrm{mean}_{g, j} \calA^{(L)}(j, g)$. The hidden
widths are $[64, 32]$, matching the MLPs. Input is the $\starG$-feature
vector $X_{\mathrm{in}}$ from
Section~\ref{sec:invariant_features}. Trained for 300 epochs at learning
rate $0.003$, batch size 256, with the same Adam settings and early
stopping. Equivariance is exact by construction (the per-layer rotation
variance is $\sim 10^{-28}$, at floating-point noise) so this baseline
isolates the cost of replacing a closed-form ridge regressor with a non-linear
trainable model on top of the same algebraic representation. Pseudocode is
given in Algorithm~\ref{alg:neural_starg}.

\begin{table}[h]
\centering
\caption{Baseline summary. Hidden $= [64, 32]$, ReLU, Adam, He init,
patience 20.}
\label{tab:baselines_main}
\begin{tabular}{@{}p{2.5cm}p{4.5cm}p{2.6cm}p{1.6cm}p{1.4cm}@{}}
\toprule
\textbf{Method} & \textbf{Input $X_{\mathrm{in}}$} & \textbf{Train data} &
\textbf{Equiv.} & \textbf{Params (QM9)} \\
\midrule
Standard MLP   & raw slice $\calX(:, e)$, normalized        & native ($n$)        & approx & 5{,}249 \\
Invariant MLP  & $[\mathrm{mean}, \mathrm{std}, \min, \max]_g \calX$ & native ($n$) & exact  & 14{,}465 \\
Augmented MLP  & raw slice $\calX(:, e)$, normalized        & augmented ($|G|\,n$) & approx & 5{,}249 \\
Neural $\starG$ & $\starG$-features (Sec.~\ref{sec:invariant_features}) & native ($n$) & exact  & 9{,}025 \\
$\starG$-SVD + Ridge & $\starG$-features (Sec.~\ref{sec:invariant_features}) & native ($n$) & exact & \textbf{107} \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Symmetry and Factorization Discovery}

The symmetry-discovery experiment (Section~\ref{sec:discovery}) and the
factorization-discovery experiment use the same scoring function:
$\mathrm{score}(G) = \alpha \cdot R^2_{\mathrm{val}}(G) + (1 - \alpha) \cdot
[1 - \mathrm{rotvar}(G) / \mathrm{rotvar}_{\max}]$ with $\alpha = 0.7$,
where $R^2_{\mathrm{val}}(G)$ is the validation $R^2$ achieved by
$\starG$-SVD + Ridge with group $G$ and $\mathrm{rotvar}(G)$ is the
prediction variance under the candidate group action. The candidate library
contains all groups of order $\leq 12$ ($\bbZ_n$, $D_n$, $S_3$, Klein 4,
Quaternion 8) plus, for the factorization experiment, every product
$\bbZ_a \times \bbZ_b$ with $ab = n_{\mathrm{group}}$. Pseudocode is given
in Algorithm~\ref{alg:discovery}.

\subsection{Reproducibility}

All experiments use 3 random seeds and a 70/15/15 train/validation/test
split. The same random seeds (\texttt{seed}, $111 \cdot \texttt{seed}$ and
$31 \cdot \texttt{seed}$) drive train/val/test partitioning, weight
initialization, and mini-batch shuffling so that runs are reproducible up to
floating-point non-determinism. End-to-end runtimes (single seed,
1{,}000 molecules) on a 2024 desktop CPU are reported in SI
Table~4 and range from 0.7~s ($\starG$-SVD~+~Ridge) to 4.0~s (Augmented
MLP).

\section*{Data Availability}

The QM9 dataset is publicly available~\citep{ramakrishnan2014qm9}. The main
structure file (\texttt{dsgdb9nsd.xyz.tar.bz2}) can be downloaded directly from
\url{https://figshare.com/ndownloader/files/3195389}. The full collection,
including property labels and auxiliary files, is archived at
\url{https://doi.org/10.6084/m9.figshare.c.978904.v5}.

\section*{Code Availability}

% TODO: Replace the placeholder URL below with the permanent repository URL
% before submission. Nature requires that code be deposited in a public
% repository (e.g., GitHub, Zenodo, or Code Ocean) with a DOI.
Open-source implementations of the $\starG$ algebra in MATLAB and Python,
together with scripts to reproduce all experiments and figures, will be made
available at \url{https://gitfront.io/r/supermanG/hRrSeL77snCf/tensor-group-sym/} %{https://github.com/[REPO-URL-TBD]/starG-algebra} 
upon acceptance.  The Lean~4 formalization of all algebraic proofs (zero
\texttt{sorry}, five axioms) is included in the repository under
\texttt{lean/}.

\section*{Acknowledgements}

We wish to acknowledge Tammy Kolda for early feedback and her foundational
contributions to the tensor algebra field, and Tess Smidt, Maurice Weiler and Joe Kileel
for engaging discussions. P.H. gratefully acknowledges the IBM internship program, which enabled key components of this study.

\section*{Author Contributions}

\textbf{Conceptualization:} L.H., S.U.
\textbf{Methodology:} P.H., H.A., M.K., D.H., V.K., K.C.
\textbf{Formal analysis:} P.H., H.A., L.H., M.K.
\textbf{Software:} L.H., S.U., D.H., V.K., K.C.
\textbf{Investigation:} L.H., S.U.
\textbf{Visualization:} L.H., S.U.
\textbf{Writing (original draft):} L.H., S.U., P.H.
\textbf{Writing (review and editing):} all authors.
\textbf{Supervision:} L.H.

\section*{Competing Interests}

The authors declare no competing interests.

\bibliographystyle{unsrtnat}
\bibliography{references}

\end{document}