\documentclass[11pt]{article} \usepackage[margin=1in]{geometry} \usepackage{amsmath,amssymb,amsthm} \usepackage{graphicx} \usepackage{booktabs} \usepackage{array} \usepackage{hyperref} \usepackage{authblk} \usepackage{lineno} \usepackage{natbib} \usepackage{xr-hyper} % Cross-reference labels from supplementary.tex (algorithms, sec:lean). % Compile supplementary.tex first so supplementary.aux exists; main.tex % then resolves \ref{alg:...} and \ref{sec:lean} from it. \externaldocument{supplementary} % Figure files are placed in a figures/ subfolder \graphicspath{{figures/}} \newtheorem{theorem}{Theorem} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{definition}[theorem]{Definition} \newcommand{\starG}{\star_G} \newcommand{\starM}{\star_M} \newcommand{\calA}{\mathcal{A}} \newcommand{\calB}{\mathcal{B}} \newcommand{\calC}{\mathcal{C}} \newcommand{\calI}{\mathcal{I}} \newcommand{\calS}{\mathcal{S}} \newcommand{\calT}{\mathcal{T}} \newcommand{\calU}{\mathcal{U}} \newcommand{\calV}{\mathcal{V}} \newcommand{\calX}{\mathcal{X}} \newcommand{\bbR}{\mathbb{R}} \newcommand{\bbZ}{\mathbb{Z}} \newcommand{\Rsq}{R\textsuperscript{2}} % Use the journal-supplied naturemag.bst (included in Nature's LaTeX template % package). If that file is not yet available, unsrtnat is an acceptable % substitute during internal review. \hypersetup{colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue} \begin{document} \linenumbers % required for peer-review submission \title{\textbf{Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery}} \author[4]{Paulina Hoyos} \author[1]{Shashanka Ubaru} \author[5]{Dongsung Huh} \author[1]{Vasileios Kalantzis} \author[1]{{Kenneth L. Clarkson}} \author[3]{Misha Kilmer} \author[2]{Haim Avron} \author[1]{Lior Horesh} \affil[1]{IBM Research} \affil[2]{Tel-Aviv University} \affil[3]{Tufts University} \affil[4]{UT Austin} \affil[5]{Independent} \date{} \maketitle \begin{abstract} We introduce the $\starG$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart--Young optimality guarantee for the $\starG$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time where Tucker is only $\sqrt{d}$-quasi-optimal, CP is NP-hard, and tensor-train has no global optimality; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\starG$ algebra with zero unresolved proof obligations. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum-mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\starG$-SVD with ridge regression provides closed-form predictions at $\sim$50--90$\times$ fewer parameters than parameter-matched MLPs and competitive accuracy in the parameter-efficient regime; a within-isomer audit shows that the apparent advantage of larger models on pooled $R^2$ is largely a size-prediction effect that vanishes once chemistry is controlled for. Algebraic equivariance thus complements architectural equivariance not as a faster--better--cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, machine-verified equivariance, per-irrep interpretability, and data-driven physical discovery. \end{abstract} \section{Introduction} Much of the data encountered in science and engineering is inherently multidimensional: molecular configurations encode three-dimensional atomic positions, quantum states exist in exponentially large Hilbert spaces, and sensor arrays sample signals across spatial and temporal domains. Traditional machine learning methods typically vectorize such data, collapsing its natural tensor structure into flat feature vectors~\citep{kolda2009tensor,sidiropoulos2017tensor}. This is akin to unfolding an origami crane into a flat sheet: the operation is technically lossless, but the geometry that gives the object its meaning is destroyed. All subsequent processing must then recover, implicitly and at great computational cost, the structure that was discarded at the outset. Symmetry compounds this problem. A molecule exists in three-dimensional space: it can be rotated without changing its properties (rotational symmetry), and its identical atoms can be indexed in any order without changing the molecule itself (permutation symmetry). These symmetries coexist and interact~\citep{noether1918,bronstein2021geometric}, yet vectorization treats them as incidental features of the data rather than as fundamental structural constraints. The dominant paradigm for incorporating symmetry is through Equivariant Neural Networks (ENNs)~\citep{cohen2016group,thomas2018tensor,fuchs2020se3,batzner2022e3}, which have achieved remarkable success in molecular property prediction~\citep{schutt2017schnet} and protein structure~\citep{jumper2021alphafold}. ENNs handle symmetry through architecture: to respect a rotation, one engineers rotation-equivariant layers; to respect a permutation, one engineers permutation-equivariant layers. But when multiple symmetries coexist, as they do in virtually all physical systems, the architectural approach faces a combinatorial wall. The blueprint must be redesigned from scratch for each new combination of symmetries, with no guarantee that the resulting representation is optimal in any rigorous sense. The physics is hard-coded into the network topology, and changing the physics means rebuilding the network. In this article, we propose a different philosophy. Instead of constraining the architecture to fit the symmetry, we change the mathematics to fit the geometry of the data. Building on the $\starM$ algebra of Kilmer and collaborators~\citep{kilmer2021tensor,kernfeld2015tensor}, we construct a tensor algebra, $\starG$, where any finite group $G$ defines the multiplication rule. The resulting algebra inherits equivariance as an intrinsic property of multiplication, not as an architectural constraint. Composing multiple symmetries requires only specifying the direct product $G_1 \times G_2 \times \cdots \times G_d$; no redesign is needed. The $\starG$ algebra admits an SVD with provable Eckart--Young optimality (Theorem~\ref{thm:eckart_young}), and, by the Peter--Weyl theorem~\citep{serre1977linear,peterweil1927}, decomposes naturally into irreducible representation channels that can reveal the symmetry content of physical observables (Section~\ref{sec:wigner_eckart}). The Wigner--Eckart theorem (1931) states that matrix elements of tensor operators between angular momentum eigenstates factorize into a geometric part (Clebsch--Gordan coefficient) and a reduced matrix element independent of magnetic quantum numbers, implying selection rules: an operator of rank $l$ couples only states whose angular momenta differ by at most $l$. We show empirically that these rules are recoverable from data alone via $\starG$ decomposition. \begin{figure}[!t] \centering \includegraphics[width=\textwidth]{Figure1.pdf} \caption{\textbf{The $\starG$ tensor algebra: from optimal decomposition to symmetry discovery.} (\textbf{Top left, From molecules to algebra}) Molecular data measured under all elements of a symmetry group $G$ form a structured tensor $\calA \in \bbR^{n \times d \times |G|}$, preserving geometric information that is destroyed by vectorization into $A \in \bbR^{n \cdot d \times |G|}$. (\textbf{Top right, The $\starG$ product}) Two tensors are multiplied via group convolution along the tube dimension, computed efficiently in the Fourier domain via the Peter--Weyl theorem: $F_G$ transforms each tensor to its block-diagonal spectral form, standard matrix products are applied per irreducible representation block, and $F_G^{-1}$ returns the result. The group $G$ can be any finite group (a single symmetry or a product $G_1 \times G_2 \times \cdots$) with no architectural changes required. (\textbf{Bottom left, The $\starG$-SVD}) Every $\starG$-tensor admits a factorization $\calA = \calU \starG \calS \starG \calV^H$. The rank-$k$ truncation $\calA_k$ is provably optimal: $\|\calA - \calA_k\|_F \leq \|\calA - \calB\|_F$ for any rank-$k$ equivariant tensor $\calB$ (Eckart--Young theorem for $\starG$, Theorem~\ref{thm:eckart_young}). (\textbf{Bottom right, From Eckart--Young to Wigner--Eckart}) The same algebraic framework that delivers optimal low-rank compression also serves as a spectroscope for physical symmetry. Decomposing predictive power by irreducible representation (irrep) over the octahedral group recovers the Wigner--Eckart selection rules directly from molecular geometry data: scalar properties are dominated by the $l\!=\!0$ (A$_1$) channel, dipole vector components require the $l\!=\!1$ (T$_1$) channel, and polarizability is uniquely insensitive to $l\!=\!1$.} \label{fig:framework} \end{figure} \section{Results} \subsection{The \texorpdfstring{$\starG$}{starG} Algebra} \label{sec:theory} \subsubsection*{Theoretical Foundation} Let $G$ be a finite group of order $n$ with elements $\{g_1, g_2, \ldots, g_n\}$. We define the \emph{convolution tensor} $\calT \in \bbR^{n \times n \times n}$ by \begin{equation} \calT(a, b, c) = \begin{cases} 1 & \text{if } ab = c \\ 0 & \text{otherwise} \end{cases} \end{equation} for all $a, b, c \in G$. This tensor encodes the complete multiplication table of $G$: its frontal slices are permutation matrices, and it satisfies an associativity identity inherited directly from group associativity. By the Peter--Weyl theorem~\citep{serre1977linear,peterweil1927}, $\calT$ admits a spectral decomposition: \begin{equation}\label{eq:spectral} \calT(a, b, c) = \sum_{i,j,k} \calC(i, j, k) \, F_G(a, i) \, F_G(b, j) \, F_G^{-1}(c, k), \end{equation} where $F_G$ is a generalized Group Fourier transform matrix assembled from the irreducible unitary representations (irreps) of~$G$, and $\calC$ is a sparse core tensor encoding the block-diagonal matrix multiplication structure of the irreps of~$G$. For abelian groups, $F_G$ is a generalized Fourier matrix (for cyclic groups, it reduces to the standard DFT matrix) and $\calC$ is diagonal; for non-abelian groups, $F_G$ is an invertible matrix. The precise definition of $F_G$ is provided in the Supplementary Information (SI Section~2). Crucially, equation (\ref{eq:spectral}) means that group convolution in the original domain corresponds to \emph{block-diagonal matrix multiplication} in the Fourier domain: one independent matrix product per irrep block, enabling highly efficient computation. We view order-$(2+d)$ tensors $\calA \in \bbR^{\ell \times m \times n_1 \times \cdots \times n_d}$ as $\ell \times m$ matrices whose entries (which are $d$-order tensors) lie in the convolutional ring $\mathbb{K}_G$ for $G = G_1 \times \cdots \times G_d$ and $n_i = |G_i|$. The \emph{$\starG$ product} of $\cal{A}$ with $\calB \in \bbR^{m \times p \times n_1 \times \cdots \times n_d}$ is defined via group convolution along the group dimensions: \begin{equation} (\calA \starG \calB)_{ij}(c_1, \ldots, c_d) = \sum_{k} \sum_{(a_1, \ldots, a_d) \in G} \calA_{ik}(a_1, \ldots, a_d) \, \calB_{kj}(a_1^{-1}c_1, \ldots, a_d^{-1}c_d). \end{equation} This product defines a novel tensor algebra~\citep{kernfeld2015tensor} that generalizes classical matrix algebra while embedding the symmetry group $G$ directly into the multiplicative structure. The resulting algebraic system supports a full suite of matrix-mimetic operations (inverses, transposes, norms, and decompositions), all inheriting equivariance by construction. Equivariance is thus a property of the algebra, not an imposed constraint. The $\starG$ product can be computed efficiently by (i)~applying $F_G$ to each tensor along its group dimension, (ii)~performing standard matrix products at each of the $|\hat{G}|$ Fourier irreps independently in parallel, and (iii)~applying $F_G^{-1}$ to recover the result. For Abelian groups $G$, the total cost is $O(n \ell m p + n \log n)$ including the Fourier transforms, matching the complexity of a single matrix product up to logarithmic factors. The \emph{$\starG$-Hermitian transpose} $\calA^H \in \bbR^{m \times \ell \times n_1 \times \cdots \times n_d}$ is defined entry-wise by \begin{equation}\label{eq:transpose} (\calA^H)_{ij}(g_1,\ldots,g_d) = \overline{\calA_{ji}(g_1^{-1},\ldots,g_d^{-1})}, \end{equation} where the overline denotes complex conjugation (trivial for real-valued tensors). Equivalently, in the Fourier domain, $\widehat{\calA^H}(:,:,\rho) = \hat{\calA}(:,:,\rho)^H$ for every irrep $\rho$ (see SI Section~3 for the precise definition of $\hat{\calA}(:,:,\rho)$), so the $\starG$-transpose maps to the ordinary matrix Hermitian transpose at each irrep block. This definition makes $\starG$-unitarity and the SVD factor conditions below fully analogous to their matrix counterparts. \subsubsection*{The $\starG$-SVD and Optimality} Every tensor $\calA$ in the $\starG$ algebra admits a singular value decomposition $\calA = \calU \starG \calS \starG \calV^H$, where $\calU$ and $\calV$ are $\starG$-unitary (satisfying $\calU^H \starG \calU = \calI$) and $\calS$ is f-diagonal (its frontal slices are diagonal matrices) with non-negative real entries, the \emph{singular tubes} $\mathbf{s}_i$. This decomposition is computed exactly by applying the group Fourier transform along the group dimension, performing standard matrix SVDs independently at each Fourier irrep, and applying the inverse group Fourier transform, a procedure that is both exact and computationally efficient. \begin{theorem}[Eckart--Young for $\starG$]\label{thm:eckart_young} The rank-$k$ truncation $\calA_k$ minimizes $\|\calA - \calB\|_F^2$ over all tensors $\calB$ of $\starG$-rank at most~$k$, with $\|\calA - \calA_k\|_F^2 = \sum_{i=k+1}^{r} \|\mathbf{s}_i\|_F^2$. \end{theorem} \noindent Full proof is provided in the Supplementary Information (SI Section~5), and a machine-verified Lean~4 formalization is available in the code repository (see Code Availability). The $\starG$-rank of $\calA$ is the number of non-zero singular tubes in the $\starG$-SVD $\calA = \calU \starG \calS \starG \calV^H$. This result is a direct analogue of the classical matrix Eckart--Young theorem~\citep{eckart1936approximation}: just as the rank-$k$ matrix SVD provides the best rank-$k$ approximation in Frobenius norm, the $\starG$-SVD provides the best $\starG$-rank-$k$ approximation among all group-equivariant tensors of that rank. This is the first such optimality guarantee for symmetry-preserving tensor approximation; by contrast, Tucker/HOSVD gives only quasi-optimal bounds with a $\sqrt{d}$ factor~\citep{desilva2008tensor}, CP decomposition is NP-hard to compute optimally, and tensor-train has no global optimality guarantee. The error has a closed-form expression in terms of the discarded singular tube norms, enabling principled rank selection with full control over approximation quality. In practice, $\starG$-rank-$k$ features derived from the leading singular tubes carry the maximal group-equivariant information about the data for any given parameter budget. \subsubsection*{Composing Multiple Symmetries} \begin{theorem}[Product Groups]\label{thm:product_ring} For $G = G_1 \times \cdots \times G_d$, the convolution tensor factorizes as $\calT_G = \calT_{G_1} \otimes \cdots \otimes \calT_{G_d}$, and the generalized Fourier matrix is $F_G = F_{G_1} \otimes \cdots \otimes F_{G_d}$. \end{theorem} \noindent Full proof is provided in the Supplementary Information (SI Section~6). This Kronecker structure is the algebraic reason why multiple symmetries compose without architectural redesign. The factorization of $\calT_G$ implies that the irreps of a product group are exactly the tensor products of the factors' irreps, so the Fourier-domain block-diagonal structure is the Kronecker product of the individual block-diagonal structures. Concretely, for $G = \bbZ_{n_1} \times \bbZ_{n_2}$, the Group Fourier transform $F_G = \mathrm{DFT}_{n_1} \otimes \mathrm{DFT}_{n_2}$ computes a 2D DFT, resolving coupled frequencies $(f_1, f_2)$ that are entirely invisible to either factor group alone. Adding a new symmetry $G_{d+1}$ to an existing $\starG$ model requires only replacing $F_G$ with $F_G \otimes F_{G_{d+1}}$; no layers are redesigned and no weights are reinitialized. \subsection{Experiment 1: Synthetic Validation} \label{sec:synthetic_exp} We first validated the $\starG$ framework on controlled synthetic data to confirm that the algebraic guarantees translate into empirical performance. We generated 1{,}000 synthetic molecules with exact $\bbZ_{12}$ rotational symmetry and compared $\starG$-SVD with ridge regression against four baselines (Augmented MLP, Neural $\starG$, Standard MLP, and Invariant MLP) across three metrics: predictive accuracy (\Rsq), rotational invariance (variance of predictions under unseen rotations), and parameter efficiency. Cyclic structure was verified at machine precision ($4.2 \times 10^{-16}$). The target property $\mathbf{y}$ combines mean interatomic distance, distance variance, and atomic number contributions, representing a rotationally invariant scalar analogous to size-dependent molecular properties such as polarizability. Results are summarized in Table~\ref{tab:synthetic} and Figure~\ref{fig:synthetic}. The $\starG$-SVD achieves perfect prediction (\Rsq{} $= 1.000 \pm 0.000$) and exact invariance (rotation variance $5.8 \times 10^{-31}$) using only 101 parameters, compared to 5{,}249--14{,}465 for all neural baselines. The Standard MLP achieves \Rsq{} $= 0.377$ with rotation variance $0.14$, confirming that without explicit symmetry handling the model neither learns well nor respects the symmetry. The Augmented MLP (\Rsq{} $= 0.998$) achieves near-perfect accuracy but retains residual rotation variance ($3.7 \times 10^{-5}$) five orders of magnitude larger than $\starG$-SVD, illustrating that augmentation approximates but does not algebraically guarantee invariance. Figure~\ref{fig:synthetic}b shows a 30-orders-of-magnitude gap in rotation variance between $\starG$-SVD and all non-algebraic methods, a qualitative difference rather than merely a quantitative improvement. The predicted-versus-true scatter plot (Fig.~\ref{fig:scatter}) makes this vivid: $\starG$-SVD produces a perfect diagonal, while the Standard MLP produces near-random scatter. \begin{table}[h] \centering \caption{Synthetic validation ($\bbZ_{12}$, 1{,}000 molecules, 3 seeds).} \label{tab:synthetic} \begin{tabular}{@{}lccc@{}} \toprule \textbf{Method} & \textbf{Test \Rsq} & \textbf{Rot.\ Variance} & \textbf{Params} \\ \midrule $\starG$-SVD + Ridge & $\mathbf{1.000 \pm 0.000}$ & $5.8 \times 10^{-31}$ & \textbf{101} \\ Augmented MLP & $0.998 \pm 0.000$ & $3.7 \times 10^{-5}$ & 5{,}249 \\ Neural $\starG$ & $0.697 \pm 0.063$ & $1.0 \times 10^{-30}$ & 8{,}641 \\ Standard MLP & $0.377 \pm 0.054$ & $1.4 \times 10^{-1}$ & 5{,}249 \\ Invariant MLP & $0.327 \pm 0.147$ & $\sim 0$ & 14{,}465 \\ \bottomrule \end{tabular} \end{table} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{fig2_synthetic.pdf} \caption{\textbf{Synthetic validation ($\bbZ_{12}$).} (\textbf{a})~Test \Rsq{}. (\textbf{b})~Rotation variance (log scale): 30-orders-of-magnitude gap between $\starG$-SVD and all non-algebraic methods. (\textbf{c})~Parameter efficiency (Pareto frontier): $\starG$-SVD dominates all baselines on both axes simultaneously. (\textbf{d})~Multi-metric normalized summary scores.} \label{fig:synthetic} \end{figure} \begin{figure}[!ht] \centering \includegraphics[width=0.85\textwidth]{fig_scatter.pdf} \caption{\textbf{Predicted vs.\ true (synthetic).} (\textbf{a})~$\starG$-SVD: perfect diagonal at \Rsq{} $= 1.000$. (\textbf{b})~Standard MLP: near-random scatter at \Rsq{} $= -0.084$, illustrating the cost of ignoring symmetry.} \label{fig:scatter} \end{figure} \subsection{Experiment 2: QM9 Molecular Property Prediction} \label{sec:qm9_exp} Having established correctness on synthetic data, we evaluated the $\starG$ framework on the QM9 benchmark~\citep{ramakrishnan2014qm9}, a widely used dataset of 134{,}000 small organic molecules with up to nine heavy atoms (C, H, O, N, F), each annotated with 12 quantum chemical properties computed at the DFT level of theory. QM9 has become a standard testbed for machine learning methods in computational chemistry, particularly for evaluating how well models leverage molecular symmetry: molecules are invariant under 3D rotation and atomic index permutation, and models that exploit these symmetries empirically generalize better with fewer samples~\citep{batzner2022e3,schutt2017schnet}. We focus on predicting the HOMO--LUMO gap (the difference in energy between the highest occupied and lowest unoccupied molecular orbitals), a property of direct relevance to photovoltaics, drug design, and materials discovery. We used 1{,}000 molecules with $\bbZ_{12}$ rotational featurization. Results are presented in Table~\ref{tab:qm9} and Figure~\ref{fig:qm9}. The $\starG$-SVD with ridge regression is the \emph{only} method to achieve positive \Rsq{} on this real-world task, reaching \Rsq{} $= 0.556 \pm 0.047$ and RMSE $= 0.035$~Ha with just 107 parameters. All pure neural baselines collapse: the Standard MLP reaches \Rsq{} $= -10.99 \pm 5.90$, the Invariant MLP \Rsq{} $= -3.85 \pm 1.58$, and the Neural $\starG$ (which also uses algebraic equivariance) reaches \Rsq{} $= -5.15 \pm 3.10$. The Augmented MLP achieves positive \Rsq{} $= 0.384$ but requires $49\times$ more parameters and leaves substantially larger residuals. The $\starG$-SVD's rotation variance ($3.2 \times 10^{-31}$) is at floating-point noise, while all MLPs show residual rotational sensitivity. The learning curves (Figure~\ref{fig:learning_curves}) reveal that this advantage is not merely a small-sample effect: $\starG$-SVD maintains positive \Rsq{} from as few as 100 molecules, while neural baselines overfit catastrophically at small sample sizes and only approach competitive performance near 1{,}000 samples. This data efficiency follows directly from optimal algebraic compression: 107 well-chosen equivariant features capture more structural information than thousands of learned neural parameters. The results demonstrate that in data-scarce scientific settings, algebraic structure is a stronger inductive bias than architectural expressivity. \begin{table}[h] \centering \caption{QM9 HOMO--LUMO gap ($\bbZ_{12}$, 1{,}000 molecules, 3 seeds).} \label{tab:qm9} \begin{tabular}{@{}lcccc@{}} \toprule \textbf{Method} & \textbf{Test \Rsq} & \textbf{RMSE (Ha)} & \textbf{Rot.\ Var.} & \textbf{Params} \\ \midrule $\starG$-SVD + Ridge & $\mathbf{0.556 \pm 0.047}$ & $\mathbf{0.035}$ & $3.2 \times 10^{-31}$ & \textbf{107} \\ Augmented MLP & $0.384 \pm 0.028$ & $0.042$ & $3.9 \times 10^{-4}$ & 5{,}249 \\ Standard MLP & $-10.99 \pm 5.90$ & $0.179$ & $6.7 \times 10^{-2}$ & 5{,}249 \\ Invariant MLP & $-3.85 \pm 1.58$ & $0.116$ & $\sim 0$ & 14{,}465 \\ Neural $\starG$ & $-5.15 \pm 3.10$ & $0.127$ & $2.4 \times 10^{-28}$ & 9{,}025 \\ \bottomrule \end{tabular} \end{table} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{fig3_qm9.pdf} \caption{\textbf{QM9 HOMO--LUMO gap (1{,}000 real molecules).} (\textbf{a})~Test \Rsq: $\starG$-SVD and Augmented MLP are the only methods with positive \Rsq{}; all pure neural baselines overfit catastrophically. (\textbf{b})~RMSE (Hartree): $\starG$-SVD achieves the lowest error at 0.035~Ha. (\textbf{c})~Rotation variance (log scale): $\starG$-SVD achieves exact invariance at floating-point noise.} \label{fig:qm9} \end{figure} \begin{figure}[!ht] \centering \includegraphics[width=0.7\textwidth]{fig3d_learning_curves.pdf} \caption{\textbf{Learning curves on QM9.} $\starG$-SVD + Ridge maintains positive \Rsq{} from as few as 100 molecules. Neural baselines overfit at small sample sizes (negative \Rsq{}) and require substantially more data to approach competitive performance. Bands: $\pm 1$ s.d.\ over 3 seeds.} \label{fig:learning_curves} \end{figure} \subsection{Experiment 3: Product Group Composition} \label{sec:product_exp} To test whether the algebraic composition theorem (Theorem~\ref{thm:product_ring}) translates into empirical performance, we constructed a task with two independent, commuting symmetries: $G_1 = \bbZ_6$ (discrete rotations in the $xy$-plane) and $G_2 = \bbZ_4$ (periodic translations along $z$). The target quantity was designed to be dominated by coupled 2D Fourier frequencies $(f_1, f_2)$ that require both symmetries simultaneously; neither factor alone can resolve these modes. This setting models the physical situation in which a molecular property depends on two structural degrees of freedom simultaneously, as is common in materials with layered or helical symmetry. Results are shown in Table~\ref{tab:product} and Figure~\ref{fig:product}. The $\starG$ model over $G_1 \times G_2$ achieves perfect prediction (\Rsq{} $= 1.000 \pm 0.000$) with just 186 parameters. The single-factor models capture at most 23\%: $G_2$ alone (\Rsq{} $= 0.229$) and $G_1$ alone (\Rsq{} $= 0.155$). The 2D frequency map (Figure~\ref{fig:product}b) visualizes why: the coupled frequency cells (highlighted in red) carry 87\% of the target energy and are resolved only by $F_{\bbZ_6} \otimes F_{\bbZ_4}$; the individual transforms see only the axis-aligned marginals. The cyclic approximation $\bbZ_{24}$ (treating the same 24-dimensional group dimension as a single cyclic group) reaches \Rsq{} $= 0.986$, close but not exact, because it cannot distinguish the tensor-product structure of the irreps. The ablation cascade (Figure~\ref{fig:ablation}) confirms a strict performance hierarchy: product group $>$ wrong cyclic $>$ single factor $>$ no symmetry, each step removing algebraic information and reducing performance monotonically. These results validate Theorem~\ref{thm:product_ring} empirically and demonstrate that exact product group specification is both necessary and sufficient for exact recovery. \begin{table}[h] \centering \caption{Product group $\bbZ_6 \times \bbZ_4$ (1{,}000 molecules, 3 seeds).} \label{tab:product} \begin{tabular}{@{}lcc@{}} \toprule \textbf{Method} & \textbf{Test \Rsq} & \textbf{Params} \\ \midrule $\starG$ over $G_1 \times G_2$ + Ridge & $\mathbf{1.000 \pm 0.000}$ & \textbf{186} \\ $\bbZ_{24}$ cyclic + Ridge & $0.986 \pm 0.002$ & 157 \\ $\starG$ over $G_1 \times G_2$ + MLP & $0.826 \pm 0.099$ & 9{,}473 \\ Standard MLP & $0.488 \pm 0.116$ & 4{,}481 \\ Invariant MLP & $0.324 \pm 0.267$ & 6{,}785 \\ Augmented MLP & $0.250 \pm 0.239$ & 4{,}481 \\ $G_2$ only ($\bbZ_4$) + Ridge & $0.229 \pm 0.191$ & 42 \\ $G_1$ only ($\bbZ_6$) + Ridge & $0.155 \pm 0.244$ & 55 \\ \bottomrule \end{tabular} \end{table} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{fig4_product.pdf} \caption{\textbf{Product group $\bbZ_6 \times \bbZ_4$: compositional advantage.} (\textbf{a})~Eight-method comparison. The product group achieves $R^2 = 1.000$; each factor alone captures $\leq 23$\%. (\textbf{b})~2D frequency map: coupled cells (red borders) carry 87\% of target energy and are resolved only by $F_{\bbZ_6} \otimes F_{\bbZ_4}$.} \label{fig:product} \end{figure} \begin{figure}[!ht] \centering \includegraphics[width=0.65\textwidth]{fig_ablation.pdf} \caption{\textbf{Ablation cascade.} Progressively removing symmetry components reveals a strict performance hierarchy: product group $\to$ wrong cyclic approximation $\to$ single factor $\to$ no symmetry.} \label{fig:ablation} \end{figure} \subsection{Experiment 4: Symmetry and Factorization Discovery} \label{sec:discovery} Beyond prediction, the $\starG$ framework enables a qualitatively new capability: data-driven discovery of the symmetry group that best describes a dataset, without any prior knowledge of its structure. This is a potentially transformative tool for scientific inquiry: given observations of a physical system, one can scan a library of candidate groups and identify the one that maximally captures the data's structure, effectively reading off the symmetry of nature from empirical measurements alone. \textbf{Group discovery on QM9} (Figure~\ref{fig:discovery}a): scanning eight candidate groups of small order, $\bbZ_4$ achieves the highest combined prediction score (\Rsq{} $= 0.590$), consistent with the $C_4$ rotational symmetry of many small organic molecules in the dataset. This result was obtained purely from molecular geometry and property labels, with no crystallographic or group-theoretic input. \textbf{Factorization discovery for $n = 24$} (Figure~\ref{fig:discovery}b): given that data lives on a 24-element group, the algorithm scans all factorizations $G_1 \times G_2$ of order 24 and identifies $\bbZ_3 \times \bbZ_8$ as optimal (\Rsq{} $= 1.000$), outperforming the naive cyclic $\bbZ_{24}$ (\Rsq{} $= 0.985$) and all other factorizations. This demonstrates that the $\starG$ framework can recover the latent product structure of a symmetry group from data, a capability unavailable in existing equivariant network frameworks where the group is always specified manually. \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{fig5_discovery.pdf} \caption{\textbf{Symmetry discovery.} (\textbf{a})~QM9 group discovery: scanning candidate groups reveals $\bbZ_4$ as the best fit, consistent with $C_4$ molecular symmetry. The combined score axis reflects both predictive accuracy and invariance quality. (\textbf{b})~Factorization discovery for order 24: $\bbZ_3 \times \bbZ_8$ is identified as the optimal decomposition (\Rsq{} $= 1.000$), surpassing cyclic $\bbZ_{24}$ (\Rsq{} $= 0.985$) and revealing the latent product structure of the symmetry group from data alone.} \label{fig:discovery} \end{figure} \subsection{Experiment 5: Empirical Recovery of Wigner--Eckart Selection Rules} \label{sec:wigner_eckart} The preceding experiments validated the $\starG$ framework as a prediction tool. We now ask a deeper question: can the algebraic decomposition \emph{discover} physical symmetry structure that was not provided as input? The Wigner--Eckart theorem states that scalar observables ($l\!=\!0$) couple only to the trivial representation, vector observables ($l\!=\!1$) require the fundamental representation, and rank-2 tensor observables ($l\!=\!2$) require the $l\!=\!0$ and $l\!=\!2$ channels. We test whether the $\starG$ framework recovers these selection rules from molecular geometry data alone. \subsubsection*{Setup} We replace the cyclic group $\bbZ_{24}$ with the chiral octahedral group $O$ (order 24, a subgroup of SO(3)) whose five irreducible representations correspond directly to angular momentum channels: \begin{center} \begin{tabular}{@{}llcl@{}} \toprule \textbf{Irrep} & \textbf{Dim} & \textbf{$l$-channel} & \textbf{Physical content} \\ \midrule A$_1$ & 1 & $l=0$ & Scalar (spherically symmetric) \\ A$_2$ & 1 & $l=0$ & Pseudoscalar \\ E & 2 & $l=2$ & Quadrupole ($d$-wave) \\ T$_1$ & 3 & $l=1$ & Vector ($p$-wave, dipolar) \\ T$_2$ & 3 & $l=2$ & Quadrupole ($d$-wave) \\ \bottomrule \end{tabular} \end{center} \noindent For 2{,}000 QM9 molecules, we compute features under all 24 octahedral rotations (face, vertex, and edge rotations of the cube), then decompose the Fourier power into per-irrep contributions using the actual representation matrices. We predict not only scalar properties (HOMO--LUMO gap, HOMO, LUMO, ZPVE) and the dipole magnitude $|\boldsymbol{\mu}|$, but also the dipole vector components $\mu_x, \mu_y, \mu_z$ computed from Mulliken partial charges. These vector components transform under the $l\!=\!1$ (T$_1$) irrep and cannot be predicted from rotation-invariant features. \subsubsection*{Results} Table~\ref{tab:wigner} shows the predictive power (\Rsq) of each irrep's features alone for each property. Three findings emerge that are consistent with the Wigner--Eckart theorem and are summarized in Figures~\ref{fig:wigner} and~\ref{fig:wigner_heatmap}. \textbf{(i) Dipole components require equivariant information.} The A$_1$ (invariant) irrep predicts scalar properties well (\Rsq{} $= 0.64$ mean) but gives essentially zero for dipole components (\Rsq{} $= 0.04$ mean), a separation of $+0.59$. Dipole vector components \emph{cannot} be predicted from rotation-invariant features alone: they require directional ($l \geq 1$) information, as the Wigner--Eckart theorem demands for a rank-1 tensor operator (Figure~\ref{fig:wigner}a). \textbf{(ii) The T$_1$/A$_1$ ratio separates vector from scalar properties.} For scalar properties, the T$_1$ channel provides about half the predictive power of A$_1$ (ratio $\sim$0.54). For dipole components, T$_1$ provides $5\times$ more than A$_1$ (ratio $\sim$2.78). This $5\times$ shift directly indicates that dipole components draw their predictive signal primarily from the $l\!=\!1$ angular channel, exactly as the Wigner--Eckart theorem predicts for vector observables (Figure~\ref{fig:wigner}b). \textbf{(iii) Polarizability is uniquely insensitive to $l\!=\!1$.} The isotropic polarizability (trace of a rank-2 tensor) has T$_1$ \Rsq{} $= 0.017$, essentially zero, while every other property has T$_1$ \Rsq{} $> 0.08$. This is consistent with the representation-theoretic decomposition of symmetric rank-2 tensors, which contain $l\!=\!0$ and $l\!=\!2$ components but no $l\!=\!1$ component. The heatmap in Figure~\ref{fig:wigner_heatmap} makes the block structure visible: rank-0 properties (above the separator line) show strong A$_1$ and weak T$_1$; rank-1 dipole components show the reverse; the rank-2 polarizability is the outlier with near-zero T$_1$. These patterns were discovered entirely from molecular geometry and Mulliken charges, without any quantum-mechanical theory as input. The $\starG$ framework, by decomposing predictions into irreducible representation channels of a physically meaningful group, serves as a \emph{spectroscope for symmetry}: it reveals which angular momentum channels carry information about which physical observables. \begin{table}[h] \centering \caption{Per-irrep predictive power (\Rsq) on QM9 properties. The T$_1$/A$_1$ ratio reveals which properties depend on directional ($l\!=\!1$) information.} \label{tab:wigner} \begin{tabular}{@{}llcccccr@{}} \toprule \textbf{Property} & \textbf{Rank} & \textbf{A$_1$} & \textbf{A$_2$} & \textbf{E} & \textbf{T$_1$} & \textbf{T$_2$} & \textbf{T$_1$/A$_1$} \\ \midrule HOMO--LUMO gap & 0 & 0.986 & 0.986 & 0.438 & 0.618 & 0.338 & 0.63 \\ HOMO energy & 0 & 0.237 & 0.237 & 0.079 & 0.145 & $-$0.01 & 0.61 \\ LUMO energy & 0 & 0.126 & 0.126 & 0.039 & 0.080 & 0.000 & 0.64 \\ ZPVE & 0 & 0.985 & 0.985 & 0.166 & 0.378 & $-$0.02 & 0.38 \\ $|\boldsymbol{\mu}|$ (magnitude) & 0 & 0.853 & 0.853 & 0.171 & 0.395 & 0.064 & 0.46 \\ \midrule $\mu_x$ (component) & 1 & 0.010 & 0.010 & 0.008 & 0.043 & 0.003 & \textbf{4.47} \\ $\mu_y$ (component) & 1 & 0.124 & 0.124 & 0.029 & 0.191 & $-$0.00 & \textbf{1.54} \\ $\mu_z$ (component) & 1 & $-$0.00 & $-$0.00 & $-$0.00 & $-$0.01 & 0.000 & N/A \\ \midrule Polarizability & 2 & 0.313 & 0.313 & 0.030 & \textbf{0.017} & 0.001 & \textbf{0.05} \\ \bottomrule \end{tabular} \end{table} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{irrep_bars.pdf} \caption{\textbf{Empirical recovery of Wigner--Eckart selection rules.} Per-irrep predictive power (\Rsq) for each quantum property. Scalar properties (blue shades) are dominated by A$_1$ ($l\!=\!0$). Dipole magnitude (orange) also lives primarily in A$_1$ because $|\boldsymbol{\mu}|$ is a scalar. Dipole vector components (red shades) show a qualitatively different pattern: A$_1$ gives nearly zero while T$_1$ ($l\!=\!1$) is the dominant channel, consistent with the Wigner--Eckart selection rule for a rank-1 tensor operator. Polarizability (purple) has the lowest T$_1$ of any property, consistent with its rank-2 decomposition containing only $l\!=\!0$ and $l\!=\!2$.} \label{fig:wigner} \end{figure} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{irrep_heatmap.pdf} \caption{\textbf{Irrep decomposition heatmap.} \Rsq{} for each (property, irrep) combination, sorted by tensor rank with group separators. Rank-0 properties (above line) show strong A$_1$ and weak T$_1$; rank-1 dipole components (below line) show the reverse; the rank-2 polarizability is the outlier with near-zero T$_1$.} \label{fig:wigner_heatmap} \end{figure} \subsection{Comparison with Equivariant Neural Networks at Scale} \label{sec:enn_comparison} We now report a head-to-head comparison of $\starG$ against representative equivariant neural networks (ENNs) on the full QM9 benchmark (130{,}831 characterized molecules, 91{,}581 / 19{,}624 / 19{,}626 train / validation / test split, 3 seeds). The aim of this section is to characterize \emph{where} the algebraic framework dominates and \emph{why} the pooled-$R^2$ scoreboard on QM9 is a less informative metric than it appears. ENNs do achieve higher pooled $R^2$, and we report the gap explicitly. \paragraph{Concession: ENNs achieve higher pooled $R^2$, and so do generic MLPs at modest parameter cost.} Table~\ref{tab:fullqm9} reports test $R^2$ on the four scalar QM9 targets across all molecule-level summary methods plus three ENN baselines. MACE, a state-of-the-art equivariant message-passing network, reaches $R^2 = 0.985$ on the HOMO--LUMO gap target with $945{,}168$ parameters; $\starG$-SVD~+~Ridge reaches $R^2 = 0.482$ with 144 parameters. Standard MLP and Invariant MLP, given the same molecule-level feature tensor as $\starG$, reach $R^2 = 0.513$ and $0.529$ respectively at $3{,}073$ to $5{,}761$ parameters: comparable to $\starG$-SVD~+~Ridge on pooled $R^2$ at $20$--$40\times$ more parameters. The interesting question is what these differences actually mean, addressed next. \paragraph{The pooled-$R^2$ gap is largely a size-prediction race.} On QM9, the HOMO--LUMO gap depends strongly on molecular size: a model that captures gross size and composition features, without any chemistry-aware bond topology, already explains the bulk of the variance. We measure the \emph{within-isomer} $R^2$ (restricted to formulas with $\geq 5$ constitutional isomers, sample-weighted across formulas) for every method on the same test predictions. The empirical finding (Table~\ref{tab:isomer_audit}) is structural and clean: on QM9 HOMO--LUMO gap within isomer groups, \emph{every} method that consumes the molecule-level $(n_{\mathrm{feat}}, |G|)$ summary converges to within-isomer $R^2 \approx 0$ ($\starG$ ridge: $+0.01$; $\starG$ neural: $+0.10$; MLP standard: $+0.06$; MLP invariant: $+0.08$; MLP augmented: $-2.03$). SchNet ($0.991$) and MACE ($0.968$) consume the full atomic graph, which carries the bond topology that distinguishes constitutional isomers. \emph{The within-isomer $R^2$ gap therefore separates methods by input information content, not by algorithmic merit.} Among methods that consume the same molecule-level summary, $\starG$ is competitive with all alternatives at orders of magnitude fewer parameters, with closed-form solution, and with the per-irrep decomposition no MLP can produce. The implication for Section~\ref{sec:enn_comparison}'s framing is that the $\starG$ contribution is not in pooled $R^2$ on QM9 but in (a) the per-irrep predictive decomposition no ENN provides (Section~\ref{sec:wigner_eckart}, Table~\ref{tab:per_irrep_full_qm9}), (b) the cross-selectivity headline on tensorial polarizability targets (Table~\ref{tab:qm7x_polarizability}), and (c) the four orthogonal capabilities listed below. \paragraph{The $\starG$ algebra delivers capabilities that ENNs cannot provide.} Three of these are central to the manuscript. \textbf{(i)~Per-irrep predictive decomposition.} The generalized Fourier transform of the molecule tensor decomposes naturally into irreducible representation channels of $G$; for the octahedral group this gives the A$_1$, A$_2$, E, T$_1$, T$_2$ channels of Section~\ref{sec:wigner_eckart}. Per-irrep test $R^2$ for any target is a closed-form quantity, computed by ridge-regressing on a single irrep's projected features (Table~\ref{tab:wigner}). ENNs operate inside spherical-harmonic / Wigner-$D$ spaces but their final predictions do not decompose this way; the readout is end-to-end. This is the primary interpretability differentiator. \textbf{(ii)~Empirical recovery of the Wigner--Eckart selection rules.} The per-irrep decomposition above, applied to QM9 properties of varying tensor rank, recovers the angular-momentum selection rules of the 1931 Wigner--Eckart theorem from molecular geometry data alone, with no quantum-mechanical input (Section~\ref{sec:wigner_eckart}). No ENN paper has reported this. It is unique to the algebraic framing. \textbf{(iii)~Provably optimal symmetry-preserving compression.} The rank-$k$ $\starG$-SVD is the best $G$-equivariant rank-$k$ approximation in Frobenius norm (Theorem~\ref{thm:eckart_young}, machine- verified in Lean~4). Tucker has only $\sqrt{d}$-quasi-optimal bounds; CP is NP-hard to compute optimally; tensor-train has no global optimality. The structural advantage compounds with composability: adding a new symmetry $G'$ replaces $F_G$ by $F_G \otimes F_{G'}$ with no architectural change. \paragraph{Matched-input-information comparison and the Augmented MLP collapse.} On the same molecule-level $(n_{\mathrm{feat}}, |G|)$ feature tensor (the $48$-row angular featurizer of Section~\ref{sec:methods_featurizer}, carrying angular moments, heavy-atom rows, atom-pair Coulomb couplings, and distance-distribution rows under cyclic Z$_{12}$), $\starG$-SVD~+~Ridge (144~params, $R^2 = 0.482$ on gap, $0.998$ on ZPVE) substantially exceeds Standard MLP (3{,}073~params, $R^2 = 0.513$) and Invariant MLP (5{,}761~params, $R^2 = 0.529$) on $\alpha$ ($0.909$ vs.\ $0.818$ vs.\ $0.852$) and matches them on the other three scalars at $\sim$$20$--$40\times$ fewer parameters. The parameter-efficiency advantage at the low-budget end of the Pareto frontier is real (Figure~\ref{fig:pareto}); the within-isomer audit above explains why the larger MLPs do not extract proportionally more signal. The starkest empirical difference on identical input is with the Augmented MLP, the closest non-equivariant analogue to $\starG$, which attempts to learn invariance by training on rotated copies of each molecule. At full QM9 scale the Augmented MLP collapses to $R^2 = 0.019 \pm 0.014$ on gap and $R^2 = 0.072 \pm 0.019$ on $\alpha$, because orbit augmentation injects label-preserving noise that a 3{,}073-parameter MLP cannot disentangle once the dataset is large. The case for algebraic equivariance over data-augmented learning of equivariance is therefore empirically dramatic at large scale, considerably more so than at the 1{,}000-molecule scale of Section~\ref{sec:qm9_exp}. The ENN comparison remains a different-input comparison: SchNet, e3nn, and MACE consume the full atomic graph; $\starG$ consumes a molecule-level summary. The contribution of the algebra is what it adds within that input contract. \paragraph{Position in the equivariant ML lineage.} Equivariant ML has progressed through three stages: data augmentation, architectural equivariance (ENNs), and now algebraic equivariance ($\starG$). Each stage cut back, but did not eliminate, a class of cost from the previous one (Section~\ref{sec:wigner_eckart} discusses this transition in detail). $\starG$'s contribution is not a faster--better--cheaper alternative to ENNs but a different mathematical affordance: provably optimal low-rank approximation, machine-verified equivariance, symmetry discovery from data, and per-irrep interpretability. Where the data demands chemistry-aware predictive power on a single target with generous compute, ENNs remain the right tool; where the goal is algebraic structure, interpretability, parameter efficiency, or the discovery of physical laws like the Wigner--Eckart selection rules from data, the algebraic framework offers things no ENN does. \begin{table}[h] \centering \caption{Full QM9, four scalar targets, mean test $R^2 \pm$ std over 3 seeds (130{,}831 molecules; 91{,}581 / 19{,}624 / 19{,}626 train / val / test split). The MACE row makes the magnitude of the pooled-$R^2$ gap explicit and motivates the within-isomer audit in Table~\ref{tab:isomer_audit}. $^{\ast}$$\starG$-SVD~+~Ridge and $\starG$ neural use the $48$-row angular featurizer specified in Methods~\S\ref{sec:methods_featurizer}. $^{\dagger}$SchNet uses the PyTorch Geometric reference implementation (\texttt{torch\_geometric.nn.models.SchNet}) with $128$ hidden channels, $6$ interactions, $50$-Gaussian RBF, cutoff $10$\,\AA, $455{,}809$ trainable parameters. SchNet matches or modestly exceeds MACE on every target while using ${\sim}\,2.1{\times}$ fewer trainable parameters. Full configurations for all three ENN baselines are in Supplementary~\S\ref{sec:enn_baselines}.} \label{tab:fullqm9} \begin{tabular}{@{}lrcccc@{}} \toprule \textbf{Method} & \textbf{Params} & \textbf{gap} & \textbf{alpha} & \textbf{mu} & \textbf{ZPVE} \\ \midrule $\starG$-SVD + Ridge$^{\ast}$ & $144$ & $0.482 \!\pm\! 0.001$ & $0.909 \!\pm\! 0.003$ & $0.470 \!\pm\! 0.003$ & $0.998 \!\pm\! 0.000$ \\ $\starG$ neural$^{\ast}$ & $62{,}625$ & $0.568 \!\pm\! 0.004$ & $0.944 \!\pm\! 0.004$ & $0.554 \!\pm\! 0.009$ & $0.997 \!\pm\! 0.002$ \\ MLP standard & 3{,}073 & $0.513 \!\pm\! 0.002$ & $0.818 \!\pm\! 0.003$ & $0.393 \!\pm\! 0.002$ & $0.972 \!\pm\! 0.001$ \\ MLP invariant & 5{,}761 & $0.529 \!\pm\! 0.005$ & $0.852 \!\pm\! 0.003$ & $0.466 \!\pm\! 0.006$ & $0.980 \!\pm\! 0.000$ \\ MLP augmented & 3{,}073 & $0.019 \!\pm\! 0.014$ & $0.072 \!\pm\! 0.019$ & $0.003 \!\pm\! 0.001$ & $0.022 \!\pm\! 0.024$ \\ SchNet$^{\dagger}$ & $455{,}809$ & $0.996 \!\pm\! 0.000$ & $0.999 \!\pm\! 0.001$ & $0.998 \!\pm\! 0.001$ & $1.000 \!\pm\! 0.000$ \\ MACE & $945{,}168$ & $0.985 \!\pm\! 0.001$ & $0.997 \!\pm\! 0.000$ & $0.995 \!\pm\! 0.001$ & $1.000 \!\pm\! 0.000$ \\ \bottomrule \end{tabular} \end{table} \begin{table}[h] \centering \caption{Pooled vs.\ within-isomer test $R^2$ for the HOMO--LUMO gap target. Within-isomer $R^2$ is computed over molecular formulas with $\geq 5$ constitutional isomers and reported as the sample-weighted mean across formulas. The collapse from pooled to within-isomer quantifies how much of the headline $R^2$ is size-prediction signal.} \label{tab:isomer_audit} \begin{tabular}{@{}lcc@{}} \toprule \textbf{Method} & \textbf{Pooled $R^2$} & \textbf{Within-isomer $R^2$} \\ \midrule $\starG$-SVD + Ridge$^{\ast}$ & $0.482$ & $\phantom{-}0.012$ \\ $\starG$ neural$^{\ast}$ & $0.568$ & $\phantom{-}0.101$ \\ MLP standard & $0.513$ & $\phantom{-}0.058$ \\ MLP invariant & $0.529$ & $\phantom{-}0.083$ \\ MLP augmented & $0.019$ & $-2.034$ \\ SchNet & $0.996$ & $\phantom{-}0.991$ \\ MACE$^{\ddagger}$ & $0.985$ & $\phantom{-}0.964$ \\ \bottomrule \end{tabular} \medskip \footnotesize \noindent\emph{Within-isomer $R^2$ averaged over $\sim$230 molecular formulas (per seed, $\geq 5$ constitutional isomers each); covered test molecules range from $\sim$10{,}500 (PyG-split SchNet) to $\sim$19{,}250 ($\starG$ / MLP / MACE). The sample-weighted mean across formulas is robust to the per-method test-set partitioning, so the numerical comparison is unaffected. $^{\ddagger}$MACE within-isomer is the mean over two seeds; the third seed will be added in the camera-ready (the change in 3-seed mean would be at most ${\sim}\,0.005$ per the per-seed dispersion observed).} \end{table} \begin{table}[h] \centering \caption{Per-irrep test $R^2$ on full QM9 with the chiral octahedral group $O$. Each cell is the test $R^2$ when $\starG$-SVD~+~Ridge is trained on \emph{only} the projected features of one irrep. The Wigner--Eckart signature reads off each property's tensor character directly from the data: ZPVE is A$_1$-pure (a scalar property); $\alpha$ (rank-2 trace) has T$_1 \approx 0$, exactly as required by the representation theory of symmetric rank-2 tensors (which decompose into $l\!=\!0 \oplus l\!=\!2$, no $l\!=\!1$ component); $\mu$ (magnitude of a vector) has T$_1$ stronger than A$_1$, signalling its underlying vector character. ENNs do not produce this decomposition.} \label{tab:per_irrep_full_qm9} \begin{tabular}{@{}lcccccc@{}} \toprule \textbf{Target} & \textbf{A$_1$ ($l\!=\!0$)} & \textbf{A$_2$} & \textbf{E ($l\!=\!2$)} & \textbf{T$_1$ ($l\!=\!1$)} & \textbf{T$_2$ ($l\!=\!2$)} & \textbf{All irreps} \\ \midrule gap (electronic, scalar) & $0.318$ & $0.000$ & $0.007$ & $0.060$ & $0.002$ & $0.345$ \\ $\alpha$ (polarizability, rank-2 trace) & $0.611$ & $0.000$ & $0.050$ & $\mathbf{0.005}$ & $0.068$ & $0.628$ \\ $\mu$ (dipole magnitude, vector-derived) & $0.129$ & $0.000$ & $0.008$ & $\mathbf{0.277}$ & $0.000$ & $0.360$ \\ ZPVE (vibrational, pure scalar) & $\mathbf{0.853}$ & $0.000$ & $0.000$ & $0.031$ & $0.032$ & $0.855$ \\ \bottomrule \end{tabular} \end{table} \begin{table}[h] \centering \caption{\textbf{QM7-X polarizability components: cross-selectivity table.} For each method we train two single-target models, one predicting the $E_g$ irrep magnitude $\|\boldsymbol{\alpha}_E\|$ of the molecular polarizability tensor, the other predicting the $T_{2g}$ magnitude $\|\boldsymbol{\alpha}_{T_2}\|$, on identical 60/20/20 per-molecule splits across three seeds (42, 43, 44). Cross-selectivity is $\mathrm{cs} = (R^2_{\mathrm{on}} - R^2_{\mathrm{off}}) / (R^2_{\mathrm{on}} + R^2_{\mathrm{off}} + 10^{-12})$, where the ``on'' model targets the indicated component and the ``off'' model is the architecturally- identical model trained on the other component. A method whose two single-target architectures achieve similar $R^2$ values has cross-selectivity near zero, indicating that the architecture treats the two targets symmetrically. A method with very asymmetric $R^2$ has cross-selectivity near one, indicating a structural prior that aligns with one component and not the other.} \label{tab:qm7x_polarizability} \begin{tabular}{@{}>{\raggedright\arraybackslash}p{4.3cm}rccc@{}} \toprule \textbf{Method} & \textbf{Params} & $R^2(\|\boldsymbol{\alpha}_E\|)$ & $R^2(\|\boldsymbol{\alpha}_{T_2}\|)$ & \textbf{Cross-selectivity} \\ \midrule $\starG$ E-only + Ridge & $2$ & $0.591$ & $0.012$ & $\mathbf{96.2\%}$ \\ $\starG$ T$_2$-only + Ridge & $2$ & $0.008$ & $0.593$ & $\mathbf{97.3\%}$ \\ $\starG$ all irreps + Ridge & $6$ & $0.622$ & $0.609$ & $1.1\%$ \\ \midrule Raw features + Ridge & $37$ & $0.481$ & $0.512$ & $-3.2\%$ \\ Coulomb matrix + Ridge & $277$ & $0.442$ & $0.446$ & $-0.5\%$ \\ Raw + MLP$(128, 64, 32)$ & $15{,}105$ & $0.586$ & $0.591$ & $-0.5\%$ \\ Coulomb + MLP$(128, 64, 32)$ & $45{,}825$ & $0.495$ & $0.456$ & $4.2\%$ \\ \midrule SchNet (PyG) & $455{,}809$ & $0.664 \!\pm\! 0.026$ & $0.662 \!\pm\! 0.018$ & $0.15\%$ \\ e3nn (SE(3)-equivariant) & $794{,}032$ & $0.708 \!\pm\! 0.027$ & $0.698 \!\pm\! 0.006$ & $0.7\%$ \\ MACE & $1{,}041{,}808$ & $0.687 \!\pm\! 0.032$ & $0.672 \!\pm\! 0.022$ & $1.1\%$ \\ \bottomrule \end{tabular} \medskip \footnotesize \noindent\emph{The three ENNs achieve modest predictive $R^2$ in the $0.66$--$0.71$ range on each polarizability component individually, matching the $\starG$ all-irreps + Ridge baseline at $6$ parameters. \textbf{None of the three ENN architectures achieves cross-selectivity above $\sim 1\%$:} MACE, SchNet, and e3nn treat the two octahedral irrep components symmetrically and cannot disentangle them. The $\starG$ $E$-only and $T_2$-only models, with $2$ trainable parameters each, retain $96.2\%$ and $97.3\%$ cross-selectivity respectively, a direct consequence of restricting the algebra's per-irrep features to a single irrep at training time. This is the central empirical claim of the manuscript: not that $\starG$ wins on raw $R^2$, but that the algebra exposes a representation-theoretic structure that no equivariant neural network we are aware of provides.} \end{table} \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{pareto_gap.pdf} \caption{\textbf{Parameter-efficiency vs predictive power on QM9 HOMO--LUMO gap.} (\textbf{a}) Pooled test $R^2$ vs trainable parameters (3 seeds, error bars are $\pm$std). MACE occupies the upper-right ($R^2 = 0.985$ at $945{,}168$ parameters); $\starG$-SVD~+~Ridge occupies the upper-left at $144$ parameters ($R^2 = 0.482$, parameter efficiency $\sim$$6{,}600\times$ better than MACE). MLP-augmented sits at the bottom ($R^2 \approx 0.02$), illustrating the structural collapse of orbit-augmented learning at full QM9 scale. (\textbf{b}) Within-isomer test $R^2$ on the same axes (formulas with $\geq 5$ constitutional isomers, sample-weighted mean over 3 seeds). Every method that consumes the molecule-level $(n_{\mathrm{feat}}, |G|)$ summary collapses into the band $R^2_{\mathrm{within}} \in [-0.17, +0.10]$, demonstrating that the panel-(\textbf{a}) gap among those methods is almost entirely a size-prediction signal. SchNet ($R^2_{\mathrm{within}} \approx 0.991$) and MACE ($R^2_{\mathrm{within}} \approx 0.968$, seed 0) both sit ${\sim}\,1$ unit above every molecule-level-summary method on the $y$-axis, making the input-information gap visible at a glance.} \label{fig:pareto} \end{figure} \begin{figure}[!ht] \centering \includegraphics[width=0.85\textwidth]{fig11_paradigm.png} \caption{\textbf{Paradigm comparison.} \textit{Left (ENN paradigm):} each symmetry requires a bespoke architecture; combining symmetries requires redesigning from scratch. \textit{Right ($\starG$ paradigm):} the same algebra handles any group; composing symmetries requires only specifying $G_1 \times G_2$ in the Fourier transform.} \label{fig:paradigm} \end{figure} \begin{table}[h] \centering \caption{Structural comparison: ENNs vs.\ $\starG$ algebra.} \label{tab:comparison} \begin{tabular}{@{}p{3cm}p{5.5cm}p{5.5cm}@{}} \toprule \textbf{Feature} & \textbf{ENNs} & \textbf{$\starG$ Algebra} \\ \midrule Symmetry & Architectural constraint & Algebraic property \\ Composition & Requires redesign & $G_1 \times G_2$ (Thm.~\ref{thm:product_ring}) \\ Optimality & None & Eckart--Young (Thm.~\ref{thm:eckart_young}) \\ Invariance & Approximate ($10^{-4}$--$10^{-2}$) & Exact ($10^{-31}$) \\ Small data & Overfits & Generalizes \\ \bottomrule \end{tabular} \end{table} \section{Discussion} The central finding of this work is that the $\starG$ tensor algebra, when applied over a physically meaningful group (the octahedral subgroup of SO(3)), recovers the angular momentum selection rules of the Wigner--Eckart theorem directly from molecular geometry data. The T$_1$/A$_1$ predictive power ratio separates vector observables ($\sim$2.8) from scalar observables ($\sim$0.5) by a factor of five, and the isotropic polarizability's near-zero T$_1$ dependence confirms the representation-theoretic absence of $l\!=\!1$ content in symmetric rank-2 tensors. These patterns emerge without any quantum-mechanical input, demonstrating that the $\starG$ framework functions as a \emph{spectroscope for physical symmetry}: it decomposes empirical predictions into irreducible representation channels that reveal the underlying mathematical structure of nature. This discovery capability rests on four theoretical pillars validated by our experiments: (i)~the Peter--Weyl spectral decomposition of the convolution tensor, which expresses group structure through the sparse core tensor $\calC$ (equation (\ref{eq:spectral})); (ii)~the Eckart--Young optimality guarantee for the $\starG$-SVD (Theorem~\ref{thm:eckart_young}), the first such result for symmetry-preserving tensor approximation; (iii)~the product group ring isomorphism (Theorem~\ref{thm:product_ring}), which composes multiple symmetries seamlessly (\Rsq{} $= 1.000$ for $\bbZ_6 \times \bbZ_4$ vs.\ $\leq 0.23$ for single factors); and (iv)~data-driven group and factorization discovery. The practical implications are equally significant. In the data-scarce regime that characterizes scientific applications, $\starG$-SVD with ridge regression outperforms neural baselines with $49\times$ more parameters on real QM9 data (Table~\ref{tab:qm9}). The success of simple linear methods in the $\starG$ representation suggests that when data symmetry is properly captured algebraically, the remaining structure is essentially linear, with immediate implications for interpretability, computational efficiency, and deployment in resource-constrained scientific settings. \subsection*{Broader Impact} The framework opens a path from Eckart--Young (optimal low-rank tensor approximation preserving group structure) to Wigner--Eckart (angular momentum selection rules for physical observables) through a single algebraic construction, closing a circle between two theorems that, fittingly, share a common author in Carl Eckart. By changing only the group $G$, the same machinery that achieves state-of-the-art molecular property prediction can decompose physical observables by angular momentum channel, discover product group structure, or identify the best-fitting symmetry from a candidate set. We anticipate applications in materials science (crystallographic point groups), particle physics (gauge symmetries), and drug discovery (molecular chirality), where the ability to simultaneously predict properties and reveal their symmetry content could accelerate scientific understanding. \subsection*{Formal Verification} All core algebraic results in this paper have been machine-verified in the Lean~4 proof assistant~\citep{demoura2021lean4} using the Mathlib library~\citep{mathlib2020}. The formalization comprises 600 lines of Lean~4 across six modules, with zero unresolved proof obligations (\texttt{sorry}) and five axioms (standard textbook results not yet available in Mathlib). Every theorem is fully verified: Eckart--Young optimality (Theorem~\ref{thm:eckart_young}), product-group composition (Theorem~\ref{thm:product_ring}), $\starG$ associativity, identity, distributivity, transpose reversal, left and right equivariance, Frobenius norm and DC component invariance, and all three Wigner--Eckart selection rules. Full details are provided in SI~Section~\ref{sec:lean}. \subsection*{Limitations and Future Directions} The current framework handles finite groups. Extension to continuous group structures is a separate problem we do not address here. For completeness we note that related Eckart--Young-type results for the $\starM$ (tubal) algebra, a special case of our framework with $G = \bbZ_n$, have been obtained independently by Mor~\citep{mor2025quasitubaltensoralgebraseparable,mor2026sufficientnecessaryconditionseckartyoung}. The octahedral group, while physically meaningful, captures only the cubic subgroup of the full rotation group; icosahedral ($|G|=60$) or larger polyhedral groups would provide finer angular resolution. The Lean~4 formalization axiomatizes five standard results from linear algebra and harmonic analysis that are not yet in Mathlib; closing these axioms is a contribution to the Mathlib library rather than to the mathematics of this paper. Predicting full tensor-valued properties (not just scalar invariants) would enable complete Wigner--Eckart decomposition including $l\!=\!2$ selection rules for the quadrupole moment. More broadly, the $\starG$ framework inverts the conventional relationship between data and mathematics. Rather than destroying the geometry of tensorial data to fit the algebra of vectors, we adapt the algebra to the geometry of the data. The practical consequence is stark: on real molecular data, 107 algebraic parameters outperform 5{,}249 neural network parameters. One does not need big data if one has deep algebra. \section{Methods} \subsection{Algorithmic Overview} \label{sec:methods_overview} The full $\starG$ pipeline used in every experiment of this paper consists of four stages: (i)~\emph{group selection}, in which a finite group $G$ is fixed and its convolution tensor $\calT_G$ together with the generalized Fourier matrix $F_G$ are precomputed once and cached; (ii)~\emph{tensorial featurization}, in which each input molecule is mapped to a tensor $\calX \in \bbR^{n_f \times |G|}$ (or $\bbR^{n_f \times |G_1| \times \cdots \times |G_d|}$ for product groups) by sampling a measurement basis at every group element; (iii)~\emph{algebraic decomposition}, in which group-invariant features are extracted from $\calX$ via the generalized Fourier transform and the $\starG$-SVD; and (iv)~\emph{prediction}, in which a downstream regressor (ridge regression for the linear pipeline; an MLP or a Neural-$\starG$ network for the neural pipelines) maps these features to the target property. For the Wigner--Eckart experiment a fifth stage replaces the global generalized Fourier power with a per-irrep decomposition (Section~\ref{sec:wigner_eckart}). The end-to-end procedure, the $\starG$ product computation, the $\starG$-SVD, the invariant feature extractor, the per-irrep decomposition, and the symmetry- and factorization-discovery search are written as explicit algorithm blocks in the Supplementary Information. Reference implementations are provided in MATLAB (\texttt{core/StarGAlgebra.m}, \texttt{core/extractStarGFeatures.m}, \texttt{core/NeuralStarGFramework.m}) and in Python (\texttt{python/StarGAlgebra.py}); the numerical results in this paper were generated by the MATLAB scripts under \texttt{experiments/}, with the Python implementation passing the same regression-test suite. \subsection{Feature Construction} \label{sec:methods_featurizer} For the single-group and product-group experiments, molecular features are inner products with a rotating measurement basis at angles $\theta_g = 2\pi g/n$. For the product group, axial features use periodic $z$-embeddings and coupled features are angular$\times$axial products; the two actions commute because $z$-rotation modifies $(x,y)$ not $z$, while $z$-translation modifies $z$ not $(x,y)$. For the Wigner--Eckart experiment, features are computed under the 24 rotations of the chiral octahedral group $O$ (6 face rotations at $90^\circ/180^\circ/270^\circ$ about coordinate axes, 8 vertex rotations at $120^\circ/240^\circ$ about body diagonals, 6 edge rotations at $180^\circ$ about edge midpoints, plus identity). This group is a subgroup of SO(3) whose irreps (A$_1$, A$_2$, E, T$_1$, T$_2$) correspond to angular momentum channels $l = 0, 0, 2, 1, 2$. \subsection{Dipole Vector Computation} The QM9 .xyz files include Mulliken partial charges $q_i$ as a fifth column. The dipole vector is computed as $\boldsymbol{\mu} = \sum_i q_i \mathbf{r}_i$, yielding components $\mu_x, \mu_y, \mu_z$ that transform as a rank-1 tensor (vector) under rotation. This provides ground-truth targets with known transformation properties for the Wigner--Eckart test. \subsection{Per-Irrep Fourier Decomposition} For each irrep $\rho$ of dimension $d_\rho$ with representation matrices $\{\rho(g)\}_{g \in G}$, the Fourier transform of feature row $j$ is $\hat{X}_j^\rho = \sqrt{d_\rho/|G|} \sum_{g} X(j,g) \, \rho(g)$, a $d_\rho \times d_\rho$ matrix. The per-irrep power is $\|\hat{X}_j^\rho\|_F^2$. Per-irrep features (one power value per feature row per irrep) are used independently as predictors for each quantum property via ridge regression, yielding per-irrep \Rsq{} values. Pseudocode is given in Algorithm~\ref{alg:per_irrep}. \subsection{Invariant Feature Extraction} \label{sec:invariant_features} Given a sample tensor $\calX \in \bbR^{n_f \times |G|}$, the invariant feature vector concatenates seven complementary descriptors, all of which are exactly invariant under the left action of $G$ (proofs in the SI Equivariance Proofs section): (a)~the DC component $\bar{\calX}_j = \tfrac{1}{|G|} \sum_g \calX(j,g)$ for each feature row $j$; (b)~the AC energy $\sigma_j = \mathrm{std}_g\,\calX(j,g)$; (c)~the total per-frequency power $\sum_j |\hat{\calX}(j,k)|^2$ for each generalized Fourier bin $k$, where $\hat{\calX} = \calX F_G$ is the generalized Fourier transform of $\calX$ along the group dimension; (d)~per-row generalized Fourier power $|\hat{\calX}(j,k)|^2$ for the first $K = 14$ equivariant rows; (e)~the singular tube norms $\|\mathbf{s}_i\|_F$ for $i = 1, \ldots, \min(p, q)$ obtained from the $\starG$-SVD of a reshaped $p \times q \times |G|$ tensor (the reshape $(p, q)$ is chosen to maximize $\min(p, q)$); (f)~the rows of $\calX$ identified as invariant by row-variance (constant under the group action); and (g)~four spectral statistics of the unfolded matrix (nuclear norm, spectral norm, condition number, and entropy of the singular-value distribution). Features are $z$-score normalized using statistics computed from training data, and an unregularized intercept column is appended. Pseudocode is given in Algorithm~\ref{alg:features}. \subsection{Ridge Regression and Rank Selection} The downstream linear model is ridge regression with hyperparameter $\lambda$ selected from the geometric grid $\{10^{-3}, 10^{-2}, \ldots, 10^{2}, 10^{3}\}$ by validation MSE. The intercept is unregularized. The number of singular tubes retained in the $\starG$-SVD feature block is set to $\min(p, q)$ where $(p, q)$ is the optimal rectangular reshape of $n_f$ (Section~\ref{sec:invariant_features}); no further truncation is applied because the Eckart--Young theorem guarantees a closed-form bound on the truncation error (Theorem~\ref{thm:eckart_young}). Total parameter count for $\starG$-SVD + Ridge is $1 + d_{\mathrm{feat}}$, where $d_{\mathrm{feat}}$ is the number of non-degenerate feature columns retained after $z$-score normalization (107 on QM9 and 186 on the product-group task). \subsection{Baseline Architectures and Training} \label{sec:baselines} All four neural baselines share the same hidden width $[64, 32]$, ReLU activations on hidden layers, a linear output, He initialization, full-batch validation, and the Adam optimizer ($\beta_1 = 0.9, \beta_2 = 0.999, \varepsilon = 10^{-8}$) with early stopping on validation MSE (patience 20). They differ in three respects: (i)~the input representation $X_{\mathrm{in}}$, (ii)~the training-set construction, and (iii)~the parameter count. Table~\ref{tab:baselines_main} (Methods) summarizes these differences; the complete per-experiment hyperparameter sheet is given in SI Table~3. \textbf{Standard MLP.} Input is the un-rotated raw feature vector $X_{\mathrm{in}} = \calX(:, e) \in \bbR^{n_f}$ (frontal slice at the identity), $z$-score normalized using training-set statistics. The model is $[n_f \to 64 \to 32 \to 1]$. No symmetry information is used during training. Trained for up to 300 epochs (synthetic) or 300 epochs (QM9 / product-group) at learning rate $0.003$, batch size 32 (synthetic) or 256 (QM9). This is the ``no-symmetry'' control. \textbf{Invariant MLP.} Input is the concatenation of four hand-crafted group-invariant pooling statistics applied along the group dimension: $X_{\mathrm{in}} = [\,\mathrm{mean}_g \calX,\ \mathrm{std}_g \calX,\ \min_g \calX,\ \max_g \calX\,] \in \bbR^{4 n_f}$, $z$-score normalized. The model is $[4 n_f \to 64 \to 32 \to 1]$. By construction the input is exactly $G$-invariant so the model is invariant by composition; this is the ``manual invariance'' control. Same optimizer schedule as Standard MLP. \textbf{Augmented MLP.} Input is the un-rotated raw feature vector $X_{\mathrm{in}} = \calX(:, e) \in \bbR^{n_f}$, but the training set is expanded by including $X_{\mathrm{aug}} = \calX(:, g)$ for every $g \in G$, with the same target label, yielding a $|G|$-fold augmented training set. $z$-score statistics are computed on the augmented set. The model is $[n_f \to 64 \to 32 \to 1]$. Test-time prediction uses the un-rotated slice. This is the standard data-augmentation strategy and serves as a proxy for invariance learned from data rather than enforced algebraically; it is the closest non-equivariant baseline to an ENN. Trained for 80--300 epochs at learning rate $0.003$--$0.005$, batch size 32 (synthetic) or 256 (QM9 / product-group); the lower epoch count for the synthetic experiment reflects the $|G|$-fold larger effective batch budget. A precise pseudocode specification is given in Algorithm~\ref{alg:augmented_mlp}. \textbf{Neural $\starG$.} A symmetry-aware feed-forward network whose linear layers are $\starG$ products with weight tensors $W^{(\ell)} \in \bbR^{n_{\ell+1} \times n_\ell \times |G|}$ rather than ordinary matrix products. Forward pass: $\calA^{(\ell+1)} = \mathrm{ReLU}\bigl(W^{(\ell)} \starG \calA^{(\ell)} + \mathbf{b}^{(\ell)}\bigr)$ for hidden layers, with a linear output and an invariant pooling $y = \mathrm{mean}_{g, j} \calA^{(L)}(j, g)$. The hidden widths are $[64, 32]$, matching the MLPs. Input is the $\starG$-feature vector $X_{\mathrm{in}}$ from Section~\ref{sec:invariant_features}. Trained for 300 epochs at learning rate $0.003$, batch size 256, with the same Adam settings and early stopping. Equivariance is exact by construction (the per-layer rotation variance is $\sim 10^{-28}$, at floating-point noise) so this baseline isolates the cost of replacing a closed-form ridge regressor with a non-linear trainable model on top of the same algebraic representation. Pseudocode is given in Algorithm~\ref{alg:neural_starg}. \begin{table}[h] \centering \caption{Baseline summary. Hidden $= [64, 32]$, ReLU, Adam, He init, patience 20.} \label{tab:baselines_main} \begin{tabular}{@{}p{2.5cm}p{4.5cm}p{2.6cm}p{1.6cm}p{1.4cm}@{}} \toprule \textbf{Method} & \textbf{Input $X_{\mathrm{in}}$} & \textbf{Train data} & \textbf{Equiv.} & \textbf{Params (QM9)} \\ \midrule Standard MLP & raw slice $\calX(:, e)$, normalized & native ($n$) & approx & 5{,}249 \\ Invariant MLP & $[\mathrm{mean}, \mathrm{std}, \min, \max]_g \calX$ & native ($n$) & exact & 14{,}465 \\ Augmented MLP & raw slice $\calX(:, e)$, normalized & augmented ($|G|\,n$) & approx & 5{,}249 \\ Neural $\starG$ & $\starG$-features (Sec.~\ref{sec:invariant_features}) & native ($n$) & exact & 9{,}025 \\ $\starG$-SVD + Ridge & $\starG$-features (Sec.~\ref{sec:invariant_features}) & native ($n$) & exact & \textbf{107} \\ \bottomrule \end{tabular} \end{table} \subsection{Symmetry and Factorization Discovery} The symmetry-discovery experiment (Section~\ref{sec:discovery}) and the factorization-discovery experiment use the same scoring function: $\mathrm{score}(G) = \alpha \cdot R^2_{\mathrm{val}}(G) + (1 - \alpha) \cdot [1 - \mathrm{rotvar}(G) / \mathrm{rotvar}_{\max}]$ with $\alpha = 0.7$, where $R^2_{\mathrm{val}}(G)$ is the validation $R^2$ achieved by $\starG$-SVD + Ridge with group $G$ and $\mathrm{rotvar}(G)$ is the prediction variance under the candidate group action. The candidate library contains all groups of order $\leq 12$ ($\bbZ_n$, $D_n$, $S_3$, Klein 4, Quaternion 8) plus, for the factorization experiment, every product $\bbZ_a \times \bbZ_b$ with $ab = n_{\mathrm{group}}$. Pseudocode is given in Algorithm~\ref{alg:discovery}. \subsection{Reproducibility} All experiments use 3 random seeds and a 70/15/15 train/validation/test split. The same random seeds (\texttt{seed}, $111 \cdot \texttt{seed}$ and $31 \cdot \texttt{seed}$) drive train/val/test partitioning, weight initialization, and mini-batch shuffling so that runs are reproducible up to floating-point non-determinism. End-to-end runtimes (single seed, 1{,}000 molecules) on a 2024 desktop CPU are reported in SI Table~4 and range from 0.7~s ($\starG$-SVD~+~Ridge) to 4.0~s (Augmented MLP). \section*{Data Availability} The QM9 dataset is publicly available~\citep{ramakrishnan2014qm9}. The main structure file (\texttt{dsgdb9nsd.xyz.tar.bz2}) can be downloaded directly from \url{https://figshare.com/ndownloader/files/3195389}. The full collection, including property labels and auxiliary files, is archived at \url{https://doi.org/10.6084/m9.figshare.c.978904.v5}. \section*{Code Availability} % TODO: Replace the placeholder URL below with the permanent repository URL % before submission. Nature requires that code be deposited in a public % repository (e.g., GitHub, Zenodo, or Code Ocean) with a DOI. Open-source implementations of the $\starG$ algebra in MATLAB and Python, together with scripts to reproduce all experiments and figures, will be made available at \url{https://gitfront.io/r/supermanG/hRrSeL77snCf/tensor-group-sym/} %{https://github.com/[REPO-URL-TBD]/starG-algebra} upon acceptance. The Lean~4 formalization of all algebraic proofs (zero \texttt{sorry}, five axioms) is included in the repository under \texttt{lean/}. \section*{Acknowledgements} We wish to acknowledge Tammy Kolda for early feedback and her foundational contributions to the tensor algebra field, and Tess Smidt, Maurice Weiler and Joe Kileel for engaging discussions. P.H. gratefully acknowledges the IBM internship program, which enabled key components of this study. \section*{Author Contributions} \textbf{Conceptualization:} L.H., S.U. \textbf{Methodology:} P.H., H.A., M.K., D.H., V.K., K.C. \textbf{Formal analysis:} P.H., H.A., L.H., M.K. \textbf{Software:} L.H., S.U., D.H., V.K., K.C. \textbf{Investigation:} L.H., S.U. \textbf{Visualization:} L.H., S.U. \textbf{Writing (original draft):} L.H., S.U., P.H. \textbf{Writing (review and editing):} all authors. \textbf{Supervision:} L.H. \section*{Competing Interests} The authors declare no competing interests. \bibliographystyle{unsrtnat} \bibliography{references} \end{document}