Since a matrix can be seen as a linear map , the product of two matrices can be seen as the composition of two linear maps:
One cool thing about linear functions is that we can easily pre-calculate this product only once to obtain a new matrix, and so we don't have to do both multiplications separately each time.
No 2x2 examples please. I'm talking about large matrices that would be used in supercomputers.

Tagged

TODO application.
TODO speedup over algorithm for general matrices.
www.studentclustercompetition.us/ comments:
The HPCG benchmark uses a preconditioned conjugate gradient (PCG) algorithm to measure the performance of HPC platforms with respect to frequently observed but challenging patterns of computing, communication, and memory access. While HPL provides an optimistic performance target for applications, HPCG can be considered as a lower bound on performance. Many of the top 500 supercomputers also provide their HPCG performance as a reference.
The terminology GEMM is present on BLAS, and has stuck pretty much.
DeepMind likes coming up with new improved algorithms for these more specific cases, e.g. it was announced in 2025 that AlphaEvolve found a novel 4x4 complex valued algorithm that uses 48 multiplications.
Bibliography:
A "commutative matrix multiplication algorithm" is a matrix multiplication algorithm that requires the ring to be commutative. Such algorithms are inferior because you cannot use them to create more efficient algorithms for general matrix matrix multiplication by decomposing the bigger matrix into smaller ones.
For example, the Strassen algorithm is based on reduction to non-commutative 2x2 matrix multiplication optimized to be done in 7 multiplications rather than 8 as in the native algorithm.
For 3x3 matrix multiplication, the best algorithms as of 2025 are:and beating the Strassen algorithm using 3x3 matrices would require a non-commutative algorithm with 21 multiplications.

Articles by others on the same topic (1)