Number of multiplications per token in a GPT model

ID: number-of-multiplications-per-token-in-a-gpt-model

The following is for a "classic" GPT-2-style model, the following estimates the number attention multiplications.
For each layer (L):
So the total sum is:
L * (
  h * (
    2 * d_model * d_head +
    n_ctx * d_head +
    d_model * d_model +
    n_ctx * d_model
  ) +
  2 * d_model * d_ff
)
This is coded at: llm_count_mults.py.

New to topics? Read the docs here!