GPT-3 (source code)

= GPT-3
{c}
{title2=175 B parameters}
{title2=2020-06}

* Vocabulary size (V): 50,257
* Hidden size (d_model): 12,288
* Context length	2048
* Q V size: (d_head): 128
* Attention heads	(h): 96
* FFN inner size (d_ff)	4 × 12,288 = 49,152
* Layers (L): 96