GPT-3
= GPT-3
{c}
{title2=175 B parameters}
{title2=2020-06}
* Vocabulary size (V): 50,257
* Hidden size (d_model): 12,288
* Context length 2048
* Q V size: (d_head): 128
* Attention heads (h): 96
* FFN inner size (d_ff) 4 × 12,288 = 49,152
* Layers (L): 96