GPT-2
= GPT-2
{c}
{title2=124 M parameters}
{title2=2019-11-05}
* Vocabulary size (V): 50,257
* Hidden size (d_model): 768
* Context length (n_ctx): 1024
* Q V size: (d_head): 64
* Attention heads (h): 12
* FFN inner size (d_ff): 3072
* Layers (L): 12