Ciro Santilli @cirosantilli 37

 Articles (11k) Discussions (26) Comments (64) Follows  Received likes Files

New Updated  Top  Announced  A-Z  Liked  Followed

GPT-2 large 2025-08-08

 Read the full article

GPT-2 medium 2025-08-08

 Read the full article

nanoGPT 2025-08-08

github.com/karpathy/nanoGPT

 Read the full article

Llama 3.1 2025-08-08

 Read the full article

Llama 2 7B 2025-08-08

 Read the full article

GPT 4 Turbo 2025-08-08

platform.openai.com/docs/models/gpt-4-turbo

 Read the full article

GPT-2 variant 2025-08-08

 Read the full article

GPT-2 implementation in PyTorch 2025-08-08

 Read the full article

GPT-2 implementation 2025-08-08

 Read the full article

Language Models are Unsupervised Multitask Learners 2025-08-08

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

 Read the full article

Improving Language Understanding by Generative Pre-Training 2025-08-08

 Read the full article

Llama 3 2025-08-08

www.llama.com/models/llama-3/

 Read the full article

Llama 2 2025-08-08

Page: www.llama.com/llama2/

 Read the full article

llama-cli inference batching 2025-08-08

As of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 there is a --parallel option but not sure what it does.

Bibliography:

 Read the full article

GPT-4 2025-08-08

 Read the full article

GPT-3 2025-08-08

Vocabulary size (V): 50,257
Hidden size (d_model): 12,288
Context length 2048
Q V size: (d_head): 128
Attention heads (h): 96
FFN inner size (d_ff) 4 × 12,288 = 49,152
Layers (L): 96

 Read the full article

GPT-2 2025-08-08

Vocabulary size (V): 50,257
Hidden size (d_model): 768
Context length (n_ctx): 1024
Q V size: (d_head): 64
Attention heads (h): 12
FFN inner size (d_ff): 3072
Layers (L): 12

 Read the full article

GPT-1 2025-08-08

 Read the full article

Number of multiplications per token in a GPT model 2025-08-08

The following is for a "classic" GPT-2-style model, the following estimates the number attention multiplications.

For each layer (L):

for each attention head (h):
- K = d_model * d_head (takes embedding of one token and converts to vector of length d_head)
- Q = d_model * d_head (same)
- K Q dot product for attention pattern: n_ctx * d_head (n_ctx times dot products of vectors of size d_head, once new K vs every Q. Q vs every K zeroed out by causality.)
- new value vector for new token: d_model * d_model
- new updates: n_ctx * d_model (multiply each value vector by the new attention column scalar)
fully connected: d_model * d_ff + d_ff * d_model (converts the embedding to the hidden layer size and then back)

So the total sum is:

L * (
  h * (
    2 * d_model * d_head +
    n_ctx * d_head +
    d_model * d_model +
    n_ctx * d_model
  ) +
  2 * d_model * d_ff
)

This is coded at: llm_count_mults.py.

Bibliography:

 Read the full article

rpi-pico-w/upython/thermistor_fan_control.py 2025-08-08

This example attempts to keep temperature to a fixed point by turning on a fan when a thermistor gets too hot.

You can test it easily if you are not in a place that is too hot by holding the thermistor with your finger to turn on the fan.

You can use a simple LED to represent the fan if you don't have one handy.

In Ciro's ASCII art circuit diagram notation:

            +----------FAN-----------+
            |                        |
            |                        |
RPI_PICO_W__gnd__gpio26Adc__3.3V@36__gpio2
            |    |          |
            |    |          |
            |    |          |
            |    +-THERMISTOR
            |    |
            |    |
            R_10-+

 Read the full article

 There are unlisted articles, also show them or only show them.