GPT-2 large 2025-08-08
GPT-2 medium 2025-08-08
Llama 3.1 2025-08-08
Llama 2 7B 2025-08-08
GPT-2 variant 2025-08-08
GPT-2 implementation in PyTorch 2025-08-08
GPT-2 implementation 2025-08-08
Llama 2 2025-08-08
llama-cli inference batching 2025-08-08
GPT-4 2025-08-08
GPT-3 2025-08-08
GPT-2 2025-08-08
GPT-1 2025-08-08
The following is for a "classic" GPT-2-style model, the following estimates the number attention multiplications.
For each layer (L):So the total sum is:
- for each attention head (h):
- K = d_model * d_head (takes embedding of one token and converts to vector of length d_head)
- Q = d_model * d_head (same)
- K Q dot product for attention pattern: n_ctx * d_head (n_ctx times dot products of vectors of size d_head, once new K vs every Q. Q vs every K zeroed out by causality.)
- new value vector for new token: d_model * d_model
- new updates: n_ctx * d_model (multiply each value vector by the new attention column scalar)
- fully connected: d_model * d_ff + d_ff * d_model (converts the embedding to the hidden layer size and then back)
L * (
h * (
2 * d_model * d_head +
n_ctx * d_head +
d_model * d_model +
n_ctx * d_model
) +
2 * d_model * d_ff
)This is coded at: llm_count_mults.py.
rpi-pico-w/upython/thermistor_fan_control.py 2025-08-08
This example attempts to keep temperature to a fixed point by turning on a fan when a thermistor gets too hot.
You can test it easily if you are not in a place that is too hot by holding the thermistor with your finger to turn on the fan.
In Ciro's ASCII art circuit diagram notation:
+----------FAN-----------+
| |
| |
RPI_PICO_W__gnd__gpio26Adc__3.3V@36__gpio2
| | |
| | |
| | |
| +-THERMISTOR
| |
| |
R_10-+ There are unlisted articles, also show them or only show them.