LLM inference batching
ID: llm-inference-batching
This can be used to overcome the fact that most single prompt inference will be heavily memory bound, see also: Section "Theoretical peak performance of GPT inference". Batching helps increase the GPU compute utilization and balance it out with the memory.
New to topics? Read the docs here!