LLM inference optimization
= <LLM> <inference> optimization
{c}
This section discusses techniques that can be used to make <LLMs> infer with lower latency or greater throughput.
Bibliography:
* https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/