Throughput

Also known as: requests per second, tokens per second

How many queries or tokens a system processes per unit time; quantization and smaller models typically improve throughput per GPU.

See also

slm infrastructure

← Back to full glossary · View on index

Contact if you need a term added for a security or procurement review.