Throughput
Also known as: requests per second, tokens per second
How many queries or tokens a system processes per unit time; quantization and smaller models typically improve throughput per GPU.
Also known as: requests per second, tokens per second
How many queries or tokens a system processes per unit time; quantization and smaller models typically improve throughput per GPU.
Contact if you need a term added for a security or procurement review.