vLLM

Also known as: vLLM inference server

An open high-throughput inference server for transformer models with features like PagedAttention - often used in private serving stacks.

See also

slm infrastructure

← Back to full glossary · View on index

Contact if you need a term added for a security or procurement review.