vLLM
Also known as: vLLM inference server
An open high-throughput inference server for transformer models with features like PagedAttention - often used in private serving stacks.
Also known as: vLLM inference server
An open high-throughput inference server for transformer models with features like PagedAttention - often used in private serving stacks.
Contact if you need a term added for a security or procurement review.