Quantization

Also known as: INT8, INT4, weight quantization, post-training quantization

Reducing numeric precision of weights (and sometimes activations) to shrink memory footprint and speed up inference - common schemes include 8-bit and 4-bit formats with accuracy trade-offs.