Long-form notes for technical and executive readers - strategy, infrastructure, and how we think about enterprise SLMs. RSS feed
Compression is not a single knob. Here is how distillation, quantization, and pruning interact when you need smaller models without wrecking production metrics.
Read article →Fine-tuning moves the loss curve, but production SLMs need latency, cost, and governance properties that training alone rarely delivers.
Read article →