Skip to main content

Command Palette

Search for a command to run...

SCLP: Exponent-First LLM Weight Compression

Series

SCLP: Exponent-First LLM Weight Compression

A weight-compression scheme for LLMs that starts where quantization doesn't look - the exponent. SCLP turns the handful of exponent values a model actually uses into a tiny palette, stores the rare outliers exactly, and runs as a fused decode-GEMV kernel on-GPU. This series builds it from the core idea up to 4-bit mixed precision, imatrix-aware sidecars, and the llama.cpp kernels that make it fast on real hardware.