Alexander Kerchum

Alexander Kerchum

Alexander Kerchum

Alexander Kerchum
4 posts
Lessons From the Bottom of the Stack: Shipping a Quant
Shipping a 4-Bit LLM Quant into llama.cpp
Jun 5, 202612 min read
From 8 Bits to 4: Sidecar, MoE, and the imatrix Trick That Worked
Last time we cut BF16 weights in half by treating the exponent as a 16-entry palette instead of an 8-bit field. SCLP8: 7.9 GB instead of 15.0, perplexity slightly better than the original, token gener
Jun 3, 20269 min read
LLMs Use Just 16 of 256 Exponents — So We Compressed the Rest Away
2× compression on Llama-3-8B — and perplexity went down.
May 29, 20269 min read
How to move a directory from one git repo to another (or new) without losing history
Make copy of repo git clone dirtySourceRepo newSourceRepo OR clone from actual git repo and prevent push git remote set-url --push origin no_push Make sure to checkout the correct branch before the next step. Cloning from another local directory al...
Nov 24, 20211 min read