Entropy coded quantization #3045

drxddy · 2026-01-22T19:08:24Z

Summary

Add rANS entropy coding on top of quantized weights for 1.3-2x additional lossless compression.

Closes #3043

Motivation

Quantized LLM weights have entropy significantly below their bit-width:

4-bit weights: ~2.17 bits entropy → 1.84x compression potential
8-bit weights: ~4-5 bits entropy → 1.6-2x compression potential

This PR closes that gap with lossless entropy coding, reducing memory bandwidth during inference.

Implementation

New Primitives

EntropyCodedMatmul - Fused per-row decode+GEMV
EntropyDecodeAsync - Async GPU decode for prefetching

Metal Kernel

entropy_coded.h - Per-row fused decode+dequant+GEMV kernel

Python API

from mlx.nn.layers import EntropyCodedLinear

# Convert quantized layer
ec_layer = EntropyCodedLinear.from_linear(linear, decode_mode="fused")

Decode Modes

Mode	Memory	Speed	Use Case
`fused`	1.3-2x smaller	1.1-1.5x overhead	Memory-constrained
`cached`	Same as quantized	1.0x	Speed-critical
`gpu_async`	1.3-2x smaller	~1.0x	Deep models

Testing

Build passes (cmake + make)
All 235 existing tests pass
Benchmarked on M3 Pro with synthetic and real model weights

angeloskath · 2026-01-23T02:04:37Z

This looks great but I think it is fairly niche to be merged into mlx core.

I believe it would be great as a standalone project and a very good showcase of custom C++ extensions for MLX as well (https://ml-explore.github.io/mlx/build/html/dev/extensions.html).

drxddy added 4 commits January 23, 2026 00:13

feat: rANS entropy coded quantization

9ff95f7

v2 api cleanup

a21a496

fea: cleanup examples

44ebd78

utils cleanup

542d6a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entropy coded quantization #3045

Entropy coded quantization #3045

drxddy commented Jan 22, 2026 •

edited

Loading

Uh oh!

angeloskath commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Entropy coded quantization #3045

Are you sure you want to change the base?

Entropy coded quantization #3045

Conversation

drxddy commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Implementation

New Primitives

Metal Kernel

Python API

Decode Modes

Testing

Uh oh!

angeloskath commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drxddy commented Jan 22, 2026 •

edited

Loading