multicache is an absurdly fast multi-threaded multi-tiered in-memory cache library for Go -- it offers higher performance than any other option ever created for the language.
It offers optional persistence with compression, and has been specifically optimized for Cloud Compute environments where the process is periodically restarted, such as Kubernetes or Google Cloud Run.
go get github.com/codeGROOVE-dev/multicache
cache := multicache.New[string, int](multicache.Size(10000))
cache.Set("answer", 42)
val, ok := cache.Get("answer")With persistence:
store, _ := localfs.New[string, User]("myapp", "")
cache, _ := multicache.NewTiered(store)
cache.Set(ctx, "user:123", user) // sync write
cache.SetAsync(ctx, "user:456", user) // async writeGetSet deduplicates concurrent loads to prevent thundering herd situations:
user, err := cache.GetSet("user:123", func() (User, error) {
return db.LoadUser("123")
})multicache.Size(n) // max entries (default 16384)
multicache.TTL(time.Hour) // default expirationMemory cache backed by durable storage. Reads check memory first; writes go to both.
| Backend | Import |
|---|---|
| Local filesystem | pkg/store/localfs |
| Valkey/Redis | pkg/store/valkey |
| Google Cloud Datastore | pkg/store/datastore |
| Auto-detect (Cloud Run) | pkg/store/cloudrun |
For maximum efficiency, all backends support S2 or Zstd compression via pkg/store/compress.
multicache has been exhaustively tested for performance using gocachemark.
Where multicache wins:
- Throughput: 954M int gets/sec at 16 threads (2.2X faster than otter). 140M string sets/sec (9X faster than otter).
- Hit rate: Wins 7 of 9 workloads. Highest average across all datasets (+2.9% vs otter, +0.9% vs sieve).
- Latency: 8ns int gets, 10ns string gets, zero allocations (4X lower latency than otter)
Where others win:
- Memory: freelru and otter use less memory per entry (73 bytes/item overhead vs 15 for otter)
- Specific workloads: clock +0.07% on ibm-docker, theine +0.34% on zipf
Much of the credit for high throughput goes to puzpuzpuz/xsync. While highly sharded maps and flightGroups performed well, you can't beat xsync's lock-free data structures.
Run make benchmark for full results, or see benchmarks/gocachemark_results.md.
multicache uses S3-FIFO, which features three queues: small (new entries), main (promoted entries), and ghost (recently evicted keys). New items enter small; items accessed twice move to main. The ghost queue tracks evicted keys in a bloom filter to fast-track their return.
multicache has been hyper-tuned for high performance, and deviates from the original paper in a handful of ways:
- Dynamic sharding - scales to 16×GOMAXPROCS shards; at 32 threads: 21x Get throughput, 6x Set throughput vs single shard
- Tuned small queue - 24.7% vs paper's 10%, chosen via sweep in 0.1% increments to maximize wins across 9 production traces
- Full ghost frequency restoration - returning keys restore 100% of their previous access count; +0.37% zipf, +0.05% meta, +0.04% tencentPhoto, +0.03% wikipedia
- Extended frequency cap - max freq=7 vs paper's 3; +0.9% meta, +0.8% zipf
- Hot item demotion - items that were once hot (freq≥4) get demoted to small queue instead of evicted; +0.24% zipf
- Death row buffer - 8-entry buffer per shard holds recently evicted items for instant resurrection; +0.04% meta/tencentPhoto, +0.03% wikipedia, +8% set throughput
- Ghost frequency ring buffer - fixed-size 256-entry ring replaces map allocations; -5.1% string latency, -44.5% memory
Apache 2.0