Skip to content

codeGROOVE-dev/multicache

Repository files navigation

multicache

multicache is an absurdly fast multi-threaded multi-tiered in-memory cache library for Go -- it offers higher performance than any other option ever created for the language.

It offers optional persistence with compression, and has been specifically optimized for Cloud Compute environments where the process is periodically restarted, such as Kubernetes or Google Cloud Run.

Install

go get github.com/codeGROOVE-dev/multicache

Use

cache := multicache.New[string, int](multicache.Size(10000))
cache.Set("answer", 42)
val, ok := cache.Get("answer")

With persistence:

store, _ := localfs.New[string, User]("myapp", "")
cache, _ := multicache.NewTiered(store)

cache.Set(ctx, "user:123", user)           // sync write
cache.SetAsync(ctx, "user:456", user)      // async write

GetSet deduplicates concurrent loads to prevent thundering herd situations:

user, err := cache.GetSet("user:123", func() (User, error) {
    return db.LoadUser("123")
})

Options

multicache.Size(n)           // max entries (default 16384)
multicache.TTL(time.Hour)    // default expiration

Persistence

Memory cache backed by durable storage. Reads check memory first; writes go to both.

Backend Import
Local filesystem pkg/store/localfs
Valkey/Redis pkg/store/valkey
Google Cloud Datastore pkg/store/datastore
Auto-detect (Cloud Run) pkg/store/cloudrun

For maximum efficiency, all backends support S2 or Zstd compression via pkg/store/compress.

Performance

multicache has been exhaustively tested for performance using gocachemark.

Where multicache wins:

  • Throughput: 954M int gets/sec at 16 threads (2.2X faster than otter). 140M string sets/sec (9X faster than otter).
  • Hit rate: Wins 7 of 9 workloads. Highest average across all datasets (+2.9% vs otter, +0.9% vs sieve).
  • Latency: 8ns int gets, 10ns string gets, zero allocations (4X lower latency than otter)

Where others win:

  • Memory: freelru and otter use less memory per entry (73 bytes/item overhead vs 15 for otter)
  • Specific workloads: clock +0.07% on ibm-docker, theine +0.34% on zipf

Much of the credit for high throughput goes to puzpuzpuz/xsync. While highly sharded maps and flightGroups performed well, you can't beat xsync's lock-free data structures.

Run make benchmark for full results, or see benchmarks/gocachemark_results.md.

Algorithm

multicache uses S3-FIFO, which features three queues: small (new entries), main (promoted entries), and ghost (recently evicted keys). New items enter small; items accessed twice move to main. The ghost queue tracks evicted keys in a bloom filter to fast-track their return.

multicache has been hyper-tuned for high performance, and deviates from the original paper in a handful of ways:

  • Dynamic sharding - scales to 16×GOMAXPROCS shards; at 32 threads: 21x Get throughput, 6x Set throughput vs single shard
  • Tuned small queue - 24.7% vs paper's 10%, chosen via sweep in 0.1% increments to maximize wins across 9 production traces
  • Full ghost frequency restoration - returning keys restore 100% of their previous access count; +0.37% zipf, +0.05% meta, +0.04% tencentPhoto, +0.03% wikipedia
  • Extended frequency cap - max freq=7 vs paper's 3; +0.9% meta, +0.8% zipf
  • Hot item demotion - items that were once hot (freq≥4) get demoted to small queue instead of evicted; +0.24% zipf
  • Death row buffer - 8-entry buffer per shard holds recently evicted items for instant resurrection; +0.04% meta/tencentPhoto, +0.03% wikipedia, +8% set throughput
  • Ghost frequency ring buffer - fixed-size 256-entry ring replaces map allocations; -5.1% string latency, -44.5% memory

License

Apache 2.0