GitHub - vipshop/cache-dit: 🤗 A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Baseline	SCM Slow	SCM Fast	SCM Ultra	+compile	+FP8*	+CP2
24.85s	15.4s	11.4s	8.2s	🎉7.1s	🎉4.5s	🎉2.9s

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the Diffusers library and now supports nearly 🔥ALL DiTs from Diffusers, including over 🤗70+ DiTs. Please refer to our online documentation at readthedocs.io for more details. The optimizations made by Cache-DiT include:

🎉Hybrid Cache Acceleration (DBCache, DBPrune, TaylorSeer, SCM and more)
🎉Context Parallelism (w/ Ulysses, Ring, USP, Ulysses Anything, FP8 Comm)
🎉Tensor Parallelism (w/ PyTorch native DTensor and Tensor Parallelism APIs)
🎉Hybrid 2D and 3D Parallelism (Scale up the performance of 💥Large DiTs)
🎉Text Encoder Parallelism (TE-P w/ PyTorch native Tensor Parallelism APIs)
🎉Auto Encoder Parallelism (VAE-P w/ Tile Parallelism, faster, avoid OOM)
🎉ControlNet Parallelism (CN-P w/ Context Parallelism for ControlNet)
🎉Built-in HTTP serving deployment support with simple REST APIs
🎉Natively compatible with Compile, Offloading, Quantization, ...
🎉Integration into vLLM-Omni, SGLang Diffusion, SD.Next, ...
🎉Natively supports NVIDIA GPUs, Ascend NPUs (>= 1.2.0), ...

🔥Latest News

[2026/02] 🎉v1.2.1 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.
[2026/01] 🎉v1.2.0 stable release is ready: New Models Support(Z-Image, FLUX.2, LTX-2, etc), Request level Cache Context, HTTP Serving, Ulysses Anything, TE-P, VAE-P, CN-P and Ascend NPUs support.

🚀Quick Start

You can install the cache-dit from PyPI or from source:

pip3 install -U cache-dit # or, pip3 install git+https://github.com/vipshop/cache-dit.git

Then accelerate your DiTs with just ♥️one line♥️ of code ~

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> # The pipe can be any diffusion pipeline.
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
>>> # Cache Acceleration with One-line code.
>>> cache_dit.enable_cache(pipe)
>>> # Or, Hybrid Cache Acceleration + 1D Parallelism.
>>> from cache_dit import DBCacheConfig, ParallelismConfig
>>> cache_dit.enable_cache(
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2))
>>> # Or, Use Distributed Inference without Cache Acceleration.
>>> cache_dit.enable_cache(
...   pipe, parallelism_config=ParallelismConfig(ulysses_size=2))
>>> # Or, Hybrid Cache Acceleration + 2D Parallelism.
>>> cache_dit.enable_cache(
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2, tp_size=2))
>>> from cache_dit import load_configs
>>> # Or, Load Acceleration config from a custom yaml file.
>>> cache_dit.enable_cache(pipe, **load_configs("config.yaml"))
>>> # Optional, set attention backend for better performance.
>>> cache_dit.set_attn_backend(pipe, attention_backend=...)
>>> output = pipe(...) # Just call the pipe as normal.

Please refer to our online documentation at readthedocs.io for more details.

🚀Quick Links

📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
🎉User Guide - For more advanced features, please refer to the 🎉User Guide for details.
❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.

🌐Community Integration

Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and deployment of this project. We learned the design and reused code from the following projects: Diffusers, SGLang, vLLM-Omni, ParaAttention, xDiT, TaylorSeer and LeMiCa.

©️Citations

@misc{cache-dit@2025,
  title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 899 Commits
.github/workflows		.github/workflows
assets		assets
bench		bench
docs		docs
examples		examples
src/cache_dit		src/cache_dit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
collect_env.py		collect_env.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

🔥Latest News

🚀Quick Start

🚀Quick Links

🌐Community Integration

©️Acknowledgements

©️Citations

About

Uh oh!

Releases 81

Packages

Contributors 14

Uh oh!

Languages

License

vipshop/cache-dit

Folders and files

Latest commit

History

Repository files navigation

A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs

🔥Latest News

🚀Quick Start

🚀Quick Links

🌐Community Integration

©️Acknowledgements

©️Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 81

Packages 0

Contributors 14

Uh oh!

Languages

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Packages