TL;DR
A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU.
Paper: arxiv.org/abs/2411.05007
Weights: huggingface.co/mit-han-lab/svdquant-models
Code: github.com/mit-han-lab/nunchaku
Blog: hanlab.mit.edu/blog/svdquant
Project Page:
Demo: svdquant.mit.edu