MAGI-1: Autoregressive Video Generation at Scale

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://private-user-images.githubusercontent.com/28325733/435652797-5cfa90e0-f6ed-476b-a194-71f1d309903a.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDUzMzkyMDgsIm5iZiI6MTc0NTMzODkwOCwicGF0aCI6Ii8yODMyNTczMy80MzU2NTI3OTctNWNmYTkwZTAtZjZlZC00NzZiLWExOTQtNzFmMWQzMDk5MDNhLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDIyVDE2MjE0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNmODIxNWY5NTlhMjY3YTcwYmViNGQ5OTNkZWYxNjYxNTM1NjA4MWNmMTMzYzUyMWU1MWFjZmMyZTE4Y2MyMjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-0wtqNrzwzoA4KEUSrhnWI7GNhfnqPV_AxCXkDHfGRk

Abstract

We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe MAGI-1 offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.

technical report: static.magi.world/static/files/MAGI_1.pdf

Code: github.com/SandAI-org/MAGI-1?tab=readme-ov-file

Model: huggingface.co/sand-ai/MAGI-1

Homepage: sand.ai

source

Comments

Sort:hotnew top