HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space — supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 × 2,048.

Technical Report: github.com/HiDream-ai/…/HiDream-O1-Image.pdf

Code: github.com/HiDream-ai/HiDream-O1-Image

Blog:

HiDream-O1-Image-Dev 28 steps: huggingface.co/HiDream-ai/HiDream-O1-Image-Dev

HiDream-O1-Image 50 steps: huggingface.co/HiDream-ai/HiDream-O1-Image

Image

Image

Image

Image