Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba's Tongyi releases the Z-Image base model, a non‑distilled diffusion transformer that supports full classifier‑free guidance, negative prompts, higher diversity, and fine‑tuning, contrasting with the faster Turbo variant and providing detailed usage instructions and community resources.

AI Engineering
AI Engineering
AI Engineering
Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba Tongyi previously introduced Z-Image‑Turbo, a 6B‑parameter model that generates images in eight inference steps on a 16 GB GPU. The new Z-Image base model is a non‑distilled version that retains the complete training signal.

The model uses a single‑stream diffusion Transformer architecture that concatenates text, visual semantic tokens, and image VAE tokens at the sequence level, maximizing parameter efficiency compared with dual‑stream approaches. Its core innovations are the Decoupled‑DMD distillation algorithm and DMDR, which combines DMD with reinforcement learning, targeting creators, researchers, and developers who need the highest creative freedom.

Core features : full classifier‑free guidance (CFG) support, negative‑prompt capability, fine‑tuning ability, high diversity, and 28‑50 inference steps. It handles resolutions from 512×512 up to 2048×2048 (any aspect ratio) and recommends a guidance scale of 3‑5. In contrast, Z‑Image‑Turbo lacks CFG, negative prompts, and fine‑tuning, runs only eight steps, and yields lower diversity.

Usage guide :

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

image = pipe(
    prompt="your prompt",
    negative_prompt="negative prompt",
    height=1280,
    width=720,
    num_inference_steps=50,
    guidance_scale=4
).images[0]

The non‑distilled nature makes the model ideal for LoRA training, ControlNet, and semantic control, offering stronger tools for precise content control and exclusion of specific elements.

The model is publicly available on Hugging Face (https://huggingface.co/Tongyi-MAI/Z-Image). The community quickly released a GGUF version and a bf16 version via the ComfyUI‑ModelQuantizer tool, each quantizable in about ten minutes. Technical details are documented in the arXiv preprint “Z-Image: An Efficient Image Generation Foundation Model with Single‑Stream Diffusion Transformer” (arXiv:2511.22699).

Future expectations include an upcoming Z‑Image‑Edit variant to complete the model family.

AlibabaImage GenerationDiffusion TransformerClassifier-Free GuidanceZ-ImageNegative Prompt
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.