Ele.me Vertical Business AIGC Image Model: Architecture, Training Pipeline, and Evaluation
Ele.me created a domain-specific AIGC image model built from scratch on its own data using the DiT backbone, a three-stage training pipeline (transformer pre-training, prompt alignment, aesthetic fine-tuning), custom T5‑E‑CLIP text and visual encoders, ControlNet for layout control, and evaluated via FID, CLIP scores and a human rubric, enabling automated dish-image generation and UI asset creation for its vertical business.