Tagged articles
9 articles
Page 1 of 1
Machine Heart
Machine Heart
Mar 31, 2026 · Artificial Intelligence

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

ProMoE introduces a two‑step routing MoE framework with explicit semantic guidance that tackles the high spatial redundancy and functional heterogeneity of visual tokens, enabling diffusion transformers to scale efficiently and outperform dense models and prior MoE approaches across generation, convergence, and scaling benchmarks.

Diffusion TransformerExplicit RoutingMixture of Experts
0 likes · 9 min read
ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)
AI Engineering
AI Engineering
Jan 28, 2026 · Artificial Intelligence

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba's Tongyi releases the Z-Image base model, a non‑distilled diffusion transformer that supports full classifier‑free guidance, negative prompts, higher diversity, and fine‑tuning, contrasting with the faster Turbo variant and providing detailed usage instructions and community resources.

AlibabaClassifier-Free GuidanceDiffusion Transformer
0 likes · 4 min read
Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support
21CTO
21CTO
Oct 20, 2025 · Artificial Intelligence

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

World Labs unveiled RTFM, a real‑time frame model that runs on a single H100 GPU, generating persistent, interactive 3D worlds from 2D images without explicit 3D representations, highlighting the growing computational demands of generative world models and their potential to reshape AI-driven spatial intelligence.

3D generationDiffusion TransformerGPU Acceleration
0 likes · 9 min read
Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation
Bilibili Tech
Bilibili Tech
Mar 4, 2025 · Artificial Intelligence

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

The Bilibili TTV team optimized OpenSora and CogVideoX text‑to‑video models by redesigning data storage with Alluxio, parallelizing VAE encoding, applying dynamic sequence‑parallel and DeepSpeed‑Ulysses attention, adapting GPU code for NPU execution, leveraging profiling‑driven kernel fusion, FlashAttention, and expandable memory to dramatically increase training efficiency and frame throughput, while outlining future pipeline‑parallel and ZeRO‑3 scaling plans.

Diffusion TransformerFlashAttentionModel Parallelism
0 likes · 26 min read
Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team
AIWalker
AIWalker
Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

Diffusion TransformerVideo Generationconditional adaptive normalization
0 likes · 16 min read
Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIComputer VisionDiffusion Transformer
0 likes · 5 min read
How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 4, 2024 · Artificial Intelligence

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

EasyAnimate, an open‑source DiT‑based video generation framework from Alibaba Cloud AI Platform PAI, offers a complete pipeline—including data preprocessing, VAE and DiT training, LoRA fine‑tuning, motion‑module integration, and scalable inference up to 768×768 resolution and 144 frames—leveraging Diffusion Transformers to produce longer, higher‑quality videos.

AI videoDiffusion TransformerLoRA
0 likes · 14 min read
EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers
Tencent Cloud Developer
Tencent Cloud Developer
May 15, 2024 · Artificial Intelligence

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

Tencent has open‑sourced its upgraded 1.5‑billion‑parameter HunYuan DiT model—the first Chinese‑native, bilingual (Chinese‑English) text‑to‑image diffusion‑with‑transformer system—delivering about 20% visual quality improvement, multi‑round generation, video‑generation potential, and free commercial use, with full weights, inference code, and algorithms available on Hugging Face and GitHub for developers and enterprises.

Chinese-native AIDiT architectureDiffusion Transformer
0 likes · 6 min read
Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters
NewBeeNLP
NewBeeNLP
Mar 20, 2024 · Artificial Intelligence

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights

This article provides a comprehensive technical walkthrough of Open‑Sora 1.0, covering its Diffusion‑Transformer architecture, three‑stage training strategy, data‑preprocessing scripts, generation quality, and the Colossal‑AI acceleration that together make Sora‑level video synthesis openly reproducible.

AI videoDiffusion TransformerOpen-Sora
0 likes · 12 min read
How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights