Tagged articles

diffusion transformer

10 articles · Page 1 of 1

Jun 3, 2026 · Artificial Intelligence

TF-CoDiT: A New Approach to Synthesizing Treasury Futures Data

TF-CoDiT introduces a diffusion‑Transformer framework that converts multi‑channel treasury futures time series into discrete wavelet coefficients, encodes cross‑channel dependencies with a U‑shaped VAE, conditions generation on a structured FinMAP prompt, and achieves state‑of‑the‑art MSE and MAE scores across multiple contracts and horizons.

FinMAPTF-CoDiTU-VAE

0 likes · 17 min read

TF-CoDiT: A New Approach to Synthesizing Treasury Futures Data

Machine Heart

Mar 31, 2026 · Artificial Intelligence

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

ProMoE introduces a two‑step routing MoE framework with explicit semantic guidance that tackles the high spatial redundancy and functional heterogeneity of visual tokens, enabling diffusion transformers to scale efficiently and outperform dense models and prior MoE approaches across generation, convergence, and scaling benchmarks.

Explicit RoutingMixture of ExpertsPrototypical Routing

0 likes · 9 min read

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

AI Engineering

Jan 28, 2026 · Artificial Intelligence

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba's Tongyi releases the Z-Image base model, a non‑distilled diffusion transformer that supports full classifier‑free guidance, negative prompts, higher diversity, and fine‑tuning, contrasting with the faster Turbo variant and providing detailed usage instructions and community resources.

AlibabaClassifier-Free GuidanceNegative Prompt

0 likes · 4 min read

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

21CTO

Oct 20, 2025 · Artificial Intelligence

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

World Labs unveiled RTFM, a real‑time frame model that runs on a single H100 GPU, generating persistent, interactive 3D worlds from 2D images without explicit 3D representations, highlighting the growing computational demands of generative world models and their potential to reshape AI-driven spatial intelligence.

3D generationGPU AccelerationGenerative AI

0 likes · 9 min read

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

Bilibili Tech

Mar 4, 2025 · Artificial Intelligence

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

The Bilibili TTV team optimized OpenSora and CogVideoX text‑to‑video models by redesigning data storage with Alluxio, parallelizing VAE encoding, applying dynamic sequence‑parallel and DeepSpeed‑Ulysses attention, adapting GPU code for NPU execution, leveraging profiling‑driven kernel fusion, FlashAttention, and expandable memory to dramatically increase training efficiency and frame throughput, while outlining future pipeline‑parallel and ZeRO‑3 scaling plans.

FlashAttentionNPUdata pipeline

0 likes · 26 min read

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

AIWalker

Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

conditional adaptive normalizationdiffusion transformeridentity preservation

0 likes · 16 min read

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Alibaba Cloud Big Data AI Platform

Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIEasyAnimateGenerative AI

0 likes · 5 min read

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

Alibaba Cloud Big Data AI Platform

Jun 4, 2024 · Artificial Intelligence

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

EasyAnimate, an open‑source DiT‑based video generation framework from Alibaba Cloud AI Platform PAI, offers a complete pipeline—including data preprocessing, VAE and DiT training, LoRA fine‑tuning, motion‑module integration, and scalable inference up to 768×768 resolution and 144 frames—leveraging Diffusion Transformers to produce longer, higher‑quality videos.

AI videoLoRAVAE

0 likes · 14 min read

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

Tencent Cloud Developer

May 15, 2024 · Artificial Intelligence

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

Tencent has open‑sourced its upgraded 1.5‑billion‑parameter HunYuan DiT model—the first Chinese‑native, bilingual (Chinese‑English) text‑to‑image diffusion‑with‑transformer system—delivering about 20% visual quality improvement, multi‑round generation, video‑generation potential, and free commercial use, with full weights, inference code, and algorithms available on Hugging Face and GitHub for developers and enterprises.

Chinese-native AIDiT architectureMultimodal Generation

0 likes · 6 min read

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

NewBeeNLP

Mar 20, 2024 · Artificial Intelligence

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights

This article provides a comprehensive technical walkthrough of Open‑Sora 1.0, covering its Diffusion‑Transformer architecture, three‑stage training strategy, data‑preprocessing scripts, generation quality, and the Colossal‑AI acceleration that together make Sora‑level video synthesis openly reproducible.

AI videoOpen-Soradiffusion transformer

0 likes · 12 min read

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights