Tagged articles
8 articles
Page 1 of 1
Kuaishou Tech
Kuaishou Tech
Sep 17, 2025 · Artificial Intelligence

How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation

The MIDAS framework introduced by the Kling Team combines autoregressive video generation with a lightweight diffusion denoising head to deliver real‑time, high‑quality digital‑human synthesis under multimodal control, achieving sub‑500 ms latency, 64× compression, and robust performance across multilingual dialogue, singing, and interactive world modeling tasks.

AIDigital HumanReal-time Video
0 likes · 6 min read
How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation
AIWalker
AIWalker
Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisInference OptimizationLumina-mGPT
0 likes · 22 min read
Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power
AI Frontier Lectures
AI Frontier Lectures
Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAautoregressive
0 likes · 16 min read
How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing
AI Algorithm Path
AI Algorithm Path
Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchText GenerationTransformer
0 likes · 17 min read
Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions
AIWalker
AIWalker
May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

BenchmarkSupervised Fine‑Tuningautoregressive
0 likes · 14 min read
SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL
DevOps
DevOps
Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oVQ-VAE
0 likes · 16 min read
The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap
JD Tech Talk
JD Tech Talk
Mar 5, 2025 · Artificial Intelligence

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GLM introduces a unified pretraining framework that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle, achieving superior performance over BERT, T5 and GPT on a range of NLU and generation tasks such as SuperGLUE, text‑filling, and language modeling.

2D positional encodingGLMLanguage Model
0 likes · 27 min read
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 5, 2023 · Artificial Intelligence

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

AI limitationsGPT-4Model architecture
0 likes · 24 min read
Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals