NeurIPS 2024 Best Paper Introduces Visual Autoregressive Modeling (VAR) for Image Generation
A recent NeurIPS 2024 best‑paper award highlights a novel Visual Autoregressive Modeling (VAR) approach that uses multi‑scale token prediction to improve image generation, while the surrounding article also mentions a free book giveaway and a legal dispute involving the paper's author.
The article begins with a promotional notice: users can reply with the keyword "5000" to receive a free copy of the book "Programmer Book Resources" from the backend menu.
It then reports that intern Tian, who worked in ByteDance's commercialization technology department, co‑authored a paper that received the NeurIPS 2024 Best Paper award and achieved one of the highest reviewer scores (7, 8, 8, 8).
The paper, titled "Visual Autoregressive Modeling (VAR)", proposes a new paradigm for image generation that departs from traditional raster‑scan token prediction by predicting the next scale or resolution in a coarse‑to‑fine manner.
VAR consists of two training stages: (1) a multi‑scale VQ‑VAE encodes images into K token maps, and (2) a VAR Transformer predicts higher‑resolution token maps from lower‑resolution ones using masked attention and cross‑entropy loss.
The approach enables autoregressive Transformers to learn visual distributions more efficiently and achieve better generalization, allowing AR models to surpass diffusion Transformers in image generation.
In parallel, the article notes that ByteDance has filed a lawsuit against Tian for code tampering, seeking 8 million CNY in damages and a public apology.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.