Artificial Intelligence 13 min read

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

Luma released the Uni‑1.1 image‑generation API, which ranks third on the Arena blind‑test leaderboard, offers sub‑half‑price per image, and demonstrates production‑grade capabilities such as multi‑reference fusion, multi‑turn editing, and a decoder‑only transformer that jointly models text and image tokens.

Machine Heart

May 6, 2026

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

Uni‑1.1 API and Market Position

Luma announced the Uni‑1.1 model upgrade and opened its API in February. On the third‑party blind‑test platform Arena, Uni‑1.1 and Uni‑1.1‑Max placed in the top three of the image‑generation leaderboard, behind only OpenAI and Google and ahead of Microsoft AI, xAI, Alibaba, Tencent, and ByteDance.

Pricing and Accessibility

The API charges a minimum of $0.0404 per image , with both latency and cost less than half of comparable models. Two billing tiers are offered: a pay‑as‑you‑go Build plan and a reserved‑throughput Scale plan that starts at eight units and includes SDKs for Python, JavaScript/TypeScript, Go, and a CLI.

Production‑Grade Demonstrations

1. Full‑page 2036 news website – The prompt asks for a futuristic news site designed for AI agents. The generated image contains a header, navigation, breaking‑news ticker, headline image, multi‑column article, timestamps, sponsor labels, AI‑targeted ads, and a footer, all with readable English text.

2. Sagittarius A* blueprint – Rendered in a technical blueprint style, the image shows the supermassive black hole’s cross‑section with labeled structures (Schwarzschild radius, event horizon, photon sphere, etc.), scale bars, and a formal drawing footer.

3. Rocket illustration (1957‑2025) – Over twenty rockets are placed side‑by‑side at a consistent scale, each annotated with model, country, height, and launch year; operational rockets are highlighted with a red outline.

4. Chinese poster "水・韵" – The poster combines Chinese typography, a main title, subtitle, brand information, and a 3 × 4 grid of thumbnail faces that maintain the same identity while varying clothing and props, demonstrating cross‑language layout consistency.

5. Multi‑reference fusion & multi‑turn editing – The model accepts up to nine reference images in a single call, preserving each visual identity in the output. Users can iteratively edit images sentence‑by‑sentence (e.g., “remove the bear”, “add a black curtain”, “convert to black‑and‑white”) while the model retains subject identity and spatial relationships across turns.

Technical Architecture

Traditional multimodal systems separate vision encoders (e.g., CLIP, Florence, Grounding‑DINO) from diffusion or autoregressive generators. Uni‑1.1 instead uses a decoder‑only autoregressive Transformer that interleaves text tokens and image tokens in a single sequence, enabling simultaneous modeling of both modalities.

The architecture provides two API endpoints:

Reasoning endpoint – Handles instruction parsing, composition planning, and constraint locking for brand, character, or product identity.

Generation endpoint – Performs pixel‑level rendering based on the reasoning output.

Because constraints are encoded directly in the token stream, the model avoids external alignment modules, allowing character‑level control, multi‑reference constraints, and stable multi‑turn editing without additional post‑processing.

Real‑World Impact

A case study shows a brand campaign originally budgeted at $15 million over a year was completed in ~40 hours for under $20 000 using Uni‑1.1, delivering localized versions for multiple markets and passing internal quality review.

Clients such as Adidas, Mazda, Publicis Groupe, and creator platforms (Envato, Comfy, Runware, etc.) have already integrated the API, leveraging the nine‑reference capability for cross‑market asset production.

Research Team

The core team consists of fewer than 15 members, led by Jiaming Song (Tsinghua undergrad, Stanford PhD, author of DDIM) and William Shen (Stanford undergrad and PhD, CVPR 2018 Best Paper, RSS 2022 Best Student Paper). Their strategy diverges from large‑scale models by unifying understanding and generation in a single model.

Future Roadmap

Uni‑1.1 is the first deployed instance of Luma’s “unified intelligence” roadmap. Upcoming extensions aim to incorporate video, speech, and interactive world simulation, ultimately running perception, reasoning, and imagination in a continuous flow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal AI benchmark Image Generation decoder‑only transformer API pricing Luma

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.