Tagged articles

text-to-image

65 articles · Page 1 of 1

Jun 22, 2026 · Artificial Intelligence

Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance

MiniT2I, a minimalist pixel-space text-to-image model that discards VAE, AdaLN, and private data, achieves 0.87 GenEval and 84.2 DPG-Bench scores with only 258 M parameters, demonstrating that a stripped-down architecture and public data can outperform larger, more complex systems.

AI researchMiniT2ITransformer

0 likes · 8 min read

Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance

Machine Learning Algorithms & Natural Language Processing

Jun 21, 2026 · Artificial Intelligence

Rank‑Only Rewards Accelerate One‑Step Text‑to‑Image Preference Optimization 3.5×

DrPO introduces a drifting‑field based, rank‑only reward mechanism for one‑step text‑to‑image models, enabling reinforcement‑learning‑after‑training without back‑propagating reward gradients; it speeds up training 3.51× versus DRaFT, works with non‑differentiable rewards, and improves generation quality on SD‑Turbo and SDXL‑Turbo.

DrPODrifting ModelHPSv3

0 likes · 11 min read

Rank‑Only Rewards Accelerate One‑Step Text‑to‑Image Preference Optimization 3.5×

Machine Heart

May 12, 2026 · Artificial Intelligence

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

DreamLite, a 0.39 B‑parameter diffusion model from ByteDance, unifies text‑to‑image generation and text‑guided editing in a single on‑device network, delivering 1024×1024 results in about three seconds on an iPhone 17 Pro while surpassing existing mobile and even many server‑side baselines.

DreamLiteRLHFdiffusion model

0 likes · 9 min read

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

SuanNi

May 7, 2026 · Artificial Intelligence

DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing

DreamLite is a compact 0.39 B unified diffusion model open‑sourced by ByteDance that runs on smartphones, delivering text‑to‑image generation and text‑guided editing in about three seconds for 1024×1024 pictures, with performance comparable to Flux, Z‑Image and LongCat‑Image and offering two variants to balance fidelity and latency.

AI modelByteDanceDreamLite

0 likes · 4 min read

DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing

Machine Heart

May 6, 2026 · Artificial Intelligence

PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels

PromptEcho computes a continuous reward for text‑to‑image generation by measuring how well a frozen vision‑language model can reconstruct the original prompt from the generated image, eliminating the need for annotated data or a trained reward model and outperforming prior methods across multiple benchmarks.

PromptEchoReward Modelingbenchmark

0 likes · 10 min read

PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels

Machine Learning Algorithms & Natural Language Processing

Apr 27, 2026 · Artificial Intelligence

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

The paper introduces CFG‑Ctrl, a control‑theoretic redesign of classifier‑free diffusion guidance that treats the generation process as a dynamic system, achieving more stable and accurate text‑to‑image results across multiple model scales and evaluation metrics.

CFG-CtrlClassifier-Free GuidanceControl Theory

0 likes · 15 min read

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

Old Meng AI Explorer

Apr 25, 2026 · Artificial Intelligence

Stop Using Vague Prompts – Master GPT Image 2 with Top‑Tier Prompt Templates to End ‘Waste’ Images

The guide explains why GPT Image 2 dramatically reduces low‑quality outputs, outlines five essential prompt elements, provides eight ready‑to‑use scene templates, shares advanced tricks, common pitfalls, and concrete examples to help users generate professional AI images reliably.

AI image generationCJK renderingGPT Image 2

0 likes · 16 min read

Stop Using Vague Prompts – Master GPT Image 2 with Top‑Tier Prompt Templates to End ‘Waste’ Images

Geek Labs

Apr 24, 2026 · Artificial Intelligence

One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

The article introduces three open‑source AI tools—a video editor that turns raw footage and a brief into a finished ad, Baidu's 8‑billion‑parameter text‑to‑image model that runs on 24 GB GPUs, and a weekly AI‑developer digest that auto‑generates Chinese reports—detailing their workflows, benchmarks, usage commands, and target users.

AI content creationAI video editingAgentic workflow

0 likes · 9 min read

One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

Machine Heart

Apr 9, 2026 · Artificial Intelligence

From Direct Generation to Agentic Text-to-Image: Introducing the Open-Source Gen-Searcher

Gen-Searcher equips text-to-image models with searchable, reasoning, and web‑browsing capabilities, turning the traditional direct‑generation pipeline into an agentic system that fetches and verifies real‑world knowledge, dramatically improving accuracy and quality across multiple benchmarks.

Agentic AIGen-SearcherKnowGen

0 likes · 7 min read

From Direct Generation to Agentic Text-to-Image: Introducing the Open-Source Gen-Searcher

PaperAgent

Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchDiffusion Models

0 likes · 7 min read

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

Design Hub

Feb 10, 2026 · Artificial Intelligence

AI‑Assisted Design Breakthrough: Qwen‑Image‑2.0 Becomes Your PPT, Poster, and Comic Creator

Qwen‑Image‑2.0, the latest text‑to‑image model from Tongyi Qianwen, delivers pixel‑perfect 2K text rendering, supports 1K‑token prompts, and combines generation and editing in one model, achieving a score of 1029 and third place in the global AI Arena benchmark, positioning it as an AI‑powered designer for PPTs, posters, infographics, and comics.

AI Arena benchmarkAI image generationDesign Automation

0 likes · 10 min read

AI‑Assisted Design Breakthrough: Qwen‑Image‑2.0 Becomes Your PPT, Poster, and Comic Creator

Woodpecker Software Testing

Jan 21, 2026 · Artificial Intelligence

Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image

This guide walks through designing and implementing an AI assistant that connects FastAPI to Alibaba Cloud large‑model services, supports streaming text Q&A, image understanding, text‑to‑image generation, network search, and MCP‑based map queries, with full front‑end and back‑end code examples.

AI ChatbotAlibaba CloudFastAPI

0 likes · 38 min read

Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image

Design Hub

Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationDiffusion ModelsFLUX.2

0 likes · 8 min read

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

AI Info Trend

Jan 14, 2026 · Industry Insights

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

The 2026 AI model leaderboards across Text, Web Development, Vision, and Text-to-Image arenas reveal Google’s Gemini series leading in text and vision, Anthropic’s Claude Opus unexpectedly topping web‑dev rankings, and OpenAI’s GPT‑Image‑1.5 clinching the top spot in creative image generation, highlighting an increasingly competitive AI landscape.

AIAnthropicGoogle

0 likes · 8 min read

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

Baobao Algorithm Notes

Jan 14, 2026 · Artificial Intelligence

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

GLM-Image, a Chinese text‑to‑image model trained end‑to‑end on Huawei Ascend 800T A2 NPUs, combines an autoregressive decoder with a diffusion encoder, supports resolutions up to 2048×2048, and offers open‑source code, API access, and detailed prompts that demonstrate its strong layout and typography capabilities.

GLM-ImageHuawei Ascenddiffusion

0 likes · 12 min read

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

Tencent Cloud Developer

Jan 14, 2026 · Artificial Intelligence

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

This guide explains how to use advanced AI models such as Gemini, Midjourney, and Stable Diffusion to expand brief, informal user descriptions into comprehensive, high‑quality English prompts that include visual style, subject details, environment, lighting, and camera parameters for image or video generation.

AI prompt engineeringMidjourneyPrompt Design

0 likes · 14 min read

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

PaperAgent

Dec 21, 2025 · Artificial Intelligence

Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test

This article evaluates the Nano Banana Pro text‑to‑image model, built on Gemini 3 Pro, across fourteen low‑level vision tasks and forty datasets using only prompts without fine‑tuning, revealing strong perceptual quality but weak pixel‑level metrics, and highlighting both its generative strengths and failure modes such as hallucinations and color shifts.

AI model analysisZero-shot Evaluationimage restoration

0 likes · 7 min read

Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test

AI Algorithm Path

Dec 17, 2025 · Artificial Intelligence

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

Black Forest Labs released Flux.2 Max, the top‑performing model in the Flux.2 series featuring real‑time context generation, superior texture handling, and strong instruction following, ranking second on the Artificial Analysis leaderboard, with detailed examples, API usage, and pricing information provided.

AI modelAPIFlux.2 Max

0 likes · 11 min read

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

58UXD

Dec 5, 2025 · Artificial Intelligence

How AI Powered a 40‑Second Brand Video for China’s “Good Employer” Awards

This case study details how the 58.com UX team used AI tools to create a 40‑second pre‑video for the China Good Employer awards, outlining the visual framework, prompt engineering, segmented generation workflow, and the balance between automation and designer insight.

AI video generationCase Studybrand video

0 likes · 7 min read

Data Party THU

Sep 28, 2025 · Artificial Intelligence

How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation

This article reviews the YOLO-Count model, a fully differentiable, open‑vocabulary object counting system that guides text‑to‑image generators to produce the exact number of objects specified in prompts, achieving state‑of‑the‑art results on both generic counting and controlled image synthesis tasks.

Generative AIObject CountingYOLO-Count

0 likes · 8 min read

How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation

AI Frontier Lectures

Sep 7, 2025 · Artificial Intelligence

How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation

YOLO-Count introduces a fully differentiable, open‑vocabulary object counting model that guides text‑to‑image generators to produce the exact number of objects specified in prompts, achieving state‑of‑the‑art performance on both generic counting and controlled image synthesis tasks.

Generative AIObject CountingYOLO-Count

0 likes · 8 min read

AI Algorithm Path

Sep 2, 2025 · Artificial Intelligence

Google Unveils “Nano‑Banana”: A New AI Image Editing Model

Google's Gemini 2.5 Flash Image, nicknamed Nano‑Banana, tops community leaderboards with a 0.855 score, offers high‑fidelity likeness preservation for editing and generation at about $0.04 per 1024×1024 image, and is demonstrated through scene‑swap, virtual‑try‑on, and text‑to‑image examples.

AI Image EditingGeminiGoogle

0 likes · 7 min read

Google Unveils “Nano‑Banana”: A New AI Image Editing Model

ShiZhen AI

Aug 27, 2025 · Artificial Intelligence

How to Craft Text Prompts for Stunning Images with Google Gemini

This guide explains how to write precise text prompts for Google Gemini’s image‑generation model, covering six essential prompt elements, feature overviews, and concrete examples that demonstrate character consistency, targeted edits, creative composition, style transfer, and logical reasoning, while also noting current limitations.

AI image generationGoogle GeminiPrompt engineering

0 likes · 10 min read

How to Craft Text Prompts for Stunning Images with Google Gemini

AI Software Product Manager

Jul 23, 2025 · Artificial Intelligence

Create and Run a Text-to-Image Workflow in Coze: Step-by-Step Guide

This tutorial walks you through logging into Coze, building a text‑to‑image workflow with start, image generation, and end nodes, testing and publishing it, then generating a personal access token and invoking the workflow via the Coze API to retrieve the image URL.

AIAPICoze

0 likes · 4 min read

Create and Run a Text-to-Image Workflow in Coze: Step-by-Step Guide

Kuaishou Tech

Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchDiffusion Modelsautoregressive transformer

0 likes · 11 min read

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

AI Frontier Lectures

May 13, 2025 · Artificial Intelligence

How T2I‑R1 Boosts Text‑to‑Image Generation with Dual‑Level CoT Reasoning

Recent large language models have shown strong reasoning abilities, and this work extends chain‑of‑thought reasoning to autoregressive image generation by introducing T2I‑R1, a dual‑level (Semantic‑CoT and Token‑CoT) framework trained with reinforcement learning that unifies high‑level planning and low‑level token generation, achieving state‑of‑the‑art results.

Generative AIreinforcement learningsemantic planning

0 likes · 7 min read

How T2I‑R1 Boosts Text‑to‑Image Generation with Dual‑Level CoT Reasoning

Eric Tech Circle

Mar 28, 2025 · Artificial Intelligence

How to Build a High‑Performance Local Text‑to‑Image Service with Flux and Cursor IDE

Learn step‑by‑step how to set up a stable, high‑efficiency local text‑to‑image generation service using the Flux model series on Alibaba Cloud’s Bailen platform, integrate it with Cursor IDE’s MCP tool, configure environments, manage API keys, and run the service with sample code and results.

AI model deploymentCloud ComputingCursor IDE

0 likes · 13 min read

How to Build a High‑Performance Local Text‑to‑Image Service with Flux and Cursor IDE

AIWalker

Mar 18, 2025 · Artificial Intelligence

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

ImageRAG introduces a retrieval‑augmented generation framework that dynamically fetches relevant images to guide diffusion models, dramatically improving the synthesis of rare and fine‑grained concepts across multiple text‑to‑image systems, as demonstrated by extensive quantitative and user studies.

AI generationDiffusion ModelsImageRAG

0 likes · 17 min read

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

AIWalker

Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Efficient TrainingInference ScalingModel Scaling

0 likes · 17 min read

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

Full-Stack Cultivation Path

Mar 13, 2025 · Cloud Native

Build a Free Flux Text-to-Image API on Cloudflare in 5 Minutes

This guide shows how to use Cloudflare Workers AI's free daily quota to quickly create a custom Flux‑1‑Schnell text‑to‑image API, covering project initialization, AI binding configuration, request validation, error handling, authentication, deployment, and testing with curl.

AI modelCloudflare WorkersFlux Schnell

0 likes · 9 min read

Build a Free Flux Text-to-Image API on Cloudflare in 5 Minutes

AIWalker

Feb 15, 2025 · Artificial Intelligence

How 1.58‑bit Quantization Cuts FLUX Parameters by 99.5% While Matching Full‑Precision Quality

This article presents a 1.58‑bit quantization of the FLUX.1‑dev text‑to‑image model that reduces 99.5% of its 11.9 B parameters, introduces a custom low‑bit kernel, and achieves storage, memory, and latency improvements while preserving generation quality on standard benchmarks.

1.58-bitAI inferenceFlux

0 likes · 8 min read

How 1.58‑bit Quantization Cuts FLUX Parameters by 99.5% While Matching Full‑Precision Quality

AIWalker

Feb 8, 2025 · Artificial Intelligence

Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset

The CVPR 2025 NTIRE workshop launches an AI-generated image quality assessment competition featuring two tracks—fine‑grained text‑image matching and structural issue detection—supported by the large‑scale EvalMuse dataset, detailed evaluation metrics, baseline code, and a prize pool of up to $10,000.

AI competitionCVPREvalMuse

0 likes · 9 min read

Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset

AIWalker

Jan 18, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen is a 379 M‑parameter text‑to‑image diffusion model that produces 1024 px images on mobile devices in about 1.4 seconds, using a compact U‑Net design, multi‑stage knowledge distillation, step distillation, and optimized training tricks to outperform much larger models on standard benchmarks.

Diffusion ModelsSnapGenknowledge distillation

0 likes · 22 min read

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

AIWalker

Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismDiffusion ModelsStyle Transfer

0 likes · 10 min read

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

AIWalker

Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Diffusion ModelsSnapGenknowledge distillation

0 likes · 23 min read

Tencent Cloud Developer

Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPDiffusion Models

0 likes · 19 min read

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

58UXD

Oct 25, 2024 · Artificial Intelligence

How Ideogram AI Generates Ready‑to‑Use Posters and Fonts in Seconds

This article introduces Ideogram, a free AI image tool that can instantly create high‑quality graphics with integrated text, walks through its simple two‑step workflow, showcases font and poster design examples, compares results with Midjourney, and discusses current limitations and pricing.

AI image generationIdeogramfont design

0 likes · 7 min read

How Ideogram AI Generates Ready‑to‑Use Posters and Fonts in Seconds

Alibaba Cloud Big Data AI Platform

Aug 11, 2024 · Artificial Intelligence

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.

AI researchDiffusion ModelsRetrieval-Augmented Generation

0 likes · 11 min read

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Kuaishou Tech

Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI ApplicationsDiffusion ModelsKolors

0 likes · 27 min read

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

Kuaishou Tech

Jul 18, 2024 · Artificial Intelligence

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.

MHP datasetMPSRLHF

0 likes · 10 min read

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

Kuaishou Large Model

Jun 20, 2024 · Artificial Intelligence

Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation

Kwai (Kuaishou) has eight papers accepted at CVPR 2024 covering multi‑dimensional human preference for text‑to‑image generation, short‑video quality assessment, efficient video quality assessment, compressed video enhancement, conditional unsigned distance fields, universal cross‑domain retrieval, perception‑oriented frame interpolation, and test‑time energy adaptation.

3D generationCVPR 2024text-to-image

0 likes · 16 min read

Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation

DaTaobao Tech

Jun 3, 2024 · Artificial Intelligence

Transforming Interior Design: AIGC’s Text‑to‑Image, Lora, and IP‑Adapter Techniques

This article explains how AI‑generated content (AIGC) technologies such as text‑to‑image diffusion models, Lora fine‑tuning, and IP‑Adapter style transfer are applied to interior design, dramatically reducing design time, cutting costs, and enabling personalized, high‑quality visualizations for both consumers and furniture merchants.

AIGCGenerative AIIP-Adapter

0 likes · 9 min read

Transforming Interior Design: AIGC’s Text‑to‑Image, Lora, and IP‑Adapter Techniques

58UXD

May 23, 2024 · Artificial Intelligence

How to Create Stunning Clay‑Style Images with Stable Diffusion

This guide walks you through generating eye‑catching clay‑style artwork using Stable Diffusion, covering model selection, prompt engineering, sampling settings, image‑to‑image techniques, and iterative refinements to achieve high‑quality, realistic results.

AI artStable Diffusionclay style

0 likes · 6 min read

How to Create Stunning Clay‑Style Images with Stable Diffusion

Sohu Tech Products

May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

CLIPDistillationEdge deployment

0 likes · 18 min read

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

Volcano Engine Developer Services

Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation

0 likes · 8 min read

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

DataFunTalk

Nov 15, 2023 · Artificial Intelligence

Contextual Learning for Personalized Text‑to‑Image Generation

This article explains how contextual learning can enhance text‑to‑image models by incorporating example image‑text pairs, redesigning the UNet architecture, building large in‑context training datasets, and training the SuTI model to achieve fast, controllable, and high‑quality personalized image generation.

AIDiffusion Modelscontextual learning

0 likes · 24 min read

Contextual Learning for Personalized Text‑to‑Image Generation

Baidu Tech Salon

Nov 7, 2023 · Artificial Intelligence

How Baidu Is Shaping Text‑to‑Image AI: Trends, Challenges, and Future Outlook

In this interview, Baidu's search architect Tianbao explains the evolution of text‑to‑image generation since 2022, discusses data preparation, model quality, prompt engineering, multi‑style support, evaluation methods, and predicts when fully AI‑generated video and movies might become mainstream.

AI evaluationAIGCBaidu

0 likes · 24 min read

How Baidu Is Shaping Text‑to‑Image AI: Trends, Challenges, and Future Outlook

Baidu Geek Talk

Nov 7, 2023 · Artificial Intelligence

Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications

In a recent InfoQ Geek Talk, Baidu Search chief architect Tianbao discussed the rapid evolution of AI text‑to‑image technology—highlighting Chinese‑language data preparation, prompt‑engineering challenges, evaluation methods combining human feedback and metrics, and future video‑generation prospects—while announcing openings for visual algorithm engineers.

AI image generationAIGCBaidu

0 likes · 24 min read

Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications

Tencent Tech

Oct 26, 2023 · Artificial Intelligence

Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips

This guide introduces Tencent Hunyuan's upgraded text‑to‑image model, explains its technical innovations, provides detailed prompt engineering advice, showcases example prompts and generated images across various styles, and highlights real‑world applications and performance metrics for developers and creators.

AI generationPrompt engineeringTencent Hunyuan

0 likes · 12 min read

Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips

DaTaobao Tech

Oct 13, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Core Principles and Technical Architecture

The article demystifies Stable Diffusion by explaining its low‑cost latent‑space design and conditioning mechanisms, comparing it to autoregressive, VAE, flow‑based and GAN models, detailing the iterative noise‑to‑image process, token‑based text‑to‑image control, version differences, common generation issues, and providing implementation code examples.

AI image generationCross-AttentionStable Diffusion

0 likes · 15 min read

Understanding Stable Diffusion: Core Principles and Technical Architecture

Alibaba Cloud Big Data AI Platform

Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPdiffusion modelfast inference

0 likes · 12 min read

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Tencent Cloud Developer

May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration

0 likes · 12 min read

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

Top Architect

May 8, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

This article provides a comprehensive overview of Stable Diffusion, covering its latent diffusion architecture, training data and procedures, model components such as autoencoder, CLIP text encoder and UNet, as well as practical usage examples including text‑to‑image generation, image‑to‑image, inpainting, and advanced extensions like ControlNet and SD‑2.x.

AI image generationDiffusion ModelsStable Diffusion

0 likes · 52 min read

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

DataFunTalk

Mar 7, 2023 · Artificial Intelligence

Overview of Text‑Controlled Image Generation Models: DALL‑E‑2, Imagen, Latent Stable Diffusion, and ControlNet

This article surveys the key challenges of controllable text‑to‑image generation and explains the architectures, components, and training details of major diffusion‑based models such as DALL‑E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension.

AIControlNetDALL-E-2

0 likes · 16 min read

Overview of Text‑Controlled Image Generation Models: DALL‑E‑2, Imagen, Latent Stable Diffusion, and ControlNet

Laiye Technology Team

Mar 3, 2023 · Artificial Intelligence

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

This article reviews the key components and design choices of recent text‑controlled image generation systems—including DALL·E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension—highlighting how diffusion models, text encoders, prior modules, super‑resolution, and conditioning mechanisms enable high‑quality, controllable visual synthesis.

AIControlNetDALL-E-2

0 likes · 16 min read

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

phodal

Feb 20, 2023 · Artificial Intelligence

Prompt Engineering Secrets: Text‑to‑Image, Article & Code Generation with AI

This guide explores how to craft effective prompts for Stable Diffusion image creation, ChatGPT article writing, and GitHub Copilot code generation, covering prompt evolution, negative prompts, ControlNet enhancements, model selection, and practical tips for iterative refinement and context building.

AI generationChatGPTControlNet

0 likes · 15 min read

Prompt Engineering Secrets: Text‑to‑Image, Article & Code Generation with AI

DeWu Technology

Feb 13, 2023 · Artificial Intelligence

Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications

The article surveys AI‑generated art, explaining how GANs’ limitations gave way to diffusion models and the open‑source Stable Diffusion platform, which offers text‑to‑image, img2img, inpainting, DreamBooth fine‑tuning, and widespread commercial and DIY deployments via cloud or local WebUI setups.

AI artGaNStable Diffusion

0 likes · 13 min read

Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications

21CTO

Jan 13, 2023 · Artificial Intelligence

How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding

Google’s new Muse model, a Transformer‑based text‑to‑image system running on TPUv4, claims to generate 256×256 images in 0.5 seconds—far faster than Imagen—while delivering unprecedented photorealism and deep language understanding through parallel decoding and large‑scale LLM‑conditioned training.

AI researchGoogle MuseLLM conditioning

0 likes · 4 min read

How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding

Alibaba Cloud Big Data AI Platform

Dec 12, 2022 · Artificial Intelligence

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

This article introduces Alibaba Cloud’s open‑source PAI‑Diffusion series, detailing its Latent Diffusion Model foundation, Chinese CLIP alignment, super‑resolution components, and showcases diverse artistic and real‑world text‑to‑image generation scenarios, while providing guidance on accessing the models via Alibaba Cloud AI Center, PAI‑DSW, and HuggingFace Space.

Alibaba CloudChinese AIDiffusion Models

0 likes · 11 min read

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

Tencent Cloud Developer

Nov 1, 2022 · Artificial Intelligence

The Rise of AI-Generated Content: Technologies, Applications, and Risks

The article surveys the evolution of AI‑generated content from early art programs to modern diffusion‑based text‑to‑image and text‑to‑video models, outlines key milestones such as Stable Diffusion and DALL‑E 2, explores gaming applications, and highlights limitations, ethical concerns, and copyright risks of open‑source generative AI.

AI generationcreative AItext-to-image

0 likes · 22 min read

The Rise of AI-Generated Content: Technologies, Applications, and Risks

Alibaba Cloud Big Data AI Platform

Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI generationChinese NLPEasyNLP

0 likes · 20 min read

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

Alibaba Cloud Developer

Jul 28, 2022 · Artificial Intelligence

Unlock Chinese Text‑to‑Image Generation with EasyNLP: Models, Code & Tutorials

This article introduces EasyNLP's Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, showcases sample outputs, and offers step‑by‑step code and command‑line instructions for fine‑tuning and inference.

Chinese AIEasyNLPMultimodal

0 likes · 20 min read

Unlock Chinese Text‑to‑Image Generation with EasyNLP: Models, Code & Tutorials

Liangxu Linux

May 30, 2022 · Frontend Development

Discover Free Online Tools for Hex Editing, Text‑to‑Image, Code Snippets, and Diagramming

This guide introduces four free web‑based utilities—hexed.it for quick hex editing, text2image for converting text into images, Carbon for turning code snippets into stylish pictures, and draw.io for creating professional diagrams—highlighting their key features, interfaces, and practical use cases for developers.

code snippet imagediagramminghex editor

0 likes · 7 min read

Discover Free Online Tools for Hex Editing, Text‑to‑Image, Code Snippets, and Diagramming

IT Services Circle

Apr 13, 2022 · Artificial Intelligence

Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms

Jack Cui presents three recently open‑sourced CVPR 2022 algorithms—DualStyleGAN for high‑resolution portrait style transfer, RQ‑VAE Transformer for improved text‑to‑image generation, and VFD for deep‑fake detection—detailing their functionality, usage options, and providing links to code repositories and demo platforms.

AIStyle Transferdeepfake detection

0 likes · 5 min read

Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms

Alibaba Cloud Developer

Nov 19, 2019 · Artificial Intelligence

Can AI Imagine Visually? Seq‑SG2SL for Scene‑to‑Semantic Layout

This article introduces the Seq‑SG2SL framework, which tackles the challenge of granting AI visual imagination by converting scene graphs into semantic layouts, discusses the limitations of existing text‑to‑image methods, proposes the SLEU metric for automatic evaluation, and presents experimental results demonstrating its effectiveness.

AISLEUScene Graph

0 likes · 16 min read

Can AI Imagine Visually? Seq‑SG2SL for Scene‑to‑Semantic Layout