Tagged articles
63 articles
Page 1 of 1
SuanNi
SuanNi
May 7, 2026 · Artificial Intelligence

DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing

DreamLite is a compact 0.39 B unified diffusion model open‑sourced by ByteDance that runs on smartphones, delivering text‑to‑image generation and text‑guided editing in about three seconds for 1024×1024 pictures, with performance comparable to Flux, Z‑Image and LongCat‑Image and offering two variants to balance fidelity and latency.

AI modelByteDanceDreamLite
0 likes · 4 min read
DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels

PromptEcho computes a continuous reward for text‑to‑image generation by measuring how well a frozen vision‑language model can reconstruct the original prompt from the generated image, eliminating the need for annotated data or a trained reward model and outperforming prior methods across multiple benchmarks.

BenchmarkPromptEchoReward Modeling
0 likes · 10 min read
PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 27, 2026 · Artificial Intelligence

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

The paper introduces CFG‑Ctrl, a control‑theoretic redesign of classifier‑free diffusion guidance that treats the generation process as a dynamic system, achieving more stable and accurate text‑to‑image results across multiple model scales and evaluation metrics.

CFG-CtrlClassifier-Free Guidancecontrol theory
0 likes · 15 min read
From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation
Old Meng AI Explorer
Old Meng AI Explorer
Apr 25, 2026 · Artificial Intelligence

Stop Using Vague Prompts – Master GPT Image 2 with Top‑Tier Prompt Templates to End ‘Waste’ Images

The guide explains why GPT Image 2 dramatically reduces low‑quality outputs, outlines five essential prompt elements, provides eight ready‑to‑use scene templates, shares advanced tricks, common pitfalls, and concrete examples to help users generate professional AI images reliably.

AI image generationCJK renderingGPT Image 2
0 likes · 16 min read
Stop Using Vague Prompts – Master GPT Image 2 with Top‑Tier Prompt Templates to End ‘Waste’ Images
Geek Labs
Geek Labs
Apr 24, 2026 · Artificial Intelligence

One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

The article introduces three open‑source AI tools—a video editor that turns raw footage and a brief into a finished ad, Baidu's 8‑billion‑parameter text‑to‑image model that runs on 24 GB GPUs, and a weekly AI‑developer digest that auto‑generates Chinese reports—detailing their workflows, benchmarks, usage commands, and target users.

AI content creationAI video editingagentic workflow
0 likes · 9 min read
One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox
PaperAgent
PaperAgent
Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchconcept coupling
0 likes · 7 min read
How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation
Design Hub
Design Hub
Feb 10, 2026 · Artificial Intelligence

AI‑Assisted Design Breakthrough: Qwen‑Image‑2.0 Becomes Your PPT, Poster, and Comic Creator

Qwen‑Image‑2.0, the latest text‑to‑image model from Tongyi Qianwen, delivers pixel‑perfect 2K text rendering, supports 1K‑token prompts, and combines generation and editing in one model, achieving a score of 1029 and third place in the global AI Arena benchmark, positioning it as an AI‑powered designer for PPTs, posters, infographics, and comics.

AI Arena benchmarkAI image generationDesign Automation
0 likes · 10 min read
AI‑Assisted Design Breakthrough: Qwen‑Image‑2.0 Becomes Your PPT, Poster, and Comic Creator
Woodpecker Software Testing
Woodpecker Software Testing
Jan 21, 2026 · Artificial Intelligence

Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image

This guide walks through designing and implementing an AI assistant that connects FastAPI to Alibaba Cloud large‑model services, supports streaming text Q&A, image understanding, text‑to‑image generation, network search, and MCP‑based map queries, with full front‑end and back‑end code examples.

AI chatbotAlibaba CloudFastAPI
0 likes · 38 min read
Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image
Design Hub
Design Hub
Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationFLUX.2GPU Acceleration
0 likes · 8 min read
FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts
AI Info Trend
AI Info Trend
Jan 14, 2026 · Industry Insights

2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion

The 2026 AI model leaderboards across Text, Web Development, Vision, and Text-to-Image arenas reveal Google’s Gemini series leading in text and vision, Anthropic’s Claude Opus unexpectedly topping web‑dev rankings, and OpenAI’s GPT‑Image‑1.5 clinching the top spot in creative image generation, highlighting an increasingly competitive AI landscape.

AIAnthropicGoogle
0 likes · 8 min read
2026 AI Model Leaderboards: Google Dominates, Anthropic Surprises, OpenAI’s New Champion
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2026 · Artificial Intelligence

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

GLM-Image, a Chinese text‑to‑image model trained end‑to‑end on Huawei Ascend 800T A2 NPUs, combines an autoregressive decoder with a diffusion encoder, supports resolutions up to 2048×2048, and offers open‑source code, API access, and detailed prompts that demonstrate its strong layout and typography capabilities.

GLM-ImageHuawei Ascenddiffusion
0 likes · 12 min read
How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips
Tencent Cloud Developer
Tencent Cloud Developer
Jan 14, 2026 · Artificial Intelligence

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

This guide explains how to use advanced AI models such as Gemini, Midjourney, and Stable Diffusion to expand brief, informal user descriptions into comprehensive, high‑quality English prompts that include visual style, subject details, environment, lighting, and camera parameters for image or video generation.

AI prompt engineeringMidjourneyPrompt Design
0 likes · 14 min read
Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide
PaperAgent
PaperAgent
Dec 21, 2025 · Artificial Intelligence

Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test

This article evaluates the Nano Banana Pro text‑to‑image model, built on Gemini 3 Pro, across fourteen low‑level vision tasks and forty datasets using only prompts without fine‑tuning, revealing strong perceptual quality but weak pixel‑level metrics, and highlighting both its generative strengths and failure modes such as hallucinations and color shifts.

AI model analysisImage Restorationlow-level vision
0 likes · 7 min read
Can a Text‑to‑Image Model Replace Traditional Vision Tools? Nano Banana Pro Zero‑Shot Test
AI Algorithm Path
AI Algorithm Path
Dec 17, 2025 · Artificial Intelligence

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

Black Forest Labs released Flux.2 Max, the top‑performing model in the Flux.2 series featuring real‑time context generation, superior texture handling, and strong instruction following, ranking second on the Artificial Analysis leaderboard, with detailed examples, API usage, and pricing information provided.

AI modelAPIBenchmark
0 likes · 11 min read
Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model
58UXD
58UXD
Dec 5, 2025 · Artificial Intelligence

How AI Powered a 40‑Second Brand Video for China’s “Good Employer” Awards

This case study details how the 58.com UX team used AI tools to create a 40‑second pre‑video for the China Good Employer awards, outlining the visual framework, prompt engineering, segmented generation workflow, and the balance between automation and designer insight.

AI video generationCase Studybrand video
0 likes · 7 min read
How AI Powered a 40‑Second Brand Video for China’s “Good Employer” Awards
Data Party THU
Data Party THU
Sep 28, 2025 · Artificial Intelligence

How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation

This article reviews the YOLO-Count model, a fully differentiable, open‑vocabulary object counting system that guides text‑to‑image generators to produce the exact number of objects specified in prompts, achieving state‑of‑the‑art results on both generic counting and controlled image synthesis tasks.

Object CountingYOLO-Countdifferentiable model
0 likes · 8 min read
How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation
AI Frontier Lectures
AI Frontier Lectures
Sep 7, 2025 · Artificial Intelligence

How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation

YOLO-Count introduces a fully differentiable, open‑vocabulary object counting model that guides text‑to‑image generators to produce the exact number of objects specified in prompts, achieving state‑of‑the‑art performance on both generic counting and controlled image synthesis tasks.

Object CountingYOLO-Countdifferentiable models
0 likes · 8 min read
How YOLO-Count Enables Precise Object Counting in Text-to-Image Generation
AI Algorithm Path
AI Algorithm Path
Sep 2, 2025 · Artificial Intelligence

Google Unveils “Nano‑Banana”: A New AI Image Editing Model

Google's Gemini 2.5 Flash Image, nicknamed Nano‑Banana, tops community leaderboards with a 0.855 score, offers high‑fidelity likeness preservation for editing and generation at about $0.04 per 1024×1024 image, and is demonstrated through scene‑swap, virtual‑try‑on, and text‑to‑image examples.

AI Image EditingGeminiGoogle
0 likes · 7 min read
Google Unveils “Nano‑Banana”: A New AI Image Editing Model
ShiZhen AI
ShiZhen AI
Aug 27, 2025 · Artificial Intelligence

How to Craft Text Prompts for Stunning Images with Google Gemini

This guide explains how to write precise text prompts for Google Gemini’s image‑generation model, covering six essential prompt elements, feature overviews, and concrete examples that demonstrate character consistency, targeted edits, creative composition, style transfer, and logical reasoning, while also noting current limitations.

AI image generationGoogle GeminiPrompt engineering
0 likes · 10 min read
How to Craft Text Prompts for Stunning Images with Google Gemini
Kuaishou Tech
Kuaishou Tech
Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchautoregressive transformerdiffusion models
0 likes · 11 min read
How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer
AI Frontier Lectures
AI Frontier Lectures
May 13, 2025 · Artificial Intelligence

How T2I‑R1 Boosts Text‑to‑Image Generation with Dual‑Level CoT Reasoning

Recent large language models have shown strong reasoning abilities, and this work extends chain‑of‑thought reasoning to autoregressive image generation by introducing T2I‑R1, a dual‑level (Semantic‑CoT and Token‑CoT) framework trained with reinforcement learning that unifies high‑level planning and low‑level token generation, achieving state‑of‑the‑art results.

generative AIreinforcement learningsemantic planning
0 likes · 7 min read
How T2I‑R1 Boosts Text‑to‑Image Generation with Dual‑Level CoT Reasoning
Eric Tech Circle
Eric Tech Circle
Mar 28, 2025 · Artificial Intelligence

How to Build a High‑Performance Local Text‑to‑Image Service with Flux and Cursor IDE

Learn step‑by‑step how to set up a stable, high‑efficiency local text‑to‑image generation service using the Flux model series on Alibaba Cloud’s Baile​n platform, integrate it with Cursor IDE’s MCP tool, configure environments, manage API keys, and run the service with sample code and results.

AI Model DeploymentCursor IDEFlux
0 likes · 13 min read
How to Build a High‑Performance Local Text‑to‑Image Service with Flux and Cursor IDE
AIWalker
AIWalker
Mar 18, 2025 · Artificial Intelligence

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

ImageRAG introduces a retrieval‑augmented generation framework that dynamically fetches relevant images to guide diffusion models, dramatically improving the synthesis of rare and fine‑grained concepts across multiple text‑to‑image systems, as demonstrated by extensive quantitative and user studies.

AI GenerationBenchmarkImageRAG
0 likes · 17 min read
How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation
AIWalker
AIWalker
Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Inference ScalingModel Scalingdiffusion
0 likes · 17 min read
How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Mar 13, 2025 · Cloud Native

Build a Free Flux Text-to-Image API on Cloudflare in 5 Minutes

This guide shows how to use Cloudflare Workers AI's free daily quota to quickly create a custom Flux‑1‑Schnell text‑to‑image API, covering project initialization, AI binding configuration, request validation, error handling, authentication, deployment, and testing with curl.

AI modelCloudflare WorkersFlux Schnell
0 likes · 9 min read
Build a Free Flux Text-to-Image API on Cloudflare in 5 Minutes
AIWalker
AIWalker
Feb 8, 2025 · Artificial Intelligence

Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset

The CVPR 2025 NTIRE workshop launches an AI-generated image quality assessment competition featuring two tracks—fine‑grained text‑image matching and structural issue detection—supported by the large‑scale EvalMuse dataset, detailed evaluation metrics, baseline code, and a prize pool of up to $10,000.

AI competitionBenchmarkCVPR
0 likes · 9 min read
Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset
AIWalker
AIWalker
Jan 18, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen is a 379 M‑parameter text‑to‑image diffusion model that produces 1024 px images on mobile devices in about 1.4 seconds, using a compact U‑Net design, multi‑stage knowledge distillation, step distillation, and optimized training tricks to outperform much larger models on standard benchmarks.

Mobile AISnapGendiffusion models
0 likes · 22 min read
SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture
AIWalker
AIWalker
Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismStyle Transferdiffusion models
0 likes · 10 min read
ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua
AIWalker
AIWalker
Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Mobile AISnapGendiffusion models
0 likes · 23 min read
SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPComputer Vision
0 likes · 19 min read
Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview
58UXD
58UXD
Oct 25, 2024 · Artificial Intelligence

How Ideogram AI Generates Ready‑to‑Use Posters and Fonts in Seconds

This article introduces Ideogram, a free AI image tool that can instantly create high‑quality graphics with integrated text, walks through its simple two‑step workflow, showcases font and poster design examples, compares results with Midjourney, and discusses current limitations and pricing.

AI image generationIdeogramfont design
0 likes · 7 min read
How Ideogram AI Generates Ready‑to‑Use Posters and Fonts in Seconds
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 11, 2024 · Artificial Intelligence

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.

AI researchRetrieval Augmented Generationdiffusion models
0 likes · 11 min read
Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing
Kuaishou Tech
Kuaishou Tech
Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsModel Evaluation
0 likes · 27 min read
Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications
Kuaishou Tech
Kuaishou Tech
Jul 18, 2024 · Artificial Intelligence

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.

MHP datasetMPSRLHF
0 likes · 10 min read
Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis
Kuaishou Large Model
Kuaishou Large Model
Jun 20, 2024 · Artificial Intelligence

Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation

Kwai (Kuaishou) has eight papers accepted at CVPR 2024 covering multi‑dimensional human preference for text‑to‑image generation, short‑video quality assessment, efficient video quality assessment, compressed video enhancement, conditional unsigned distance fields, universal cross‑domain retrieval, perception‑oriented frame interpolation, and test‑time energy adaptation.

3D generationCVPR 2024text-to-image
0 likes · 16 min read
Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation
DaTaobao Tech
DaTaobao Tech
Jun 3, 2024 · Artificial Intelligence

Transforming Interior Design: AIGC’s Text‑to‑Image, Lora, and IP‑Adapter Techniques

This article explains how AI‑generated content (AIGC) technologies such as text‑to‑image diffusion models, Lora fine‑tuning, and IP‑Adapter style transfer are applied to interior design, dramatically reducing design time, cutting costs, and enabling personalized, high‑quality visualizations for both consumers and furniture merchants.

AIGCIP-AdapterLoRA
0 likes · 9 min read
Transforming Interior Design: AIGC’s Text‑to‑Image, Lora, and IP‑Adapter Techniques
58UXD
58UXD
May 23, 2024 · Artificial Intelligence

How to Create Stunning Clay‑Style Images with Stable Diffusion

This guide walks you through generating eye‑catching clay‑style artwork using Stable Diffusion, covering model selection, prompt engineering, sampling settings, image‑to‑image techniques, and iterative refinements to achieve high‑quality, realistic results.

AI artStable Diffusionclay style
0 likes · 6 min read
How to Create Stunning Clay‑Style Images with Stable Diffusion
Sohu Tech Products
Sohu Tech Products
May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

CLIPDistillationLoRA
0 likes · 18 min read
OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation
0 likes · 8 min read
How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps
DataFunTalk
DataFunTalk
Nov 15, 2023 · Artificial Intelligence

Contextual Learning for Personalized Text‑to‑Image Generation

This article explains how contextual learning can enhance text‑to‑image models by incorporating example image‑text pairs, redesigning the UNet architecture, building large in‑context training datasets, and training the SuTI model to achieve fast, controllable, and high‑quality personalized image generation.

AIcontextual learningdiffusion models
0 likes · 24 min read
Contextual Learning for Personalized Text‑to‑Image Generation
Baidu Geek Talk
Baidu Geek Talk
Nov 7, 2023 · Artificial Intelligence

Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications

In a recent InfoQ Geek Talk, Baidu Search chief architect Tianbao discussed the rapid evolution of AI text‑to‑image technology—highlighting Chinese‑language data preparation, prompt‑engineering challenges, evaluation methods combining human feedback and metrics, and future video‑generation prospects—while announcing openings for visual algorithm engineers.

AI image generationAIGCBaidu
0 likes · 24 min read
Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications
Tencent Tech
Tencent Tech
Oct 26, 2023 · Artificial Intelligence

Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips

This guide introduces Tencent Hunyuan's upgraded text‑to‑image model, explains its technical innovations, provides detailed prompt engineering advice, showcases example prompts and generated images across various styles, and highlights real‑world applications and performance metrics for developers and creators.

AI GenerationLarge ModelPrompt engineering
0 likes · 12 min read
Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips
DaTaobao Tech
DaTaobao Tech
Oct 13, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Core Principles and Technical Architecture

The article demystifies Stable Diffusion by explaining its low‑cost latent‑space design and conditioning mechanisms, comparing it to autoregressive, VAE, flow‑based and GAN models, detailing the iterative noise‑to‑image process, token‑based text‑to‑image control, version differences, common generation issues, and providing implementation code examples.

AI image generationComputer VisionCross-Attention
0 likes · 15 min read
Understanding Stable Diffusion: Core Principles and Technical Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPKnowledge Enhancementdiffusion model
0 likes · 12 min read
Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese
Tencent Cloud Developer
Tencent Cloud Developer
May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration
0 likes · 12 min read
QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting
Top Architect
Top Architect
May 8, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

This article provides a comprehensive overview of Stable Diffusion, covering its latent diffusion architecture, training data and procedures, model components such as autoencoder, CLIP text encoder and UNet, as well as practical usage examples including text‑to‑image generation, image‑to‑image, inpainting, and advanced extensions like ControlNet and SD‑2.x.

AI image generationStable Diffusiondiffusion models
0 likes · 52 min read
Understanding Stable Diffusion: Architecture, Training, and Practical Applications
Laiye Technology Team
Laiye Technology Team
Mar 3, 2023 · Artificial Intelligence

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

This article reviews the key components and design choices of recent text‑controlled image generation systems—including DALL·E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension—highlighting how diffusion models, text encoders, prior modules, super‑resolution, and conditioning mechanisms enable high‑quality, controllable visual synthesis.

AIControlNetDALL-E-2
0 likes · 16 min read
Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet
phodal
phodal
Feb 20, 2023 · Artificial Intelligence

Prompt Engineering Secrets: Text‑to‑Image, Article & Code Generation with AI

This guide explores how to craft effective prompts for Stable Diffusion image creation, ChatGPT article writing, and GitHub Copilot code generation, covering prompt evolution, negative prompts, ControlNet enhancements, model selection, and practical tips for iterative refinement and context building.

AI GenerationChatGPTControlNet
0 likes · 15 min read
Prompt Engineering Secrets: Text‑to‑Image, Article & Code Generation with AI
21CTO
21CTO
Jan 13, 2023 · Artificial Intelligence

How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding

Google’s new Muse model, a Transformer‑based text‑to‑image system running on TPUv4, claims to generate 256×256 images in 0.5 seconds—far faster than Imagen—while delivering unprecedented photorealism and deep language understanding through parallel decoding and large‑scale LLM‑conditioned training.

AI researchGoogle MuseLLM conditioning
0 likes · 4 min read
How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 12, 2022 · Artificial Intelligence

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

This article introduces Alibaba Cloud’s open‑source PAI‑Diffusion series, detailing its Latent Diffusion Model foundation, Chinese CLIP alignment, super‑resolution components, and showcases diverse artistic and real‑world text‑to‑image generation scenarios, while providing guidance on accessing the models via Alibaba Cloud AI Center, PAI‑DSW, and HuggingFace Space.

Alibaba CloudChinese AIdiffusion models
0 likes · 11 min read
Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models
Tencent Cloud Developer
Tencent Cloud Developer
Nov 1, 2022 · Artificial Intelligence

The Rise of AI-Generated Content: Technologies, Applications, and Risks

The article surveys the evolution of AI‑generated content from early art programs to modern diffusion‑based text‑to‑image and text‑to‑video models, outlines key milestones such as Stable Diffusion and DALL‑E 2, explores gaming applications, and highlights limitations, ethical concerns, and copyright risks of open‑source generative AI.

AI Generationcreative AItext-to-image
0 likes · 22 min read
The Rise of AI-Generated Content: Technologies, Applications, and Risks
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI GenerationChinese NLPEasyNLP
0 likes · 20 min read
Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models
Liangxu Linux
Liangxu Linux
May 30, 2022 · Frontend Development

Discover Free Online Tools for Hex Editing, Text‑to‑Image, Code Snippets, and Diagramming

This guide introduces four free web‑based utilities—hexed.it for quick hex editing, text2image for converting text into images, Carbon for turning code snippets into stylish pictures, and draw.io for creating professional diagrams—highlighting their key features, interfaces, and practical use cases for developers.

Diagrammingcode snippet imagehex editor
0 likes · 7 min read
Discover Free Online Tools for Hex Editing, Text‑to‑Image, Code Snippets, and Diagramming
IT Services Circle
IT Services Circle
Apr 13, 2022 · Artificial Intelligence

Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms

Jack Cui presents three recently open‑sourced CVPR 2022 algorithms—DualStyleGAN for high‑resolution portrait style transfer, RQ‑VAE Transformer for improved text‑to‑image generation, and VFD for deep‑fake detection—detailing their functionality, usage options, and providing links to code repositories and demo platforms.

AIGenerative ModelsStyle Transfer
0 likes · 5 min read
Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 19, 2019 · Artificial Intelligence

Can AI Imagine Visually? Seq‑SG2SL for Scene‑to‑Semantic Layout

This article introduces the Seq‑SG2SL framework, which tackles the challenge of granting AI visual imagination by converting scene graphs into semantic layouts, discusses the limitations of existing text‑to‑image methods, proposes the SLEU metric for automatic evaluation, and presents experimental results demonstrating its effectiveness.

AISLEUscene graph
0 likes · 16 min read
Can AI Imagine Visually? Seq‑SG2SL for Scene‑to‑Semantic Layout