Tag

text-to-image

0 views collected around this technical thread.

Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCClipGPT
0 likes · 19 min read
Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview
Kuaishou Tech
Kuaishou Tech
Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsdiffusion models
0 likes · 27 min read
Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications
Kuaishou Tech
Kuaishou Tech
Jul 18, 2024 · Artificial Intelligence

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.

AI evaluationMHP datasetMPS
0 likes · 10 min read
Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis
Kuaishou Large Model
Kuaishou Large Model
Jun 20, 2024 · Artificial Intelligence

Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation

Kwai (Kuaishou) has eight papers accepted at CVPR 2024 covering multi‑dimensional human preference for text‑to‑image generation, short‑video quality assessment, efficient video quality assessment, compressed video enhancement, conditional unsigned distance fields, universal cross‑domain retrieval, perception‑oriented frame interpolation, and test‑time energy adaptation.

3D GenerationArtificial IntelligenceCVPR 2024
0 likes · 16 min read
Eight Kwai Papers Accepted at CVPR 2024 – Text-to-Image, Video Quality & 3D Generation
Sohu Tech Products
Sohu Tech Products
May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

ClipLoRAOPPO
0 likes · 18 min read
OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations
Tencent Cloud Developer
Tencent Cloud Developer
May 15, 2024 · Artificial Intelligence

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

Tencent has open‑sourced its upgraded 1.5‑billion‑parameter HunYuan DiT model—the first Chinese‑native, bilingual (Chinese‑English) text‑to‑image diffusion‑with‑transformer system—delivering about 20% visual quality improvement, multi‑round generation, video‑generation potential, and free commercial use, with full weights, inference code, and algorithms available on Hugging Face and GitHub for developers and enterprises.

Chinese-native AIDiT architectureDiffusion Transformer
0 likes · 6 min read
Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters
DataFunTalk
DataFunTalk
Nov 15, 2023 · Artificial Intelligence

Contextual Learning for Personalized Text‑to‑Image Generation

This article explains how contextual learning can enhance text‑to‑image models by incorporating example image‑text pairs, redesigning the UNet architecture, building large in‑context training datasets, and training the SuTI model to achieve fast, controllable, and high‑quality personalized image generation.

AIcontextual learningdiffusion models
0 likes · 24 min read
Contextual Learning for Personalized Text‑to‑Image Generation
Baidu Geek Talk
Baidu Geek Talk
Nov 7, 2023 · Artificial Intelligence

Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications

In a recent InfoQ Geek Talk, Baidu Search chief architect Tianbao discussed the rapid evolution of AI text‑to‑image technology—highlighting Chinese‑language data preparation, prompt‑engineering challenges, evaluation methods combining human feedback and metrics, and future video‑generation prospects—while announcing openings for visual algorithm engineers.

AI image generationAIGCBaidu
0 likes · 24 min read
Interview on AI Image Generation (Text-to-Image) Technology and Baidu Search Applications
Tencent Tech
Tencent Tech
Oct 26, 2023 · Artificial Intelligence

Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips

This guide introduces Tencent Hunyuan's upgraded text‑to‑image model, explains its technical innovations, provides detailed prompt engineering advice, showcases example prompts and generated images across various styles, and highlights real‑world applications and performance metrics for developers and creators.

AI generationPrompt EngineeringTencent Hunyuan
0 likes · 12 min read
Unlocking Tencent Hunyuan Text‑to‑Image: A Complete Guide and Prompt Tips
DaTaobao Tech
DaTaobao Tech
Oct 13, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Core Principles and Technical Architecture

The article demystifies Stable Diffusion by explaining its low‑cost latent‑space design and conditioning mechanisms, comparing it to autoregressive, VAE, flow‑based and GAN models, detailing the iterative noise‑to‑image process, token‑based text‑to‑image control, version differences, common generation issues, and providing implementation code examples.

AI image generationCross-AttentionStable Diffusion
0 likes · 15 min read
Understanding Stable Diffusion: Core Principles and Technical Architecture
Tencent Cloud Developer
Tencent Cloud Developer
May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI PaintingCLIP embeddingTraining Acceleration
0 likes · 12 min read
QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting
Top Architect
Top Architect
May 8, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

This article provides a comprehensive overview of Stable Diffusion, covering its latent diffusion architecture, training data and procedures, model components such as autoencoder, CLIP text encoder and UNet, as well as practical usage examples including text‑to‑image generation, image‑to‑image, inpainting, and advanced extensions like ControlNet and SD‑2.x.

AI image generationStable Diffusiondiffusion models
0 likes · 52 min read
Understanding Stable Diffusion: Architecture, Training, and Practical Applications
DaTaobao Tech
DaTaobao Tech
Mar 22, 2023 · Artificial Intelligence

A Comprehensive Overview of Text-to-Image Generation: From GANs to Stable Diffusion and Advanced Techniques

The article traces the evolution of text‑to‑image generation from early GANs through auto‑regressive and CLIP‑guided diffusion models, explains Stable Diffusion’s architecture and prompt engineering, and reviews advanced personalization techniques such as Textual Inversion, DreamBooth, ControlNet, plus efficient OneFlow deployment and diverse real‑world applications.

AI artPrompt EngineeringStable Diffusion
0 likes · 17 min read
A Comprehensive Overview of Text-to-Image Generation: From GANs to Stable Diffusion and Advanced Techniques
Laiye Technology Team
Laiye Technology Team
Mar 3, 2023 · Artificial Intelligence

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

This article reviews the key components and design choices of recent text‑controlled image generation systems—including DALL·E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension—highlighting how diffusion models, text encoders, prior modules, super‑resolution, and conditioning mechanisms enable high‑quality, controllable visual synthesis.

AIControlNetDALL-E-2
0 likes · 16 min read
Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet
DeWu Technology
DeWu Technology
Feb 13, 2023 · Artificial Intelligence

Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications

The article surveys AI‑generated art, explaining how GANs’ limitations gave way to diffusion models and the open‑source Stable Diffusion platform, which offers text‑to‑image, img2img, inpainting, DreamBooth fine‑tuning, and widespread commercial and DIY deployments via cloud or local WebUI setups.

AI artGANStable Diffusion
0 likes · 13 min read
Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications
Tencent Cloud Developer
Tencent Cloud Developer
Nov 1, 2022 · Artificial Intelligence

The Rise of AI-Generated Content: Technologies, Applications, and Risks

The article surveys the evolution of AI‑generated content from early art programs to modern diffusion‑based text‑to‑image and text‑to‑video models, outlines key milestones such as Stable Diffusion and DALL‑E 2, explores gaming applications, and highlights limitations, ethical concerns, and copyright risks of open‑source generative AI.

AI generationCreative AIdiffusion models
0 likes · 22 min read
The Rise of AI-Generated Content: Technologies, Applications, and Risks
IT Services Circle
IT Services Circle
Apr 13, 2022 · Artificial Intelligence

Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms

Jack Cui presents three recently open‑sourced CVPR 2022 algorithms—DualStyleGAN for high‑resolution portrait style transfer, RQ‑VAE Transformer for improved text‑to‑image generation, and VFD for deep‑fake detection—detailing their functionality, usage options, and providing links to code repositories and demo platforms.

AIDeepfake DetectionGenerative Models
0 likes · 5 min read
Introducing DualStyleGAN, RQ‑VAE Transformer, and VFD: Recent CVPR 2022 Open‑Source Algorithms