Tagged articles

158 articles

Page 2 of 2

Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismStyle Transferdiffusion models

0 likes · 10 min read

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

AIWalker

Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Mobile AISnapGendiffusion models

0 likes · 23 min read

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

DaTaobao Tech

Dec 30, 2024 · Artificial Intelligence

AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation

This article compiles recent AI research breakthroughs presented at AAAI 2025 and NeurIPS 2024, summarizing eight papers on multi‑condition image generation, mixed auto‑regressive models, hallucination mitigation in vision‑language models, quantized diffusion denoising, facial part swapping, language‑guided concept vectors, attribution consistency, and video virtual try‑on, with links to each work.

AAAI 2025AI researchGenerative Models

0 likes · 13 min read

AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation

Bilibili Tech

Dec 24, 2024 · Artificial Intelligence

AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark

AniSora combines a 10‑million‑pair anime text‑video dataset, a controllable diffusion‑transformer with temporal‑mask conditioning for text‑to‑video, interpolation and region‑guided animation, and a 948‑video benchmark, delivering industry‑leading character and motion consistency and already powering low‑cost dynamic‑comic production for multiple IPs.

AI_AnimationDataset BenchmarkTemporal Masking

0 likes · 21 min read

AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark

DaTaobao Tech

Dec 16, 2024 · Artificial Intelligence

Reference Image Generation for Subject‑Driven Diffusion

This work presents a subject‑driven diffusion pipeline that injects multi‑scale reference features (ReferenceNet‑style) into high‑fidelity backbones such as SD‑XL and Flux, enabling zero‑shot, fine‑grained product consistency across diverse scenes and outperforming current fine‑tuned and zero‑shot methods while noting limits in category coverage and human interactions.

AIDreamBoothIP-Adapter

0 likes · 9 min read

Reference Image Generation for Subject‑Driven Diffusion

AntTech

Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

EchoMimicV2, an open‑source project from Ant Group's Alipay AI team, introduces an end‑to‑end audio‑driven framework that generates high‑quality semi‑body portrait videos by jointly coordinating audio, pose, and image inputs, while addressing challenges of condition complexity, model stability, and computational cost.

Digital Humanaudio-driven animationdiffusion models

0 likes · 16 min read

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

Alimama Tech

Nov 27, 2024 · Artificial Intelligence

FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution

FlowDCN introduces Groupwise‑MSDCN, a sparse deformable convolution that replaces attention, enabling efficient arbitrary‑resolution image generation with linear complexity, fewer parameters and FLOPs, and achieving state‑of‑the‑art FID scores on ImageNet while requiring far fewer training steps.

Deformable Convolutionarbitrary resolutiondiffusion models

0 likes · 12 min read

FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution

Alipay Experience Technology

Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

EchoMimicV2 is an open‑source digital‑human framework that generates high‑quality half‑body animation videos from a single reference image, an audio clip, and a hand‑gesture sequence, addressing challenges of facial portrait limits, complex condition injection, and inference latency in audio‑driven animation.

AI researchDigital HumanVideo Generation

0 likes · 18 min read

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

JD Tech

Nov 15, 2024 · Artificial Intelligence

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

The paper proposes a multimodal Reliable Feedback Network (RFNet) and a consistency‑regularized fine‑tuning method (RFFT) that dramatically increase the proportion of usable advertising images generated by diffusion models while preserving visual appeal, and introduces the large‑scale RF1M dataset for training and evaluation.

RFNetadvertising imagesdiffusion models

0 likes · 9 min read

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

JD Retail Technology

Nov 14, 2024 · Artificial Intelligence

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.

AIECCV2024advertisement generation

0 likes · 8 min read

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

JD Tech Talk

Nov 14, 2024 · Artificial Intelligence

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

This paper presents a multimodal Reliable Feedback Network (RFNet) and a consistency regularization method that use human feedback to automatically evaluate and fine‑tune diffusion models, dramatically increasing the usable rate of e‑commerce advertising images while preserving visual quality.

Computer VisionHuman FeedbackRFNet

0 likes · 8 min read

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

JD Cloud Developers

Nov 14, 2024 · Artificial Intelligence

Boosting Advertising Image Generation Reliability with Human Feedback

This paper presents a multimodal Trustworthy Feedback Network (RFNet) and a consistency regularization method that use human feedback to dramatically improve the usability and visual quality of automatically generated e‑commerce advertising images while reducing manual inspection costs.

AIHuman FeedbackReliability

0 likes · 9 min read

Boosting Advertising Image Generation Reliability with Human Feedback

NewBeeNLP

Nov 11, 2024 · Artificial Intelligence

Inside MIT’s Deep Generative Models Course: Topics, Schedule, and Resources

MIT’s 6.S978 Deep Generative Models seminar, taught by Associate Professor He Kaiming, offers graduate students a 15‑week deep dive into VAEs, autoregressive models, GANs, diffusion techniques, and cross‑disciplinary applications, with detailed weekly topics, required assignments, and publicly available lecture PDFs.

Deep Generative ModelsGANHe Kaiming

0 likes · 5 min read

Inside MIT’s Deep Generative Models Course: Topics, Schedule, and Resources

Tencent Cloud Developer

Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPComputer Vision

0 likes · 19 min read

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

Alibaba Cloud Big Data AI Platform

Oct 16, 2024 · Artificial Intelligence

How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud AI Platform PAI and South China University of Technology at ACM MM 2024, leverages linguistic dependency parsing to guide cross‑attention in Stable Diffusion, enabling accurate, training‑free multi‑object image editing while preserving spatial structure and achieving state‑of‑the‑art results on benchmark datasets.

AI researchStable DiffusionVICTORIA

0 likes · 10 min read

How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion

Alibaba Cloud Big Data AI Platform

Oct 15, 2024 · Artificial Intelligence

How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud's PAI team at ACM MM2024, leverages linguistic dependency parsing and cross‑attention control to overcome multi‑object editing challenges in training‑free text‑guided image editing, delivering precise, structure‑preserving results across diverse scenes.

AI researchdiffusion modelsimage manipulation

0 likes · 6 min read

How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion

Xiaohongshu Tech REDtech

Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDistillationdiffusion models

0 likes · 9 min read

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Alimama Tech

Aug 16, 2024 · Artificial Intelligence

SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling

SPLAM (Sub‑Path Linear Approximation Model) accelerates diffusion‑model image synthesis by linearly approximating short sub‑paths of the probability‑flow ODE, allowing high‑quality generation in as few as four steps, outperforming prior fast‑sampling methods on COCO benchmarks and being deployed in Alibaba Mama’s recommendation system.

AI image generationSPLAMdiffusion models

0 likes · 11 min read

SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling

Alibaba Cloud Big Data AI Platform

Aug 11, 2024 · Artificial Intelligence

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.

AI researchRetrieval Augmented Generationdiffusion models

0 likes · 11 min read

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Kuaishou Tech

Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsModel Evaluation

0 likes · 27 min read

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

Tencent Advertising Technology

Jul 31, 2024 · Artificial Intelligence

MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis

MimicMotion is a controllable video generation framework that produces smooth, high-quality human motion videos by leveraging skeletal action guidance, addressing challenges in video generation such as limited length, weak controllability, and lack of dynamic detail.

AIMimicMotionVideo Generation

0 likes · 13 min read

MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis

AntTech

Jul 24, 2024 · Artificial Intelligence

EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation

EchoMimic, an open‑source project from Ant Group, presents a flexible, audio‑ and pose‑driven digital human generation pipeline that combines 2D, 3D and AIGC techniques, reduces production costs, achieves real‑time inference, and includes a detailed architecture, related work analysis, and future research directions.

AIGCDigital Humanaudio-driven animation

0 likes · 18 min read

EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation

Architecture and Beyond

Jul 7, 2024 · Artificial Intelligence

How Does ControlNet Extend Stable Diffusion for Precise Image Generation?

This article explains the core principles of Stable Diffusion, its training pipeline and limitations, then details how ControlNet adds controllable signals to diffusion models, outlines its architecture, ecosystem of model variants, and showcases diverse real‑world applications.

AIComputer VisionControlNet

0 likes · 16 min read

How Does ControlNet Extend Stable Diffusion for Precise Image Generation?

Kuaishou Large Model

Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIComputer VisionI2V-Adapter

0 likes · 9 min read

How I2V-Adapter Turns Images into Videos with Minimal Training

Kuaishou Tech

Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIComputer VisionI2V-Adapter

0 likes · 8 min read

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

Huolala Tech

May 23, 2024 · Artificial Intelligence

How to Detect and Remove Moiré Patterns with AI and Diffusion Models

This article explains the nature of moiré patterns in digital imaging, reviews manual mitigation techniques, introduces direct and indirect AI‑based recognition methods—including traditional feature extraction and deep‑learning models such as CNNs and diffusion frameworks—and details practical applications and evaluation metrics used by Huolala.

AIComputer VisionDeep Learning

0 likes · 17 min read

How to Detect and Remove Moiré Patterns with AI and Diffusion Models

DataFunTalk

May 3, 2024 · Artificial Intelligence

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

This article reviews the rapid progress of text‑to‑video generation, explains diffusion‑based video synthesis, outlines key technical challenges such as motion modeling, semantic alignment and quality, and presents Tencent’s solutions and real‑world applications, while also discussing future directions and the impact of OpenAI’s Sora model.

AISoraTencent

0 likes · 23 min read

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

Rare Earth Juejin Tech Community

May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models

0 likes · 10 min read

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

JD Tech

Apr 29, 2024 · Artificial Intelligence

Relation-Aware Diffusion Models for Automated Poster Layout and Product Background Generation

This article presents JD Advertising's 2023 AI-driven framework that uses a relation‑aware diffusion model with visual‑text and geometric modules, combined with category‑common and personalized generators and a planning‑and‑rendering network, to automate high‑quality, scalable e‑commerce poster creation and background synthesis.

Multimodal AIdiffusion modelse-commerce advertising

0 likes · 18 min read

Relation-Aware Diffusion Models for Automated Poster Layout and Product Background Generation

Xiaohe Frontend Team

Apr 21, 2024 · Artificial Intelligence

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

The article reviews the latest breakthroughs in generative AI, including Microsoft’s VASA‑1 video synthesis model, Meta’s open‑source Llama‑3 large language model, Stability AI’s Stable Diffusion 3 API, Adobe’s integration of third‑party AI video tools into Premiere Pro, and a free image‑style‑recreation platform from Freepik, highlighting their technical details and potential applications.

AI toolsdiffusion modelsgenerative AI

0 likes · 13 min read

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

DaTaobao Tech

Mar 25, 2024 · Artificial Intelligence

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

AIAIGCVideo Editing

0 likes · 13 min read

Survey of AIGC Video Generation Algorithms

Rare Earth Juejin Tech Community

Feb 28, 2024 · Artificial Intelligence

A Survey of Multimodal Image Synthesis and Editing with Generative AI

This comprehensive review examines the rapid advances in generative AI for multimodal image synthesis and editing, covering visual, textual, and audio guidance, model families such as GANs, diffusion, autoregressive, and NeRF, as well as datasets, challenges, and future research directions.

GANNeRFdiffusion models

0 likes · 6 min read

A Survey of Multimodal Image Synthesis and Editing with Generative AI

Architect

Feb 22, 2024 · Artificial Intelligence

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

The article provides a comprehensive technical overview of OpenAI’s Sora text‑to‑video model, explaining its background, underlying diffusion‑Transformer architecture, key breakthroughs, potential industry impacts, success factors, limitations, and future prospects for AI‑generated video content.

AIOpenAISora

0 likes · 15 min read

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

Rare Earth Juejin Tech Community

Feb 19, 2024 · Artificial Intelligence

Technical Review of OpenAI's Sora Video Generation Model

This article reviews OpenAI's Sora video generation model, summarizing its technical report, key innovations such as patch-based visual tokens, compression networks, scaling transformers, language understanding, and discussing its capabilities, highlights, and current limitations in AI video synthesis.

AIOpenAISora

0 likes · 9 min read

Technical Review of OpenAI's Sora Video Generation Model

21CTO

Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchSoraTransformer

0 likes · 19 min read

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

Architect

Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI SafetyOpenAISora

0 likes · 12 min read

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

Xiaohongshu Tech REDtech

Feb 9, 2024 · Artificial Intelligence

How InstantID Generates High‑Fidelity Holiday Portraits in 30 Seconds

InstantID is a plug‑in adapter that adds identity‑preserving capabilities to text‑to‑image diffusion models, allowing users to upload a single photo and, within 30 seconds, produce a Spring Festival‑styled portrait with accurate facial features, customizable prompts, and strong text control.

AI image generationHugging FaceInstantID

0 likes · 7 min read

How InstantID Generates High‑Fidelity Holiday Portraits in 30 Seconds

Alimama Tech

Jan 24, 2024 · Artificial Intelligence

Hierarchical Masked 3D Diffusion Model for Video Outpainting

The Hierarchical Masked 3D Diffusion Model (M3DDM) introduces a masking‑based training strategy and cross‑attention with global video clips to achieve temporally consistent video outpainting, while a hybrid coarse‑to‑fine inference pipeline mitigates error accumulation, delivering state‑of‑the‑art results and deployment in Alibaba’s creative center.

3D diffusionACM MM2023AI video processing

0 likes · 12 min read

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Baidu Geek Talk

Dec 19, 2023 · Industry Insights

Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks

The second Baidu Search Innovation Contest attracted over 2,800 participants from 45 regions, featured five AI‑focused tracks, and highlighted champion teams that employed techniques such as Lora‑fine‑tuned LLMs, vector‑intersection Top‑K search, GPU‑optimized algorithms, and diffusion‑based image generation to push the boundaries of search technology.

AI competitionGPU OptimizationLLM fine-tuning

0 likes · 12 min read

Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks

DaTaobao Tech

Dec 4, 2023 · Artificial Intelligence

AIGC Poster Generation Project: Methods and Optimizations

The AIGC Poster Generation Project employs Stable Diffusion enhanced with VAE, ControlNet, LoRA and other extensions to create product posters in four visual styles, exploring outpainting, inpainting, reference‑based diffusion and DreamBooth prototypes, and optimizes detail preservation, super‑resolution text, and masking to achieve over 90% detail fidelity, 95% success rate, and 3–5 second inference per image.

AIGCControlNetStable Diffusion

0 likes · 7 min read

AIGC Poster Generation Project: Methods and Optimizations

JD Retail Technology

Nov 23, 2023 · Artificial Intelligence

Recent Advances in Advertising Recommendation Algorithms and Their Applications

This article reviews recent progress in advertising recommendation technologies, covering deep learning‑based ranking, sequence modeling, self‑supervised learning, online and reinforcement learning, multimodal recommendation, and fairness, and details four key breakthroughs—data‑driven incremental learning, dynamic group parameter modeling, bilateral interactive graph convolution, and a relation‑aware diffusion model for poster layout generation, along with experimental results and future challenges.

Deep LearningIncremental Learningadvertising recommendation

0 likes · 25 min read

Recent Advances in Advertising Recommendation Algorithms and Their Applications

DataFunTalk

Nov 15, 2023 · Artificial Intelligence

Contextual Learning for Personalized Text‑to‑Image Generation

This article explains how contextual learning can enhance text‑to‑image models by incorporating example image‑text pairs, redesigning the UNet architecture, building large in‑context training datasets, and training the SuTI model to achieve fast, controllable, and high‑quality personalized image generation.

AIcontextual learningdiffusion models

0 likes · 24 min read

Contextual Learning for Personalized Text‑to‑Image Generation

php Courses

Nov 14, 2023 · Artificial Intelligence

Google and UC Berkeley Introduce Idempotent Generative Network (IGN) as a New Generative AI Method

Google, in collaboration with UC Berkeley, has unveiled a novel generative AI approach called the Idempotent Generative Network (IGN) that can produce images from any input in a single step, offering an alternative to GANs, diffusion models, and consistency models.

GANIGNdiffusion models

0 likes · 3 min read

Google and UC Berkeley Introduce Idempotent Generative Network (IGN) as a New Generative AI Method

DataFunSummit

Nov 4, 2023 · Artificial Intelligence

AIGC Generation Models and Diffusion‑Based Planning for Embodied AI

This article explores powerful AIGC generation models and large language models like ChatGPT, detailing how diffusion models can be applied to robotic planning, introducing AdaptDiffuser, self‑evolving data generation, and embodied AI challenges, while summarizing recent research and practical implementations.

AIGCAdaptDiffuserEmbodied AI

0 likes · 20 min read

AIGC Generation Models and Diffusion‑Based Planning for Embodied AI

Bilibili Tech

Nov 1, 2023 · Artificial Intelligence

Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications

Professor Hu Qiang presented recent progress in Neural Radiance Fields—covering implicit/explicit representations, hybrid models, and solutions for dynamic scenes, cloud‑based and edge‑cloud rendering—while also reviewing generative AI advances such as diffusion‑based text‑to‑image/video/3D, LoRA fine‑tuning, and large‑scale story‑book datasets, highlighting applications in virtual‑real content, smart‑city modeling, and 6‑DoF e‑commerce displays.

3D reconstructionNeRFdiffusion models

0 likes · 14 min read

Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications

Alibaba Cloud Big Data AI Platform

Oct 24, 2023 · Artificial Intelligence

Accelerating Diffusion Model Sampling with OLSS: A Linear Subspace Approach

This article presents the OLSS (Optimal Linear Subspace Search) algorithm, a novel diffusion‑model sampling accelerator that models acceleration as a linear subspace expansion, unifies existing methods, introduces trainable scheduler coefficients solved via least‑squares, and demonstrates significant speed and quality gains on Stable Diffusion benchmarks.

AIdiffusion modelsimage generation

0 likes · 8 min read

Accelerating Diffusion Model Sampling with OLSS: A Linear Subspace Approach

Alibaba Cloud Big Data AI Platform

Oct 23, 2023 · Artificial Intelligence

How the New OLSS Algorithm Supercharges Diffusion Model Sampling

The article announces that Alibaba Cloud’s AI platform PAI and ECNU researchers’ paper on the Optimal Linear Subspace Search (OLSS) algorithm was selected for CIKM 2023, explains how OLSS accelerates diffusion‑model sampling by operating in higher‑dimensional linear subspaces, and provides details of the paper and its visual results.

OLSSdiffusion modelsgenerative AI

0 likes · 5 min read

How the New OLSS Algorithm Supercharges Diffusion Model Sampling

AntTech

Sep 27, 2023 · Artificial Intelligence

DiffUTE: A Universal Multilingual Text Editing Diffusion Model for High-Fidelity Image Text Manipulation

The article presents DiffUTE, an end‑to‑end self‑supervised multilingual text‑editing diffusion model that leverages fine‑grained position and glyph guidance together with large language model control to achieve high‑quality, high‑fidelity text modifications in images, and demonstrates its effectiveness through extensive experiments and real‑world deployments at Ant Group.

AIAIGCText Editing

0 likes · 8 min read

DiffUTE: A Universal Multilingual Text Editing Diffusion Model for High-Fidelity Image Text Manipulation

58UXD

Sep 21, 2023 · Artificial Intelligence

How AI is Redefining Design: From Midjourney to Stable Diffusion

This article explores the rise of AI‑driven design, examines real‑world projects such as AI‑Soup, virtual jewelry, and e‑commerce checkpoints, explains diffusion model fundamentals, compares major tools, and discusses the opportunities and risks for creators and businesses adopting generative AI.

AI designcreative workflowdiffusion models

0 likes · 13 min read

How AI is Redefining Design: From Midjourney to Stable Diffusion

Alimama Tech

Aug 2, 2023 · Artificial Intelligence

Can AI Fully Automate Advertising Poster Creation and Video Outpainting?

This article reviews four ACM MM 2023 papers that introduce AI‑driven systems for automatic advertising poster generation, multimodal text‑image creation, few‑shot style‑guided visual captioning, and hierarchical 3D diffusion models for video outpainting, detailing their methods, datasets, and experimental results.

AI-generated designPoster Automationdiffusion models

0 likes · 9 min read

Can AI Fully Automate Advertising Poster Creation and Video Outpainting?

Alibaba Cloud Big Data AI Platform

Jul 10, 2023 · Artificial Intelligence

Alibaba’s PAI Platform Powers Three Groundbreaking ACL 2023 AI Papers

Three papers from Alibaba Cloud's PAI platform were selected for ACL 2023, showcasing FashionKLIP for e‑commerce image‑text retrieval, ConaCLIP's lightweight dual‑encoder distillation, and a fast domain‑specific diffusion model, all of which will be open‑sourced for the AI community.

ACL 2023Knowledge GraphMultimodal AI

0 likes · 8 min read

Alibaba’s PAI Platform Powers Three Groundbreaking ACL 2023 AI Papers

Top Architect

May 8, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

This article provides a comprehensive overview of Stable Diffusion, covering its latent diffusion architecture, training data and procedures, model components such as autoencoder, CLIP text encoder and UNet, as well as practical usage examples including text‑to‑image generation, image‑to‑image, inpainting, and advanced extensions like ControlNet and SD‑2.x.

AI image generationStable Diffusiondiffusion models

0 likes · 52 min read

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

Tencent Cloud Developer

Apr 10, 2023 · Artificial Intelligence

How Computers Generate Realistic Images: An In‑Depth Guide to AI Image Generation, Diffusion Models, ControlNet, LoRA and More

This guide explains how AI creates photorealistic images, tracing the shift from VAEs and GANs to diffusion models, detailing latent diffusion, ControlNet conditioning, CLIP text‑image alignment, and lightweight fine‑tuning methods like DreamBooth and LoRA, plus practical tips for higher‑resolution results.

AI image generationControlNetLoRA

0 likes · 22 min read

How Computers Generate Realistic Images: An In‑Depth Guide to AI Image Generation, Diffusion Models, ControlNet, LoRA and More

DataFunTalk

Mar 7, 2023 · Artificial Intelligence

Overview of Text‑Controlled Image Generation Models: DALL‑E‑2, Imagen, Latent Stable Diffusion, and ControlNet

This article surveys the key challenges of controllable text‑to‑image generation and explains the architectures, components, and training details of major diffusion‑based models such as DALL‑E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension.

AIControlNetDALL-E-2

0 likes · 16 min read

Overview of Text‑Controlled Image Generation Models: DALL‑E‑2, Imagen, Latent Stable Diffusion, and ControlNet

Laiye Technology Team

Mar 3, 2023 · Artificial Intelligence

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

This article reviews the key components and design choices of recent text‑controlled image generation systems—including DALL·E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension—highlighting how diffusion models, text encoders, prior modules, super‑resolution, and conditioning mechanisms enable high‑quality, controllable visual synthesis.

AIControlNetDALL-E-2

0 likes · 16 min read

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

Alibaba Cloud Big Data AI Platform

Dec 12, 2022 · Artificial Intelligence

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

This article introduces Alibaba Cloud’s open‑source PAI‑Diffusion series, detailing its Latent Diffusion Model foundation, Chinese CLIP alignment, super‑resolution components, and showcases diverse artistic and real‑world text‑to‑image generation scenarios, while providing guidance on accessing the models via Alibaba Cloud AI Center, PAI‑DSW, and HuggingFace Space.

Alibaba CloudChinese AIdiffusion models

0 likes · 11 min read

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

IT Services Circle

Jun 6, 2022 · Artificial Intelligence

AI Image Generation Showdown: Google Imagen vs OpenAI DALL·E on the "Tiger Wearing VR" Prompt

The article compares Google’s Imagen and OpenAI’s DALL·E by feeding them the whimsical "Tiger Wearing VR" prompt, showcasing each model’s visual style, underlying architecture—including CLIP, diffusion, and T5‑XXL—and community reactions to the resulting AI‑generated artwork.

AICLIPGoogle Imagen

0 likes · 5 min read

AI Image Generation Showdown: Google Imagen vs OpenAI DALL·E on the "Tiger Wearing VR" Prompt

Code DAO

May 26, 2022 · Artificial Intelligence

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.

Generative ModelsMarkov chainNeural Networks

0 likes · 8 min read

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process