Tagged articles
154 articles
Page 1 of 2
Geek Labs
Geek Labs
May 20, 2026 · Frontend Development

Two Open‑Source AI Tools to Auto‑Generate HTML Slides and Hand‑Drawn Technical Diagrams

This article introduces two open‑source projects—beautiful‑html‑templates, which lets an AI generate complete HTML slide decks from plain instructions, and ian‑handdrawn‑ppt, which converts articles or outlines into a series of Chinese hand‑drawn style technical illustration images—detailing their features, usage steps, target users, and limitations.

AIHTML slidesfrontend
0 likes · 9 min read
Two Open‑Source AI Tools to Auto‑Generate HTML Slides and Hand‑Drawn Technical Diagrams
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

Luma released the Uni‑1.1 image‑generation API, which ranks third on the Arena blind‑test leaderboard, offers sub‑half‑price per image, and demonstrates production‑grade capabilities such as multi‑reference fusion, multi‑turn editing, and a decoder‑only transformer that jointly models text and image tokens.

API pricingBenchmarkLuma
0 likes · 13 min read
Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Open Generative AI is an open‑source, MIT‑licensed desktop suite that bundles over 200 cutting‑edge image, video, and lip‑sync models into four dedicated studios, offering unrestricted generation without content filters, subscription fees, or closed ecosystems, and provides online, desktop, and self‑hosted deployment options.

AI media generationMIT licenseOpen Generative AI
0 likes · 6 min read
Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation
Machine Heart
Machine Heart
Apr 24, 2026 · Artificial Intelligence

Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap

DeepMind’s Vision Banana model demonstrates that large‑scale image‑generation pre‑training can produce powerful, universal visual representations, achieving state‑of‑the‑art results on segmentation, depth, and normal estimation without task‑specific heads, thereby supporting the hypothesis that generation and understanding are fundamentally linked.

DeepMindVision Bananagenerative AI
0 likes · 13 min read
Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap
Architect's Must-Have
Architect's Must-Have
Apr 23, 2026 · Artificial Intelligence

OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”

The article provides a comprehensive technical analysis of OpenAI's ChatGPT Images 2.0 (gpt‑image‑2), detailing its strategic launch, new autoregressive architecture, integrated reasoning and web‑search capabilities, multi‑image consistency, pricing model, competitive landscape, limitations, and future impact on visual AI workflows.

AI ArchitectureGPT Image 2Multimodal AI
0 likes · 28 min read
OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”
Geek Labs
Geek Labs
Apr 23, 2026 · Artificial Intelligence

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

From the rapid rise of prompt‑engineering in 2025‑2026, this article reviews seven standout open‑source GitHub repositories—covering Nano Banana Pro, GPT‑Image‑2, multi‑model prompts, and video generation—detailing their stars, content structure, multilingual support, and ideal use cases for creators.

AI prompt engineeringGitHubNano Banana Pro
0 likes · 14 min read
7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)
SuanNi
SuanNi
Apr 22, 2026 · Artificial Intelligence

OpenAI’s ChatGPT Images 2.0: A Leap Ahead in AI‑Generated Visual Design

OpenAI’s newly released ChatGPT Images 2.0 transforms image generation into a full‑featured visual design system, delivering 2K resolution, multilingual text rendering, complex layout handling, and up to eight concurrent images, while also exposing current physical limits such as intricate spatial puzzles.

AIChatGPTimage generation
0 likes · 9 min read
OpenAI’s ChatGPT Images 2.0: A Leap Ahead in AI‑Generated Visual Design
IT Services Circle
IT Services Circle
Apr 22, 2026 · Artificial Intelligence

GPT-Image-2 Launches: How Designers Can Ditch Old‑School Workflows

OpenAI's newly released ChatGPT Images 2.0 (GPT‑Image‑2) lets users generate photorealistic screenshots, posters, and even homework from ultra‑short prompts, outperforms the previous Nano Banana model, supports 2K resolution, multi‑language input, and is already available via API with pricing details.

AI modelChatGPT Images 2.0OpenAI
0 likes · 7 min read
GPT-Image-2 Launches: How Designers Can Ditch Old‑School Workflows
DaTaobao Tech
DaTaobao Tech
Apr 22, 2026 · Artificial Intelligence

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

MNN‑Sana‑Edit‑V2, a collaborative effort between Taobao’s Meta team and Hangzhou University, combines a frozen Qwen3‑0.6B LLM, Learnable Query, Connector, Linear DiT and Deep Compression Autoencoder with 4/8‑bit quantization to run fully on mobile devices, delivering 512×512 comic‑style conversions in about 15 seconds—2.5× faster than cloud alternatives—while providing open‑source code, detailed training stages, and extensive performance benchmarks.

Mobile AIModel Quantizationdiffusion
0 likes · 13 min read
How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds
Java Architecture Diary
Java Architecture Diary
Apr 22, 2026 · Artificial Intelligence

Why OpenAI’s gpt-image-2 Turns Image Generation into a Practical Tool

OpenAI’s new gpt-image-2 model improves dense Chinese text rendering, follows detailed prompts more reliably, and offers precise edit capabilities, making it suitable for real‑world business graphics such as posters, banners, and dashboards, and the article shows how to integrate it with Spring AI in Java.

AI EditingGPT Image 2Java
0 likes · 7 min read
Why OpenAI’s gpt-image-2 Turns Image Generation into a Practical Tool
Design Hub
Design Hub
Apr 15, 2026 · Artificial Intelligence

Overnight AI Shifts: Core Models, Agents, Design Tools, and More

A rapid roundup of today’s AI news shows the industry moving beyond marginal model gains toward lower cost and latency, agents entering task and browser workflows, redesign of the design‑code gap, 3D/web expansion, and open‑source tools reaching smaller teams.

AIChip Collaborationagents
0 likes · 8 min read
Overnight AI Shifts: Core Models, Agents, Design Tools, and More
Machine Heart
Machine Heart
Apr 10, 2026 · Artificial Intelligence

AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models

AdaGen replaces handcrafted static schedules in multi‑step image generators with a universal, learnable policy network trained via reinforcement learning, using an MDP formulation, adversarial rewards and action smoothing, achieving consistent quality and efficiency gains across diffusion, autoregressive, mask and flow models while adding negligible overhead.

MDPaction smoothingadaptive policy
0 likes · 11 min read
AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models
Machine Heart
Machine Heart
Apr 10, 2026 · Artificial Intelligence

Keeping Image Quality with Only 20 Diffusion Steps: The TC‑Padé Acceleration Method

TC‑Padé uses a Padé‑based residual prediction framework, step‑aware strategies, and a trajectory‑stability indicator to accelerate diffusion sampling to as few as 20 steps while preserving visual fidelity, achieving up to 2.88× speed‑up on image generation and 1.72× on video generation.

Inference AccelerationPadé approximationTC-Padé
0 likes · 12 min read
Keeping Image Quality with Only 20 Diffusion Steps: The TC‑Padé Acceleration Method
Machine Heart
Machine Heart
Apr 5, 2026 · Artificial Intelligence

GPT-Image-2 Leak Sparks Fear That Nano Banana Pro Is About to Be Dethroned

A leaked GPT-Image-2 model, tested under codenames like maskingtape-alpha, shows dramatically improved text rendering, world‑knowledge understanding and image editing that many claim surpasses Google’s Nano Banana Pro, prompting a perceived paradigm shift in multimodal AI generation.

AI model comparisonGPT Image 2Multimodal AI
0 likes · 5 min read
GPT-Image-2 Leak Sparks Fear That Nano Banana Pro Is About to Be Dethroned
SuanNi
SuanNi
Apr 3, 2026 · Artificial Intelligence

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

The article presents the GEMS (Agent‑Native Multimodal Generation with Memory and Skills) framework, detailing its multi‑agent loop, hierarchical memory compression, on‑demand skill modules, and extensive benchmark results that show a lightweight 6B model surpassing larger proprietary systems on complex image‑generation tasks.

GEMSMultimodal AISkill Library
0 likes · 14 min read
How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators
vivo Internet Technology
vivo Internet Technology
Apr 1, 2026 · Artificial Intelligence

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

CVPR 2026Classifier-Free GuidancePlug-and-Play
0 likes · 7 min read
Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation
NiuNiu MaTe
NiuNiu MaTe
Mar 30, 2026 · Artificial Intelligence

Which AI Drawing Tool Is Best for Scientific Comics? A Hands‑On Comparison

This article reviews three AI‑powered illustration platforms—Banana Painter, NotebookLM, and JiMeng AI—detailing their free and paid plans, usage methods, visual styles, strengths and weaknesses, and provides a side‑by‑side comparison table to help creators choose the most suitable tool for scientific comic production.

AITool comparisonillustration
0 likes · 7 min read
Which AI Drawing Tool Is Best for Scientific Comics? A Hands‑On Comparison
AIWalker
AIWalker
Mar 20, 2026 · Artificial Intelligence

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

The paper introduces reAR, a plug‑and‑play regularization framework that aligns generator and tokenizer representations in visual autoregressive models, dramatically improving image quality and matching large diffusion models while using far fewer parameters, and validates the approach with extensive experiments, ablations, and scalability analysis.

AI researchRegularizationimage generation
0 likes · 20 min read
Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters
Design Hub
Design Hub
Mar 19, 2026 · Artificial Intelligence

Why This AI-Generated Image Post Gets Saved: A Deep Dive into Its Prompt Structure

By dissecting a popular AI‑generated image series, the article reveals how a carefully ordered prompt—detailing character, action, setting, lighting, lens effects, and emotion—creates a reusable template that transforms simple tags into vivid, collectible visual narratives.

AI promptimage generationlighting
0 likes · 10 min read
Why This AI-Generated Image Post Gets Saved: A Deep Dive into Its Prompt Structure
AIWalker
AIWalker
Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchInternVL-Uimage generation
0 likes · 22 min read
How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants
AIWalker
AIWalker
Mar 12, 2026 · Artificial Intelligence

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

Mind-Brush introduces a ‘think‑research‑create’ agentic framework that unifies intent analysis, multimodal evidence retrieval, and knowledge‑driven reasoning to transform text‑to‑image generation from static decoding into an active cognitive workflow, achieving large accuracy gains on the new Mind‑Bench benchmark and surpassing existing SOTA models.

Agentic AIBenchmarkMind-Brush
0 likes · 15 min read
Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 6, 2026 · Artificial Intelligence

15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana

The article reviews Uni‑1, a decoder‑only transformer that unifies visual understanding and generation, details its architecture, benchmark superiority on RISEBench and ODinW‑13, showcases diverse visual examples where it outperforms GPT Image 1.5 and Nano Banana Pro, and highlights the small elite team behind the breakthrough.

AI researchLuma AIMultimodal AI
0 likes · 14 min read
15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana
DataFunTalk
DataFunTalk
Feb 27, 2026 · Artificial Intelligence

Google’s Nano Banana 2: Turning Image Generation into a Scalable Creation Engine

Google’s Nano Banana 2 (Gemini 3.1 Flash Image) upgrades image generation with real‑time web knowledge, clearer text rendering, consistent character/object handling, and broad product integration, positioning the model as a fast, configurable rendering engine rather than a niche creative tool.

AI modelsGeminiGoogle AI
0 likes · 9 min read
Google’s Nano Banana 2: Turning Image Generation into a Scalable Creation Engine
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchImageNetdiffusion models
0 likes · 6 min read
Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality
AI Engineering
AI Engineering
Jan 28, 2026 · Artificial Intelligence

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba's Tongyi releases the Z-Image base model, a non‑distilled diffusion transformer that supports full classifier‑free guidance, negative prompts, higher diversity, and fine‑tuning, contrasting with the faster Turbo variant and providing detailed usage instructions and community resources.

AlibabaClassifier-Free GuidanceDiffusion Transformer
0 likes · 4 min read
Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support
Woodpecker Software Testing
Woodpecker Software Testing
Jan 27, 2026 · Artificial Intelligence

How to Build a Multimodal AI Assistant with FastAPI, Alibaba Cloud and DashScope

This guide walks through configuring Alibaba Cloud credentials, implementing a FastAPI backend with email function calling, Alibaba OpenSearch, image generation via DashScope, speech recognition, and a responsive HTML/CSS/JavaScript front‑end that supports text chat, image recognition, image synthesis, and voice interaction.

Alibaba CloudDashscopeFastAPI
0 likes · 38 min read
How to Build a Multimodal AI Assistant with FastAPI, Alibaba Cloud and DashScope
AI Insight Log
AI Insight Log
Jan 22, 2026 · Artificial Intelligence

Cursor 2.4 Adds Subagents—AI Becomes a Project Manager and Generates UI Mockups Instantly

Cursor 2.4 launches with Subagents that enable parallel, specialized AI assistants, improving context handling and speed, plus a Google‑powered image generator for UI mockups, an AI‑enhanced blame feature for code attribution, proactive clarification questions, and numerous performance upgrades such as a ten‑fold faster built‑in browser and 40× faster hooks.

AICode AttributionCursor
0 likes · 8 min read
Cursor 2.4 Adds Subagents—AI Becomes a Project Manager and Generates UI Mockups Instantly
Tencent Cloud Developer
Tencent Cloud Developer
Jan 14, 2026 · Artificial Intelligence

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

This guide explains how to use advanced AI models such as Gemini, Midjourney, and Stable Diffusion to expand brief, informal user descriptions into comprehensive, high‑quality English prompts that include visual style, subject details, environment, lighting, and camera parameters for image or video generation.

AI prompt engineeringMidjourneyPrompt Design
0 likes · 14 min read
Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide
Design Hub
Design Hub
Dec 28, 2025 · Artificial Intelligence

AI‑Assisted Design: Prompt Recipes for Nano Banana Pro E‑Commerce Ads

The article showcases a series of AI‑generated ultra‑realistic commercial visuals, breaks down the exact prompt language behind each, and explains the design insights that turn imaginative concepts into high‑impact advertising imagery for products like Nano Banana Pro.

AIPrompt engineeringcommercial advertising
0 likes · 10 min read
AI‑Assisted Design: Prompt Recipes for Nano Banana Pro E‑Commerce Ads
AI Algorithm Path
AI Algorithm Path
Dec 17, 2025 · Artificial Intelligence

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

Black Forest Labs released Flux.2 Max, the top‑performing model in the Flux.2 series featuring real‑time context generation, superior texture handling, and strong instruction following, ranking second on the Artificial Analysis leaderboard, with detailed examples, API usage, and pricing information provided.

AI modelAPIBenchmark
0 likes · 11 min read
Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model
Design Hub
Design Hub
Dec 9, 2025 · Artificial Intelligence

AI Frontiers: GLM‑4.6V, AutoGLM 2.0 & RealGen for Designers & Developers

The article reviews three recent AI breakthroughs—GLM‑4.6V’s multimodal large‑model with 128K context and native function calling, AutoGLM 2.0’s open‑source mobile‑operating AI agent, and RealGen’s detector‑rewarded image generator that achieves a 50.15% realism win rate—highlighting how they expand toolkits for designers and developers.

AI agentsAutoGLMGLM-4.6V
0 likes · 11 min read
AI Frontiers: GLM‑4.6V, AutoGLM 2.0 & RealGen for Designers & Developers
Kuaishou Tech
Kuaishou Tech
Dec 3, 2025 · Artificial Intelligence

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

This article presents a novel paradigm—Latent Reward Model (LRM) and Latent Preference Optimization (LPO)—that repurposes diffusion models as noise‑aware latent reward models for step‑level preference optimization, addressing the shortcomings of pixel‑level reward models, introducing multi‑preference consistent filtering, and demonstrating significant performance and efficiency gains on benchmarks such as PickScore and T2I‑CompBench++.

AI Alignmentdiffusion modelsimage generation
0 likes · 9 min read
Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization
Kuaishou Tech
Kuaishou Tech
Nov 25, 2025 · Artificial Intelligence

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Flow‑GRPO introduces online reinforcement learning into flow‑matching models by converting deterministic ODE sampling to stochastic SDE sampling and reducing denoising steps, raising SD‑3.5‑Medium's GenEval accuracy from 63% to 95%—surpassing GPT‑4o—and demonstrating strong gains in complex composition, text rendering, and human‑preference alignment across multiple generative tasks.

AI researchDeep Learningflow matching
0 likes · 8 min read
How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning
Kuaishou Tech
Kuaishou Tech
Nov 14, 2025 · Artificial Intelligence

How GRPO‑Guard Stops Over‑Optimization in Flow‑Based Visual Generators

This article explains the over‑optimization problem in GRPO‑based flow models, analyzes why importance‑ratio clipping fails, and introduces GRPO‑Guard with RatioNorm and cross‑step gradient balancing, showing through extensive experiments that it stabilizes training and improves image quality across multiple diffusion backbones and tasks.

GRPO-Guardflow matchinggenerative AI
0 likes · 9 min read
How GRPO‑Guard Stops Over‑Optimization in Flow‑Based Visual Generators
AntTech
AntTech
Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

image generationlarge language modelmultimodal
0 likes · 8 min read
Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech
Alimama Tech
Alimama Tech
Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCComputer Vision
0 likes · 11 min read
How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On
Data Party THU
Data Party THU
Oct 6, 2025 · Artificial Intelligence

Why Data, Not Architecture, Drives Locality in Diffusion Models

A recent MIT‑Toyota study shows that the locality observed in image diffusion models emerges from the statistical structure of training data rather than from architectural biases, and a simple linear denoiser can replicate this behavior, reshaping how we think about model design.

Data StatisticsU-Netdiffusion models
0 likes · 10 min read
Why Data, Not Architecture, Drives Locality in Diffusion Models
Data Party THU
Data Party THU
Aug 31, 2025 · Artificial Intelligence

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

Google’s Gemini 2.5 Flash model, codenamed “Nano Banana”, dramatically improves visual quality, natural editing, identity consistency, instruction following, and generation speed, while researchers discuss its new metrics, interleaved generation capabilities, comparisons with Imagen, and future directions for smarter, more factual multimodal AI.

AI modelGeminiimage generation
0 likes · 23 min read
How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing
JD Cloud Developers
JD Cloud Developers
Aug 27, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

This article details how JD.com’s AI‑driven virtual fitting solution, integrated with an A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, cutting production costs to zero, accelerating design cycles, and increasing conversion rates by over 80% during major sales events.

A/B testingAIFashion E‑commerce
0 likes · 14 min read
How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive
Ops Development & AI Practice
Ops Development & AI Practice
Aug 15, 2025 · Artificial Intelligence

How Google’s Imagen 4 Redefines AI Image Generation: Breakthroughs & Prompt Tips

Google’s Imagen 4 family—Ultra, Standard, and Fast—introduces unprecedented realism, reliable text rendering, multilingual prompts, and higher instruction fidelity, while the article explains each model’s trade‑offs and offers concrete prompt‑engineering techniques to help creators harness this next‑generation AI image generator.

AIGoogleImagen 4
0 likes · 8 min read
How Google’s Imagen 4 Redefines AI Image Generation: Breakthroughs & Prompt Tips
AIWalker
AIWalker
Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisInference OptimizationLumina-mGPT
0 likes · 22 min read
Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power
AI Frontier Lectures
AI Frontier Lectures
Jul 31, 2025 · Artificial Intelligence

Can a 32‑Token Compressor Generate Images Without Training?

This article reviews a recent study that demonstrates how a highly compressed one‑dimensional tokenizer, using only 32 discrete tokens and gradient‑based test‑time optimization, can generate high‑quality images without training a separate generative model, and explores its methodology, findings, applications, and limitations.

1D tokenizerAI researchTiTok
0 likes · 10 min read
Can a 32‑Token Compressor Generate Images Without Training?
JD Tech
JD Tech
Jul 23, 2025 · Artificial Intelligence

AI Virtual Try‑On Transforms Fashion E‑Commerce, Raising Conversion 80%

JD Retail’s “JingDianDian” AI virtual try‑on platform leverages a 12‑billion‑parameter Flux‑Fill diffusion model and multimodal pose estimation to automatically create realistic model images and videos, integrates with the JingMai A/B testing system, and delivers up to an 80% boost in conversion while cutting production costs and time dramatically.

A/B testingAIFashion Tech
0 likes · 13 min read
AI Virtual Try‑On Transforms Fashion E‑Commerce, Raising Conversion 80%
Kuaishou Tech
Kuaishou Tech
Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchautoregressive transformerdiffusion models
0 likes · 11 min read
How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer
JD Retail Technology
JD Retail Technology
Jul 15, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: JD’s Innovative Solution

This article details JD Retail Technology’s AI‑driven virtual try‑on system that combines a 12B Flux‑Fill diffusion model with a high‑quality virtual model library and integrates with the JingMai A/B testing platform, cutting production costs to zero, slashing cycle time to half a day, and increasing order conversion rates by over 80% during the 618 shopping festival.

A/B testingAIFashion E‑commerce
0 likes · 13 min read
How AI Virtual Try‑On Boosted Fashion Sales by 80%: JD’s Innovative Solution
AI Frontier Lectures
AI Frontier Lectures
Jul 13, 2025 · Artificial Intelligence

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

HarmoniCa, a new feature‑caching framework co‑designed by HKUST, Beihang University, and SenseTime, tackles diffusion model inference bottlenecks by aligning training and inference through Step‑Wise Denoising Training and an Image Error Proxy Objective, achieving up to 2× speedup while preserving image quality.

diffusion modelsfeature cachingimage generation
0 likes · 9 min read
How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching
Amap Tech
Amap Tech
Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Vision Transformerdiffusion modelsimage generation
0 likes · 10 min read
Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding
JD Tech Talk
JD Tech Talk
Jul 4, 2025 · Artificial Intelligence

How AI‑Driven Virtual Try‑On Boosted Fashion Sales by 80%

This article details how JD.com’s AI-powered virtual try‑on system, integrated with the Jingmai A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, reducing production costs to near zero, cutting design cycles from weeks to hours, and increasing conversion rates by over 80% during major sales events.

A/B testingAIAIGC
0 likes · 14 min read
How AI‑Driven Virtual Try‑On Boosted Fashion Sales by 80%
Kuaishou Large Model
Kuaishou Large Model
Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Test-Time ScalingVideo Generationdiffusion models
0 likes · 8 min read
How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchTest-Time ScalingVideo Generation
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
AI Frontier Lectures
AI Frontier Lectures
Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAautoregressive
0 likes · 16 min read
How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing
AI Frontier Lectures
AI Frontier Lectures
May 19, 2025 · Artificial Intelligence

DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model

DreamO, a collaborative effort by ByteDance and Peking University, introduces a unified 400M‑parameter framework built on Flux‑1.0‑dev that enables simultaneous control of identity, style, appearance, and virtual try‑on, offering open‑source, low‑cost, and fast image customization comparable to commercial large models.

AI researchDreamOFlux model
0 likes · 6 min read
DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model
AI Algorithm Path
AI Algorithm Path
May 15, 2025 · Artificial Intelligence

Understanding Diffusion Models: Core Principles Explained

This article explains the fundamental principles of diffusion models, using physics and machine‑learning analogies to describe forward and reverse diffusion, the role of Gaussian noise, iteration trade‑offs, U‑Net architecture, and shared‑weight training for image generation.

U-Netdiffusion modelsforward diffusion
0 likes · 8 min read
Understanding Diffusion Models: Core Principles Explained
Baidu MEUX
Baidu MEUX
Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIVideo Generationdigital humans
0 likes · 8 min read
Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans
DevOps
DevOps
Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oVQ-VAE
0 likes · 16 min read
The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap
Meituan Technology Team
Meituan Technology Team
Apr 10, 2025 · Artificial Intelligence

Meituan's 10 Papers at CVPR 2025 and ICLR 2025

This article presents concise summaries of ten selected ICLR 2025 and CVPR 2025 papers covering LLM alignment, temporal‑decay DPO, joint‑embedding predictive architecture, 4‑bit quantization, token‑focused VQA, universal visual segmentation, document understanding, fine‑grained spatio‑temporal modeling, visual quality evaluation, and ultra‑high‑resolution diffusion, and also announces face‑to‑face and online sharing sessions hosted by Meituan.

CVPR 2025ICLR 2025Large Language Model Alignment
0 likes · 19 min read
Meituan's 10 Papers at CVPR 2025 and ICLR 2025
Ele.me Technology
Ele.me Technology
Apr 10, 2025 · Artificial Intelligence

Ele.me Vertical Business AIGC Image Model: Architecture, Training Pipeline, and Evaluation

Ele.me created a domain-specific AIGC image model built from scratch on its own data using the DiT backbone, a three-stage training pipeline (transformer pre-training, prompt alignment, aesthetic fine-tuning), custom T5‑E‑CLIP text and visual encoders, ControlNet for layout control, and evaluated via FID, CLIP scores and a human rubric, enabling automated dish-image generation and UI asset creation for its vertical business.

AIGCControlNetDiT
0 likes · 8 min read
Ele.me Vertical Business AIGC Image Model: Architecture, Training Pipeline, and Evaluation
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
Apr 7, 2025 · Artificial Intelligence

Common Applications, Tools, and Practical Scenarios of AIGC in Design and Business

This article outlines the rapid growth of AIGC technologies, describes key image‑generation and language models, demonstrates step‑by‑step design workflows, explores user‑experience research enhancements, and envisions future business uses while offering practical tips for mastering AI‑generated content.

AIGCDesignUser experience
0 likes · 8 min read
Common Applications, Tools, and Practical Scenarios of AIGC in Design and Business
Nightwalker Tech
Nightwalker Tech
Mar 28, 2025 · Artificial Intelligence

Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities

This article presents a thorough assessment of GPT‑4o’s new image generation features, detailing multiple test scenarios—from simple portrait creation and style transfer to UI design, product rendering, and educational illustrations—comparing its output with Claude‑3.7‑Sonnet, highlighting strengths in realism and weaknesses in Chinese text handling.

AI EvaluationGPT-4oimage generation
0 likes · 16 min read
Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Mar 24, 2025 · Artificial Intelligence

AI SDK 4.2 Release: New Reasoning, MCP Client, useChat Message Components, Image Generation, URL Sources, and Provider Updates

The AI SDK 4.2 release introduces powerful new features such as step‑by‑step reasoning support, a Model Context Protocol (MCP) client for tool integration, useChat message components, multimodal image generation, standardized URL sources, OpenAI Responses API support, Svelte 5 compatibility, and numerous middleware and provider enhancements, all illustrated with practical JavaScript/TypeScript examples.

AI SDKJavaScriptMCP
0 likes · 19 min read
AI SDK 4.2 Release: New Reasoning, MCP Client, useChat Message Components, Image Generation, URL Sources, and Provider Updates
JD Tech Talk
JD Tech Talk
Mar 19, 2025 · Artificial Intelligence

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

The 2024 advertising team introduced a suite of AI‑driven techniques—including a trustworthy feedback network, a large‑scale human‑annotated dataset, multimodal large language model representations, and online ranking architecture upgrades—to dramatically improve the quality, coverage, and personalization of generated ad creatives.

AIGCAdvertisingMLLM
0 likes · 10 min read
Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations
JD Cloud Developers
JD Cloud Developers
Mar 19, 2025 · Artificial Intelligence

How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection

2024 saw the advertising team achieve major breakthroughs in AI-generated ad creatives by introducing a multimodal reliable feedback network to improve image usability, releasing a large human-annotated dataset, and leveraging multimodal large language models for richer representation and more effective online/offline creative selection.

AIGCad optimizationimage generation
0 likes · 10 min read
How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection
AIWalker
AIWalker
Mar 17, 2025 · Artificial Intelligence

How UNIFIEDREWARD Breaks Task Boundaries to Boost Image and Video Performance

The paper introduces UNIFIEDREWARD, the first unified reward model for multimodal understanding and generation that supports pairwise ranking and pointwise scoring, builds a 236K human‑preference dataset across image and video tasks, and uses DPO to align VLMs and diffusion models, achieving significant performance gains on both image and video benchmarks.

Direct Preference OptimizationMultimodal AIPreference Modeling
0 likes · 19 min read
How UNIFIEDREWARD Breaks Task Boundaries to Boost Image and Video Performance
AIWalker
AIWalker
Mar 11, 2025 · Artificial Intelligence

Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation

The paper presents FAR, a frequency‑aware autoregressive framework that predicts image tokens from low‑frequency to high‑frequency components using a continuous tokenizer, and demonstrates its efficiency and quality on ImageNet and text‑to‑image benchmarks compared with existing AR and VAR methods.

AI researchAutoregressive ModelsFAR
0 likes · 20 min read
Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation
AIWalker
AIWalker
Mar 10, 2025 · Artificial Intelligence

FlexVAR: Autoregressive Image Generation with Inpainting and Speed‑Quality Control

FlexVAR replaces residual prediction with direct ground‑truth prediction in visual autoregressive modeling, enabling generation of arbitrary resolutions and aspect ratios, supporting image‑to‑image tasks such as inpainting and upscaling, and offering adjustable inference steps that trade speed for quality while achieving state‑of‑the‑art FID scores.

Autoregressive ModelsVQVAEflexvar
0 likes · 17 min read
FlexVAR: Autoregressive Image Generation with Inpainting and Speed‑Quality Control
AIWalker
AIWalker
Mar 5, 2025 · Artificial Intelligence

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

The paper introduces a novel attention‑distillation loss and a guided‑sampling scheme that together enable diffusion models to faithfully transfer visual features from reference images, dramatically speeding synthesis and surpassing prior plug‑and‑play attention methods across style transfer, text‑to‑image generation, and texture synthesis tasks.

AI researchStyle Transferattention distillation
0 likes · 15 min read
Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation
AI Algorithm Path
AI Algorithm Path
Mar 2, 2025 · Artificial Intelligence

Exploring Flux Labs AI’s New Virtual Try‑On Feature

The article reviews Flux Labs AI’s newly added virtual try‑on tool, explaining how AI, machine‑learning and computer‑vision enable seamless clothing overlays, outlining its main applications, providing a step‑by‑step usage guide, detailing pricing plans, and sharing the author’s positive performance impressions.

AIFlux Labsfashion technology
0 likes · 5 min read
Exploring Flux Labs AI’s New Virtual Try‑On Feature
AIWalker
AIWalker
Feb 23, 2025 · Artificial Intelligence

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

U‑ViT replaces the convolutional U‑Net backbone of diffusion models with a Vision Transformer, treats time, condition and noisy patches as tokens, adds long skip connections and a lightweight 3×3 convolution, and through extensive ablations and scaling studies achieves state‑of‑the‑art FID scores on unconditional, class‑conditional and text‑to‑image generation tasks.

AdaLNFIDLong Skip Connections
0 likes · 16 min read
U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation
AIWalker
AIWalker
Feb 21, 2025 · Artificial Intelligence

DC-ControlNet: Decoupling Control Conditions for More Flexible and Precise Image Generation

DC-ControlNet introduces intra‑ and inter‑element controllers that decouple global conditions into separate content and layout signals, enabling finer‑grained, conflict‑aware control of multi‑condition image generation and achieving higher flexibility and accuracy than traditional ControlNet approaches.

ControlNetDC-ControlNetMulti-Condition Control
0 likes · 20 min read
DC-ControlNet: Decoupling Control Conditions for More Flexible and Precise Image Generation
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingTransformer
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
AIWalker
AIWalker
Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchVARGPTimage generation
0 likes · 20 min read
VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation
AIWalker
AIWalker
Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Large-Scale TrainingMultimodal AIVideo Generation
0 likes · 29 min read
Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation
AIWalker
AIWalker
Feb 4, 2025 · Artificial Intelligence

How Chain‑of‑Thought Boosts Text‑to‑Image Generation: The New o1 Inference Scheme

This article reviews a comprehensive study that applies Chain‑of‑Thought reasoning to autoregressive text‑to‑image generation, introducing extended test‑time computation, direct preference optimization, and two custom reward models (PARM and PARM++) that together improve generation quality by up to 15% over Stable Diffusion 3.

Direct Preference OptimizationInferenceMultimodal AI
0 likes · 13 min read
How Chain‑of‑Thought Boosts Text‑to‑Image Generation: The New o1 Inference Scheme
Code Mala Tang
Code Mala Tang
Jan 30, 2025 · Artificial Intelligence

Is Janus-Pro the Open‑Source Rival to DALL·E 3? A Deep Dive Review

This article reviews DeepSeek's Janus‑Pro image model, explains its multimodal architecture, benchmarks it against DALL·E 3 and Stable Diffusion, provides usage instructions and inference code, and offers a critical assessment of its image quality and practical limitations.

AI modelBenchmarkJanus-Pro
0 likes · 12 min read
Is Janus-Pro the Open‑Source Rival to DALL·E 3? A Deep Dive Review
AIWalker
AIWalker
Jan 21, 2025 · Artificial Intelligence

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×

The paper proposes a Next Patch Prediction (NPP) paradigm that groups image tokens into high‑density patches, enabling autoregressive models to predict patches instead of individual tokens, which reduces training cost to about 0.6× and improves ImageNet FID scores by up to 1.0 across models ranging from 100 M to 1.4 B parameters.

Autoregressive ModelsFID improvementLlamaGen
0 likes · 10 min read
PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×
DaTaobao Tech
DaTaobao Tech
Dec 16, 2024 · Artificial Intelligence

Reference Image Generation for Subject‑Driven Diffusion

This work presents a subject‑driven diffusion pipeline that injects multi‑scale reference features (ReferenceNet‑style) into high‑fidelity backbones such as SD‑XL and Flux, enabling zero‑shot, fine‑grained product consistency across diverse scenes and outperforming current fine‑tuned and zero‑shot methods while noting limits in category coverage and human interactions.

AIDreamBoothIP-Adapter
0 likes · 9 min read
Reference Image Generation for Subject‑Driven Diffusion
DataFunTalk
DataFunTalk
Dec 5, 2024 · Artificial Intelligence

VAR: Scalable Image Generation via Next‑Scale Prediction Wins NeurIPS 2024 Best Paper

The VAR model, a Visual AutoRegressive framework that introduces a novel multi‑scale “next‑scale prediction” paradigm, dramatically improves image generation efficiency and quality, surpasses diffusion models, validates scaling laws in vision, and earned the Best Paper award at NeurIPS 2024.

Autoregressive ModelsNeurIPS2024image generation
0 likes · 7 min read
VAR: Scalable Image Generation via Next‑Scale Prediction Wins NeurIPS 2024 Best Paper
Alimama Tech
Alimama Tech
Nov 27, 2024 · Artificial Intelligence

FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution

FlowDCN introduces Groupwise‑MSDCN, a sparse deformable convolution that replaces attention, enabling efficient arbitrary‑resolution image generation with linear complexity, fewer parameters and FLOPs, and achieving state‑of‑the‑art FID scores on ImageNet while requiring far fewer training steps.

Deformable Convolutionarbitrary resolutiondiffusion models
0 likes · 12 min read
FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution
JD Tech
JD Tech
Nov 15, 2024 · Artificial Intelligence

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

The paper proposes a multimodal Reliable Feedback Network (RFNet) and a consistency‑regularized fine‑tuning method (RFFT) that dramatically increase the proportion of usable advertising images generated by diffusion models while preserving visual appeal, and introduces the large‑scale RF1M dataset for training and evaluation.

RFNetadvertising imagesdiffusion models
0 likes · 9 min read
Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation
JD Cloud Developers
JD Cloud Developers
Nov 14, 2024 · Artificial Intelligence

Boosting Advertising Image Generation Reliability with Human Feedback

This paper presents a multimodal Trustworthy Feedback Network (RFNet) and a consistency regularization method that use human feedback to dramatically improve the usability and visual quality of automatically generated e‑commerce advertising images while reducing manual inspection costs.

AIHuman FeedbackReliability
0 likes · 9 min read
Boosting Advertising Image Generation Reliability with Human Feedback
360 Tech Engineering
360 Tech Engineering
Oct 31, 2024 · Artificial Intelligence

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation

The paper introduces HiCo, a hierarchical controllable diffusion model that enables precise layout‑to‑image generation by decoupling object and background features through weight‑shared branches and a fusion module, achieving high‑quality results and efficient inference as demonstrated on the HiCo‑7K benchmark.

AI paintingHiCoNeurIPS2024
0 likes · 9 min read
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation
DataFunSummit
DataFunSummit
Oct 10, 2024 · Artificial Intelligence

AIGC‑Assisted Marketing Material Generation at Shujia Technology

This article describes Shujia Technology's use of artificial intelligence to generate marketing images and videos, outlining the background, challenges of high-volume content production, detailed solutions for image and video assets—including layout models, diffusion models, and digital human synthesis—and future research directions.

AIGCDigital HumanMarketing
0 likes · 12 min read
AIGC‑Assisted Marketing Material Generation at Shujia Technology
DaTaobao Tech
DaTaobao Tech
Sep 25, 2024 · Artificial Intelligence

Consistent Style Generation in AIGC: Style Aligned and Story Diffusion

The article reviews two AIGC techniques—Style Aligned, which shares self‑attention across a batch to keep style consistent, and Story Diffusion, which uses a training‑free Consistent Self‑Attention module followed by a transformer to generate coherent image sequences—showing promising results in home‑decoration scenarios while noting remaining challenges in fine‑grained spatial and detail alignment.

AIAIGCConsistent Self-Attention
0 likes · 5 min read
Consistent Style Generation in AIGC: Style Aligned and Story Diffusion
58UXD
58UXD
Sep 24, 2024 · Artificial Intelligence

Unlock AI-Powered Icon Design: Techniques, Parameters, and Real-World Examples

This guide explores how AI can streamline icon creation by explaining key parameters like image weight, style reference, and content reference, showcasing step‑by‑step workflows and real examples that demonstrate customized, high‑quality icon production for web and app design.

AI icon designMidjourney parameterscontent reference
0 likes · 7 min read
Unlock AI-Powered Icon Design: Techniques, Parameters, and Real-World Examples
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDistillationdiffusion models
0 likes · 9 min read
Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
Qunar Tech Salon
Qunar Tech Salon
Aug 8, 2024 · Backend Development

Using Satori and Resvg (or Sharp) for Efficient Backend Image Generation: Architecture, Implementation, and Optimizations

This article examines various image‑generation approaches, compares web‑frontend, client‑side, and backend methods, introduces a new Node‑backend solution based on Satori to convert HTML to SVG and then to PNG with Resvg (later Sharp), and details performance and memory optimizations that dramatically improve speed, resource usage, and stability for large‑scale image‑service deployments.

Node.jsResvgSatori
0 likes · 14 min read
Using Satori and Resvg (or Sharp) for Efficient Backend Image Generation: Architecture, Implementation, and Optimizations