Tagged articles

image generation

168 articles · Page 1 of 2

Jul 1, 2026 · Artificial Intelligence

Beyond One-Word Prompts: How the Open-Source GenEvolve Agent Uses Tool Orchestration for Image Generation

GenEvolve, an open-source self-evolving image-generation agent, orchestrates search, image retrieval, and knowledge tools into a prompt-reference program, handling knowledge-anchored and quality-anchored tasks; experiments show it outperforms baseline generators on both standard and strong renderers, with open data and code released.

Agentic AIGenEvolvebenchmark

0 likes · 9 min read

Beyond One-Word Prompts: How the Open-Source GenEvolve Agent Uses Tool Orchestration for Image Generation

Machine Heart

Jun 25, 2026 · Artificial Intelligence

Do AI-Generated Images Invert Aesthetic Preferences? ICML 2026 Spotlight

The ICML 2026 spotlight paper argues that universal aesthetic alignment in image‑generation models narrows artistic expression, presents six interrelated concerns, and demonstrates through extensive prompts and benchmark tests that reward models and aligned generators stubbornly favor homogenized, overly positive imagery while failing to honor anti‑aesthetic or negative‑emotion requests.

AI-generated artICML 2026aesthetic alignment

0 likes · 13 min read

Do AI-Generated Images Invert Aesthetic Preferences? ICML 2026 Spotlight

CodeTrend

Jun 12, 2026 · Artificial Intelligence

Vision Banana: Turning Image Generation Models into Generalist Vision Learners

Vision Banana shows that large‑scale image‑generation models can be instruction‑tuned to perform zero‑shot visual‑understanding tasks such as semantic segmentation, instance segmentation, depth and normal estimation, achieving or surpassing specialist SOTA results while preserving their original generative capabilities.

Instruction TuningRGB encodingVision Banana

0 likes · 32 min read

Vision Banana: Turning Image Generation Models into Generalist Vision Learners

Machine Heart

Jun 7, 2026 · Artificial Intelligence

Why Is ChatGPT Generating Bizarre Images? A Prompt‑Injection Case Study

A recent investigation shows that when given a deceptive prompt asking it to "restore" a non‑existent photo, ChatGPT produces surreal, sometimes disturbing images, revealing a jailbreak‑style vulnerability and highlighting safety‑check trade‑offs.

AI safetyChatGPTimage generation

0 likes · 4 min read

Why Is ChatGPT Generating Bizarre Images? A Prompt‑Injection Case Study

James' Growth Diary

Jun 6, 2026 · Artificial Intelligence

Master GPT‑Image‑2: Multi‑Round Iteration, Local Editing, Batch Generation, Reference Images

This guide explains how to unlock GPT‑Image‑2’s four advanced capabilities—multi‑round iteration, natural‑language local editing, multi‑image generation, and reference‑image mode—by showing concrete prompts, code snippets, best‑practice formulas, performance data, and common pitfalls to avoid.

GPT Image 2Prompt Engineeringbatch generation

0 likes · 15 min read

Master GPT‑Image‑2: Multi‑Round Iteration, Local Editing, Batch Generation, Reference Images

James' Growth Diary

Jun 1, 2026 · Artificial Intelligence

Ready‑to‑Use Prompt Templates for Social Media Covers, E‑commerce Images, and Logos

This article provides ready‑to‑copy AI prompt templates for four common visual scenarios—social‑media cover images, e‑commerce product main images, marketing posters, and logos—detailing dimensions, style guidelines, and specific element descriptions so you can generate appropriate graphics instantly.

AI promptsSocial Mediae-commerce

0 likes · 13 min read

Ready‑to‑Use Prompt Templates for Social Media Covers, E‑commerce Images, and Logos

Design Hub

May 30, 2026 · Artificial Intelligence

5 Proven GPT‑Image‑2 Prompt Templates for E‑Commerce Visuals

The article breaks down five practical GPT‑Image‑2 prompts for e‑commerce graphics, explains the underlying four‑step structure—scenario, protagonist, material, typography and constraints—and provides reusable templates that turn raw style words into commercially viable visual assets.

AI designGPT Image 2Prompt Engineering

0 likes · 16 min read

5 Proven GPT‑Image‑2 Prompt Templates for E‑Commerce Visuals

SuanNi

May 28, 2026 · Artificial Intelligence

How a 3.8B Model Beats 6B+ Models Using Just 20% of the Compute – Inside Microsoft Lens

Microsoft’s Lens team shows that a 3.8 B‑parameter image‑generation model can match or surpass 6 B‑plus models while consuming only about 19 % of the GPU compute, thanks to aggressive model compression, dense captioning, mixed‑resolution training, optimized VAE and language encoders, and targeted RL fine‑tuning.

BenchmarkingModel Efficiencydense captioning

0 likes · 14 min read

How a 3.8B Model Beats 6B+ Models Using Just 20% of the Compute – Inside Microsoft Lens

JD Tech Talk

May 26, 2026 · Artificial Intelligence

AI Powers Cross‑Border Growth: JD Oxygen Vision Image‑Set Generation Practice & Outlook

The article examines JD’s Oxygen Vision AI solution for cross‑border e‑commerce, detailing how automated product‑image set generation tackles high costs, slow turnaround, multilingual and platform compliance challenges, delivers up to 90% time and cost reductions, and outlines future multimodal, personalization, and ecosystem expansions.

AIAutomationcross‑border e‑commerce

0 likes · 15 min read

AI Powers Cross‑Border Growth: JD Oxygen Vision Image‑Set Generation Practice & Outlook

Machine Heart

May 25, 2026 · Artificial Intelligence

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.

Diffusion ModelsFlow-OPDManifold Anchor Regularization

0 likes · 9 min read

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

SuanNi

May 22, 2026 · Artificial Intelligence

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Lance, ByteDance’s newly open‑sourced 3‑billion‑parameter multimodal model, runs on a single 40 GB GPU, tops HuggingFace trend charts, and achieves leading scores on DPG Bench, GenEval, and video generation benchmarks while surpassing several state‑of‑the‑art single‑modal models.

AI researchByteDanceLance

0 likes · 3 min read

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Machine Heart

May 21, 2026 · Artificial Intelligence

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

The RAEv2 framework replaces traditional VAEs by summing multiple layers of pretrained vision encoders, combines RAE with REPA for complementary semantic and spatial gains, and leverages free guidance, achieving up to ten‑fold faster convergence, higher image quality, and lower compute on ImageNet‑256 diffusion training.

Diffusion ModelsRAEv2Representation Autoencoder

0 likes · 11 min read

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

JD Tech

May 20, 2026 · Artificial Intelligence

How AI Powers Cross‑Border Growth: Inside JD’s Oxygen Vision Image‑Generation Solution

The article details how JD’s Oxygen Vision leverages AI to overhaul cross‑border product visual creation, cutting SKU image‑generation cost and time by over 90%, automating multi‑platform compliance, multilingual localization, and high‑quality output through a three‑step workflow.

AIAutomationcross‑border e‑commerce

0 likes · 15 min read

How AI Powers Cross‑Border Growth: Inside JD’s Oxygen Vision Image‑Generation Solution

Geek Labs

May 20, 2026 · Frontend Development

Two Open‑Source AI Tools to Auto‑Generate HTML Slides and Hand‑Drawn Technical Diagrams

This article introduces two open‑source projects—beautiful‑html‑templates, which lets an AI generate complete HTML slide decks from plain instructions, and ian‑handdrawn‑ppt, which converts articles or outlines into a series of Chinese hand‑drawn style technical illustration images—detailing their features, usage steps, target users, and limitations.

AIHTML slidesfrontend

0 likes · 9 min read

Two Open‑Source AI Tools to Auto‑Generate HTML Slides and Hand‑Drawn Technical Diagrams

Machine Heart

May 6, 2026 · Artificial Intelligence

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

Luma released the Uni‑1.1 image‑generation API, which ranks third on the Arena blind‑test leaderboard, offers sub‑half‑price per image, and demonstrates production‑grade capabilities such as multi‑reference fusion, multi‑turn editing, and a decoder‑only transformer that jointly models text and image tokens.

API pricingLumaMultimodal AI

0 likes · 13 min read

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

AI Explorer

Apr 24, 2026 · Artificial Intelligence

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Open Generative AI is an open‑source, MIT‑licensed desktop suite that bundles over 200 cutting‑edge image, video, and lip‑sync models into four dedicated studios, offering unrestricted generation without content filters, subscription fees, or closed ecosystems, and provides online, desktop, and self‑hosted deployment options.

AI media generationMIT licenseOpen Generative AI

0 likes · 6 min read

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Machine Heart

Apr 24, 2026 · Artificial Intelligence

Generating High‑Resolution Images with Only 64 Tokens: How MacTok Overcomes Posterior Collapse

MacTok introduces semantic masking and dual‑space alignment to prevent posterior collapse in continuous image tokenizers, enabling high‑quality generation with just 64‑128 tokens and achieving strong gFID scores on ImageNet at 256×256 and 512×512 resolutions.

ImageNetMacTokcontinuous tokenizer

0 likes · 9 min read

Generating High‑Resolution Images with Only 64 Tokens: How MacTok Overcomes Posterior Collapse

Machine Heart

Apr 24, 2026 · Artificial Intelligence

Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap

DeepMind’s Vision Banana model demonstrates that large‑scale image‑generation pre‑training can produce powerful, universal visual representations, achieving state‑of‑the‑art results on segmentation, depth, and normal estimation without task‑specific heads, thereby supporting the hypothesis that generation and understanding are fundamentally linked.

DeepMindGenerative AIVision Banana

0 likes · 13 min read

Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap

Architect's Must-Have

Apr 23, 2026 · Artificial Intelligence

OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”

The article provides a comprehensive technical analysis of OpenAI's ChatGPT Images 2.0 (gpt‑image‑2), detailing its strategic launch, new autoregressive architecture, integrated reasoning and web‑search capabilities, multi‑image consistency, pricing model, competitive landscape, limitations, and future impact on visual AI workflows.

AI ArchitectureGPT Image 2Multimodal AI

0 likes · 28 min read

OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”

Geek Labs

Apr 23, 2026 · Artificial Intelligence

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

From the rapid rise of prompt‑engineering in 2025‑2026, this article reviews seven standout open‑source GitHub repositories—covering Nano Banana Pro, GPT‑Image‑2, multi‑model prompts, and video generation—detailing their stars, content structure, multilingual support, and ideal use cases for creators.

AI prompt engineeringGitHubNano Banana Pro

0 likes · 14 min read

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

SuanNi

Apr 22, 2026 · Artificial Intelligence

OpenAI’s ChatGPT Images 2.0: A Leap Ahead in AI‑Generated Visual Design

OpenAI’s newly released ChatGPT Images 2.0 transforms image generation into a full‑featured visual design system, delivering 2K resolution, multilingual text rendering, complex layout handling, and up to eight concurrent images, while also exposing current physical limits such as intricate spatial puzzles.

AIChatGPTimage generation

0 likes · 9 min read

OpenAI’s ChatGPT Images 2.0: A Leap Ahead in AI‑Generated Visual Design

IT Services Circle

Apr 22, 2026 · Artificial Intelligence

GPT-Image-2 Launches: How Designers Can Ditch Old‑School Workflows

OpenAI's newly released ChatGPT Images 2.0 (GPT‑Image‑2) lets users generate photorealistic screenshots, posters, and even homework from ultra‑short prompts, outperforms the previous Nano Banana model, supports 2K resolution, multi‑language input, and is already available via API with pricing details.

AI modelChatGPT Images 2.0OpenAI

0 likes · 7 min read

GPT-Image-2 Launches: How Designers Can Ditch Old‑School Workflows

DaTaobao Tech

Apr 22, 2026 · Artificial Intelligence

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

MNN‑Sana‑Edit‑V2, a collaborative effort between Taobao’s Meta team and Hangzhou University, combines a frozen Qwen3‑0.6B LLM, Learnable Query, Connector, Linear DiT and Deep Compression Autoencoder with 4/8‑bit quantization to run fully on mobile devices, delivering 512×512 comic‑style conversions in about 15 seconds—2.5× faster than cloud alternatives—while providing open‑source code, detailed training stages, and extensive performance benchmarks.

Edge deploymentModel Quantizationdiffusion

0 likes · 13 min read

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

Java Architecture Diary

Apr 22, 2026 · Artificial Intelligence

Why OpenAI’s gpt-image-2 Turns Image Generation into a Practical Tool

OpenAI’s new gpt-image-2 model improves dense Chinese text rendering, follows detailed prompts more reliably, and offers precise edit capabilities, making it suitable for real‑world business graphics such as posters, banners, and dashboards, and the article shows how to integrate it with Spring AI in Java.

AI EditingGPT Image 2Java

0 likes · 7 min read

Why OpenAI’s gpt-image-2 Turns Image Generation into a Practical Tool

Design Hub

Apr 15, 2026 · Artificial Intelligence

Overnight AI Shifts: Core Models, Agents, Design Tools, and More

A rapid roundup of today’s AI news shows the industry moving beyond marginal model gains toward lower cost and latency, agents entering task and browser workflows, redesign of the design‑code gap, 3D/web expansion, and open‑source tools reaching smaller teams.

AIAgentsChip Collaboration

0 likes · 8 min read

Overnight AI Shifts: Core Models, Agents, Design Tools, and More

Machine Heart

Apr 10, 2026 · Artificial Intelligence

AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models

AdaGen replaces handcrafted static schedules in multi‑step image generators with a universal, learnable policy network trained via reinforcement learning, using an MDP formulation, adversarial rewards and action smoothing, achieving consistent quality and efficiency gains across diffusion, autoregressive, mask and flow models while adding negligible overhead.

MDPaction smoothingadaptive policy

0 likes · 11 min read

AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models

Machine Heart

Apr 10, 2026 · Artificial Intelligence

Keeping Image Quality with Only 20 Diffusion Steps: The TC‑Padé Acceleration Method

TC‑Padé uses a Padé‑based residual prediction framework, step‑aware strategies, and a trajectory‑stability indicator to accelerate diffusion sampling to as few as 20 steps while preserving visual fidelity, achieving up to 2.88× speed‑up on image generation and 1.72× on video generation.

Padé approximationTC-Padéimage generation

0 likes · 12 min read

Keeping Image Quality with Only 20 Diffusion Steps: The TC‑Padé Acceleration Method

Machine Heart

Apr 9, 2026 · Artificial Intelligence

How TDM‑R1 Boosts Few‑Step Image Generation: GenEval Jumps from 61% to 92% and Beats GPT‑4o

The TDM‑R1 framework introduces a two‑stage reinforcement learning pipeline that lets 4‑step diffusion models achieve a GenEval score of 92%, surpassing 80‑step baselines and GPT‑4o while also fixing instruction compliance, text rendering, and compositional generation issues.

GenEvalOCR improvementTDM-R1

0 likes · 15 min read

How TDM‑R1 Boosts Few‑Step Image Generation: GenEval Jumps from 61% to 92% and Beats GPT‑4o

Machine Heart

Apr 5, 2026 · Artificial Intelligence

GPT-Image-2 Leak Sparks Fear That Nano Banana Pro Is About to Be Dethroned

A leaked GPT-Image-2 model, tested under codenames like maskingtape-alpha, shows dramatically improved text rendering, world‑knowledge understanding and image editing that many claim surpasses Google’s Nano Banana Pro, prompting a perceived paradigm shift in multimodal AI generation.

AI model comparisonGPT Image 2Multimodal AI

0 likes · 5 min read

GPT-Image-2 Leak Sparks Fear That Nano Banana Pro Is About to Be Dethroned

SuanNi

Apr 3, 2026 · Artificial Intelligence

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

The article presents the GEMS (Agent‑Native Multimodal Generation with Memory and Skills) framework, detailing its multi‑agent loop, hierarchical memory compression, on‑demand skill modules, and extensive benchmark results that show a lightweight 6B model surpassing larger proprietary systems on complex image‑generation tasks.

GEMSMemory compressionMultimodal AI

0 likes · 14 min read

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

vivo Internet Technology

Apr 1, 2026 · Artificial Intelligence

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

CVPR 2026Classifier-Free GuidanceDiffusion Models

0 likes · 7 min read

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

NiuNiu MaTe

Mar 30, 2026 · Artificial Intelligence

Which AI Drawing Tool Is Best for Scientific Comics? A Hands‑On Comparison

This article reviews three AI‑powered illustration platforms—Banana Painter, NotebookLM, and JiMeng AI—detailing their free and paid plans, usage methods, visual styles, strengths and weaknesses, and provides a side‑by‑side comparison table to help creators choose the most suitable tool for scientific comic production.

AIillustrationimage generation

0 likes · 7 min read

Which AI Drawing Tool Is Best for Scientific Comics? A Hands‑On Comparison

AIWalker

Mar 20, 2026 · Artificial Intelligence

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

The paper introduces reAR, a plug‑and‑play regularization framework that aligns generator and tokenizer representations in visual autoregressive models, dramatically improving image quality and matching large diffusion models while using far fewer parameters, and validates the approach with extensive experiments, ablations, and scalability analysis.

AI researchParameter EfficiencyRegularization

0 likes · 20 min read

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

Design Hub

Mar 19, 2026 · Artificial Intelligence

Why This AI-Generated Image Post Gets Saved: A Deep Dive into Its Prompt Structure

By dissecting a popular AI‑generated image series, the article reveals how a carefully ordered prompt—detailing character, action, setting, lighting, lens effects, and emotion—creates a reusable template that transforms simple tags into vivid, collectible visual narratives.

AI promptimage generationlighting

0 likes · 10 min read

Why This AI-Generated Image Post Gets Saved: A Deep Dive into Its Prompt Structure

AIWalker

Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchInternVL-ULarge Language Model

0 likes · 22 min read

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

Design Hub

Mar 16, 2026 · Artificial Intelligence

Why Google Flow’s NanoBanana 2 Is the Easiest Way for Beginners to Master AI Portraits

The article explains how Google Flow’s NanoBanana 2 provides a low‑cost, beginner‑friendly workflow for AI portrait generation, emphasizes the shift from vague descriptions to photographic‑language prompts, showcases four distinct style prompts, and offers practical tips for refining results.

AI portraitGoogle FlowNanoBanana 2

0 likes · 18 min read

Why Google Flow’s NanoBanana 2 Is the Easiest Way for Beginners to Master AI Portraits

AIWalker

Mar 12, 2026 · Artificial Intelligence

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

Mind-Brush introduces a ‘think‑research‑create’ agentic framework that unifies intent analysis, multimodal evidence retrieval, and knowledge‑driven reasoning to transform text‑to‑image generation from static decoding into an active cognitive workflow, achieving large accuracy gains on the new Mind‑Bench benchmark and surpassing existing SOTA models.

Agentic AIMind-BrushMultimodal Reasoning

0 likes · 15 min read

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

Machine Learning Algorithms & Natural Language Processing

Mar 6, 2026 · Artificial Intelligence

15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana

The article reviews Uni‑1, a decoder‑only transformer that unifies visual understanding and generation, details its architecture, benchmark superiority on RISEBench and ODinW‑13, showcases diverse visual examples where it outperforms GPT Image 1.5 and Nano Banana Pro, and highlights the small elite team behind the breakthrough.

AI researchLuma AIMultimodal AI

0 likes · 14 min read

15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana

DataFunTalk

Feb 27, 2026 · Artificial Intelligence

Google’s Nano Banana 2: Turning Image Generation into a Scalable Creation Engine

Google’s Nano Banana 2 (Gemini 3.1 Flash Image) upgrades image generation with real‑time web knowledge, clearer text rendering, consistent character/object handling, and broad product integration, positioning the model as a fast, configurable rendering engine rather than a niche creative tool.

AI modelsGeminiGoogle AI

0 likes · 9 min read

Google’s Nano Banana 2: Turning Image Generation into a Scalable Creation Engine

Machine Learning Algorithms & Natural Language Processing

Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchDiffusion ModelsImageNet

0 likes · 6 min read

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

AI Engineering

Jan 28, 2026 · Artificial Intelligence

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Alibaba's Tongyi releases the Z-Image base model, a non‑distilled diffusion transformer that supports full classifier‑free guidance, negative prompts, higher diversity, and fine‑tuning, contrasting with the faster Turbo variant and providing detailed usage instructions and community resources.

AlibabaClassifier-Free GuidanceNegative Prompt

0 likes · 4 min read

Alibaba Tongyi Unveils Z-Image Non‑Distilled Base Model with Full CFG and Negative Prompt Support

Woodpecker Software Testing

Jan 27, 2026 · Artificial Intelligence

How to Build a Multimodal AI Assistant with FastAPI, Alibaba Cloud and DashScope

This guide walks through configuring Alibaba Cloud credentials, implementing a FastAPI backend with email function calling, Alibaba OpenSearch, image generation via DashScope, speech recognition, and a responsive HTML/CSS/JavaScript front‑end that supports text chat, image recognition, image synthesis, and voice interaction.

Alibaba CloudDashScopeFastAPI

0 likes · 38 min read

How to Build a Multimodal AI Assistant with FastAPI, Alibaba Cloud and DashScope

AI Insight Log

Jan 22, 2026 · Artificial Intelligence

Cursor 2.4 Adds Subagents—AI Becomes a Project Manager and Generates UI Mockups Instantly

Cursor 2.4 launches with Subagents that enable parallel, specialized AI assistants, improving context handling and speed, plus a Google‑powered image generator for UI mockups, an AI‑enhanced blame feature for code attribution, proactive clarification questions, and numerous performance upgrades such as a ten‑fold faster built‑in browser and 40× faster hooks.

AICode AttributionCursor

0 likes · 8 min read

Cursor 2.4 Adds Subagents—AI Becomes a Project Manager and Generates UI Mockups Instantly

Tencent Cloud Developer

Jan 14, 2026 · Artificial Intelligence

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

This guide explains how to use advanced AI models such as Gemini, Midjourney, and Stable Diffusion to expand brief, informal user descriptions into comprehensive, high‑quality English prompts that include visual style, subject details, environment, lighting, and camera parameters for image or video generation.

AI prompt engineeringMidjourneyPrompt Design

0 likes · 14 min read

Turn Simple Text into Detailed AI Image Prompts: A Step‑by‑Step Guide

Design Hub

Dec 28, 2025 · Artificial Intelligence

AI‑Assisted Design: Prompt Recipes for Nano Banana Pro E‑Commerce Ads

The article showcases a series of AI‑generated ultra‑realistic commercial visuals, breaks down the exact prompt language behind each, and explains the design insights that turn imaginative concepts into high‑impact advertising imagery for products like Nano Banana Pro.

AIPrompt Engineeringcommercial advertising

0 likes · 10 min read

AI‑Assisted Design: Prompt Recipes for Nano Banana Pro E‑Commerce Ads

AI Algorithm Path

Dec 17, 2025 · Artificial Intelligence

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

Black Forest Labs released Flux.2 Max, the top‑performing model in the Flux.2 series featuring real‑time context generation, superior texture handling, and strong instruction following, ranking second on the Artificial Analysis leaderboard, with detailed examples, API usage, and pricing information provided.

AI modelAPIFlux.2 Max

0 likes · 11 min read

Flux.2 Max Unveiled: Black Forest Labs’ Most Powerful Image Generation Model

Design Hub

Dec 9, 2025 · Artificial Intelligence

AI Frontiers: GLM‑4.6V, AutoGLM 2.0 & RealGen for Designers & Developers

The article reviews three recent AI breakthroughs—GLM‑4.6V’s multimodal large‑model with 128K context and native function calling, AutoGLM 2.0’s open‑source mobile‑operating AI agent, and RealGen’s detector‑rewarded image generator that achieves a 50.15% realism win rate—highlighting how they expand toolkits for designers and developers.

AI AgentsAutoGLMGLM-4.6V

0 likes · 11 min read

AI Frontiers: GLM‑4.6V, AutoGLM 2.0 & RealGen for Designers & Developers

Kuaishou Tech

Dec 3, 2025 · Artificial Intelligence

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

This article presents a novel paradigm—Latent Reward Model (LRM) and Latent Preference Optimization (LPO)—that repurposes diffusion models as noise‑aware latent reward models for step‑level preference optimization, addressing the shortcomings of pixel‑level reward models, introducing multi‑preference consistent filtering, and demonstrating significant performance and efficiency gains on benchmarks such as PickScore and T2I‑CompBench++.

AI alignmentDiffusion ModelsPreference Optimization

0 likes · 9 min read

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

Kuaishou Tech

Nov 25, 2025 · Artificial Intelligence

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Flow‑GRPO introduces online reinforcement learning into flow‑matching models by converting deterministic ODE sampling to stochastic SDE sampling and reducing denoising steps, raising SD‑3.5‑Medium's GenEval accuracy from 63% to 95%—surpassing GPT‑4o—and demonstrating strong gains in complex composition, text rendering, and human‑preference alignment across multiple generative tasks.

AI researchOnline RLdeep learning

0 likes · 8 min read

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Kuaishou Tech

Nov 19, 2025 · Artificial Intelligence

Can a Single Number Create a Whole New Visual Style? Inside CoTyle’s Code‑to‑Style Generation

CoTyle introduces a novel open‑source framework that generates unique image styles from a numeric style code, eliminating the need for reference images, lengthy prompts, or LoRA modules, and demonstrates superior style consistency compared to existing solutions like Midjourney.

Generative AITransformerdiffusion model

0 likes · 8 min read

Can a Single Number Create a Whole New Visual Style? Inside CoTyle’s Code‑to‑Style Generation

Kuaishou Tech

Nov 14, 2025 · Artificial Intelligence

How GRPO‑Guard Stops Over‑Optimization in Flow‑Based Visual Generators

This article explains the over‑optimization problem in GRPO‑based flow models, analyzes why importance‑ratio clipping fails, and introduces GRPO‑Guard with RatioNorm and cross‑step gradient balancing, showing through extensive experiments that it stabilizes training and improves image quality across multiple diffusion backbones and tasks.

GRPO-GuardGenerative AIflow matching

0 likes · 9 min read

How GRPO‑Guard Stops Over‑Optimization in Flow‑Based Visual Generators

AntTech

Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

Large Language ModelMultimodalSparse MoE

0 likes · 8 min read

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Alimama Tech

Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCModel Training

0 likes · 11 min read

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

Data Party THU

Oct 6, 2025 · Artificial Intelligence

Why Data, Not Architecture, Drives Locality in Diffusion Models

A recent MIT‑Toyota study shows that the locality observed in image diffusion models emerges from the statistical structure of training data rather than from architectural biases, and a simple linear denoiser can replicate this behavior, reshaping how we think about model design.

Data StatisticsDiffusion ModelsU-Net

0 likes · 10 min read

Why Data, Not Architecture, Drives Locality in Diffusion Models

AIWalker

Sep 17, 2025 · Artificial Intelligence

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

InfGen introduces a resolution‑agnostic generation paradigm that replaces the VAE decoder in diffusion models, allowing any‑size image synthesis with up to ten‑fold speed gains, achieving 4K outputs in under 7 seconds while improving visual quality.

Diffusion ModelsInfGenarbitrary resolution

0 likes · 15 min read

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

Data Party THU

Aug 31, 2025 · Artificial Intelligence

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

Google’s Gemini 2.5 Flash model, codenamed “Nano Banana”, dramatically improves visual quality, natural editing, identity consistency, instruction following, and generation speed, while researchers discuss its new metrics, interleaved generation capabilities, comparisons with Imagen, and future directions for smarter, more factual multimodal AI.

AI modelGeminiMultimodal

0 likes · 23 min read

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

JD Cloud Developers

Aug 27, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

This article details how JD.com’s AI‑driven virtual fitting solution, integrated with an A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, cutting production costs to zero, accelerating design cycles, and increasing conversion rates by over 80% during major sales events.

A/B testingAIFashion E‑commerce

0 likes · 14 min read

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

Ops Development & AI Practice

Aug 15, 2025 · Artificial Intelligence

How Google’s Imagen 4 Redefines AI Image Generation: Breakthroughs & Prompt Tips

Google’s Imagen 4 family—Ultra, Standard, and Fast—introduces unprecedented realism, reliable text rendering, multilingual prompts, and higher instruction fidelity, while the article explains each model’s trade‑offs and offers concrete prompt‑engineering techniques to help creators harness this next‑generation AI image generator.

AIGoogleImagen 4

0 likes · 8 min read

How Google’s Imagen 4 Redefines AI Image Generation: Breakthroughs & Prompt Tips

AIWalker

Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisInference OptimizationLumina-mGPT

0 likes · 22 min read

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

AI Frontier Lectures

Jul 31, 2025 · Artificial Intelligence

Can a 32‑Token Compressor Generate Images Without Training?

This article reviews a recent study that demonstrates how a highly compressed one‑dimensional tokenizer, using only 32 discrete tokens and gradient‑based test‑time optimization, can generate high‑quality images without training a separate generative model, and explores its methodology, findings, applications, and limitations.

1D tokenizerAI researchTiTok

0 likes · 10 min read

Can a 32‑Token Compressor Generate Images Without Training?

JD Tech

Jul 23, 2025 · Artificial Intelligence

AI Virtual Try‑On Transforms Fashion E‑Commerce, Raising Conversion 80%

JD Retail’s “JingDianDian” AI virtual try‑on platform leverages a 12‑billion‑parameter Flux‑Fill diffusion model and multimodal pose estimation to automatically create realistic model images and videos, integrates with the JingMai A/B testing system, and delivers up to an 80% boost in conversion while cutting production costs and time dramatically.

A/B testingAIFashion Tech

0 likes · 13 min read

AI Virtual Try‑On Transforms Fashion E‑Commerce, Raising Conversion 80%

Kuaishou Tech

Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchDiffusion Modelsautoregressive transformer

0 likes · 11 min read

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

JD Retail Technology

Jul 15, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: JD’s Innovative Solution

This article details JD Retail Technology’s AI‑driven virtual try‑on system that combines a 12B Flux‑Fill diffusion model with a high‑quality virtual model library and integrates with the JingMai A/B testing platform, cutting production costs to zero, slashing cycle time to half a day, and increasing order conversion rates by over 80% during the 618 shopping festival.

A/B testingAIFashion E‑commerce

0 likes · 13 min read

How AI Virtual Try‑On Boosted Fashion Sales by 80%: JD’s Innovative Solution

AI Frontier Lectures

Jul 13, 2025 · Artificial Intelligence

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

HarmoniCa, a new feature‑caching framework co‑designed by HKUST, Beihang University, and SenseTime, tackles diffusion model inference bottlenecks by aligning training and inference through Step‑Wise Denoising Training and an Image Error Proxy Objective, achieving up to 2× speedup while preserving image quality.

Diffusion ModelsPerformance Accelerationfeature caching

0 likes · 9 min read

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsVAEViT³

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Diffusion ModelsVision Transformerimage generation

0 likes · 10 min read

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

JD Tech Talk

Jul 4, 2025 · Artificial Intelligence

How AI‑Driven Virtual Try‑On Boosted Fashion Sales by 80%

This article details how JD.com’s AI-powered virtual try‑on system, integrated with the Jingmai A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, reducing production costs to near zero, cutting design cycles from weeks to hours, and increasing conversion rates by over 80% during major sales events.

A/B testingAIAIGC

0 likes · 14 min read

How AI‑Driven Virtual Try‑On Boosted Fashion Sales by 80%

Kuaishou Large Model

Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsEvolutionary Searchimage generation

0 likes · 8 min read

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

Kuaishou Tech

Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsEvolutionary Search

0 likes · 8 min read

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

AI Frontier Lectures

Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAautoregressive

0 likes · 16 min read

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

AI Frontier Lectures

May 19, 2025 · Artificial Intelligence

DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model

DreamO, a collaborative effort by ByteDance and Peking University, introduces a unified 400M‑parameter framework built on Flux‑1.0‑dev that enables simultaneous control of identity, style, appearance, and virtual try‑on, offering open‑source, low‑cost, and fast image customization comparable to commercial large models.

AI researchDreamOFlux model

0 likes · 6 min read

DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model

AI Algorithm Path

May 15, 2025 · Artificial Intelligence

Understanding Diffusion Models: Core Principles Explained

This article explains the fundamental principles of diffusion models, using physics and machine‑learning analogies to describe forward and reverse diffusion, the role of Gaussian noise, iteration trade‑offs, U‑Net architecture, and shared‑weight training for image generation.

Diffusion ModelsGenerative AIU-Net

0 likes · 8 min read

Understanding Diffusion Models: Core Principles Explained

Baidu MEUX

Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIdigital humansimage generation

0 likes · 8 min read

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

AI Algorithm Path

Apr 26, 2025 · Artificial Intelligence

OpenAI Launches GPT-Image-1: Bringing ChatGPT‑Style Image Generation to Developers

OpenAI has opened the GPT‑Image‑1 API, a multimodal model that supports both image generation and editing, offers configurable quality, size, and format options, provides JavaScript code samples, outlines token‑based pricing, and is already being integrated by platforms such as Adobe, Canva, and HeyGen.

APIGPT-Image-1JavaScript

0 likes · 9 min read

OpenAI Launches GPT-Image-1: Bringing ChatGPT‑Style Image Generation to Developers

AIWalker

Apr 14, 2025 · Artificial Intelligence

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

FlexIP introduces a dual‑adapter architecture and a dynamic weight‑gating mechanism that decouple identity preservation from personalized editing, allowing continuous control over image generation and outperforming prior SOTA methods in both fidelity and flexibility.

AIDiffusion Modelsdual-adapter

0 likes · 16 min read

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

DevOps

Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oTokenization

0 likes · 16 min read

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

Meituan Technology Team

Apr 10, 2025 · Artificial Intelligence

Meituan's 10 Papers at CVPR 2025 and ICLR 2025

This article presents concise summaries of ten selected ICLR 2025 and CVPR 2025 papers covering LLM alignment, temporal‑decay DPO, joint‑embedding predictive architecture, 4‑bit quantization, token‑focused VQA, universal visual segmentation, document understanding, fine‑grained spatio‑temporal modeling, visual quality evaluation, and ultra‑high‑resolution diffusion, and also announces face‑to‑face and online sharing sessions hosted by Meituan.

CVPR 2025ICLR 2025Large Language Model Alignment

0 likes · 19 min read

Meituan's 10 Papers at CVPR 2025 and ICLR 2025

Ele.me Technology

Apr 10, 2025 · Artificial Intelligence

Ele.me Vertical Business AIGC Image Model: Architecture, Training Pipeline, and Evaluation

Ele.me created a domain-specific AIGC image model built from scratch on its own data using the DiT backbone, a three-stage training pipeline (transformer pre-training, prompt alignment, aesthetic fine-tuning), custom T5‑E‑CLIP text and visual encoders, ControlNet for layout control, and evaluated via FID, CLIP scores and a human rubric, enabling automated dish-image generation and UI asset creation for its vertical business.

AIGCControlNetDiT

0 likes · 8 min read

Ele.me Vertical Business AIGC Image Model: Architecture, Training Pipeline, and Evaluation

Beijing SF i-TECH City Technology Team

Apr 7, 2025 · Artificial Intelligence

Common Applications, Tools, and Practical Scenarios of AIGC in Design and Business

This article outlines the rapid growth of AIGC technologies, describes key image‑generation and language models, demonstrates step‑by‑step design workflows, explores user‑experience research enhancements, and envisions future business uses while offering practical tips for mastering AI‑generated content.

AIGCartificial-intelligencecreative workflow

0 likes · 8 min read

Common Applications, Tools, and Practical Scenarios of AIGC in Design and Business

Alibaba Cloud Developer

Apr 3, 2025 · Artificial Intelligence

Build a Text‑and‑Image Article with Alibaba Cloud AI Custom Plugin in 5 Steps

This tutorial shows how to use Alibaba Cloud's Baileian platform to create a workflow that generates a Xiaohongshu‑style article together with matching images by leveraging a custom large‑model plugin, Python script nodes, and image‑generation tools, complete with step‑by‑step configuration and code examples.

AIPluginsPython

0 likes · 14 min read

Build a Text‑and‑Image Article with Alibaba Cloud AI Custom Plugin in 5 Steps

AI Frontier Lectures

Mar 31, 2025 · Industry Insights

Why GPT‑4o’s Image Generation Is Overwhelming Users—and What It Means for AI

OpenAI’s GPT‑4o image generation, launched only for paid users, quickly hit performance bottlenecks and sparked a flood of viral content, prompting technical analysis of its multimodal capabilities, speed issues, copyright concerns, and the broader impact on the AI industry.

AI industryAI multimodalGPT-4o

0 likes · 5 min read

Why GPT‑4o’s Image Generation Is Overwhelming Users—and What It Means for AI

Nightwalker Tech

Mar 28, 2025 · Artificial Intelligence

Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities

This article presents a thorough assessment of GPT‑4o’s new image generation features, detailing multiple test scenarios—from simple portrait creation and style transfer to UI design, product rendering, and educational illustrations—comparing its output with Claude‑3.7‑Sonnet, highlighting strengths in realism and weaknesses in Chinese text handling.

AI evaluationGPT-4oMultimodal

0 likes · 16 min read

Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities

Rare Earth Juejin Tech Community

Mar 24, 2025 · Artificial Intelligence

AI SDK 4.2 Release: New Reasoning, MCP Client, useChat Message Components, Image Generation, URL Sources, and Provider Updates

The AI SDK 4.2 release introduces powerful new features such as step‑by‑step reasoning support, a Model Context Protocol (MCP) client for tool integration, useChat message components, multimodal image generation, standardized URL sources, OpenAI Responses API support, Svelte 5 compatibility, and numerous middleware and provider enhancements, all illustrated with practical JavaScript/TypeScript examples.

AI SDKJavaScriptMCP

0 likes · 19 min read

AI SDK 4.2 Release: New Reasoning, MCP Client, useChat Message Components, Image Generation, URL Sources, and Provider Updates

JD Tech Talk

Mar 19, 2025 · Artificial Intelligence

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

The 2024 advertising team introduced a suite of AI‑driven techniques—including a trustworthy feedback network, a large‑scale human‑annotated dataset, multimodal large language model representations, and online ranking architecture upgrades—to dramatically improve the quality, coverage, and personalization of generated ad creatives.

AIGCAdvertisingMLLM

0 likes · 10 min read

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

JD Cloud Developers

Mar 19, 2025 · Artificial Intelligence

How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection

2024 saw the advertising team achieve major breakthroughs in AI-generated ad creatives by introducing a multimodal reliable feedback network to improve image usability, releasing a large human-annotated dataset, and leveraging multimodal large language models for richer representation and more effective online/offline creative selection.

AIGCMultimodalad optimization

0 likes · 10 min read

How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection

AIWalker

Mar 17, 2025 · Artificial Intelligence

How UNIFIEDREWARD Breaks Task Boundaries to Boost Image and Video Performance

The paper introduces UNIFIEDREWARD, the first unified reward model for multimodal understanding and generation that supports pairwise ranking and pointwise scoring, builds a 236K human‑preference dataset across image and video tasks, and uses DPO to align VLMs and diffusion models, achieving significant performance gains on both image and video benchmarks.

Direct Preference OptimizationMultimodal AIPreference Modeling

0 likes · 19 min read

How UNIFIEDREWARD Breaks Task Boundaries to Boost Image and Video Performance

AIWalker

Mar 11, 2025 · Artificial Intelligence

Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation

The paper presents FAR, a frequency‑aware autoregressive framework that predicts image tokens from low‑frequency to high‑frequency components using a continuous tokenizer, and demonstrates its efficiency and quality on ImageNet and text‑to‑image benchmarks compared with existing AR and VAR methods.

AI researchAutoregressive ModelsFAR

0 likes · 20 min read

Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation

AIWalker

Mar 10, 2025 · Artificial Intelligence

FlexVAR: Autoregressive Image Generation with Inpainting and Speed‑Quality Control

FlexVAR replaces residual prediction with direct ground‑truth prediction in visual autoregressive modeling, enabling generation of arbitrary resolutions and aspect ratios, supporting image‑to‑image tasks such as inpainting and upscaling, and offering adjustable inference steps that trade speed for quality while achieving state‑of‑the‑art FID scores.

Autoregressive ModelsVQVAEflexvar

0 likes · 17 min read

FlexVAR: Autoregressive Image Generation with Inpainting and Speed‑Quality Control

AIWalker

Mar 5, 2025 · Artificial Intelligence

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

The paper introduces a novel attention‑distillation loss and a guided‑sampling scheme that together enable diffusion models to faithfully transfer visual features from reference images, dramatically speeding synthesis and surpassing prior plug‑and‑play attention methods across style transfer, text‑to‑image generation, and texture synthesis tasks.

AI researchDiffusion ModelsStyle Transfer

0 likes · 15 min read

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

AI Algorithm Path

Mar 2, 2025 · Artificial Intelligence

Exploring Flux Labs AI’s New Virtual Try‑On Feature

The article reviews Flux Labs AI’s newly added virtual try‑on tool, explaining how AI, machine‑learning and computer‑vision enable seamless clothing overlays, outlining its main applications, providing a step‑by‑step usage guide, detailing pricing plans, and sharing the author’s positive performance impressions.

AIFlux Labsfashion technology

0 likes · 5 min read

Exploring Flux Labs AI’s New Virtual Try‑On Feature

AIWalker

Feb 23, 2025 · Artificial Intelligence

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

U‑ViT replaces the convolutional U‑Net backbone of diffusion models with a Vision Transformer, treats time, condition and noisy patches as tokens, adds long skip connections and a lightweight 3×3 convolution, and through extensive ablations and scaling studies achieves state‑of‑the‑art FID scores on unconditional, class‑conditional and text‑to‑image generation tasks.

AdaLNFIDLong Skip Connections

0 likes · 16 min read

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

AIWalker

Feb 21, 2025 · Artificial Intelligence

DC-ControlNet: Decoupling Control Conditions for More Flexible and Precise Image Generation

DC-ControlNet introduces intra‑ and inter‑element controllers that decouple global conditions into separate content and layout signals, enabling finer‑grained, conflict‑aware control of multi‑condition image generation and achieving higher flexibility and accuracy than traditional ControlNet approaches.

ControlNetDC-ControlNetDiffusion Models

0 likes · 20 min read

DC-ControlNet: Decoupling Control Conditions for More Flexible and Precise Image Generation

AIWalker

Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingMultimodal

0 likes · 20 min read

Transfusion: A Single Model for Unified Image Generation and Understanding

Smart Era Software Development

Feb 20, 2025 · Industry Insights

Which Creative AI Tools Are Shaping Multimodal Generative Content in 2024?

The article reviews the most notable open‑source and commercial creative AI tools released in 2024 across image, video, and audio generation, explains key technical shifts such as diffusion Transformers and zero‑shot personalization, and forecasts major trends and new releases expected in 2025.

AI artMultimodal AIaudio generation

0 likes · 14 min read

Which Creative AI Tools Are Shaping Multimodal Generative Content in 2024?

AIWalker

Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchLarge Language ModelMultimodal

0 likes · 20 min read

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

AIWalker

Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Large‑Scale TrainingMultimodal AIflow-based models

0 likes · 29 min read

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

AIWalker

Feb 5, 2025 · Artificial Intelligence

How SANA 1.5’s Efficient Linear Diffusion Transformer Sets a New SOTA in Text‑to‑Image Generation

The paper introduces SANA 1.5, an efficient linear diffusion transformer that scales training and inference compute via model growth, depth‑wise pruning, and inference‑time scaling, achieving a GenEval score of 0.80 and matching larger models while using far less resources.

AISANAdiffusion

0 likes · 23 min read

How SANA 1.5’s Efficient Linear Diffusion Transformer Sets a New SOTA in Text‑to‑Image Generation

AIWalker

Feb 4, 2025 · Artificial Intelligence

How Chain‑of‑Thought Boosts Text‑to‑Image Generation: The New o1 Inference Scheme

This article reviews a comprehensive study that applies Chain‑of‑Thought reasoning to autoregressive text‑to‑image generation, introducing extended test‑time computation, direct preference optimization, and two custom reward models (PARM and PARM++) that together improve generation quality by up to 15% over Stable Diffusion 3.

Chain-of-ThoughtDirect Preference OptimizationMultimodal AI

0 likes · 13 min read

How Chain‑of‑Thought Boosts Text‑to‑Image Generation: The New o1 Inference Scheme

Code Mala Tang

Jan 30, 2025 · Artificial Intelligence

Is Janus-Pro the Open‑Source Rival to DALL·E 3? A Deep Dive Review

This article reviews DeepSeek's Janus‑Pro image model, explains its multimodal architecture, benchmarks it against DALL·E 3 and Stable Diffusion, provides usage instructions and inference code, and offers a critical assessment of its image quality and practical limitations.

AI modelJanus-Probenchmark

0 likes · 12 min read

Is Janus-Pro the Open‑Source Rival to DALL·E 3? A Deep Dive Review

AIWalker

Jan 21, 2025 · Artificial Intelligence

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×

The paper proposes a Next Patch Prediction (NPP) paradigm that groups image tokens into high‑density patches, enabling autoregressive models to predict patches instead of individual tokens, which reduces training cost to about 0.6× and improves ImageNet FID scores by up to 1.0 across models ranging from 100 M to 1.4 B parameters.

Autoregressive ModelsFID improvementLlamaGen

0 likes · 10 min read

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×