Artificial Intelligence 15 min read

Master GPT‑Image‑2: Multi‑Round Iteration, Local Editing, Batch Generation, Reference Images

This guide explains how to unlock GPT‑Image‑2’s four advanced capabilities—multi‑round iteration, natural‑language local editing, multi‑image generation, and reference‑image mode—by showing concrete prompts, code snippets, best‑practice formulas, performance data, and common pitfalls to avoid.

James' Growth Diary

Jun 6, 2026

Master GPT‑Image‑2: Multi‑Round Iteration, Local Editing, Batch Generation, Reference Images

01 Multi‑Round Iteration

GPT‑Image‑2 keeps the previous image in the same session, unlike DALL‑E 3, so you can treat the model as a designer and adjust the image round by round.

Basic iteration workflow

第一轮生成底图：一张俯拍的白色陶瓷咖啡杯，放在深色木桌上，旁边有几颗咖啡豆，柔和侧光，景深效果。

第二轮微调：把杯子颜色改成磨砂墨绿色，其他元素保持不变。

第三轮继续：木桌换成白色大理石桌面，咖啡豆位置和数量保持原样。

Key point: Change only one variable per iteration, similar to debugging code.

Three iteration strategies

Attribute Replacement – change color, material, or text. Example: "把标题文字从蓝色改成红色".

Element Add/Remove – add or delete objects or characters. Example: "在画面右侧加一盆绿植".

Style Transfer – keep composition, change style. Example: "保持构图不变，改成水彩风格".

Iteration boundaries

When the modification requires a structural change, restart the session:

Completely change composition (e.g., from landscape to portrait).

Change the number of main subjects (e.g., from single person to multiple people).

Drastically alter perspective (e.g., from top‑down to eye‑level).

一句话总结：迭代适合“改属性不改结构”，结构性变化请重新写 Prompt。

02 Local Editing

Local editing lets you describe the region to change in natural language, avoiding manual masks.

Basic usage (Python SDK)

from openai import OpenAI
client = OpenAI()
# Step 1: generate base image
base = client.images.generate(
    model="gpt-image-2",
    prompt="一间现代风格的客厅，浅色沙发，落地窗，窗外城市天际线",
)
base_url = base.data[0].url
# Step 2: local edit
edit = client.images.edit(
    model="gpt-image-2",
    image=base_url,
    prompt="将沙发颜色从浅灰色改为深宝石绿，保持光影和其他家具完全不变",
)

Five common editing operations

Add Element

在桌面右侧添加一杯冒着热气的咖啡，放在木制杯垫上，光影与现有环境匹配。

Delete Element

移除画面左侧的人物，背景保持完整，填补空缺。

Replace Element

将墙上的装饰画换成一幅黑白城市建筑摄影，保持画框不变。

Color Adjustment

将天空从蓝色改为黄昏的橙紫色渐变，地面光照同步调整为暖色调。

Style Change

将画面整体转换为水彩画风格，笔触柔和，色彩淡雅。

Control‑Variable Method: 3W1K formula

What to change – specify the object or region (e.g., "图片中左侧第二个花瓶").

What attribute – define the attribute to modify (color, material, shape, text, etc.).

What to keep – list elements that must stay unchanged.

Key constraint – add strict qualifiers such as "严格保持" or "其他一切不变".

Example prompt: "只将模特左手拿的手机从黑色改成银色，衣服纹理、背景大楼、姿势、表情、光影方向全部严格保持不变。"

Empirical data (single‑step success rate)

人物换装 – T恤黑→白衬衫 – 100% success – 3.8 s response.

家具改色 – 沙发现代→复古雕花 – 100% success – 4.2 s response.

UI 组件改动 – 按钮圆角→胶囊形 – 100% success – 3.5 s response.

Data source: internal benchmark, April 2026, 1024×1024 source images, API calls.

03 Multi‑Image Generation

GPT‑Image‑2 can generate up to 8 images per API call, producing distinct variants of the same theme.

Basic multi‑image call

response = client.images.generate(
    model="gpt-image-2",
    prompt="一张现代简约风格的品牌 Logo，主图形是一只抽象的飞鸟，配色深蓝和金色",
    n=4,
    size="1024x1024",
    quality="medium",
)
for i, img in enumerate(response.data):
    print(f"图片 {i+1}: {img.url}")

Typical scenarios

Brand logo exploration – generate 4‑6 concepts in one request, saving 3‑4× time compared with single‑image loops.

Comic storyboard – generate a set of 6 panels describing a person running at dawn under a city skyline; keep character appearance consistent and use a blue‑orange gradient palette. GPT‑Image‑2 maintains consistency better than DALL‑E 3, though occasional drift can occur; define character traits explicitly.

A/B test assets – produce multiple stylistic variants of a marketing image in one call and select the best performer.

Multi‑image vs. single‑image loop

Speed : n=4 → 2‑4 s; loop 4 × n=1 → 8‑16 s.

Diversity : model‑controlled for n=4; independent random for loop.

Cost : identical for both approaches.

Control : n=4 offers weak mid‑generation control; loop allows strong per‑image prompt adjustments.

Recommendation : use n=4 during the exploration phase, then switch to single‑image loops for fine‑tuning.

04 Reference Image

Reference images let the model learn a desired style, composition, or character traits.

Two reference modes

Mode 1 – Style reference

import base64
with open("style_reference.jpg", "rb") as f:
    style_b64 = base64.b64encode(f.read()).decode("utf-8")
response = client.images.generate(
    model="gpt-image-2",
    prompt="生成一张咖啡馆海报，文字内容：'COFFEE & CODE'，风格参考上传的图片",
    reference_images=[f"data:image/jpeg;base64,{style_b64}"],
)

Mode 2 – Character/Object reference

参考这张人物照片，生成他在海滩度假的场景，保持面部特征、发型和体型不变。

Best‑practice checklist

Use a high‑resolution reference; low‑resolution images cause the model to guess details.

Ensure the main subject dominates the frame; keep background simple.

Prefer a single reference image; mixing multiple images confuses the model.

Complement the image with clear textual description to convey information the picture cannot.

Reference image vs. local editing

Input : reference image vs. image to be edited.

Output : completely new generation vs. modification of the original.

Applicable scenarios : style transfer / character preservation vs. fine‑tuning, recoloring, add/remove.

Composition preservation : not guaranteed for reference images; strictly preserved for local editing.

One‑sentence summary : reference images tell the model "draw like this", while local editing tells it "modify the existing picture".

05 Common Pitfalls

Pitfall 1 – Too many iterations cause degradation

Each round introduces tiny changes; after five rounds the image may become blurry or color‑shifted.

Solution : after 3‑4 iterations, restart from the first image and merge earlier modifications into a single prompt.

Pitfall 2 – Local editing unintentionally changes other areas

Changing one element can affect unrelated parts.

Solution : use the 3W1K formula to list an "immutable list". If it still fails, split the prompt into two steps – first lock the unchanged parts, then modify the target.

Pitfall 3 – Inconsistent characters in multi‑image batches

Characters across a batch may look different.

Solution : explicitly define character attributes in the prompt (e.g., "30‑year‑old Asian male, short hair, black‑frame glasses, dark‑blue shirt"). The more specific, the better.

Pitfall 4 – Reference image style dilution

The model may only partially adopt the reference style.

Solution : emphasize "严格参考上传图片的风格" in the prompt and reduce conflicting style descriptors.

Summary

Multi‑Round Iteration – best for changing attributes without altering structure; modify one variable per round and consider resetting after 3‑4 rounds.

Local Editing – follow the 3W1K formula; single‑step success rate can reach 100%.

Multi‑Image Generation – use n=4 in the exploration phase to quickly generate options, then fine‑tune each image individually.

Reference Images – two modes (style reference and character/object reference); a single high‑resolution image plus clear textual description yields the best results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

prompt engineering image generation reference images local editing GPT Image 2 batch generation

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

01 Multi‑Round Iteration

Basic iteration workflow

Three iteration strategies

Iteration boundaries

02 Local Editing

Basic usage (Python SDK)

Five common editing operations

Control‑Variable Method: 3W1K formula

Empirical data (single‑step success rate)

03 Multi‑Image Generation

Basic multi‑image call

Typical scenarios

Multi‑image vs. single‑image loop

04 Reference Image

Two reference modes

Best‑practice checklist

Reference image vs. local editing

05 Common Pitfalls

Pitfall 1 – Too many iterations cause degradation

Pitfall 2 – Local editing unintentionally changes other areas

Pitfall 3 – Inconsistent characters in multi‑image batches

Pitfall 4 – Reference image style dilution

Summary

James' Growth Diary

How this landed with the community

Was this worth your time?

0 Comments

Pitfall 1 – Too many iterations cause degradation

Pitfall 2 – Local editing unintentionally changes other areas

Pitfall 3 – Inconsistent characters in multi‑image batches

Pitfall 4 – Reference image style dilution