Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities
This article presents a thorough assessment of GPT‑4o’s new image generation features, detailing multiple test scenarios—from simple portrait creation and style transfer to UI design, product rendering, and educational illustrations—comparing its output with Claude‑3.7‑Sonnet, highlighting strengths in realism and weaknesses in Chinese text handling.
On March 26, 2025 GPT‑4o received an update that added advanced image generation capabilities, prompting a series of experiments to determine whether natural‑language prompts could replace traditional “magic‑spell” methods.
Test Scenario 1 – Portrait Creation : The prompt asked for a 40‑year‑old Asian man standing on the Shanghai Bund, wearing a black T‑shirt with the word "Black" printed, long hair, and glasses. The generated image was realistic, though the cartoon version differed slightly and the face appeared slightly rounded.
Test Scenario 2 – Image Editing and Style Transfer : Starting from the portrait, the following operations were performed:
Remove all people from the original photo, keeping only the scenery.
Add a white horse that blends naturally with the environment.
Place a historically dressed woman riding the horse.
Change the viewpoint to a top‑down aerial view without losing any elements.
Convert the whole scene to a Studio Ghibli‑style cartoon.
Evaluations noted high fidelity in most edits, with minor color differences and occasional facial distortions.
Test Scenario 3 – Product Content Translation : The prompt requested translation of all Chinese text in an image to English while preserving layout and colors. The resulting image showed reasonable translation quality, though some characters were mis‑rendered.
Test Scenario 4 – Poster Design (Xiaohongshu Cover) : A detailed prompt defined a tech‑heavy DeepSeek model cover with specific colors, fonts, and symbols. GPT‑4o produced a visually striking poster, while Claude‑3.7‑Sonnet generated a comparable layout but with less polish.
Test Scenario 5 – Recruitment Advertisement : The prompt asked for a compelling recruitment poster for an operations talent. GPT‑4o’s design was aesthetically strong, though a few Chinese characters were incorrect; Claude’s version was less visually appealing.
Test Scenario 6 – Industrial Product Design (Car) : Using the Chinese Ideal Mega car as a base, the prompt requested a sleek, glossy gray‑black redesign. GPT‑4o delivered a high‑quality, realistic rendering that was praised for its premium feel.
Test Scenario 7 – UI Design (E‑commerce) : Two prompts generated a JD‑style desktop website and an iPhone app mock‑up. GPT‑4o produced a complete, well‑structured UI with correct colors and layout; Claude’s output was comparable in content but slightly less polished.
Test Scenario 8 – Book Cover Design : The prompt described a DeepSeek model book cover attached to a realistic book. GPT‑4o generated a convincing 3‑D book rendering; Claude’s version was less realistic but contained richer textual details.
Test Scenario 9 – Educational Comic (Quadratic Equation) : The prompt asked for a multi‑panel comic explaining a quadratic equation. GPT‑4o produced a clear, colorful comic; Claude’s version was more text‑dense but less visually engaging.
Test Scenario 10 – Chinese Blackboard Writing : The prompt required a chalk‑style rendering of Li Bai’s poem with the words "黄鹤楼" highlighted in red. The generated image missed the exact highlight location and used a more calligraphic style than requested.
Test Scenario 11 – Math Formula Board : A LaTeX block describing an ellipse was turned into a chalkboard style formula. The output contained several transcription errors and omitted some terms.
Test Scenario 12 – Physics Experiment Illustration : The prompt described a momentum‑conservation experiment with air‑track carts and photogates. GPT‑4o’s illustration was visually appealing but omitted some LaTeX symbols; Claude’s SVG version retained the formulas but looked less realistic.
Test Scenario 13 – Chemistry Experiment Illustration : The prompt asked for a side‑by‑side depiction of oxygen generation via heated KClO₃ and H₂O₂ decomposition. GPT‑4o produced a colorful, semi‑realistic diagram; Claude’s version was more schematic but contained fewer visual errors.
Overall Findings :
Advantages : High realism and detail in product rendering, scene editing, and UI design; strong visual appeal in poster and book‑cover creation; effective LUI (language‑user‑interface) interaction that lowers the barrier to image generation.
Limitations : Inconsistent handling of Chinese characters (misspellings, misplaced highlights); occasional errors in complex mathematical or chemical notation; occasional failure to follow fine‑grained instructions.
Comparison with Claude‑3.7‑Sonnet : GPT‑4o generally outperforms Claude in photorealism and artistic style, while Claude sometimes provides more complete technical content.
In conclusion, GPT‑4o’s multimodal image generation marks a significant step forward, especially for users seeking intuitive, natural‑language driven creation, though further refinement is needed for precise Chinese text and specialist notation handling.
Nightwalker Tech
[Nightwalker Tech] is the tech sharing channel of "Nightwalker", focusing on AI and large model technologies, internet architecture design, high‑performance networking, and server‑side development (Golang, Python, Rust, PHP, C/C++).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.