Fin2.0 AI‑Powered Design Assistant: Text‑to‑Image Generation, Prompt Engineering, and Practical Case Study
Fin2.0, NetEase Cloud Music’s AI‑driven design assistant, combines text‑to‑image, text‑to‑icon and text‑to‑copy generation with an internal Stable Diffusion engine and streamlined prompt templates, enabling non‑designers like a colleague to create high‑quality promotional banners in hours while avoiding external costs and data‑security risks.
Fin2.0 is an AI‑driven design assistant developed by NetEase Cloud Music's Public Technology Department. Its vision is to empower the design process with AIGC, lower design thresholds and costs, and simplify business innovation.
Background
The article presents a real‑world case where a business colleague, Jersey, needed promotional banner images for a new song but could not secure a designer. Leveraging Fin2.0’s text‑to‑image ("文生图") feature, Jersey generated two high‑quality banner images within half a day, achieving a #2 ranking on the daily rise chart.
AIGC Capability Matrix
Fin2.0 integrates three AIGC capabilities—text‑to‑image, text‑to‑icon, and text‑to‑copy—to reconstruct the entire design workflow, improving efficiency, reducing communication cost, and avoiding data‑security risks associated with external services.
Challenges with Existing Tools
Internal Dreammaker (Stable Diffusion) requires complex configuration (model, LoRA, prompts, negative prompts, ControlNet, sampler, VAE, etc.).
Midjourney incurs external costs and requires multiple accounts for team usage.
External tools raise data‑security concerns for confidential projects.
Fin2.0 partnered with Dreammaker, keeping all generated data inside the company and benefiting from Dreammaker’s abundant compute resources.
Three‑Step Image Generation Process
Using Stable Diffusion, a single text‑to‑image operation involves more than 30 configuration parameters, grouped into three categories:
1. Mandatory Parameters
Parameter Name
Description
model_name
Base model name
prompt
Positive prompt
2. Basic Parameters
Parameter Name
Description
negative_prompt
Negative prompt
sampler_name
Sampling method
steps
Number of sampling steps
width
Image width
height
Image height
cfg_scale
Prompt relevance scale
n_iter
Iteration count (number of images)
seed
Random seed
3. Auxiliary Parameters
Parameter Name
Description
enable_hr
Enable high‑resolution generation
hr_scale
High‑resolution upscale factor
denoising_strength
Re‑draw strength
hr_upscaler
High‑resolution upscaler algorithm
hr_resize_x
Target width after resize
hr_resize_y
Target height after resize
Additional modules such as LoRA (for style or subject‑specific fine‑tuning) and ControlNet (for special scene control) are also supported, with their own parameter tables provided in the source.
Prompt Template
The recommended prompt formula is:
Subject + Subject Modifiers + Camera & Lighting + Style Settings
Four components:
Subject: Main visual element (e.g., teenager, vinyl record, lake).
Subject Modifiers: Attributes like facial features, expressions, clothing, actions, environment.
Camera & Lighting: Angle, perspective, lighting conditions, image quality descriptors.
Style Settings: Artistic style (e.g., Ghibli, Pixar), image type (photo, illustration, Chinese‑style).
Example prompt: far desert, nearby poplar forest, large lake, Gobi, sheep, yurt, rich details, close‑up, landscape, children’s watercolor
Advanced Settings, History, and Asset Library
For power users, Fin2.0 offers advanced controls such as resolution selection (512×512 for most models, 1024×1024 for SDXL), iteration count, prompt strength, and seed. Generated history and an internal asset gallery allow users to bookmark and reuse high‑quality outputs.
Practical Experience
Common pitfalls include using mismatched models (e.g., a landscape‑oriented model for portrait generation) or inappropriate image sizes, leading to artifacts. The recommended workflow is to preview model capabilities, select the appropriate model, keep image size consistent with training data, and optionally provide reference images for image‑to‑image generation.
Understanding the Diffusion Model
Stable Diffusion converts textual prompts into latent image representations via a text encoder and a noise predictor. Repeated prediction‑and‑denoise steps gradually transform pure noise into an image that aligns with the semantic vector derived from the prompt.
Understanding Prompts
Prompts are tokenized strings that guide the diffusion process. While older models relied heavily on tag‑based prompts, newer SDXL models accept natural language descriptions, reducing the need for rigid token structures.
Creating Complex Images
When a single prompt cannot achieve the desired composition, a two‑stage approach is suggested: first generate partial elements (e.g., character heads) using text‑to‑image, then assemble them in a design tool (MasterGo or Figma) and finally refine the composite with a second text‑to‑image pass.
Conclusion
Fin2.0’s text‑to‑image feature has been applied to various business scenarios such as promotional banners, H5 hero images, and live‑stream assets. Continuous user feedback drives iterative improvements, aiming to make AI‑assisted design more accessible and efficient.
References
https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-matrix
https://stablediffusionxl.com/
https://github.com/CompVis/latent-diffusion
https://zhuanlan.zhihu.com/p/628714183
https://www.uisdc.com/lora-model
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.