Artificial Intelligence 20 min read

How Large Language Models Transform Advertising Copy Generation

This article examines the adoption of large language models for intelligent advertising copy creation, detailing business challenges, model selection criteria, training data preparation, fine‑tuning methods, performance evaluation, deployment results, while highlighting the trade‑offs between model size, cost, and output quality.

NewBeeNLP

May 16, 2024

How Large Language Models Transform Advertising Copy Generation

Introduction

Advertising platforms rely on copywriting for virtually every creative asset. The rapid development of large language models (LLMs) such as ChatGPT creates an opportunity to automate ad copy generation across diverse formats, including short‑video and audio scripts. This work investigates how a unified LLM can address three business challenges: (1) the large variety of copy types required by different ad channels, (2) the need for richer, more creative content, and (3) emerging formats that demand longer, more expressive text.

Model Selection

Why Use a Large Model?

LLMs provide strong generative capabilities but incur higher development and serving costs. The team identified three pain points that justify a large model:

Variety of copy types : Search, display, external placement, and tool‑based ads each have distinct style and format requirements.

Richness of content : Product‑related creative copy benefits from the generative power and world knowledge of LLMs.

Emerging formats : Short‑video and audio ads require longer, more expressive narratives.

A single LLM that can handle all these scenarios reduces maintenance overhead while improving creative quality.

Evaluation Metrics

Two groups of metrics were defined:

Objective metrics – controllability of length, compliance with predefined formats, and ability to differentiate between business scenarios.

Subjective metrics – fluency, elegance, and alignment with product information. Human evaluation is expensive, so GPT‑based scoring was used with carefully crafted prompts that enforce mutually exclusive judgments to avoid metric interference.

Model Families Considered

The shortlist included Chinese‑tuned LLaMA variants, Baichuan, ChatGLM, and the QWen series. Zero‑shot tests showed acceptable fluency but insufficient control over length and format. After fine‑tuning on a curated copy dataset, all models achieved >97% compliance with format and quantity constraints.

Performance highlights:

Length control: QWen performed best; LLaMA‑Chinese lagged.

Fluency & creativity: Baichuan excelled.

Fluency & product relevance: ChatGLM balanced both.

Overall (fluency + product alignment): QWen 1.5 (latest version) outperformed all others.

Parameter Size Decision

Models from 6 B to 14 B parameters were benchmarked. Larger models improved objective metrics (e.g., tighter length control) but did not yield noticeable gains on subjective metrics. Considering inference cost, QWen 1.5‑7B was selected for production; the 14 B variant required quantization to fit a single NVIDIA A10 GPU, whereas the 7 B model runs in FP16 without quantization.

Model Training

Data Preparation

High‑quality, diverse data are essential. The pipeline consisted of:

Rule‑based cleaning of massive raw text corpora to enforce copy length, format, and category coverage.

Enrichment with spoken‑style video narration obtained via ASR/OCR pipelines.

Synthetic data generation using GPT‑4 with engineered prompts to cover scarce copy scenarios.

Mixing generic instruction‑following data to preserve the model’s general language ability and avoid over‑fitting to proprietary copy.

Training Process

Several fine‑tuning techniques were evaluated (Prompt Tuning, Prefix Tuning, LoRA, full‑parameter fine‑tuning). Full‑parameter fine‑tuning, combined with DeepSpeed and ZeRO optimizations, was chosen as the primary method because it yielded the best trade‑off between convergence speed and final quality.

Results and Impact

Unified Copy Generation

Replacing multiple legacy models with a single QWen 1.5‑7B model simplified maintenance and improved copy diversity. Internal A/B tests showed higher acceptance rates for the 10‑copy output set, and external‑placement titles exhibited broader length and style coverage, reducing manual review effort.

Business Enhancements

In the search‑ad (“Direct Train”) scenario, style‑rich product summaries (e.g., flamboyant, scientific) generated by the LLM led to measurable performance lifts. In video‑editing tools, long‑form copy enabled richer narration styles such as influencer recommendations, storytelling, and promotional scripts.

New Business Exploration

For video ads, the team focused on generating engaging openings, vivid imagery, and controllable outputs. Keyword extraction was also upgraded: leveraging the LLM’s long‑context understanding, keywords now combine product attributes with inferred user intent, improving ad recall relevance.

Future Outlook

Planned directions include:

Scaling models to explore the limits of personalized copy generation.

Investigating ultra‑compact models trained on extremely high‑quality data to lower operational costs.

Extracting world knowledge from LLMs to enhance product understanding, user‑preference inference, and forward‑looking ad concepts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models Fine-tuning natural language processing model evaluation AI marketing advertising copy

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.