Artificial Intelligence 9 min read

Unlocking AI Creativity with Just Eight Words: The Verbalized Sampling Breakthrough

A recent Stanford and West Virginia University study reveals that a simple eight‑word prompt technique, called Verbalized Sampling, can double the creative output of large language models without costly retraining, by exposing hidden diversity suppressed by conventional alignment methods.

Code Mala Tang

Oct 28, 2025

Unlocking AI Creativity with Just Eight Words: The Verbalized Sampling Breakthrough

The Problem with Current Prompting

Repeated attempts to get ChatGPT to tell a coffee joke produced the same boring answer each time, illustrating how post‑training alignment (RLHF, DPO, reward models) drives models into a "mode collapse" that favors safe, stereotypical responses.

What Researchers Discovered

A paper from Stanford and West Virginia University introduced Verbalized Sampling , a technique that requires only eight carefully chosen words to unlock the creativity that alignment had hidden.

The authors analyzed 6,874 human preference scores from the HelpSteer dataset and found systematic human bias: annotators tend to select familiar, typical answers due to exposure effect, availability heuristic, processing fluency, and pattern consistency, resulting in a typicality bias weight of α ≈ 0.57.

Why It Works

Instead of asking for a single response, prompting the model to generate multiple responses with associated probabilities forces it to sample from the tails of its learned distribution, revealing diverse outputs that were previously inaccessible.

How to Apply It

Method 1 – Copy‑Paste Prompt : Insert the following XML‑style instruction into any chat interface:

<instructions>
Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>. Randomly sample responses from the full distribution.
</instructions>
[Your actual query]

Method 2 – System Prompt (for custom instructions or API use):

You are a helpful assistant.
For each query, generate five possible responses, each within a separate <response> tag.
Each response should include a <text> and a numeric <probability>.
Sample from the distribution tails so each probability is < 0.10.

Method 3 – Python Package (for developers):

pip install verbalized-sampling

from verbalized_sampling import verbalize

dist = verbalize(
    "Write a 100‑word story about an astronaut who discovers something unexpected",
    k=5,
    tau=0.10,
    temperature=0.9,
)
print(dist.sample(seed=42).text)

Results Across Models

Testing on major LLMs showed:

Creative writing diversity increased 1.6–2.1×, restoring 66.8% of the base model’s creativity (vs. 23.8% without the technique).

Human preference scores rose 25.7% in 2,700 evaluations.

Dialogue tasks matched fine‑tuned models while sounding more human.

Open‑ended question diversity grew 1.9×.

Synthetic data generated with Verbalized Sampling improved downstream task accuracy by 14–28%.

Larger models benefited the most; for example, GPT‑4.1’s diversity doubled compared with its smaller counterpart.

Implications

The findings overturn the belief that alignment permanently destroys model creativity. Instead, creativity remains in the weights; it is merely hidden by the way we prompt models. Prompt engineering, not algorithmic change, can recover this latent diversity.