Artificial Intelligence 20 min read

5 Design Patterns to Control LLM Output in Generative AI Applications

The article presents five design patterns—Logits Masking, Grammar, Style Transfer, Reverse Neutralization, and Content Optimization—for steering the output of generative AI models, compares their suitable scenarios, advantages, drawbacks, and anti‑patterns, and provides concrete implementation steps, code snippets, and flowcharts to help developers reliably enforce style, format, and compliance constraints.

DaTaobao Tech

Jan 7, 2026

5 Design Patterns to Control LLM Output in Generative AI Applications

Logits Masking

Logits Masking injects user‑defined constraints directly into the model’s token‑selection step during beam search. For each generation step the logits of tokens that violate a rule are set to ‑inf (or zero after softmax), preventing those tokens from being sampled.

Typical anti‑pattern

Repeatedly calling the model until the output satisfies a rule set, which incurs high latency and cost.

Solution steps

Define Rule objects (e.g., keyword blacklists, regexes, token lists).

At each generation step, obtain the candidate continuations and set the logits of illegal tokens to zero.

Continue as long as at least one legal continuation exists.

If no legal continuation remains, backtrack to a previous step.

After a configurable number of retries, return a refusal response if constraints cannot be satisfied.

from transformers import pipeline
MODEL_ID = "/Users/mario/.cache/modelscope/hub/models/LLM-Research/Phi-3-mini-4k-instruct"
pipe = pipeline(
    task="text-generation",
    model=MODEL_ID,
    kwargs={"return_full_text": False},
    model_kwargs={}
)
results = pipe(
    input_message,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.8,
    num_beams=10,
    use_cache=True,
    logits_processor=[MyLogitsProcessor()]  # custom processor implementing the rules
)