How to Turn Bad Prompts into High‑Scoring AI Prompts: A Step‑by‑Step Guide

This article walks through a complete prompt‑engineering workflow—starting from a weak baseline, building an evaluation pipeline, and applying four concrete techniques (clarity, specificity, XML structuring, and examples) that lift a Claude score from 3.4 to over 9, with code, metrics, and real‑world examples.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Turn Bad Prompts into High‑Scoring AI Prompts: A Step‑by‑Step Guide

Prompt Engineering Overview

Prompt engineering means iteratively improving a prompt until the AI model consistently produces reliable, high‑quality output. The process follows a clear loop: set a goal → write an initial prompt → evaluate → apply engineering tricks → re‑evaluate, repeating until the score meets expectations.

Building an Evaluation Pipeline

To demonstrate the workflow, the author creates a concrete example: generating a one‑day diet plan for an athlete. The evaluation uses a PromptEvaluator instance, where the max_concurrent_tasks parameter controls parallelism (start with 3 to avoid rate‑limit errors, increase up to 5 if the API quota allows).

evaluator = PromptEvaluator(max_concurrent_tasks=5)

Dataset generation is automated with evaluator.generate_dataset, specifying the task description and input fields (height, weight, goal, restrictions). Only a few cases (2‑3) are generated during development to speed up iteration.

dataset = evaluator.generate_dataset(
    task_description="为单个运动员编写一份紧凑简洁的一日饮食计划",
    prompt_inputs_spec={
        "height": "运动员身高(厘米,cm)",
        "weight": "运动员体重(公斤,kg)",
        "goal": "运动员的目标",
        "restrictions": "饮食限制"
    },
    output_file="dataset.json",
    num_cases=3
)

Running the evaluation produces a numeric score and an HTML report that explains why each test case received its rating.

1. Clear & Direct

The first line of a prompt is the most influential. Using simple language and an imperative verb removes ambiguity for Claude. Example transformation:

Original: "这个人应该吃什么?" Improved: "为运动员生成一份满足其饮食限制的一日饮食计划。"

This change alone raised the score from 3.4 to 5.4.

2. Specificity

Adding explicit output‑quality guidelines (length, macro breakdown, meal timing, ingredient constraints, portion units, budget) guides Claude toward the desired format. After applying these guidelines the score jumped from 5.4 to 7.4.

指南:
1. 包含准确的每日总热量
2. 显示蛋白质、脂肪和碳水的含量
3. 明确每餐的进食时间
4. 仅使用符合饮食限制的食材
5. 列出所有份量(以克为单位)
6. 如果提到了预算,保持经济实惠

3. XML Tag Structuring

When prompts contain large blocks of data, XML‑style tags create clear boundaries, helping Claude parse sections correctly. Example:

prompt = f"""
根据下面的运动员信息生成一份满足其饮食限制的一日饮食计划:

<athlete_information>
- 身高:{prompt_inputs[\"height\"]}
- 体重:{prompt_inputs[\"weight\"]}
- 目标:{prompt_inputs[\"goal\"]}
- 饮食限制:{prompt_inputs[\"restrictions\"]}
</athlete_information>

指南:
1. 包含准确的每日总热量
2. 显示蛋白质、脂肪和碳水的含量
3. 明确每餐的进食时间
4. 仅使用符合饮食限制的食材
5. 列出所有份量(以克为单位)
6. 如果提到了预算,保持经济实惠
"""

Adding these tags lifted the score from 7.4 to above 9.

4. Providing Examples (One‑Shot / Multi‑Shot)

Supplying input‑output pairs is the most powerful trick. In a sentiment‑analysis case, the author shows how a sarcastic tweet can be mis‑classified unless a negative‑sentiment example is given.

Positive example: "今晚的比赛太棒了!" → 正面 Negative (sarcastic) example: "哦对,我今晚正想要航班延误!太棒了!" → 负面

Examples are wrapped in XML tags ( <sample_input>, <ideal_output>) so Claude knows each part’s role. The author also notes a pitfall: an overly long example conflicted with a concise‑output requirement, causing the score to drop to 7.5. Replacing it with a shorter, high‑scoring example restored the score to 9.5+.

Putting It All Together

The final prompt combines the four techniques: a clear opening command, detailed quality guidelines, XML sections for athlete data, and a concise one‑shot example. Running the evaluation on this composite prompt consistently yields scores above 9, demonstrating how systematic, data‑driven prompt engineering transforms a weak baseline into a robust, high‑performing prompt.

Self‑Check Quiz

At the end, the article includes a short quiz that asks readers to choose the best opening sentence, the optimal way to handle sarcasm, and the definition of prompt engineering, reinforcing the learned concepts.

AIprompt engineeringXMLevaluationPrompt OptimizationClaude
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.