Create a Python Semantic Validator with Claude AI & LangChain in 150 Lines

This article shows QA engineers how to build a Python semantic validator using Claude AI and LangChain, demonstrating rule‑based checks, a concrete e‑commerce example, performance tricks, and cost estimates to catch logical inconsistencies that traditional schema validators miss.

Woodpecker Software Testing
Woodpecker Software Testing
Woodpecker Software Testing
Create a Python Semantic Validator with Claude AI & LangChain in 150 Lines

QA engineers often encounter test cases that pass syntactic checks but contain nonsensical business logic, such as orders delivered before they are placed, three‑year‑old customers ordering, or a $1 item charged $500 shipping. Traditional validators only verify JSON structure, not semantic correctness.

Rule‑Based Evaluation

Rule‑based evaluation applies predefined business rules to assess whether data makes sense in the domain. Unlike schema validation, which asks "Is this valid JSON?", rule‑based evaluation asks "Are these values reasonable for our business?" For example, an e‑commerce order must have a customer age of at least 13 and a delivery date later than the order date.

Environment Setup

Install the required packages and set the Anthropic API key:

pip install langchain langchain-anthropic pydantic
export ANTHROPIC_API_KEY='your-api-key-here'

Core Validator Construction

Define the result model:

class ValidationResult(BaseModel):
    valid: bool = Field(description="测试数据是否有效")
    violations: List[str] = Field(default_factory=list, description="发现的违规列表")
    severity: Dict[str, str] = Field(default_factory=dict, description="每条违规的严重程度")
    suggestions: List[str] = Field(default_factory=list, description="修复建议")

Initialize the validator with Claude‑3‑Haiku:

class TestDataValidator:
    def __init__(self, model: str = "claude-3-haiku-20240307"):
        self.llm = ChatAnthropic(model=model, temperature=0, max_tokens=1000)

Define domain‑specific rules (e‑commerce example):

def create_validation_rules(self, data_type: str) -> List[str]:
    return [
        "订单日期必须早于或等于当前日期",
        "配送日期必须晚于或等于订单日期",
        "客户年龄应真实合理(13-120岁)",
        "订单总价必须大于0",
        "运费应与订单总价成比例",
        "商品数量应与商品数组长度一致",
        "订单状态流转应合理",
    ]

Run validation by constructing a prompt that lists the fixture and rules, then asks the LLM to return violations in JSON:

def validate_fixture(self, fixture: Dict, data_type: Optional[str] = None) -> ValidationResult:
    rules = self.create_validation_rules(data_type)
    prompt = ChatPromptTemplate.from_template("""
    你是一个 QA 专家,负责验证测试数据的语义正确性。
    测试数据:{fixture}
    验证规则:{rules}
    请识别可能导致测试失败的违规项,并返回 JSON 格式的结果。
    """)
    # LLM call omitted for brevity

Real‑World Example

A deliberately flawed order demonstrates the validator:

suspicious_order = {
    "order_id": "ORD001",
    "order_date": "2024-12-01",
    "delivery_date": "2024-11-15",  # delivery before order
    "customer_age": 3,               # child ordering
    "order_total": 0.01,
    "items_count": 47,              # count mismatch
    "items": [{"id": "item1", "price": 0.01}],
    "shipping_cost": 500.00,        # excessive shipping
}

Validation output:

✅ Valid: False

配送日期早于订单日期

客户年龄不真实

运费与订单金额不成比例

Suggested fixes:

将配送日期设置为订单日期之后

将客户年龄设置为13至120岁之间的合理值

调整运费,使其与订单金额成比例

Custom Rules

Users can define rule sets for other domains, such as financial transactions or user subscriptions:

VALIDATION_RULES = {
    "financial_transaction": [
        "交易金额应符合货币精度规则",
        "时间戳应按时间顺序排列",
        "账户余额不得违反监管最低限额",
    ],
    "user_subscription": [
        "订阅结束日期应晚于开始日期",
        "套餐功能应与订阅等级匹配",
        "支付历史应与账单周期一致",
    ],
}

Performance & Cost Optimizations

Cache validation results to avoid duplicate LLM calls, saving >70% of API usage.

Batch parallel processing with ThreadPoolExecutor to validate many fixtures concurrently.

Trigger validation intelligently (e.g., pre‑commit, nightly jobs, CI/CD) instead of on every test run.

Cost estimates:

≈ $0.0005 per validation call

≈ $0.5 for 1,000 records

≈ $5 for 10,000 records

Conclusion

Semantic validation adds an AI‑driven “common‑sense” layer to QA pipelines, turning test data from a potential risk into a reliable quality guarantee. With roughly 150 lines of Python, teams can automatically catch logical inconsistencies that pure schema checks miss, ensuring orders are realistic, ages are plausible, and shipping costs are proportional.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMLangChainQA TestingClaude AISemantic Validation
Woodpecker Software Testing
Written by

Woodpecker Software Testing

The Woodpecker Software Testing public account shares software testing knowledge, connects testing enthusiasts, founded by Gu Xiang, website: www.3testing.com. Author of five books, including "Mastering JMeter Through Case Studies".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.