Artificial Intelligence 13 min read

Build a Test‑Specific AI Agent to Auto‑Generate Pytest Cases and Analyze Allure Reports

This guide presents an end‑to‑end solution for creating a test‑focused AI agent that indexes project code and defect data, integrates a large language model via LangChain, generates compliant Pytest cases, parses Allure reports, and offers deployment tips for seamless PyCharm integration.

Test Development Learning Exchange

Mar 24, 2026

Build a Test‑Specific AI Agent to Auto‑Generate Pytest Cases and Analyze Allure Reports

Developers can empower their testing workflow by building a dedicated AI agent that understands project structure, historical defects, and testing standards, enabling automatic generation of high‑quality Pytest cases and intelligent analysis of Allure reports.

Why a test‑specific AI Agent?

Unlike generic AI assistants, a test‑oriented agent knows the tests/api and tests/ui directory conventions, can access private defect databases such as Jira, understands business rules like payment idempotency, and can parse Allure reports to provide actionable insights.

System Architecture

Core components include:

Knowledge base: project source code, defect data, and test documentation.

Vector engine: converts text to semantic vectors.

Large model: Qwen (high‑accuracy) or a local CodeLlama for privacy.

Agent router: selects the appropriate knowledge source based on query type.

PyCharm integration: seamless embedding within the IDE.

Step‑by‑step implementation

1. Build the knowledge base

1.1 Index project code

# index_code.py
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

def index_project_code(project_path: str, persist_dir: str):
    # Load all Python files
    loader = DirectoryLoader(project_path, glob="**/*.py")
    docs = loader.load()
    # Split by function/class while preserving context
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=600,
        chunk_overlap=100,
        separators=["

def ", "

class "]
    )
    splits = text_splitter.split_documents(docs)
    # Vectorize and store
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=OllamaEmbeddings(model="nomic-embed-text"),
        persist_directory=persist_dir
    )
    vectorstore.persist()
    print(f"Indexed {len(splits)} code chunks to {persist_dir}")

index_project_code("./my_project", "./vectorstores/code")

1.2 Build defect knowledge base

# index_defects.py
import pandas as pd
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

def index_defects(jira_csv: str, persist_dir: str):
    df = pd.read_csv(jira_csv)
    defect_texts = []
    for _, row in df.iterrows():
        text = f"""
        缺陷ID: {row['issue_key']}
        标题: {row['summary']}
        描述: {row['description']}
        根因: {row['root_cause']}
        解决方案: {row['resolution']}
        模块: {row['component']}
        """
        defect_texts.append(text)
    vectorstore = Chroma.from_texts(
        texts=defect_texts,
        embedding=OllamaEmbeddings(model="nomic-embed-text"),
        persist_directory=persist_dir
    )
    vectorstore.persist()

index_defects("jira_defects.csv", "./vectorstores/defects")

Data sources recommended:

Code: Git repository

Defects: Jira or Azure DevOps export

Documentation: Confluence or Markdown test specifications

2. Create the AI Agent core engine

2.1 Define tools

# tools.py
from langchain_core.tools import tool
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

@tool
def search_code(query: str) -> str:
    """Search the project code base and return relevant snippets"""
    vectorstore = Chroma(persist_directory="./vectorstores/code", embedding_function=OllamaEmbeddings(model="nomic-embed-text"))
    results = vectorstore.similarity_search(query, k=3)
    return "

".join([doc.page_content for doc in results])

@tool
def search_defects(query: str) -> str:
    """Search the historical defect library and return similar cases"""
    vectorstore = Chroma(persist_directory="./vectorstores/defects", embedding_function=OllamaEmbeddings(model="nomic-embed-text"))
    results = vectorstore.similarity_search(query, k=2)
    return "

".join([doc.page_content for doc in results])

2.2 Build the agent executor

# agent.py
from langchain import hub
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_ollama import OllamaLLM
from tools import search_code, search_defects

def create_test_agent():
    # Initialize the large model (can be swapped for Alibaba Cloud Baichuan API)
    llm = OllamaLLM(model="qwen:7b", temperature=0.2)
    # Load a prompt that supports function calling
    prompt = hub.pull("hwchase17/openai-functions-agent")
    tools = [search_code, search_defects]
    agent = create_tool_calling_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

# Global agent instance
test_agent = create_test_agent()

3. Add advanced capabilities

3.1 Allure report analysis

# allure_analyzer.py
import xml.etree.ElementTree as ET

def parse_allure_report(report_path: str) -> str:
    """Parse Allure report and extract failure patterns"""
    tree = ET.parse(f"{report_path}/widgets/summary.json")
    summary = tree.getroot()
    failed_tests = []
    for suite in summary.findall('.//testSuite'):
        if suite.find('statistic/failed').text != "0":
            suite_name = suite.get('name')
            failed_count = suite.find('statistic/failed').text
            failed_tests.append(f"{suite_name}: 失败 {failed_count} 个用例")
    return "近期失败热点:
" + "
".join(failed_tests[:5])

@tool
def analyze_allure() -> str:
    return parse_allure_report("./allure-report")

3.2 Test case generator

# test_generator.py
TEST_PROMPT_TEMPLATE = """
You are a senior test engineer. Generate pytest cases for the given function.
Requirements:
1. Cover normal flow, exception flow, and boundary values.
2. Use @pytest.mark.parametrize.
3. Include Chinese comments.
4. Follow team naming convention: test_ prefix.
Function code:
{code}
"""

@tool
def generate_test_cases(code_snippet: str) -> str:
    """Generate test cases from a code snippet"""
    from langchain_ollama import OllamaLLM
    llm = OllamaLLM(model="qwen:7b")
    prompt = TEST_PROMPT_TEMPLATE.format(code=code_snippet)
    return llm.invoke(prompt)

4. Deploy and integrate

4.1 Start an HTTP service

# server.py
from flask import Flask, request, jsonify
from agent import test_agent
app = Flask(__name__)

@app.route('/ask', methods=['POST'])
def ask_agent():
    query = request.json['query']
    if "生成测试" in query or "写用例" in query:
        code = request.json.get('code', '')
        result = generate_test_cases(code)
    elif "allure" in query or "报告" in query:
        result = analyze_allure()
    else:
        response = test_agent.invoke({"input": query})
        result = response["output"]
    return jsonify({"answer": result})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

4.2 PyCharm integration via REST Client plugin

### Generate test case
POST http://localhost:8080/ask
Content-Type: application/json
{
  "query": "生成用户登录测试用例",
  "code": "def login(username, password): ..."
}

### Analyze defects
POST http://localhost:8080/ask
Content-Type: application/json
{
  "query": "支付超时有哪些历史案例？"
}

Effect validation

Scenario 1 – generate a compliant pytest case for calculate_discount:

import pytest

@pytest.mark.parametrize("user_level, expected_discount", [
    ("VIP", 0.8),          # VIP gets 20% off
    ("regular", 1.0),      # Regular users no discount
    ("", 1.0),             # Empty level treated as regular
])
def test_calculate_discount(user_level, expected_discount):
    """Test discount calculation for different user levels"""
    price = 100.0
    result = calculate_discount(price, user_level)
    assert result == price * expected_discount

Scenario 2 – analyze recent payment module failures:

DEF-1234: Third‑party gateway timeout (network spikes) – add 3 s retry.

DEF-1567: Over‑selling under concurrency – implement row lock + pre‑allocation.

Recommendation: introduce network‑delay chaos experiments in automated tests.

Pitfalls and optimization tips

Slow response: enable GPU acceleration for Ollama; shard the knowledge base for parallel retrieval.

Inaccurate answers: refine chunking strategy to split by function rather than fixed length; add post‑processing validation.

Memory overflow: limit vector store size; switch to FAISS for larger‑scale embeddings.

Key success factors: high‑quality knowledge injection, precise prompt design (e.g., "You are a financial testing expert"), and a feedback loop that records user ratings for continuous improvement.

Conclusion

Embedding a team’s testing expertise into an AI agent transforms implicit knowledge into reusable assets, delivering a 24/7 quality consultant, rapid onboarding for newcomers, and predictive defect‑prevention capabilities.

LangChain test automation AI Agent Qwen Ollama pytest Allure

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.