Build a Test‑Specific AI Agent to Auto‑Generate Pytest Cases and Analyze Allure Reports

This guide presents an end‑to‑end solution for creating a test‑focused AI agent that indexes project code and defect data, integrates a large language model via LangChain, generates compliant Pytest cases, parses Allure reports, and offers deployment tips for seamless PyCharm integration.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Build a Test‑Specific AI Agent to Auto‑Generate Pytest Cases and Analyze Allure Reports

Developers can empower their testing workflow by building a dedicated AI agent that understands project structure, historical defects, and testing standards, enabling automatic generation of high‑quality Pytest cases and intelligent analysis of Allure reports.

Why a test‑specific AI Agent?

Unlike generic AI assistants, a test‑oriented agent knows the tests/api and tests/ui directory conventions, can access private defect databases such as Jira, understands business rules like payment idempotency, and can parse Allure reports to provide actionable insights.

System Architecture

Core components include:

Knowledge base: project source code, defect data, and test documentation.

Vector engine: converts text to semantic vectors.

Large model: Qwen (high‑accuracy) or a local CodeLlama for privacy.

Agent router: selects the appropriate knowledge source based on query type.

PyCharm integration: seamless embedding within the IDE.

System architecture diagram
System architecture diagram

Step‑by‑step implementation

1. Build the knowledge base

1.1 Index project code

# index_code.py
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

def index_project_code(project_path: str, persist_dir: str):
    # Load all Python files
    loader = DirectoryLoader(project_path, glob="**/*.py")
    docs = loader.load()
    # Split by function/class while preserving context
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=600,
        chunk_overlap=100,
        separators=["

def ", "

class "]
    )
    splits = text_splitter.split_documents(docs)
    # Vectorize and store
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=OllamaEmbeddings(model="nomic-embed-text"),
        persist_directory=persist_dir
    )
    vectorstore.persist()
    print(f"Indexed {len(splits)} code chunks to {persist_dir}")

index_project_code("./my_project", "./vectorstores/code")

1.2 Build defect knowledge base

# index_defects.py
import pandas as pd
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

def index_defects(jira_csv: str, persist_dir: str):
    df = pd.read_csv(jira_csv)
    defect_texts = []
    for _, row in df.iterrows():
        text = f"""
        缺陷ID: {row['issue_key']}
        标题: {row['summary']}
        描述: {row['description']}
        根因: {row['root_cause']}
        解决方案: {row['resolution']}
        模块: {row['component']}
        """
        defect_texts.append(text)
    vectorstore = Chroma.from_texts(
        texts=defect_texts,
        embedding=OllamaEmbeddings(model="nomic-embed-text"),
        persist_directory=persist_dir
    )
    vectorstore.persist()

index_defects("jira_defects.csv", "./vectorstores/defects")

Data sources recommended:

Code: Git repository

Defects: Jira or Azure DevOps export

Documentation: Confluence or Markdown test specifications

2. Create the AI Agent core engine

2.1 Define tools

# tools.py
from langchain_core.tools import tool
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

@tool
def search_code(query: str) -> str:
    """Search the project code base and return relevant snippets"""
    vectorstore = Chroma(persist_directory="./vectorstores/code", embedding_function=OllamaEmbeddings(model="nomic-embed-text"))
    results = vectorstore.similarity_search(query, k=3)
    return "

".join([doc.page_content for doc in results])

@tool
def search_defects(query: str) -> str:
    """Search the historical defect library and return similar cases"""
    vectorstore = Chroma(persist_directory="./vectorstores/defects", embedding_function=OllamaEmbeddings(model="nomic-embed-text"))
    results = vectorstore.similarity_search(query, k=2)
    return "

".join([doc.page_content for doc in results])

2.2 Build the agent executor

# agent.py
from langchain import hub
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_ollama import OllamaLLM
from tools import search_code, search_defects

def create_test_agent():
    # Initialize the large model (can be swapped for Alibaba Cloud Baichuan API)
    llm = OllamaLLM(model="qwen:7b", temperature=0.2)
    # Load a prompt that supports function calling
    prompt = hub.pull("hwchase17/openai-functions-agent")
    tools = [search_code, search_defects]
    agent = create_tool_calling_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

# Global agent instance
test_agent = create_test_agent()

3. Add advanced capabilities

3.1 Allure report analysis

# allure_analyzer.py
import xml.etree.ElementTree as ET

def parse_allure_report(report_path: str) -> str:
    """Parse Allure report and extract failure patterns"""
    tree = ET.parse(f"{report_path}/widgets/summary.json")
    summary = tree.getroot()
    failed_tests = []
    for suite in summary.findall('.//testSuite'):
        if suite.find('statistic/failed').text != "0":
            suite_name = suite.get('name')
            failed_count = suite.find('statistic/failed').text
            failed_tests.append(f"{suite_name}: 失败 {failed_count} 个用例")
    return "近期失败热点:
" + "
".join(failed_tests[:5])

@tool
def analyze_allure() -> str:
    return parse_allure_report("./allure-report")

3.2 Test case generator

# test_generator.py
TEST_PROMPT_TEMPLATE = """
You are a senior test engineer. Generate pytest cases for the given function.
Requirements:
1. Cover normal flow, exception flow, and boundary values.
2. Use @pytest.mark.parametrize.
3. Include Chinese comments.
4. Follow team naming convention: test_ prefix.
Function code:
{code}
"""

@tool
def generate_test_cases(code_snippet: str) -> str:
    """Generate test cases from a code snippet"""
    from langchain_ollama import OllamaLLM
    llm = OllamaLLM(model="qwen:7b")
    prompt = TEST_PROMPT_TEMPLATE.format(code=code_snippet)
    return llm.invoke(prompt)

4. Deploy and integrate

4.1 Start an HTTP service

# server.py
from flask import Flask, request, jsonify
from agent import test_agent
app = Flask(__name__)

@app.route('/ask', methods=['POST'])
def ask_agent():
    query = request.json['query']
    if "生成测试" in query or "写用例" in query:
        code = request.json.get('code', '')
        result = generate_test_cases(code)
    elif "allure" in query or "报告" in query:
        result = analyze_allure()
    else:
        response = test_agent.invoke({"input": query})
        result = response["output"]
    return jsonify({"answer": result})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

4.2 PyCharm integration via REST Client plugin

### Generate test case
POST http://localhost:8080/ask
Content-Type: application/json
{
  "query": "生成用户登录测试用例",
  "code": "def login(username, password): ..."
}

### Analyze defects
POST http://localhost:8080/ask
Content-Type: application/json
{
  "query": "支付超时有哪些历史案例?"
}

Effect validation

Scenario 1 – generate a compliant pytest case for calculate_discount:

import pytest

@pytest.mark.parametrize("user_level, expected_discount", [
    ("VIP", 0.8),          # VIP gets 20% off
    ("regular", 1.0),      # Regular users no discount
    ("", 1.0),             # Empty level treated as regular
])
def test_calculate_discount(user_level, expected_discount):
    """Test discount calculation for different user levels"""
    price = 100.0
    result = calculate_discount(price, user_level)
    assert result == price * expected_discount

Scenario 2 – analyze recent payment module failures:

DEF-1234: Third‑party gateway timeout (network spikes) – add 3 s retry.

DEF-1567: Over‑selling under concurrency – implement row lock + pre‑allocation.

Recommendation: introduce network‑delay chaos experiments in automated tests.

Pitfalls and optimization tips

Slow response: enable GPU acceleration for Ollama; shard the knowledge base for parallel retrieval.

Inaccurate answers: refine chunking strategy to split by function rather than fixed length; add post‑processing validation.

Memory overflow: limit vector store size; switch to FAISS for larger‑scale embeddings.

Key success factors: high‑quality knowledge injection, precise prompt design (e.g., "You are a financial testing expert"), and a feedback loop that records user ratings for continuous improvement.

Conclusion

Embedding a team’s testing expertise into an AI agent transforms implicit knowledge into reusable assets, delivering a 24/7 quality consultant, rapid onboarding for newcomers, and predictive defect‑prevention capabilities.

LangChaintest automationAI AgentQwenOllamapytestAllure
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.