Backend Development 19 min read

Design and Implementation of an Automated JSON Diff Tool for API Regression Testing

This article details the design, algorithmic challenges, and implementation of an automated JSON diff tool used in Qunar's Hackathon to compare complex API responses, handle unordered arrays, ignore configurable nodes, and provide fast, accurate regression testing for large JSON payloads.

Qunar Tech Salon

Sep 27, 2018

Design and Implementation of an Automated JSON Diff Tool for API Regression Testing

The Qunar Hackathon team CodeCode tackled a real‑world regression testing problem by building an automated JSON diff tool that compares API responses, supports large (up to 1 MB) JSON objects, and allows selective node ignoring.

Problem definition : Regression testing often requires comparing new API results with baseline JSON data, handling nested objects, unordered arrays, and ignoring fields such as dynamic query IDs.

Design approach : The solution is split into a jsondiff‑util JAR for core diff logic and a jsondiff‑web WAR for UI and request handling. A syntax‑tree (AST) is built for each JSON, with each node storing key, value, depth, and feature vectors (objectKey, arrayKeyWeight, arrayValueWeight) to enable fast similarity calculations.

Key algorithms :

Pre‑process JSON into ordered TreeMap structures to normalize object key order.

Generate feature vectors by counting byte occurrences (256‑bucket histogram) for both keys and values.

Use weighted similarity (key weight = 30, value weight = 1) and a "same‑weight" marking scheme to quickly prune identical nodes.

For unordered arrays, apply a similarity‑based matching (weight‑based ranking) followed by dynamic programming to find minimal‑difference pairings.

Implementation highlights (excerpt from the Node class):

public class Node implements Serializable {</code>
<code>    private String key;</code>
<code>    private Object value;</code>
<code>    private String objectKey;</code>
<code>    private int[] arrayKeyWeight;</code>
<code>    private int[] arrayValueWeight;</code>
<code>    private int depth;</code>
<code>    private int arrayQuantity;</code>
<code>    private NodeEnum valueType;</code>
<code>    // getters, setters, and toString omitted for brevity</code>
<code>}

Performance : Building the syntax tree is O(n) where n is the number of JSON nodes; diffing is O(n) for identical nodes and O(n²·m) for unordered arrays (n, m are array lengths). Typical end‑to‑end response time is ~2 seconds, with network latency accounting for ~70 %.

Conclusion : By combining tree‑based representation, dimensionality reduction, weighted similarity, and pruning, the tool achieves a good balance between accuracy and efficiency, making it suitable for large‑scale API regression testing in production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java algorithm json diff tool

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.