Design and Practices of a Data‑Driven OCR Testing System
The article describes Laiye's shift to a data‑driven deep‑learning workflow and presents the design, macro‑ and micro‑analysis features, visual diff tools, distributed tracing, and code examples of their OCR testing system that accelerate model evaluation and iterative optimization.
In industry, deep learning is moving from a model‑centric to a data‑driven paradigm; model capacity is usually sufficient, and continuously adding training data improves performance, enabling rapid effect gains with limited engineering resources.
Tesla's Autopilot data‑engine framework is a classic example of this data‑driven approach.
Laiye has also been transitioning to a data‑driven mode, which requires strong integration of data, algorithms, models, compute, inference, and testing; the testing system is crucial for determining production readiness and guiding model optimization.
This article briefly outlines the key design concepts and practices of Laiye's OCR testing system.
Overview
The data‑driven workflow and core testing system requirements are illustrated in the following diagram:
Solid lines in the flowchart indicate steps executed for every model training; dashed lines indicate optional steps. The testing system appears as two purple diamonds in the lower‑left corner.
Background : Laiye already has many OCR capabilities; this article focuses on generic OCR, which evaluates character‑level F1 score. Over 80 scenario‑specific sub‑test sets are defined to ensure comprehensive coverage.
Macro Analysis
The system provides three main macro functions: metric view and filtering, metric trend viewing, and metric comparison.
1. Metric view and filtering : Users can inspect core quantitative metrics for a test run, sort by poor‑performing sub‑sets, and trace back to model code version, training data, and version identifiers.
2. Metric trend viewing : Since not every sub‑set improves after each training, the system monitors trend changes to detect large variances that may indicate insufficient samples or conflicting data, helping identify the cause of high metric variance.
3. Metric comparison : Users can lock (pin) a version—production, test, competitor, or lightweight CPU model—and compare it with any other version. Differences are sorted by selected evaluation metrics, highlighting sub‑sets with significant changes; for example, five sub‑sets showed performance decline in the latest optimization.
Micro Analysis
When macro metrics reveal problematic sub‑sets, micro analysis pinpoints specific errors to guide data augmentation.
1. Annotation Diff : A visual diff page shows end‑to‑end model output versus ground truth, highlighting mismatches with colored bounding boxes. Clicking a box expands the discrepancy details, e.g., missing punctuation in handwritten text.
2. Historical Version Diff : Similar to the annotation diff, this compares the current model with a previous version to identify regressions, such as reduced exposure augmentation causing failures on bright scenes.
3. Intermediate Result Trace : A complete OCR product consists of multiple models (angle detection, text detection, recognition, semantic correction) and associated pre/post‑processing. Laiye's generic OCR runs as six microservices plus a TensorFlow‑Serving service, communicating via gRPC. Distributed tracing (OpenTracing) records intermediate results when a request is sampled, enabling rapid pinpointing of the failing stage.
Go example for logging images in a trace:
Go
// When forced sampling, this method takes effect
values := make(map[string][]byte)
for idx, jpg := range in.JpegImgs {
values[resp.Items[idx].Content] = jpg
}
// values is a map[string][]byte where the key is the image annotation text
// Recommend using JSON for structured logging
// "ocr-text-rec" is the trace method name; images are stored under this method's folder
// Multiple images are uploaded in parallel
tracing.LogImages(ctx, "ocr-text-rec", values)Python example for generating the required header:
Python
def get_header():
headers = {}
material = hashlib.md5(str(time.time_ns()).encode("UTF-8")).hexdigest()
headers["x-b3-img-sampled-id"] = material
return headers
requests.post(host, json.dumps(req), headers=get_header())Sample trace result image:
By analyzing the trace group "ocr-text-rec", engineers can quickly see which microservice produced an error, such as incomplete text detection.
Conclusion
The developed testing system ensures continuous, robust improvement of production models, provides efficient tools for engineers to locate issues, and greatly accelerates internal model iteration; future work will continue to share data‑driven and MLOps experiences.
Laiye Technology Team
Official account of Laiye Technology, featuring its best tech innovations, practical implementations, and cutting‑edge industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
