How We Built a Full‑Chain Rendering Debug Platform to Cut Debug Time by 30%
This case study describes how a rendering middle‑platform team tackled complex micro‑service chains, inconsistent data rules, and massive 3D data to create a full‑link location and visualization platform that enables self‑service debugging, reduces developer effort by over 30 %, and serves more than 170 monthly users.
Background
From a business perspective, non‑rendering teams view “rendering” as an independent tool, but in reality it is the rendering output service that spans the entire tool chain, causing all rendering issues to be routed to the rendering middle‑platform and burdening a single team.
Under a micro‑service architecture the service mesh is long and complex, with dozens of services involved; a single rendering task can generate more than 500 calls.
The rendering chain contains complex 3D graphics logic, large data structures (≈1 MB per task) and specialized domain knowledge.
Historically the rendering middle‑platform lacked fast problem‑location tools, forcing developers to perform repetitive debugging.
To improve efficiency, the team built a full‑chain rendering location platform.
Challenges
Key difficulties include:
How to expose detailed internal logic to non‑developers?
How to interpret complex data structures without domain expertise?
Existing architecture provides weak support for a systematic location platform.
How to enable business users and support staff to self‑diagnose issues without overloading developers?
Platformization Process
The system is divided into four modules: rendering tasks, album queries, rendering results, and rendering rights.
The most complex feature is full‑chain data query. Rendering failures are categorized into three types: missing assets, task failure/hang, and result mismatch.
Inconsistent naming rules across business parties make it hard to locate missing asset IDs.
Different TaskId conversion rules cause one‑to‑many mappings.
No fixed error signature for visual issues because causes vary.
Incomplete or irregular logs hinder automated tracing.
Given these constraints, the team abandoned one‑click tracing and built a link‑level location tool that records key information at each node for manual analysis.
Link Decomposition
The rendering workflow is broken down into key nodes: front‑end entry, rendering middle‑platform, business side (hard‑decoration, custom, DIY), Mesh middle‑platform, and rendering back‑end. By tracking input/output at each node, the source of a problem can be identified.
Location Model
Each rendering request is split into sub‑tasks (scene preprocessing, model package request, data merge/clean). Sub‑tasks become nodes in a flow tree, each containing request data, output data, description, and status. Complex processing is represented as a pipeline of stages.
Visualization
To make the tree viewable, the team used AntV‑G6’s Combo node grouping to display nested pipelines.
Three visualization standards were followed:
Expose the entire rendering chain with detailed information.
Design the UI for non‑technical users, keeping entry points shallow and data semantics clear.
Graphically render complex data in 2D/3D.
In the main UI, entering a TaskId shows a colored tree; node colors indicate status, and clicking a node reveals:
Basic info (request data, logs, queues).
Middle‑platform detailed stages.
Back‑end task chain (status, cluster info, engine parameters).
Because 3D scene data is huge, a 3D reconstruction of the house layout is provided, showing positions, lighting, and camera. A re‑render tool lets users select data subsets and trigger a new render for different business scenarios.
Results
After launch, the platform reached over 170 monthly active users and 11,500 page views. More than 10 business units can now quickly locate issues. 80 % of rendering problems are now diagnosed via the platform, and developer debug time dropped by at least 30 %.
Summary & Planning
Experience Summary
Even with extensive visualization, the platform has a learning curve; detailed guides and training were provided.
Consider location capability early in new business design for later integration.
Data for locating is stored in OSS with reasonable TTL; switches allow toggling storage to control cost.
Implementing location features requires careful iteration and regression testing.
Reuse existing capabilities and aim for modular, middle‑platform design.
Future Planning
Continue building the location system, improve modules for business users, tackle hard rendering‑effect issues, optimize middle‑platform architecture, and eventually integrate one‑click tracing. Integration with ticketing and customer‑service systems is planned to pre‑classify issues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
