Design and Management of a Unified Error Code System for Distributed Backend Services
This article explains the concept, benefits, and implementation details of a unified error‑code management platform, covering code allocation, handling strategies in microservice architectures, API security considerations, and performance optimizations to reduce development friction and improve system reliability.
Preface
During internal discussions about error‑code disagreements between backend and frontend teams, I discovered significant differences in understanding, prompting this article to share my insights on error codes and suggestions for handling them.
What Is an Error Code?
An error code can be defined as a set of numbers (or alphanumeric strings) that are linked to error messages to identify various exceptional situations in a system.
What can error codes bring us?
First, they help us identify what problem occurred in the system.
Second, they indicate which subsystem or service caused the problem.
Finally, they enable us to decide what information should be shown to the customer.
To achieve these goals, we need constraints on naming and using error codes. In Hujiang's API specification, each product line is assigned a distinct error‑code range, and each internal system within a product line should also have its own sub‑range to avoid duplication.
For example, the range assigned to my product line is -0x1400000 to -0x14FFFFF , which corresponds to the decimal range -20,971,520 to -22,020,095 . I allocate the left four digits 2098~2201 as the system code, and the right four digits are divided as follows: 0000~3999 for business exceptions, 9000~9999 for system‑level exceptions, with the remaining numbers reserved for future expansion.
When a new system is created, the system owner must pre‑apply for information such as HTTP port, Dubbo port, system code, and JMX port. These values are allocated in advance, and duplicate ports are prohibited. System owners can then claim desirable ports (e.g., 8888) to avoid conflicts that could cause deployment failures or error‑code collisions.
Error‑Code Handling Approach
In a distributed microservice architecture, a single transaction may involve 7–8 services, each with its own error code. When a downstream service returns a failure with an error code, the caller should propagate the error code and context upward, allowing the top‑level request handler to process it.
Within a single application, we typically have layered architecture such as Controller, Service, DAO. Exceptions can be thrown at any layer, but they are usually caught and handled uniformly at the entry point, which offers several benefits:
Comprehensive stack information can be collected.
Exception handling can be encapsulated at the framework level.
Code becomes much cleaner.
Anyone who dislikes cluttered code will cringe at seeing try&catch blocks scattered throughout the codebase.
Relationship Between Error Codes and HTTP Codes
Hujiang uses RESTful‑style HTTP communication between systems, and the API specification defines constraints between HTTP status codes and error codes. I argue that HTTP codes should be decoupled from business error codes; the latter should be abstracted as a common field in the response body.
Project Background
There has been a long‑standing disagreement between frontend and backend teams on how to handle error codes. Some teams let the frontend display the backend‑provided description directly, others perform the conversion in a middle‑platform, and some maintain the mapping on the frontend.
If the backend description is authoritative, any UI or product change that requires a different message forces a backend release, and the backend cannot provide different messages for the same error code on different pages.
Using a middle‑platform for conversion burdens the platform team because any backend change may require platform code updates.
Maintaining the mapping on the frontend is common but still suffers from the same pain points when product or interaction teams need to modify messages or when new error codes are added.
The lack of a unified error‑code handling approach increases communication overhead and development cost, which this article aims to address with a standardized solution.
Proposed Solution
Optimized Error‑Code Processing Flow
When interaction or product teams need to adjust an error‑code description, they no longer need to coordinate with the technical team; they can configure the description directly through the error‑code operation platform.
After configuration, the backend stores the mapping in cache and database, and exposes it via an error‑code API for external pages to consume.
Error‑Code Domain Model
Because different product managers and designers may want different messages for the same error code on different pages, the domain model was designed to accommodate multiple dimensions:
The front‑end can retrieve an error‑code description by providing the following parameters:
Organization (e.g., online school, CC, tools, Xuejin…)
Scenario (specific page)
Locale (Chinese, English, Japanese)
This multi‑dimensional design allows each business line and each page to have tailored messages for customers.
Error‑Code API Security Considerations
Initially I was concerned about the security of the API. After discussions with the security team, we concluded that because error codes and their descriptions are intended for customers, they do not pose a significant security risk. The security design therefore includes:
Filtering through the security department’s protection layer.
Black‑ and white‑list controls on the error‑code service itself.
Other Technical Details
From an implementation perspective, building the error‑code platform is straightforward:
Performance: We use Codis (with degradation) plus MyBatis caching (5‑minute refresh) to reduce database load.
Reliability: If the front‑end query parameters are incorrect or a product/interaction team forgets to configure a mapping, the platform returns a default message to avoid abrupt prompts.
Scalability: The service is stateless and runs on Docker containers managed by Hujiang OCS, providing load monitoring and auto‑scaling to handle varying traffic.
These experiences and thoughts aim to provide guidance on unified error‑code management and improve overall development efficiency.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.