Operations 9 min read

How LApiGateway Achieves 99.999% Uptime: Architecture, SLA & Risk Mitigation

LApiGateway, Huolala's internal micro‑service gateway, achieves five‑nine availability through a dual‑plane architecture, comprehensive monitoring, SLA definition, risk classification, heartbeat health checks, traffic migration strategies, strict change governance, and regular fault drills, all detailed in this technical overview.

Huolala Tech
Huolala Tech
Huolala Tech
How LApiGateway Achieves 99.999% Uptime: Architecture, SLA & Risk Mitigation

LApiGateway Overview

LApiGateway is the internal micro‑service gateway of Huolala, responsible for traffic forwarding and providing features such as authentication, rate‑limiting, parameter modification and validation to improve developer efficiency.

Architecture

The gateway consists of a control plane and a data plane.

Control Plane

Service configuration managed by LApi Management Platform and Apollo configuration center.

Service discovery via Consul, obtaining node registration info (IP, group, gray version, etc.).

Monitoring composed of Trace service and HLL Monitor for request monitoring and alarm.

Data Plane

Requests enter through load balancers (KONG, SLB), pass through LApi nodes where a series of plugins process them before being forwarded to downstream services. Plugins include account authentication (depends on Account Service) and SSO authentication (depends on SSO Service).

During data processing LApi may rely on:

Account Service – user authentication.

Kafka – persisting request‑generated messages.

Lone – publishing windows and service permission management.

SSO Service – employee authentication.

LApiGateway architecture diagram
LApiGateway architecture diagram

SLA Definition

The SLA is defined by the “availability percentage”, i.e., the success rate of proxy service requests within a calculation period, excluding failures caused by LApi itself. A calculation period is 5 minutes, and there are 105 120 periods per year.

Achieving five‑nines (99.999 %) means the total unavailable time in a year must be less than 5 minutes.

SLA calculation
SLA calculation

Challenges and Solutions

External Risks

Uncontrollable factors such as ECS instance failures, network jitter, or traffic attacks. Mitigation relies on rapid recovery and health‑check mechanisms.

Node heartbeat checks:

KONG TCP connection heartbeat (~9 s detection).

Consul heartbeat (~6 s detection).

In a 4‑node cluster, heartbeat checks raise the 5‑minute availability from 75 % to 99.25 % (KONG traffic) and 99.50 % (SOA traffic).

KONG traffic availability
KONG traffic availability
SOA traffic availability
SOA traffic availability

Cluster Faults

If more than half of the nodes fail, simply removing faulty nodes can worsen load and cause total outage. Traffic migration to a healthy cluster within minutes is required.

Migration steps:

Detect fault via LApi Management Platform and Consul service registry.

Shift traffic to a reserve cluster group with spare capacity.

Complete migration within 2–3 minutes (goal: <30 s with full automation).

Traffic migration diagram
Traffic migration diagram

Internal Risks

Mitigated through three measures:

Exception case protection – cataloguing system, application and third‑party component failure cases and their solutions.

Change governance – strict code‑review, regression testing, staged gray releases, and service‑integration procedures.

Daily operations – continuous health‑status monitoring, routing change notifications, and post‑change load verification.

Exception case protection diagram
Exception case protection diagram

Fault Drills

Regular drills simulate potential failures to uncover hidden issues. Past drill records are shown below.

Drill records
Drill records
Drill report
Drill report

Conclusion

Through continuous stability investment, LApiGateway has maintained a five‑nine availability over two years. Ongoing optimization will keep the platform reliable and provide users with a high‑quality service experience.

high availabilitySLArisk mitigationLApiGatewayMicroservice Gateway
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.