Front-End Service Availability: Definition, Measurement, and Assurance Practices at Meituan-Dianping Checkout

The article outlines Meituan‑Dianping’s approach to front‑end service availability for its checkout system, defining availability across code, static resources, and network links, measuring failure duration, identifying typical bugs, and implementing a three‑stage assurance strategy using people processes, engineering tools, lightweight technology choices, and concrete practices such as TypeScript adoption, automated testing, health‑checks, DNS protection, and post‑incident monitoring.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Front-End Service Availability: Definition, Measurement, and Assurance Practices at Meituan-Dianping Checkout

This article, adapted from an InfoQ piece titled “Meituan-Dianping Checkout Front-End Availability Practice,” discusses how to define, measure, and ensure the availability of front‑end services.

Definition of Front‑End Service Availability

Availability is usually discussed for back‑end services, but any UI‑driven business also has front‑end availability concerns. It consists of three parts:

Front‑end code availability – test quality and online exceptions.

Static‑resource service availability.

Network‑link availability – DNS hijacking and network performance.

All problems a user may encounter on the UI belong to front‑end availability.

How to Measure Front‑End Availability

The measurement method mirrors back‑end metrics: focus on the duration of failures rather than the scope, aiming to maximize the availability ratio. Impact metrics such as affected users, orders, or GMV are used for incident classification, not for the availability calculation itself.

Typical Problem Areas

Front‑end code availability :

Null‑pointer errors caused by JavaScript’s weak typing.

Insufficient business‑logic coverage due to complex, branching workflows.

Compatibility issues across browsers, platforms, WebViews, and hybrid bridges.

Static‑resource service availability :

Stability of the static‑resource chain (NGINX, Node, etc.).

CDN problems such as SSL certificate chain failures or origin‑service outages.

Network‑link stability :

DNS hijacking by ISPs or malicious actors.

Assurance Strategy

The assurance process is divided into three stages – pre‑incident, during‑incident, and post‑incident – and three categories of measures:

Soft (people): process guarantees, standard enforcement, testing.

Hard (engineering tools): static code analysis, unit tests, web‑automation tests, CI pipelines, front‑end, business, service, and network monitoring.

Root (technology selection): simple, robust solutions that fit the scenario.

Technology Selection Example

For the checkout SPA, the team evaluated React, Angular, and Ember but chose to build a lightweight view‑lifecycle manager called Cyra, inspired by Cocoa’s view controller, because the application’s complexity did not justify heavyweight frameworks and maintainability was a priority.

Avoiding Core‑Link Bottlenecks

Three common first‑paint optimization methods are manual pre‑rendering, compile‑time pre‑rendering, and server‑side rendering (SSR). Although SSR offers the best performance, the team selected compile‑time pre‑rendering to avoid the reliability issues of a Node layer that sits in the critical path.

Concrete Implementations

Adopted TypeScript in 2015 to replace weak‑typed JavaScript, eliminating ~99% of null‑pointer crashes.

Developed an internal web‑automation testing tool “Freekite” to achieve high business‑logic coverage and provide visual case management, mock capabilities, and smart assertions.

Used an internal cloud‑testing platform to cover >95% of device models, ensuring compatibility on mobile browsers and WebViews.

Static‑resource health checks are handled by SRE‑managed NGINX health probes and CDN origin monitoring.

Mitigated DNS hijacking by enforcing HTTPS, disabling HTTP on critical domains, and migrating to fresh domains when necessary.

Post‑incident monitoring includes payment‑backend alerts, JS error aggregation, performance analytics, and network diagnostics (CAT).

Author

Yu Lin, front‑end technology expert at Meituan‑Dianping, responsible for the financial wallet and payment front‑end teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

frontendmonitoringSSRservice reliabilityAvailability
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.