How Meituan Overcame Performance Bottlenecks with a Custom Analysis Framework
This article details Meituan's 2014 presentation on building a performance analysis framework and monitoring platform, describing the bottlenecks they faced, the metrics they collected, the insights gained, and the tangible improvements achieved across front‑end and back‑end processing.
What Was Covered?
The importance of performance is obvious, but this talk deliberately avoids generic best‑practice tips and instead focuses on process‑level metrics that help engineers pinpoint where optimization effort is needed.
What Bottlenecks Were Encountered?
Initial work included simple result‑oriented data collection (full load time, DOMReady) and aggressive adoption of best‑practice techniques such as async loading, static assets, lazy loading, and big renders. Because only result metrics were available, decisions were largely based on external experience rather than concrete application data, leading to three main bottlenecks:
1. Best‑practice “fuel” runs out quickly; generic optimizations have limited impact. 2. Without internal metrics, abnormal performance spikes are hard to diagnose. 3. Lack of fine‑grained insight prevents discovery of deeper optimization opportunities.
How the Bottlenecks Were Overcome?
By asking what exactly is being optimized—document generation, resource loading, rendering, or overall user experience—the team built a detailed waterfall analysis of Meituan’s project detail page.
The page resources were divided into main document, critical CSS & first‑screen images, critical JS, and other assets (sprites, analytics scripts). The waterfall visualizes each stage—from DNS lookup, TCP handshake, document request, to resource download and rendering.
According to the "High Performance Site" guidelines, overall load time splits into network time (≈10%), backend time (≈20%), and front‑end time (≈70‑80%). Front‑end resource loading order and concurrency are therefore the biggest performance levers.
How to Control Performance?
Using the analysis framework, the team injected lightweight, non‑intrusive statistical scripts that collect multi‑dimensional, real‑time process metrics without affecting page performance.
Data collection requirements:
Capture metrics per page, geographic region, and browser.
Provide real‑time visibility of each stage.
The scripts must:
Avoid invasive changes to business code. Never degrade the measured page’s performance.
Key data sources include:
Navigation Timing API for main document load.
Resource Timing API for static asset load.
msFirstPaint (IE) or loadTimes (Chrome) for first‑paint timing.
Backend instrumentation for document generation time.
Implementation steps:
Provide a cacheable endpoint before the main document loads. Inject the data‑collection script after the main document load. Use Navigation Timing to compute the metrics shown above. Tag each data point with page, location, and browser identifiers.
For static resources, a similar breakdown is collected (see image). When using a CDN, ensure the CDN adds the Timing-Allow-Origin header to expose timing data.
What Were the Real Results?
With comprehensive data, the team could evaluate optimizations more precisely.
Was Flush Early Effective?
Disabling Flush Early increased the time to first byte by ~100 ms but reduced total document transfer time by ~150 ms. However, first‑paint time grew by >300 ms, showing that some optimizations may look neutral on aggregate metrics but have clear impact when examined in detail.
Discovering New Optimization Points
Analyzing document generation revealed that 30 % of time was spent interacting with cache. Optimizing the cache layer cut backend time dramatically, reducing cache share to <10 %.
Frequent deployments (≈50 per day) caused slight performance regressions each time. Merging many JS files meant that a change to a single module forced the whole bundle to be re‑downloaded, highlighting the need for better module splitting and loading strategies.
Performance Monitoring Platform
Beyond fixing immediate bottlenecks, the team built a company‑wide performance monitoring platform that aggregates any metric across any dimension in real time, serving over 20 internal systems.
The platform provides dashboards for each project and simple data‑analysis tools, enabling engineers to drive performance improvements with data.
Takeaways
1. A thorough analysis framework exposes hidden performance issues. 2. Focusing solely on result metrics limits insight; process metrics reveal deeper optimization opportunities. 3. Solving a class of problems yields reusable tools and benefits multiple teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
