Applying Data‑Driven Thinking to Page Performance Optimization
This article demonstrates how a data‑driven mindset can be used to identify, quantify, and solve performance bottlenecks in a hotel‑listing H5 page by defining clear metrics, collecting telemetry via Kafka/Hive, analyzing with SQL and Spark, automating Lighthouse audits with Puppeteer, and iterating on optimizations.
The author, a research director at Ctrip, introduces the concept of “data thinking” – a methodology that treats data as the core element for understanding problems, selecting analysis methods, and achieving project goals.
Data thinking is valuable because it provides an objective, quantifiable basis for prioritizing features, assessing incident severity, and measuring user‑experience improvements, especially when resources are limited.
While data scientists and analysts traditionally handle data work, the article argues that every role – developers, testers, product managers – should adopt data thinking, even without deep statistical expertise.
Process Overview
The author walks through a concrete case: optimizing the performance of a hotel‑listing page (H5, first‑screen list). The steps include defining the problem and goal, selecting quantifiable metrics, collecting data, analyzing results, implementing solutions, and iterating.
Step 1 – Define Problem & Goal
Problems: the page is slow and resources for improvement are scarce. Goal: improve user experience with minimal R&D effort on a specific scenario (business‑travel hotel channel, H5 list page, first page).
Step 2 – Choose Metrics
Four custom metrics are defined:
Self‑collected TTI (Time to Interactive) – 95th percentile latency.
Self‑collected FMP (First Meaningful Paint) – 95th percentile latency.
Self‑collected BFF latency – 95th percentile latency of the backend‑for‑frontend API.
Google Lighthouse score – a standardized performance score.
Data collection relies on client‑side instrumentation that sends JSON payloads to Kafka, which are stored in a Hive table edw_corp_frontend_tracelog:
{
"key": "o_corp_htl_performance",
"ts": 1630653123093,
"userId": "012345",
"pageId": "1234567890",
"sessionId": 2940023,
"data": "{\"name\":\"customTTI\",\"latency\":1729,\"startTime\":1630653123093 }"
}Hive DDL for the table:
CREATE TABLE `edw_corp_frontend_tracelog`(
`ts` bigint COMMENT '访问时间戳',
`key` string COMMENT '埋点Key',
`userId` string COMMENT '用户ID',
`pageId` string COMMENT '页面ID',
`sessionId` string COMMENT 'Session ID',
`data` string COMMENT '自定义埋点内容'
) COMMENT '客户端页面自定义埋点表'
PARTITIONED BY (`d` string COMMENT 'date')
STORED AS ORC;Aggregating daily TTI:
SELECT d,
avg(latency) as avg_latency,
percentile(latency, 0.5) AS P50_latency,
percentile(latency, 0.95) AS P95_latency
FROM (
SELECT d, userId, pageId,
CAST(get_json_object(`data`, '$.latency') AS bigint) AS latency
FROM mytracedb.edw_corp_frontend_tracelog
WHERE d > '2021-01-01' AND get_json_object(`data`, '$.name')='customTTI'
) dx
GROUP BY d;For Lighthouse automation, a Docker image is built with Node.js, Puppeteer, and the Lighthouse npm package:
FROM centos
MAINTAINER by Graviton
RUN curl -sL https://rpm.nodesource.com/setup_12.x | bash -
RUN yum -y install nodejs at-spi2-atk libdrm libxkbcommon libXcomposite libXdamage libXrandr libgbm pango alsa-lib-devel
RUN mkdir -p /home/lighthouse
WORKDIR /home/lighthouse
RUN npm install --save puppeteer lighthouse
RUN npm install --save-dev esm
COPY lighthouse-util.js /home/lighthouseThe helper script lighthouse-util.js provides functions to launch a headless Chrome and run Lighthouse on a target URL:
import puppeteer from "puppeteer";
import lighthouse from "lighthouse";
export function initBrowser() {
return puppeteer.launch({ args: ["--no-sandbox"] });
}
export function audit(browser, targetUrl, options = { output: "json" }) {
const endpoint = browser.wsEndpoint();
const url = new URL(endpoint);
return lighthouse(targetUrl, Object.assign({}, { port: url.port }, options));
}An example audit script ( audit.js) logs in, navigates to the target page, runs Lighthouse, and stores the JSON report:
import { initBrowser, audit } from "/home/lighthouse/lighthouse-util.js";
import fs from "fs";
import puppeteer from "puppeteer";
(async () => {
const iPhone = puppeteer.devices['iPhone X'];
const usr = '<username>';
const pwd = '<password>';
const browser = await initBrowser();
const page = await browser.newPage();
await page.emulate(iPhone);
await page.goto("https://mydomain.com/my_login_page");
await page.type('input.accout', usr);
await page.type('input.pwd', pwd);
await page.click('button.login');
await page.waitForNavigation();
await page.goto("https://mydomain.com/homepage");
await page.click('button.next');
await page.waitForTimeout(3000);
const targetUrl = page.url();
console.log("ready to audit:", targetUrl);
const result = await audit(browser, targetUrl, { output: "json", onlyCategories: ["performance"] });
fs.writeFileSync("report.json", result.report, "utf-8");
await browser.close();
})().catch(console.error).then(() => { console.log("done!"); process.exit(0); });SQL scripts are used to load the Lighthouse JSON into Hive and query it via Spark‑SQL.
Analysis of the collected metrics revealed that the 95th‑percentile TTI (~4.8 s) was dominated by FMP (~2.7 s), indicating that front‑end rendering was the primary bottleneck. Lighthouse’s FCP score further confirmed the gap compared with a benchmark hotel‑listing page.
Based on these insights, the team applied several optimizations: server‑side rendering (SSR), JavaScript bundle splitting, BFF pre‑loading, and cache tuning. The metrics showed consistent reductions in TTI and FMP, and a 500 ms advantage during peak traffic for the pre‑loaded BFF path.
Finally, the article stresses the importance of continuous iteration: define quantitative targets (e.g., Lighthouse FCP > 75), monitor real‑time dashboards in Grafana, and keep expanding instrumentation to cover new scenarios.
In summary, the piece illustrates that data‑driven thinking is not about complex algorithms but about establishing a repeatable workflow: define goals, instrument, collect, analyze, act, and iterate, which can dramatically improve performance with limited resources.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
