Big Data 11 min read

How Meizu Built an Agile Big Data Platform for Millions of Users

The Meizu Tech Open Day showcased the company's rapid evolution to a data‑driven mobile internet firm, detailing its DW1.0 and DW2.0 data‑warehouse architectures, recommendation pipelines, Spark adoption, and ELK‑based log analytics, while sharing practical lessons and future challenges.

ITPUB

Jan 20, 2016

How Meizu Built an Agile Big Data Platform for Millions of Users

Meizu Data Warehouse Architecture (DW1.0)

DW1.0 integrates website logs, ERP data, and real‑time messages. It is organized into four layers: data source, ingestion, warehouse, and application. The ingestion layer uses AnyLoader for batch loading of files and databases, and AnyStream for near‑real‑time ingestion of NoSQL streams. The platform supports a data development environment and multiple data product libraries.

DW2.0 Roadmap

Goals include improving data‑product user experience, addressing core business pain points, delivering role‑specific personalized data products, and consolidating processes and integrations.

Best‑Practice Recommendations

Avoid non‑standard business designs and inconsistent data definitions.

Ensure reliable data sources and complete SDK tracking.

Prevent repeated platform migrations, maintain a unified data portal, and use consistent visualization tools.

Alibaba Recommendation Technology

Offline pipeline consists of four modules:

I2I similarity – collaborative filtering, co‑occurrence, purchase‑probability, log‑odds‑ratio, MutualInfo algorithms.

I2I pairing – compute category‑level relationships and map to item‑level pairings.

C2I – rank high‑quality items per leaf category using sales, CTR, quality metrics.

U2I – model long‑term user preferences across channels, consider purchase cycles, and filter already purchased similar items.

Real‑time Pipeline framework provides systematic componentization, unified interfaces, centralized logging for debugging, and full‑link data analysis. Key steps:

Source retrieval – collect search terms, clicks, purchase history, UGC, etc.

Source arbitration – rank sources for relevance, diversity, and context match.

Candidate set recall – multi‑channel recall (item‑item, user‑user, feature‑based, novelty, hot items).

Filtering – apply scenario‑specific filters.

Scoring and ranking – online fine‑grained ranking with real‑time models.

Post‑processing – handle diversity, novelty, business metrics, pagination, and explanation generation.

Baidu Spark Deployment

Spark is used for both batch and streaming workloads. Key advantages:

In‑memory processing reduces disk I/O and enables caching of intermediate and final results.

RDD lineage provides fault recovery without external checkpoints.

Rich transformation APIs simplify complex data pipelines.

Spark Streaming and Spark SQL allow seamless integration of real‑time and batch processing.

Meizu Log‑Analysis Platform

Daily log volume reaches hundreds of GB to several TB. The ELK stack (Elasticsearch, Logstash, Kibana) was selected for its stability, horizontal scalability, ease of use, and real‑time capabilities.

Elasticsearch stores distributed log data and supports fast search.

Logstash normalizes heterogeneous log formats.

Kibana provides a unified query UI for troubleshooting.

Meizu operations architect Lin Zhonghong

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data recommendation system Data Warehouse ELK Spark Data Architecture

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.