How Meizu Built an Agile Big Data Platform for Millions of Users
The Meizu Tech Open Day showcased the company's rapid evolution to a data‑driven mobile internet firm, detailing its DW1.0 and DW2.0 data‑warehouse architectures, recommendation pipelines, Spark adoption, and ELK‑based log analytics, while sharing practical lessons and future challenges.
Meizu Data Warehouse Architecture (DW1.0)
DW1.0 integrates website logs, ERP data, and real‑time messages. It is organized into four layers: data source, ingestion, warehouse, and application. The ingestion layer uses AnyLoader for batch loading of files and databases, and AnyStream for near‑real‑time ingestion of NoSQL streams. The platform supports a data development environment and multiple data product libraries.
DW2.0 Roadmap
Goals include improving data‑product user experience, addressing core business pain points, delivering role‑specific personalized data products, and consolidating processes and integrations.
Best‑Practice Recommendations
Avoid non‑standard business designs and inconsistent data definitions.
Ensure reliable data sources and complete SDK tracking.
Prevent repeated platform migrations, maintain a unified data portal, and use consistent visualization tools.
Alibaba Recommendation Technology
Offline pipeline consists of four modules:
I2I similarity – collaborative filtering, co‑occurrence, purchase‑probability, log‑odds‑ratio, MutualInfo algorithms.
I2I pairing – compute category‑level relationships and map to item‑level pairings.
C2I – rank high‑quality items per leaf category using sales, CTR, quality metrics.
U2I – model long‑term user preferences across channels, consider purchase cycles, and filter already purchased similar items.
Real‑time Pipeline framework provides systematic componentization, unified interfaces, centralized logging for debugging, and full‑link data analysis. Key steps:
Source retrieval – collect search terms, clicks, purchase history, UGC, etc.
Source arbitration – rank sources for relevance, diversity, and context match.
Candidate set recall – multi‑channel recall (item‑item, user‑user, feature‑based, novelty, hot items).
Filtering – apply scenario‑specific filters.
Scoring and ranking – online fine‑grained ranking with real‑time models.
Post‑processing – handle diversity, novelty, business metrics, pagination, and explanation generation.
Baidu Spark Deployment
Spark is used for both batch and streaming workloads. Key advantages:
In‑memory processing reduces disk I/O and enables caching of intermediate and final results.
RDD lineage provides fault recovery without external checkpoints.
Rich transformation APIs simplify complex data pipelines.
Spark Streaming and Spark SQL allow seamless integration of real‑time and batch processing.
Meizu Log‑Analysis Platform
Daily log volume reaches hundreds of GB to several TB. The ELK stack (Elasticsearch, Logstash, Kibana) was selected for its stability, horizontal scalability, ease of use, and real‑time capabilities.
Elasticsearch stores distributed log data and supports fast search.
Logstash normalizes heterogeneous log formats.
Kibana provides a unified query UI for troubleshooting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
