How OPPO’s Hybrid Cloud Transforms Big Data at Scale
OPPO’s big‑data platform adopts a hybrid‑cloud architecture that combines on‑premise IDC resources with public‑cloud elasticity, addressing massive data volumes, cost, security, and vendor lock‑in while delivering higher resource utilization, stability, and autonomous evolution for future workloads.
01 Facing the Future: Hybrid Cloud for Big Data Infrastructure
To meet the growing demand for compute‑storage capacity, OPPO’s big‑data team chose a hybrid‑cloud approach, merging the stability of on‑premise IDC with the elasticity of public cloud, enabling a resilient and scalable platform.
The hybrid architecture transforms a massive, complex big‑data system into a highly elastic environment, handling nearly a million offline tasks and diverse system dependencies, far beyond a simple migration.
02 Big Data Hybrid Cloud Is More Than a Technical Issue
Hybrid cloud leverages the strengths of both public cloud and IDC, requiring careful consideration of data and compute migration.
Key technical challenges include:
Massive data and task migration to the cloud.
Construction of cloud‑based big‑data infrastructure.
Hybrid‑cloud compute‑storage scheduling capabilities.
These factors determine migration speed, stability, and the ability to switch compute resources seamlessly between cloud and on‑premise environments.
03 Prerequisite for Hybrid Cloud – Migration
Successful migration of a large‑scale, complex big‑data platform demands cross‑team collaboration; OPPO completed its migration in eight months, a quarter ahead of schedule.
Three core questions to address before migration:
Data security – high‑sensitivity data must be encrypted and does not include user data; major cloud providers hold industry‑recognized security certifications.
Public‑cloud cost – effective use of elastic compute and object storage can reduce costs, and hybrid models further optimize expenses.
Vendor lock‑in – hybrid cloud mitigates binding to a single provider and facilitates future migrations.
Answering these concerns aligns stakeholders and accelerates migration.
04 Continuous Innovation in OPPO’s Hybrid Cloud Architecture
After migration, OPPO focuses on improving speed, stability, cost efficiency, and autonomy.
The architecture relies on Kubernetes as the compute foundation, OSS and HDFS for storage, and open‑source components such as YARN, Spark, and Flink. Proprietary components—HBO, Curvine Cache, and MCN—play critical roles:
HBO (History Based Optimizer) : Optimizes task parameters based on historical runs, boosting execution efficiency.
Curvine Cache : A Rust‑based high‑performance distributed cache that alleviates I/O bottlenecks, now open‑source.
MCN : A metadata routing layer built on HDFS NameNode, enabling transparent integration with cloud object storage.
These components deliver four major benefits:
Resource savings – HBO’s dynamic tuning raises CPU utilization to ~80%.
Higher stability – Curvine’s fast read/write replaces Spark Shuffle’s hotspot issues.
Faster execution – Cache layer accelerates data access and reduces OSS request costs.
Greater autonomy – Containerized, cloud‑native design preserves control over storage technologies.
By adapting HDFS NameNode to support multiple object stores, OPPO achieves transparent data migration while retaining HDFS performance and availability.
05 Hybrid Cloud as a Starting Point for Future Architectures
OPPO’s hybrid‑cloud migration represents a foundational upgrade, delivering faster scheduling, higher resource efficiency, and better observability. The shift underscores that modern enterprise technology cores lie in the seamless composition and evolution of systems rather than any single component.
Key success factors include mature public‑cloud services that now approach or surpass IDC cost levels, and OPPO’s embrace of cloud elasticity to achieve lightweight, cost‑effective big‑data operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
