How OPPO Cloud Platform Implements Containerized Storage on Kubernetes – Key Insights
In a live Q&A, OPPO cloud platform senior backend engineer shares practical insights on hybrid deployment of offline jobs on Kubernetes, custom high‑performance monitoring with a distributed TSDB, storage scalability via CSI, pod readiness detection, and deploying Ceph/HDFS within the container cluster.
Q&A Highlights from OPPO Cloud Platform Storage Containerization Live Session
Q1: Any tips for hybrid deployment of offline tasks on an online Kubernetes cluster?
Our online transcoding jobs run on the online cluster; because offline tasks are CPU‑intensive, we use affinity rules to force pods onto different physical nodes and adjust resources based on the cluster monitoring dashboard.
Q2: What monitoring system do you use and have you made any improvements?
We built a custom monitoring solution centered on a distributed TSDB that can ingest tens of millions of metrics per second, allowing us to collect metrics every second instead of the typical 30‑second interval, which provides richer data for performance‑spike analysis.
Q3: How does your custom monitoring compare with Prometheus?
Our TSDB implements the Prometheus API, leveraging the open‑source ecosystem while offering a distributed version of Prometheus.
Q4: How do you handle storage scalability?
We see two aspects of scalability:
Storage cluster scalability – the cluster itself provides expansion and contraction mechanisms.
Volume scalability – CSI’s expand interface enables dynamic resizing of persistent volumes.
Q5: Any reliable method to determine pod readiness when updating versions?
We developed a custom controller that watches Kubernetes pod events and determines readiness based on those events.
Q6: Besides using StorageClass, PV/PVC, have you tried deploying Ceph, HDFS, or other storage clusters inside the container platform?
Yes, we have some data stored in Ceph. While StorageClass, PV, and PVC abstract distributed storage for Kubernetes, we also use FlexVolume to embed storage binaries directly into the cluster. FlexVolume is simpler but lacks a controller and offers weaker support for dynamic expansion and snapshots.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
