Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization
Qiniu Cloud transformed its high‑traffic data processing platform by containerizing services with Docker, addressing challenges such as massive request volume, CPU‑intensive workloads, IO bottlenecks, and burst traffic through architecture evolution, queueing, rate limiting, auto‑scaling, and secure, isolated custom processing pipelines.
Background and Data Processing Types
Qiniu Cloud offers three data processing modes: official processing (built‑in image, audio, video services), custom processing (users upload private processing services that integrate with Qiniu storage), and third‑party processing (open platform for services like image moderation, face detection, translation, TTS, etc.).
All modes are invoked via a URL that encodes the source file, a processing command (e.g., Facecrop), and parameters such as output size.
Challenges of Official Data Processing
Huge request volume – billions of requests per day, with expected multi‑fold growth.
Frequent traffic bursts when customers migrate large data sets.
CPU‑intensive workloads (image/video transcoding) requiring many cores.
Heavy I/O – frequent disk and network reads to fetch source files.
Architecture Evolution for Official Processing
Early architecture used a single FopGate that statically configured workers on each node. Adding workers required gateway reloads, and control and data traffic shared the same path, causing overload.
The newer design introduces a Discovery service that collects capabilities from Agent and Worker nodes. Gateways query Discovery to route requests to appropriate agents, which handle downloading files directly from storage, thus removing data flow from the gateway.
Mitigations for Official Processing
System Measurement : Benchmark FopGate capacity, resource usage patterns of each processing type, and per‑instance limits to size CPU threads and concurrency.
Queueing : Add a per‑node queue to avoid overloading a single instance; queue length can differentiate free vs. paid users.
Rate Limiting : Limit concurrent HTTP connections, per‑user request counts, and command‑specific counts to protect against long‑lived connections and burst‑induced queue overflow.
IO‑CPU Coordination : Co‑locate download and compute on the same machine; optionally mount a RAM‑based filesystem to reduce disk I/O.
Custom Data Processing Challenges
Security and isolation – user‑provided programs must not access other resources.
Uncertain workload scale – need elastic capacity.
Docker was chosen in 2014 to meet security, isolation, and scalability requirements.
Custom Processing Workflow
Register a custom UFOP (User‑Defined Function) via qufopctl reg ufop-demo -m 2.
Develop the processing program locally following the UFOP specification.
Submit the program as a tarball; the backend converts the UFOP description into a Dockerfile, builds a Docker image, and pushes it to a Docker registry.
Resize the service (e.g., qufopctl resize ufop-demo -n 3) to set the desired instance count.
Images illustrate each step (registration, build, deployment).
Build Pipeline Details
The build pipeline uploads the tarball to Kodo, forwards a build request to a builder, generates a Dockerfile from the UFOP description, builds the image, and stores it in a Docker registry. Common pitfalls include using Debian mirrors that timeout and Docker build cache misuse; the solution is to combine download, extraction, and cleanup into a single Dockerfile command.
Instance Management and Upgrade
Scaling Instances : Use qufopctl resize to add or remove containers.
Gray‑scale Upgrade : Deploy new version instances alongside old ones, then retire the old instances after they finish processing.
Warm‑up & Cool‑down : New instances need a warm‑up period for pools; old instances should be stopped with a graceful timeout (Docker stop --time) before SIGKILL.
Data Flow Architecture
Requests arrive at a front‑end, are routed by a scheduler to nodes running daemon containers. Each node has a Fetcher that downloads the source file locally (avoiding direct external download for privacy). A DiskCache cluster caches data. The V2 design adds an optional queue between instances to enable automatic scaling.
Automatic Scaling Mechanism
The system monitors queue length; when the average pending tasks per instance exceed a configured threshold, a Scaler component requests additional instances from the scheduler, which launches containers on suitable nodes and records the state in a Keeper service.
Solutions for Custom Processing Challenges
Security : Limit inter‑container network access by restricting port ranges.
Isolation : Enforce CPU and memory quotas per container.
Scalability : Provide a fast container scheduler, expose scaling APIs for manual adjustments, and use queue length‑based triggers for automatic scaling.
The article concludes with practical tips on instance warm‑up, graceful shutdown, and maintaining sufficient compute redundancy during upgrades.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
