Cloud Native 20 min read

Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization

Qiniu Cloud transformed its high‑traffic data processing platform by containerizing services with Docker, addressing challenges such as massive request volume, CPU‑intensive workloads, IO bottlenecks, and burst traffic through architecture evolution, queueing, rate limiting, auto‑scaling, and secure, isolated custom processing pipelines.

dbaplus Community

Aug 9, 2016

Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization

Background and Data Processing Types

Qiniu Cloud offers three data processing modes: official processing (built‑in image, audio, video services), custom processing (users upload private processing services that integrate with Qiniu storage), and third‑party processing (open platform for services like image moderation, face detection, translation, TTS, etc.).

All modes are invoked via a URL that encodes the source file, a processing command (e.g., Facecrop), and parameters such as output size.

Challenges of Official Data Processing

Huge request volume – billions of requests per day, with expected multi‑fold growth.

Frequent traffic bursts when customers migrate large data sets.

CPU‑intensive workloads (image/video transcoding) requiring many cores.

Heavy I/O – frequent disk and network reads to fetch source files.

Architecture Evolution for Official Processing

Early architecture used a single FopGate that statically configured workers on each node. Adding workers required gateway reloads, and control and data traffic shared the same path, causing overload.

The newer design introduces a Discovery service that collects capabilities from Agent and Worker nodes. Gateways query Discovery to route requests to appropriate agents, which handle downloading files directly from storage, thus removing data flow from the gateway.

Mitigations for Official Processing

System Measurement : Benchmark FopGate capacity, resource usage patterns of each processing type, and per‑instance limits to size CPU threads and concurrency.

Queueing : Add a per‑node queue to avoid overloading a single instance; queue length can differentiate free vs. paid users.

Rate Limiting : Limit concurrent HTTP connections, per‑user request counts, and command‑specific counts to protect against long‑lived connections and burst‑induced queue overflow.

IO‑CPU Coordination : Co‑locate download and compute on the same machine; optionally mount a RAM‑based filesystem to reduce disk I/O.

Custom Data Processing Challenges

Security and isolation – user‑provided programs must not access other resources.

Uncertain workload scale – need elastic capacity.

Docker was chosen in 2014 to meet security, isolation, and scalability requirements.

Custom Processing Workflow

Develop the processing program locally following the UFOP specification.

Submit the program as a tarball; the backend converts the UFOP description into a Dockerfile, builds a Docker image, and pushes it to a Docker registry.

Resize the service (e.g., qufopctl resize ufop-demo -n 3) to set the desired instance count.

Images illustrate each step (registration, build, deployment).

Build Pipeline Details

The build pipeline uploads the tarball to Kodo, forwards a build request to a builder, generates a Dockerfile from the UFOP description, builds the image, and stores it in a Docker registry. Common pitfalls include using Debian mirrors that timeout and Docker build cache misuse; the solution is to combine download, extraction, and cleanup into a single Dockerfile command.

Instance Management and Upgrade

Scaling Instances : Use qufopctl resize to add or remove containers.

Gray‑scale Upgrade : Deploy new version instances alongside old ones, then retire the old instances after they finish processing.

Warm‑up & Cool‑down : New instances need a warm‑up period for pools; old instances should be stopped with a graceful timeout (Docker stop --time) before SIGKILL.

Data Flow Architecture

Requests arrive at a front‑end, are routed by a scheduler to nodes running daemon containers. Each node has a Fetcher that downloads the source file locally (avoiding direct external download for privacy). A DiskCache cluster caches data. The V2 design adds an optional queue between instances to enable automatic scaling.

Automatic Scaling Mechanism

The system monitors queue length; when the average pending tasks per instance exceed a configured threshold, a Scaler component requests additional instances from the scheduler, which launches containers on suitable nodes and records the state in a Keeper service.

Solutions for Custom Processing Challenges

Security : Limit inter‑container network access by restricting port ranges.

Isolation : Enforce CPU and memory quotas per container.

Scalability : Provide a fast container scheduler, expose scaling APIs for manual adjustments, and use queue length‑based triggers for automatic scaling.

The article concludes with practical tips on instance warm‑up, graceful shutdown, and maintaining sufficient compute redundancy during upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices data processing Auto Scaling

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.