Operations 6 min read

Unlocking Unmanned Ops: DataOps & SRE Strategies for Big Data Management

The article explains how DataOps and SRE practices enable large‑scale, data‑driven operations in big‑data environments, aiming for fully automated, intelligent, and ultimately unmanned management of complex systems.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Unlocking Unmanned Ops: DataOps & SRE Strategies for Big Data Management

DataOps Trend

In the DT era, cloud computing, big data, and AI are reshaping every industry, and Alibaba’s five‑new‑strategy initiative (new retail, finance, manufacturing, technology, and energy) positions data as a core driver of business transformation.

Alibaba Big Data Operations

Alibaba’s big‑data platform stores 99% of the group’s data and provides 95% of its computing power, with over 100,000 servers supporting offline batch processing and a real‑time streaming platform that handles 4.72 billion log entries per second during the Double‑11 shopping festival. The rapid growth of data workloads creates new challenges for stability, cost, efficiency, and security, defining the mission of big‑data SRE.

The massive scale of big‑data systems leads to a fragmented technology stack, diverse standards, and demanding infrastructure requirements for distributed storage, massive iterative computation, and low‑latency high‑bandwidth networks, making DevOps adoption especially difficult.

Organizational Models for Ops

Two extreme models are discussed: a vertically integrated DevOps organization, which lacks industry standards and transparency for large‑scale operations, and a horizontally separated Dev and Ops structure, which struggles with the sheer variety of services, metrics, and data dimensions, reducing operational satisfaction.

The recommended approach balances both sides by fostering collaboration through an SRE middle platform that automates hand‑offs, continuously upgrades product and operational systems, and frees operations teams to focus on higher‑value work.

DataOps Definition and Practices

DataOps (AliDataOps) aims to digitize and automate operations across platforms, applications, and data assets. Core components include offline, near‑real‑time, and real‑time compute platforms (Hadoop, StreamCompute, MaxCompute) and downstream data products such as TT/DataHub, HBase, and ADS.

By establishing standardized data definitions, metrics, and collection rules, organizations can build a data‑analysis platform that enables data‑driven operations, risk control, efficiency gains, and reduced manual effort. This foundation supports the evolution toward AI‑Ops, the ultimate goal of unmanned, intelligent operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataOperationsSREDataOpsAI Ops
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.