Unlocking Unmanned Ops: DataOps & SRE Strategies for Big Data Management
The article explains how DataOps and SRE practices enable large‑scale, data‑driven operations in big‑data environments, aiming for fully automated, intelligent, and ultimately unmanned management of complex systems.
DataOps Trend
In the DT era, cloud computing, big data, and AI are reshaping every industry, and Alibaba’s five‑new‑strategy initiative (new retail, finance, manufacturing, technology, and energy) positions data as a core driver of business transformation.
Alibaba Big Data Operations
Alibaba’s big‑data platform stores 99% of the group’s data and provides 95% of its computing power, with over 100,000 servers supporting offline batch processing and a real‑time streaming platform that handles 4.72 billion log entries per second during the Double‑11 shopping festival. The rapid growth of data workloads creates new challenges for stability, cost, efficiency, and security, defining the mission of big‑data SRE.
The massive scale of big‑data systems leads to a fragmented technology stack, diverse standards, and demanding infrastructure requirements for distributed storage, massive iterative computation, and low‑latency high‑bandwidth networks, making DevOps adoption especially difficult.
Organizational Models for Ops
Two extreme models are discussed: a vertically integrated DevOps organization, which lacks industry standards and transparency for large‑scale operations, and a horizontally separated Dev and Ops structure, which struggles with the sheer variety of services, metrics, and data dimensions, reducing operational satisfaction.
The recommended approach balances both sides by fostering collaboration through an SRE middle platform that automates hand‑offs, continuously upgrades product and operational systems, and frees operations teams to focus on higher‑value work.
DataOps Definition and Practices
DataOps (AliDataOps) aims to digitize and automate operations across platforms, applications, and data assets. Core components include offline, near‑real‑time, and real‑time compute platforms (Hadoop, StreamCompute, MaxCompute) and downstream data products such as TT/DataHub, HBase, and ADS.
By establishing standardized data definitions, metrics, and collection rules, organizations can build a data‑analysis platform that enables data‑driven operations, risk control, efficiency gains, and reduced manual effort. This foundation supports the evolution toward AI‑Ops, the ultimate goal of unmanned, intelligent operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
