Tagged articles
2 articles
Page 1 of 1
Efficient Ops
Efficient Ops
Jun 23, 2022 · Cloud Native

How Vivo Scales Kubernetes: Automated Multi‑Cluster Management with a Custom Operator

Vivo’s rapid migration to Kubernetes across multiple data centers required a secure, efficient, and reliable way to manage thousands of nodes, leading them to develop a custom k8s‑operator that streamlines cluster deployment, CI testing, declarative APIs, and automated repair for large‑scale cloud‑native environments.

Cloud NativeCluster AutomationDevOps
0 likes · 3 min read
How Vivo Scales Kubernetes: Automated Multi‑Cluster Management with a Custom Operator
dbaplus Community
dbaplus Community
Aug 19, 2019 · Big Data

Automating Fault Recovery in 5,000‑Node Hadoop Clusters with Fabric & CM_API

This article explains how a large‑scale Hadoop environment can automatically detect common failures—such as swap usage, clock drift, agent crashes, role outages, and disk imbalance—and recover them using Prometheus alerts, Fabric/Paramiko remote execution, and Cloudera Manager APIs, complete with code examples and step‑by‑step commands.

Big Data OperationsCM_APICluster Automation
0 likes · 12 min read
Automating Fault Recovery in 5,000‑Node Hadoop Clusters with Fabric & CM_API