Tagged articles

Cluster Automation

2 articles · Page 1 of 1

Jun 23, 2022 · Cloud Native

How Vivo Scales Kubernetes: Automated Multi‑Cluster Management with a Custom Operator

Vivo’s rapid migration to Kubernetes across multiple data centers required a secure, efficient, and reliable way to manage thousands of nodes, leading them to develop a custom k8s‑operator that streamlines cluster deployment, CI testing, declarative APIs, and automated repair for large‑scale cloud‑native environments.

Cluster AutomationKubernetesLarge Scale

0 likes · 3 min read

How Vivo Scales Kubernetes: Automated Multi‑Cluster Management with a Custom Operator

dbaplus Community

Aug 19, 2019 · Big Data

Automating Fault Recovery in 5,000‑Node Hadoop Clusters with Fabric & CM_API

This article explains how a large‑scale Hadoop environment can automatically detect common failures—such as swap usage, clock drift, agent crashes, role outages, and disk imbalance—and recover them using Prometheus alerts, Fabric/Paramiko remote execution, and Cloudera Manager APIs, complete with code examples and step‑by‑step commands.

Big Data OperationsCM_APICluster Automation

0 likes · 12 min read

Automating Fault Recovery in 5,000‑Node Hadoop Clusters with Fabric & CM_API