Tagged articles
51 articles
Page 1 of 1
Ops Community
Ops Community
Mar 3, 2026 · Databases

Master PostgreSQL 17: Installation, Backup, Recovery, and Performance Tuning

This comprehensive guide walks you through PostgreSQL 17 deployment, explains its multi‑process architecture and MVCC model, details environment requirements, shows essential configuration parameters, provides step‑by‑step backup and PITR procedures, demonstrates streaming replication setup, and shares best‑practice tuning, security, and monitoring tips for reliable production use.

BackupDatabase AdministrationHA
0 likes · 24 min read
Master PostgreSQL 17: Installation, Backup, Recovery, and Performance Tuning
Raymond Ops
Raymond Ops
Jan 30, 2026 · Big Data

Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch

This guide walks you through designing and deploying a highly available HDFS architecture with dual NameNodes, ZooKeeper‑based failover, and a tuned YARN resource scheduler, covering detailed configuration files, failover testing, performance tuning, monitoring, automated health checks, capacity planning, and best‑practice checklists for production‑grade big‑data platforms.

AutomationBig DataHA
0 likes · 28 min read
Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch
Raymond Ops
Raymond Ops
Apr 26, 2025 · Operations

How to Enable Ceph NFS Service with nfs-ganesha: Step‑by‑Step Guide

This article walks through configuring Ceph to provide NFS services using nfs‑ganesha, covering module checks, cluster creation, export setup, client mounting, data verification, and high‑availability configuration with haproxy and keepalived, complete with command‑line examples.

CephHALinux
0 likes · 7 min read
How to Enable Ceph NFS Service with nfs-ganesha: Step‑by‑Step Guide
Java High-Performance Architecture
Java High-Performance Architecture
Aug 28, 2024 · Databases

NewSQL vs Sharding: Which Database Architecture Truly Wins?

This article objectively compares NewSQL distributed databases with traditional middleware‑based sharding solutions, examining their architectural differences, transaction support, scalability, HA, storage engines, and ecosystem maturity, to help readers decide which approach best fits their performance, consistency, and operational needs.

HANewSQLScalability
0 likes · 18 min read
NewSQL vs Sharding: Which Database Architecture Truly Wins?
Linux Ops Smart Journey
Linux Ops Smart Journey
Aug 4, 2024 · Cloud Native

How to Build a High-Performance, Highly-Available Production Kubernetes Cluster

This guide walks you through planning, configuring, and deploying a production‑grade Kubernetes cluster with high performance and availability, covering host planning, HA load balancing with keepalived and HAProxy, Harbor setup, node initialization, and essential system tweaks, all illustrated with ready‑to‑run code snippets.

DockerHAKubernetes
0 likes · 12 min read
How to Build a High-Performance, Highly-Available Production Kubernetes Cluster
Architect's Guide
Architect's Guide
Dec 5, 2022 · Cloud Native

Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with NFS, Ingress, Dashboard, and Harbor

This comprehensive tutorial walks through preparing the operating system, installing Docker and containerd, configuring yum repositories, initializing a multi‑master HA Kubernetes cluster with IPVS, deploying the Kubernetes dashboard, setting up NFS storage, installing an Ingress controller, and finally installing Harbor with Helm and a custom NFS provisioner, providing all necessary commands and configuration files.

DockerHAHarbor
0 likes · 38 min read
Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with NFS, Ingress, Dashboard, and Harbor
Architect
Architect
Nov 3, 2022 · Cloud Native

Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with Dashboard and Harbor

This comprehensive tutorial walks through preparing multiple Linux nodes, installing Docker and containerd, setting up kubeadm, kubelet and kubectl, initializing a HA Kubernetes control plane, configuring Flannel networking, deploying the Kubernetes dashboard, installing Nginx + Keepalived for load balancing, setting up NFS with rsync, provisioning storage via Helm, and finally installing a secure Harbor image registry, all with detailed commands and configuration snippets.

DashboardHAHarbor
0 likes · 39 min read
Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with Dashboard and Harbor
MaGe Linux Operations
MaGe Linux Operations
Aug 25, 2022 · Cloud Native

Build a Highly Available Kubernetes Cluster with Dashboard, Nginx HA & Harbor

This comprehensive tutorial walks you through deploying a production‑grade Kubernetes cluster on multiple nodes, configuring Docker and containerd, setting up kubeadm, enabling IPVS, installing a high‑availability Nginx + Keepalived load balancer, deploying the Kubernetes dashboard, and installing a secure Harbor image registry with NFS storage.

DockerHAHarbor
0 likes · 44 min read
Build a Highly Available Kubernetes Cluster with Dashboard, Nginx HA & Harbor
Big Data Technology Architecture
Big Data Technology Architecture
Jun 14, 2022 · Big Data

Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans

This presentation details the background, redesign, and migration of a large‑scale data platform at Dangbei Network Technology, focusing on the adoption of Apache DolphinScheduler, ClickHouse migration, storage and compute separation, monitoring solutions, and the roadmap for future upgrades and open‑source involvement.

Apache DolphinSchedulerClickHouseHA
0 likes · 12 min read
Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 5, 2022 · Databases

Understanding ProxySQL Configuration Tables for MySQL HA (Read/Write Splitting and Failover)

This article explains ProxySQL's built‑in databases, key configuration tables such as mysql_servers, mysql_users, mysql_replication_hostgroups, mysql_group_replication_hostgroups, and mysql_query_rules, and demonstrates how to set up read/write splitting and automatic failover for MySQL primary‑replica and group replication environments.

DatabaseProxyHAProxySQL
0 likes · 14 min read
Understanding ProxySQL Configuration Tables for MySQL HA (Read/Write Splitting and Failover)
Open Source Linux
Open Source Linux
Oct 22, 2021 · Cloud Native

Deploy a High‑Availability k0s Kubernetes Cluster with k0sctl

This guide explains how to install and configure k0s, a lightweight Kubernetes distribution, using k0sctl for both standard and high‑availability clusters, covering binary deployment, offline image handling, custom CNI integration, HA load‑balancer setup, certificate management, backup, and advanced features such as etcd replacement and user management.

CNIHAKubernetes
0 likes · 25 min read
Deploy a High‑Availability k0s Kubernetes Cluster with k0sctl
MaGe Linux Operations
MaGe Linux Operations
Oct 14, 2021 · Cloud Native

Mastering k0s: Deploy a Fully Automated HA Kubernetes Cluster with k0sctl

This guide walks through installing k0s, a certified Kubernetes distribution, using k0sctl for automated, customizable deployments—including binary installation, offline image handling, CNI plugin switching, HA setup with external load balancers, backup, restore, and advanced features like etcd replacement and user management.

Cluster DeploymentHAKubernetes
0 likes · 24 min read
Mastering k0s: Deploy a Fully Automated HA Kubernetes Cluster with k0sctl
Ops Development Stories
Ops Development Stories
Sep 17, 2021 · Operations

Master Keepalived: Build Reliable Linux Load‑Balancing and HA

This guide explains Keepalived’s role in Linux load‑balancing and high‑availability, covering its VRRP‑based architecture, core modules, layered operation, configuration syntax, practical deployment with Nginx, common split‑brain issues, and advanced settings such as nopreempt and multicast conflict resolution.

HAVRRPfailover
0 likes · 21 min read
Master Keepalived: Build Reliable Linux Load‑Balancing and HA
DataFunTalk
DataFunTalk
Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingHA
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
Wukong Talks Architecture
Wukong Talks Architecture
Jul 14, 2021 · Operations

Understanding High Availability: Lessons from the Bilibili Outage

This article analyzes Bilibili's recent service disruption, explains the concept and quantitative metrics of high availability, and outlines practical techniques such as rate limiting, isolation, failover, timeout control, circuit breaking, degradation, and multi‑region active‑active deployments to improve system reliability.

Distributed SystemsHAMTBF
0 likes · 13 min read
Understanding High Availability: Lessons from the Bilibili Outage
dbaplus Community
dbaplus Community
Jan 12, 2021 · Operations

Choosing Between Prometheus and Zabbix: A Practical Guide to High‑Availability Monitoring

This technical guide walks through the fundamentals of Prometheus, compares it with Zabbix, demonstrates high‑availability setups, remote storage with InfluxDB, multi‑instance Redis monitoring, and Grafana integration, providing concrete configuration examples and best‑practice recommendations for reliable ops monitoring.

GrafanaHAInfluxDB
0 likes · 17 min read
Choosing Between Prometheus and Zabbix: A Practical Guide to High‑Availability Monitoring
dbaplus Community
dbaplus Community
Dec 10, 2020 · Databases

How GitLab Achieved a Near-Perfect PostgreSQL 9.6→11 Upgrade

In May 2020 GitLab partnered with OnGres to upgrade a 12‑node PostgreSQL 9.6 cluster to version 11, using a carefully planned pg_upgrade process, automated Ansible playbooks, Patroni HA, and a detailed rollback strategy to keep a 6 TB dataset consistent while serving 300 k transactions per second.

AnsibleGitLabHA
0 likes · 16 min read
How GitLab Achieved a Near-Perfect PostgreSQL 9.6→11 Upgrade
DevOps Cloud Academy
DevOps Cloud Academy
Dec 7, 2020 · Operations

How to Upgrade a Single‑Master Kubernetes Cluster to a Multi‑Master High‑Availability Setup

This guide walks through converting a single‑master Kubernetes cluster into a highly available multi‑master deployment by configuring a load‑balancing Nginx front‑end, updating API server certificates with additional SAN entries, adjusting kubeconfig files, and adding extra control‑plane nodes while verifying etcd health.

HANGINXkubeadm
0 likes · 20 min read
How to Upgrade a Single‑Master Kubernetes Cluster to a Multi‑Master High‑Availability Setup
Programmer DD
Programmer DD
Oct 22, 2020 · Operations

Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies

This article explores Prometheus as a cloud‑native monitoring solution, covering core principles, limitations, metric selection, exporter consolidation, Kubernetes deployment nuances, memory and storage planning, high‑availability designs, and advanced features like rate calculations, cardinality management, and predictive alerts.

HAKubernetesPrometheus
0 likes · 33 min read
Mastering Prometheus: Principles, Pitfalls, and Scaling Strategies
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 16, 2020 · Big Data

Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features

This article provides a detailed introduction to HDFS, covering its application scenarios, core architecture, fault‑tolerance benefits, drawbacks such as high latency and small‑file inefficiency, essential shell and API commands, cluster management procedures, and newer Hadoop 2.0 features like HA, Federation, snapshots, ACLs, and heterogeneous storage.

Big DataCLIHA
0 likes · 10 min read
Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features
Programmer DD
Programmer DD
Jul 30, 2020 · Cloud Native

Master Prometheus: Practical Tips, Exporter Strategies, and Scaling Challenges

This comprehensive guide explores Prometheus monitoring fundamentals, key design principles, exporter selection for Kubernetes, advanced configuration tricks, capacity planning, high‑cardinality pitfalls, HA architectures, and integration with Grafana, Alertmanager, and Thanos to help you build reliable cloud‑native observability pipelines.

AlertingExporterGrafana
0 likes · 36 min read
Master Prometheus: Practical Tips, Exporter Strategies, and Scaling Challenges
dbaplus Community
dbaplus Community
Mar 15, 2020 · Databases

Step-by-Step Guide to Installing GaussDB T 1.0.2 HA Cluster on CentOS

This tutorial walks you through preparing the environment, configuring cluster components, installing required packages, setting up Python 3.7, running pre‑install checks, executing the GaussDB installation script, handling common errors, and finally uninstalling the cluster on a CentOS 7.5 system.

CentOSDatabase ClusterGaussDB
0 likes · 14 min read
Step-by-Step Guide to Installing GaussDB T 1.0.2 HA Cluster on CentOS
ITPUB
ITPUB
Oct 18, 2019 · Cloud Native

Deploy a Highly Available Kubernetes 1.12.5 Cluster with Kubespray

This guide walks through setting up password‑less SSH, downloading a specific Kubespray tag, customizing image repositories, configuring Docker mirrors, adjusting DNS and network plugins, and finally running Ansible playbooks to provision a HA Kubernetes 1.12.5 cluster.

AnsibleDockerHA
0 likes · 10 min read
Deploy a Highly Available Kubernetes 1.12.5 Cluster with Kubespray
21CTO
21CTO
May 30, 2019 · Backend Development

How Weibo Handles Billion‑Scale Short Video Traffic: High‑Concurrency Architecture Deep Dive

This article explains how Weibo's video team designs a highly available, high‑concurrency architecture for short‑video services, covering team responsibilities, business scenarios, microservice design, caching layers, multi‑data‑center HA, and circuit‑breaker mechanisms to sustain unpredictable traffic spikes.

BackendHAMicroservices
0 likes · 12 min read
How Weibo Handles Billion‑Scale Short Video Traffic: High‑Concurrency Architecture Deep Dive
Tencent Database Technology
Tencent Database Technology
May 15, 2018 · Databases

Improving MySQL Asynchronous Replication by Aligning Read_Master_Log_Pos with Exec_Master_Log_Pos

This article analyzes a MySQL master‑crash scenario where mismatched Read_Master_Log_Pos and Exec_Master_Log_Pos prevent HA failover, explains binlog event structures, and proposes updating Read_Master_Log_Pos only after a full transaction is received to ensure reliable asynchronous replication.

AsynchronousBinlogExec_Master_Log_Pos
0 likes · 9 min read
Improving MySQL Asynchronous Replication by Aligning Read_Master_Log_Pos with Exec_Master_Log_Pos
Tencent Cloud Developer
Tencent Cloud Developer
Mar 14, 2018 · Cloud Computing

Business Continuity Solutions on Tencent Cloud: High Availability and Disaster Recovery

Tencent Cloud’s business continuity solutions combine high‑availability clusters, multi‑AZ load balancing, and cross‑region disaster‑recovery architectures—such as CLB‑CVM‑MySQL configurations, CDB hot‑standby instances, DNS‑based failover, and data‑sync services—to ensure continuous operation and rapid recovery from localized or regional failures.

HATencent Cloudbusiness continuity
0 likes · 10 min read
Business Continuity Solutions on Tencent Cloud: High Availability and Disaster Recovery
Architect's Tech Stack
Architect's Tech Stack
Jan 17, 2018 · Operations

RabbitMQ Cluster Installation and Configuration Guide

This guide explains how to install RabbitMQ, set up a three-node Erlang-based cluster on CentOS, configure hostnames, Erlang cookies, designate disk and RAM nodes, manage services, enable mirrored queues, and verify cluster status using command‑line tools.

ErlangHARabbitMQ
0 likes · 9 min read
RabbitMQ Cluster Installation and Configuration Guide
Meituan Technology Team
Meituan Technology Team
Mar 17, 2017 · Big Data

Optimizing Hadoop NameNode Restart in HA with QJM

By applying a series of JIRA patches and configuration tweaks—such as shrinking the fsLock scope, increasing checkpoint transaction thresholds, off‑loading quota calculations, simplifying BlockReport handling, and async processing of mis‑replicated blocks—the Hadoop HA NameNode restart time in a 540 MB metadata cluster drops from roughly 4000 seconds to about 2000 seconds, cutting total downtime to around 35 minutes and greatly improving cluster availability.

HAHDFSHadoop
0 likes · 18 min read
Optimizing Hadoop NameNode Restart in HA with QJM
Architecture Digest
Architecture Digest
May 8, 2016 · Databases

MySQL High Availability Architectures: Overview of Common Solutions

This article reviews the main MySQL high‑availability architectures—including shared‑storage SAN, DRBD disk replication, keepalived/heartbeat, MHA, ZooKeeper‑based HA, Galera/PXC clustering, and middleware proxy solutions—detailing their principles, advantages, limitations, and suitability for different business scenarios.

ClusterHAReplication
0 likes · 17 min read
MySQL High Availability Architectures: Overview of Common Solutions
21CTO
21CTO
Mar 11, 2016 · Databases

How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies

This article explores practical MySQL high‑availability solutions, covering asynchronous and semi‑synchronous replication, monitoring with keepalived or Zookeeper, failover decision criteria, GTID and pseudo‑GTID techniques, and lessons learned from real‑world deployments.

GTIDHAPseudo GTID
0 likes · 13 min read
How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies
dbaplus Community
dbaplus Community
Oct 17, 2015 · Databases

Master PostgreSQL: From Origins to Hands‑On Labs and HA Strategies

This article presents a comprehensive overview of PostgreSQL, covering its history, architecture, core features, step‑by‑step lab exercises for database creation, CRUD operations, configuration tuning, performance monitoring, backup and recovery, Hot Standby replication, PGPOOL clustering, and a curated Q&A session addressing common DBA challenges.

BackupHAPGPOOL
0 likes · 19 min read
Master PostgreSQL: From Origins to Hands‑On Labs and HA Strategies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 5, 2015 · Cloud Computing

Overview of VMware Storage Path Reliability, vConverter, HA, DRS, FT, vMotion, Storage vMotion, DPM, Networking and Automation Features

This article provides a comprehensive overview of VMware's storage path reliability, pluggable storage architecture, virtual machine conversion tools, high‑availability and resource‑management features such as HA, DRS, FT, vMotion, storage vMotion, DPM, as well as networking components like vDS and VMkernel ports, plus upgrade automation utilities.

HANetworkingVMware
0 likes · 9 min read
Overview of VMware Storage Path Reliability, vConverter, HA, DRS, FT, vMotion, Storage vMotion, DPM, Networking and Automation Features