Databases 9 min read

What Real‑World DBA Lessons Reveal About Database Reliability

The article shares a DBA’s three‑year journey at Ganji, detailing core responsibilities, painful incidents like accidental table deletions and massive Redis growth, and practical lessons on stability, backup, hardware prioritization, business alignment, and improving communication between operations and development teams.

Efficient Ops
Efficient Ops
Efficient Ops
What Real‑World DBA Lessons Reveal About Database Reliability

Introduction

In early 2012 I joined Ganji as a DBA during rapid traffic growth. Over three years I learned many lessons, especially the communication gap between operations and development caused by knowledge asymmetry.

DBA Responsibilities

Planning, designing, managing, and migrating database systems.

Daily maintenance, backup, optimization, and recovery.

Building and maintaining master‑slave architectures.

Supporting production releases, reviewing designs, and providing architectural solutions.

Databases include MySQL, Oracle, and, when needed, NoSQL such as Redis and MongoDB. The work focuses on high availability, data safety (e.g., backups that rescued millions of mobile users), and serving business needs.

Disastrous Cases

1. Delete Without WHERE

A colleague ran a script missing a WHERE clause, wiping an entire table; recovery required binlog restoration.

Reflection: New developers repeat mistakes; the only reliable fix is a proxy that blocks illegal SQL and better code review.

2. Large‑Seller Issue

Opening a free port caused a commercial table to swell to 100 GB, leading to database instability; it took three months to shrink the table and split a text field.

Reflection: Insufficient monitoring of large tables.

3. Cross‑Master Subqueries

Excessive subqueries from developers hit the master instead of slaves, a common but often unnoticed problem.

Reflection: Without proxy protection, master‑slave setups are vulnerable.

4. OLAP Reporting Database

A reporting system built on MyISAM suffered read‑write locks after traffic surged, causing massive request blocking and costly penalties.

Reflection: Sudden opening of free ports and lack of proper design led to the incident; a Hadoop/Spark solution would have been better.

5. 50 GB Redis Instance

Redis usage grew from 20 GB to 50 GB; after I left, the data was eventually lost during a failure.

My Work

Most of my time involved communicating with developers. I later helped build the automan platform, which automated SQL review, simulated execution, and performed backups, greatly reducing manual effort.

DBA Insights

Solid Foundations: Stability is paramount; use MHA/GTID for master failover, LVS for slave traffic, and maintain regular full and incremental backups with verification.

Hardware First: Scale up or out by adding memory, SSDs, or flash when buffers are insufficient.

Prepare Ahead: Optimize slow SQL, monitor large tables, archive old data, and regularly shrink tables.

Align with Business: Sometimes a developer can fix an issue faster than a DBA; understanding business logic helps prioritize work.

Learn to Say No: Distinguish reasonable from unreasonable requests; defer or reject non‑urgent tasks.

Effective Communication: Clarify responsibilities and set timelines.

Continuous Learning: DBAs need development skills to stay effective.

Ops vs. Development Tensions

Different KPIs create knowledge gaps, especially for newcomers. Solutions include mentorship, comprehensive wiki documentation, and using automation to enforce standards.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DevOpsReliabilityDatabase Administration
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.