What Real‑World DBA Lessons Reveal About Database Reliability
The article shares a DBA’s three‑year journey at Ganji, detailing core responsibilities, painful incidents like accidental table deletions and massive Redis growth, and practical lessons on stability, backup, hardware prioritization, business alignment, and improving communication between operations and development teams.
Introduction
In early 2012 I joined Ganji as a DBA during rapid traffic growth. Over three years I learned many lessons, especially the communication gap between operations and development caused by knowledge asymmetry.
DBA Responsibilities
Planning, designing, managing, and migrating database systems.
Daily maintenance, backup, optimization, and recovery.
Building and maintaining master‑slave architectures.
Supporting production releases, reviewing designs, and providing architectural solutions.
Databases include MySQL, Oracle, and, when needed, NoSQL such as Redis and MongoDB. The work focuses on high availability, data safety (e.g., backups that rescued millions of mobile users), and serving business needs.
Disastrous Cases
1. Delete Without WHERE
A colleague ran a script missing a WHERE clause, wiping an entire table; recovery required binlog restoration.
Reflection: New developers repeat mistakes; the only reliable fix is a proxy that blocks illegal SQL and better code review.
2. Large‑Seller Issue
Opening a free port caused a commercial table to swell to 100 GB, leading to database instability; it took three months to shrink the table and split a text field.
Reflection: Insufficient monitoring of large tables.
3. Cross‑Master Subqueries
Excessive subqueries from developers hit the master instead of slaves, a common but often unnoticed problem.
Reflection: Without proxy protection, master‑slave setups are vulnerable.
4. OLAP Reporting Database
A reporting system built on MyISAM suffered read‑write locks after traffic surged, causing massive request blocking and costly penalties.
Reflection: Sudden opening of free ports and lack of proper design led to the incident; a Hadoop/Spark solution would have been better.
5. 50 GB Redis Instance
Redis usage grew from 20 GB to 50 GB; after I left, the data was eventually lost during a failure.
My Work
Most of my time involved communicating with developers. I later helped build the automan platform, which automated SQL review, simulated execution, and performed backups, greatly reducing manual effort.
DBA Insights
Solid Foundations: Stability is paramount; use MHA/GTID for master failover, LVS for slave traffic, and maintain regular full and incremental backups with verification.
Hardware First: Scale up or out by adding memory, SSDs, or flash when buffers are insufficient.
Prepare Ahead: Optimize slow SQL, monitor large tables, archive old data, and regularly shrink tables.
Align with Business: Sometimes a developer can fix an issue faster than a DBA; understanding business logic helps prioritize work.
Learn to Say No: Distinguish reasonable from unreasonable requests; defer or reject non‑urgent tasks.
Effective Communication: Clarify responsibilities and set timelines.
Continuous Learning: DBAs need development skills to stay effective.
Ops vs. Development Tensions
Different KPIs create knowledge gaps, especially for newcomers. Solutions include mentorship, comprehensive wiki documentation, and using automation to enforce standards.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
