Databases 33 min read

How a Self‑Built Database Audit Platform Boosts DBA Efficiency and Quality

This article details the challenges of large‑scale database operations, the selection and design of a custom audit platform, its architecture, data collection, rule parsing, implementation, results, and lessons learned, while sharing the open‑source repository for the system.

dbaplus Community
dbaplus Community
dbaplus Community
How a Self‑Built Database Audit Platform Boosts DBA Efficiency and Quality

Background

Rapid growth of Oracle, MySQL, MongoDB, and Redis instances in a financial‑services environment created a mismatch between operational scale and DBA resources, leading to inconsistent design quality, SQL performance problems and difficulty maintaining overall database health.

Challenges

Scale vs. manpower: dozens of instances serving thousands of developers.

Inconsistent design and development quality across teams.

Manual, error‑prone processes for reviewing schema, SQL and execution plans.

Lack of measurable, automated feedback mechanisms.

Platform selection

Three approaches were evaluated: (1) self‑developed SQL engine (high difficulty), (2) self‑developed data collector with rule‑based post‑audit (low difficulty, flexible), (3) commercial products (powerful but costly and still require manual steps). The team chose a hybrid “tool + manual review” model and built a custom audit platform.

Architecture

Core modules (see image):

Data collection (Oracle & MySQL).

OBJ/SQL repository (MongoDB).

Core management (object and SQL lifecycle).

Rule engine (definition, configuration, execution).

Future optimization engine (phase 2).

System management (task scheduling, space management, report generation).

Data collection

Two audit object levels are supported:

Object‑level: tables, partitions, indexes, views, triggers, etc.

SQL‑level: raw SQL text, execution plans, execution features.

Oracle data is gathered from AWR snapshots via DBMS_WORKLOAD_REPOSITORY, transformed to JSON and inserted into MongoDB collection oracle_audit, minimizing production impact. MySQL data is collected from slow‑query logs using Percona Toolkit pt‑query‑digest and from real‑time queries routed to read‑only replicas; results are stored as JSON in collection mysql_audit.

Rule definition

Rules are stored as JSON documents with the following fields:

{
  "db_type": "Oracle",
  "rule_name": "large_table_full_scan",
  "rule_type": "plan",
  "rule_complexity": "simple",
  "rule_cmd": { "$match": { "operation": "TABLE ACCESS FULL", "object_size": { "$gt": ${size_threshold} } } },
  "weight": 5,
  "max_score": 20,
  "solution": "Consider partitioning or adding appropriate indexes",
  "rule_status": "ON"
}

Four rule categories exist:

Object‑level (e.g., tables without partitions).

Text‑level (regular‑expression checks on SQL text).

Plan‑level (analysis of execution plans).

Feature‑level (runtime characteristics such as block‑scan ratio).

Simple rules are expressed as MongoDB queries; complex rules invoke external Python scripts referenced by rule_name.

Implementation example – large‑table full scan

Use

db.plan.find({ "operation": "TABLE ACCESS FULL", "object_size": { "$gt": 1000000 } })

to locate execution plans that scan a large table.

Join the result with the OBJ collection to retrieve table metadata (record count, physical size).

Extract sql_id, plan_hash_value and object_name for reporting.

Apply the rule’s weight and max_score to compute a penalty.

Store the audit record in the audit_result collection for front‑end visualization.

Task scheduling and execution

All audit jobs are defined in a task table and executed asynchronously by Celery workers; Flower provides a web UI for monitoring. Typical workflow:

Administrator creates a task specifying data source (IP, port, schema), audit date range and selected rule set.

Celery worker triggers the data‑collection module, writes raw data to MongoDB.

After collection, the rule‑engine worker parses the JSON documents, runs the rule queries or scripts, and writes results to audit_result.

When the job finishes, its status changes to “SUCCESS” and results can be viewed or exported as Excel.

Result presentation

Audit results are displayed in a web dashboard. Overview charts (pie charts) show the distribution of rule violations; detailed tables list each violation with fields such as sql_id, object_name, score and suggested solution. Scores are normalized to a 0‑100 scale, enabling longitudinal tracking of database quality per system.

Operational lessons

When parsing MySQL JSON execution plans, set wait_timeout (e.g., 3 s) to avoid session hangs.

Large result sets may block workers; use pt‑kill with a unique marker (e.g., “eXplAin format=json”) to terminate offending sessions.

Rule thresholds must balance strictness and practicality to keep developers engaged.

Adoption workflow

The rollout consists of four phases: (1) information collection across all DB instances, (2) manual analysis of key systems, (3) on‑site training for developers and DBAs, and (4) continuous feedback‑driven rule refinement. After six months in production the platform reduced manual DBA effort by roughly 30 %.

Open‑source repository

The full source code is available at https://github.com/CreditEaseDBA/Themis

Q&A highlights

Typical questions covered rule creation for type mismatches, the fact that the engine is policy‑based (no machine‑learning algorithm), and effort estimates (≈ ½ year, one developer plus two DBAs).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

rule enginePerformance Optimizationdatabase auditDBA toolsSQL quality
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.