Databases 20 min read

Implementing Daily User Metrics (DNU, DAU, Retention) with Shell Scripts and MySQL Partitioned Tables

This guide explains how to collect, store, clean, and aggregate daily user metrics—including new users, active users, and multi‑day retention—using shell scripts, cron jobs, MySQL partitioned tables, and stored procedures, suitable for small‑to‑medium sized products.

Selected Java Interview Questions

Feb 12, 2022

Implementing Daily User Metrics (DNU, DAU, Retention) with Shell Scripts and MySQL Partitioned Tables

Background

After a product launch, operations need to track user data and calculate daily new users (DNU), daily active users (DAU), and retention rates for each cohort over subsequent days.

Scope of Use

The solution fits projects with daily active users under one million, ideal for small teams or solo developers who handle both backend development and operations.

Tools and Technical Points

Shell Scripts

Basic knowledge of shell scripting and crontab is required.

MySQL

Proficiency with SQL, stored procedures, and partitioned tables is needed.

Implementation Process

1. Goal

Aggregate raw logs into a single table that can be queried with one SELECT statement to obtain daily new users, active users, and retention percentages, while allowing re‑runs without affecting existing data.

2. Data Collection

Assume three dimensions: date, country, and version. Log entries are formatted as tab‑separated fields and written to a file that MySQL can load.

2022-02-09 13:14:15||aaaaaa||CN||100
2022-02-09 13:14:16||bbbbbb||US||100
2022-02-09 13:14:17||cccccc||NL||100

When data volume grows, split logs by hour and load each part separately. Use a partitioned table to delete old data efficiently.

3. Data Import

Shell script steps (run daily via cron):

Delete the previous day's partition.

Delete the partition for the current execution date (useful for re‑runs).

Create a new partition for the execution date.

Load the log file into MySQL with the LOAD DATA LOCAL INFILE command.

drop_sql="alter table st_base drop partition pxxxxxxxx"
add_sql="alter table st_base add partition (partition pxxxxxxxx values in (xxxxxxxx) engine=innodb)"
mysql -u${username} -p${password} -D${database} -e "${drop_sql}"
mysql -u${username} -p${password} -D${database} -e "${add_sql}"
log_file=xxxx
load_sql="load data local infile '${log_file}' ignore into table st_base fields terminated by '||' lines terminated by '
' (ts,device_id,country,version) set dd='${dd}'"
mysql -u${username} -p${password} -D${database} -e "${load_sql}"

Schedule the script at 00:10 each day with crontab:

10 0 * * * /path/to/job/st_base.sh

4. Data Cleaning

Use temporary tables and stored procedures to transform raw data, deduplicate, and populate three main tables: tmp_base – temporary table for type conversion. total_base – one row per unique device. flow_base – daily flow table with registration date ( rdt) for retention calculations.

CREATE PROCEDURE `sp_data_cleaning`(IN v_dt VARCHAR(10))
BEGIN
  TRUNCATE TABLE tmp_base;
  INSERT INTO tmp_base(`dt`,`device_id`,`country`,`version`)
    SELECT v_dt, `device_id`, `country`, `version`
    FROM `st_base` WHERE `dd` = REPLACE(v_dt,'-','');
  INSERT IGNORE INTO total_base(`dt`,`device_id`,`country`,`version`)
    SELECT `dt`,`device_id`,`country`,`version` FROM tmp_base;
  /* partition handling omitted for brevity */
  INSERT IGNORE INTO flow_base(`dt`,`device_id`,`country`,`version`,`rdt`,`dd`)
    SELECT v_dt, t1.`device_id`, t1.`country`, t1.`version`, t2.`dt`, REPLACE(v_dt,'-','')
    FROM tmp_base t1 LEFT OUTER JOIN total_base t2 ON t1.`device_id`=t2.`device_id`;
END

5. Data Aggregation

Create two reporting tables: rpt_base_active – stores daily new users and active users. rpt_base – stores 7‑day retention counts (d0‑d7).

CREATE TABLE `rpt_base_active` (
  `id` BIGINT NOT NULL AUTO_INCREMENT,
  `dt` DATE NOT NULL,
  `country` VARCHAR(8),
  `version` INT NOT NULL,
  `new_users` SMALLINT NOT NULL DEFAULT 0,
  `active_users` SMALLINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `index1` (`dt`),
  KEY `index3` (`country`,`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户新增活跃表';

CREATE TABLE `rpt_base` (
  `id` BIGINT NOT NULL AUTO_INCREMENT,
  `dt` DATE NOT NULL,
  `country` VARCHAR(8),
  `version` INT NOT NULL,
  `d0` SMALLINT NOT NULL DEFAULT 0,
  `d1` SMALLINT NOT NULL DEFAULT 0,
  `d2` SMALLINT NOT NULL DEFAULT 0,
  `d3` SMALLINT NOT NULL DEFAULT 0,
  `d4` SMALLINT NOT NULL DEFAULT 0,
  `d5` SMALLINT NOT NULL DEFAULT 0,
  `d6` SMALLINT NOT NULL DEFAULT 0,
  `d7` SMALLINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `index1` (`dt`),
  KEY `index3` (`country`,`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户留存表';

Aggregate data with stored procedures that compute the required dates, delete previous results, and insert fresh aggregates.

CREATE PROCEDURE `sp_rpt_base`(IN v_dt CHAR(10))
BEGIN
  /* compute d0‑d7 dates */
  DELETE FROM rpt_base_active WHERE `dt` = v_dt;
  INSERT INTO rpt_base_active(`dt`,`country`,`version`,`new_users`,`active_users`)
    SELECT `dt`,`country`,`version`,
           SUM(IF(`dt`=`rdt`,1,0)) AS new_users,
           COUNT(*) AS active_users
    FROM flow_base WHERE dt=v_dt
    GROUP BY `dt`,`country`,`version`;

  DELETE FROM rpt_base WHERE `dt` IN (d0,d1,d2,d3,d4,d5,d6,d7);
  INSERT INTO rpt_base(`dt`,`country`,`version`,`d0`,`d1`,`d2`,`d3`,`d4`,`d5`,`d6`,`d7`)
    SELECT t1.`rdt`, t1.`country`, t1.`version`,
           SUM(CASE WHEN `rdays`=0 THEN `users` ELSE 0 END) AS d0,
           SUM(CASE WHEN `rdays`=1 THEN `users` ELSE 0 END) AS d1,
           SUM(CASE WHEN `rdays`=2 THEN `users` ELSE 0 END) AS d2,
           SUM(CASE WHEN `rdays`=3 THEN `users` ELSE 0 END) AS d3,
           SUM(CASE WHEN `rdays`=4 THEN `users` ELSE 0 END) AS d4,
           SUM(CASE WHEN `rdays`=5 THEN `users` ELSE 0 END) AS d5,
           SUM(CASE WHEN `rdays`=6 THEN `users` ELSE 0 END) AS d6,
           SUM(CASE WHEN `rdays`=7 THEN `users` ELSE 0 END) AS d7
    FROM a_flow_base t1
    GROUP BY t1.`rdt`, t1.`country`, t1.`version`;
END

Full Process Wrapper

Combine all steps into a single procedure that can be called daily:

CREATE PROCEDURE `sp_user`(IN v_dt CHAR(10))
BEGIN
  CALL sp_tmp_base(v_dt);
  CALL sp_data_cleaning(v_dt);
  CALL sp_a_flow_base(v_dt);
  CALL sp_rpt_base(v_dt);
END

Final Notes

The flow_base table should retain enough days to cover the longest retention window (e.g., keep 8 days for 7‑day retention). If processing takes longer than a couple of hours, consider a Hadoop‑based solution.

Note: Index design for the reporting tables should be tuned to the specific queries used by the operations team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql shell DataPipeline retention UserMetrics

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.