SQL Approach to Identify Continuous User Activity Periods in Big Data
This article demonstrates how to use SQL, including dense_rank and date arithmetic, to detect users who have recorded a specific event for at least seven consecutive days within a month, providing step‑by‑step queries and a complete combined statement.
In daily work, analysts often receive requests to calculate the longest continuous days for various metrics, such as user sign‑ins, forum activity, or product sales.
This example focuses on finding users who, during April, have event type 24 records for a week or more, indicating strong interest that warrants targeted operations.
The solution is built with a series of sub‑queries to keep the logic clear and avoid excessive nesting.
Step 1: Group records by user_id and assign a ranking based on the date using dense_rank(), because a user may have multiple records per day.
select user_id, pt_date, dense_rank() over (partition by user_id order by pt_date) as date_rank from user_calendar_record where pt_date >= 20190401 and pt_date <= 20190430 and event_type = 24 and del_status = 0Step 2: Subtract the rank from the date to obtain a start_point , which marks the first day of each continuous sequence.
select user_id, pt_date, date_sub(pt_date, cast(date_rank as int)) as start_point from t_aStep 3: Group by user_id and start_point and count distinct dates to get the length of each continuous sequence ( day_count).
Step 4: Filter for sequences where day_count is at least 7 days.
select user_id, max(day_count) as max_day_count from t_c group by user_id having max(day_count) >= 7The four steps can be combined into a single statement:
select user_id, max(day_count) as max_day_count from ( select user_id, start_point, count(distinct pt_date) as day_count from ( select user_id, pt_date, date_sub(pt_date, cast(date_rank as int)) as start_point from ( select user_id, pt_date, dense_rank() over (partition by user_id order by pt_date) as date_rank from user_calendar_record where pt_date >= 20190401 and pt_date <= 20190430 and event_type = 24 and del_status = 0 ) t_a ) t_b group by user_id, start_point ) t_c group by user_id having max(day_count) >= 7;If the start date of the longest streak is also needed, the day counts can be stored in a temporary table and joined with another temporary table that holds the maximum counts to retrieve the corresponding start dates.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
