Big Data 24 min read

Unlock Complex Data Scenarios with Simple MaxCompute SQL Techniques

This article shows how flexible, divergent thinking combined with basic MaxCompute (ODPS) SQL syntax can solve complex data problems such as generating sequences, splitting intervals, performing permutations and combinations, and analyzing continuous activity, providing step‑by‑step examples, SQL code snippets, and practical results.

ITPUB
ITPUB
ITPUB
Unlock Complex Data Scenarios with Simple MaxCompute SQL Techniques

Introduction

The goal of this article is to demonstrate how to fully exploit SQL capabilities, especially MaxCompute (ODPS) SQL, by applying flexible and divergent data‑processing thinking to solve complex scenarios using only the most basic syntax.

Target Audience and Structure

The content is useful for both beginners and experienced users, but it is most suitable for intermediate to advanced readers. It focuses on data‑processing concepts rather than deep syntax details, and readers can explore specific functions as needed.

1. Sequence Generation

1.1 Simple Incremental Sequence

Generate a sequence from 0 to 3 using a three‑step approach: create an array of the required length, explode it with posexplode to obtain indices, and select the index as the sequence value.

-- SQL - 1
select t.pos as a_n
from (
    select posexplode(split(space(3), space(1), false))
) t;

Result: [0,1,2,3]

1.2 Arithmetic Sequence

Given a first term a and common difference d, the nth term is a + (n‑1)*d. The SQL implementation uses the same array‑explosion technique.

-- SQL - 2
select a + t.pos * d as a_n
from (
    select posexplode(split(space(n-1), space(1), false))
) t;

1.3 Geometric Sequence

With first term a and ratio q, the nth term is a * q^(n‑1). The SQL follows the same pattern.

-- SQL - 3
select a * pow(q, t.pos) as a_n
from (
    select posexplode(split(space(n-1), space(1), false))
) t;

MaxCompute also provides the built‑in sequence function for quick generation:

-- SQL - 4
select sequence(1, 3, 1);

2. Interval Operations

2.1 Interval Splitting

To split a numeric interval into k equal sub‑intervals, first generate a sequence of length k+1, then compute start and end points using the arithmetic‑sequence formula.

-- SQL - 7
select a + t.pos * d as sub_interval_start,
       a + (t.pos + 1) * d as sub_interval_end
from (
    select posexplode(split(space(n-1), space(1), false))
) t;

2.2 Interval Overlap and Merging

For two overlapping date ranges, the solution expands each range into a daily sequence, counts occurrences, and derives the merged range and overlap dates.

-- SQL - 8
with dummy_table as (
    select stack(
        2,
        '2024-01-01','2024-01-03',
        '2024-01-02','2024-01-04'
    ) as (date_start, date_end)
)
select min(date_item) as date_start_merged,
       max(date_item) as date_end_merged,
       collect_set(case when date_item_cnt > 1 then concat(date_item, ':', date_item_cnt) else null end) as overlap_date
from (
    select date_add(date_start, pos) as date_item,
           count(1) over (partition by date_add(date_start, pos)) as date_item_cnt
    from dummy_table
    lateral view posexplode(split(space(datediff(date_end, date_start)), space(1), false)) t;

The same logic extends to an arbitrary number of intervals (SQL - 9), producing merged intervals and overlap counts.

3. Permutations and Combinations

3.1 Permutations with Repetition

Using two lateral explode calls on a character array yields all ordered pairs.

-- SQL - 12
select concat(val1, val2) as perm
from (select split('A,B,C', ',') as characters) dummy
lateral view explode(characters) t1 as val1
lateral view explode(characters) t2 as val2;

3.2 Combinations with Repetition

By ordering each pair and grouping by the ordered values, duplicate permutations collapse into unique combinations.

-- SQL - 13
select concat(least(val1, val2), greatest(val1, val2)) as comb
from (select split('A,B,C', ',') as characters) dummy
lateral view explode(characters) t1 as val1
lateral view explode(characters) t2 as val2
group by least(val1, val2), greatest(val1, val2);

MaxCompute also offers the combinations function (SQL - 14).

4. Continuous Activity Analysis

4.1 Ordinary Continuous Activity (Threshold‑Based)

Identify users with at least two consecutive active days using lag or dense‑rank techniques (SQL - 18, 19). A more detailed version also returns the length of each continuous segment (SQL - 20) and the start/end dates of each segment (SQL - 21).

4.2 Dynamic Continuous Activity (Longest Streak)

Find the top two users with the longest continuous active periods, returning user ID, streak length, and date range (SQL - 23).

5. Advanced Continuous Segmentation

For Wi‑Fi scan/connection logs, the task is to split behavior into intervals where the same status repeats and the time gap between consecutive records does not exceed 30 minutes. The solution groups by user and Wi‑Fi, orders by timestamp, and uses lag together with a conditional max window to assign a group index.

-- SQL - 25
select user_id,
       wifi,
       max(status) as status,
       min(time) as start_time,
       max(time) as end_time
from (
    select *,
           max(if(lag_status is null or lag_time is null or status <> lag_status or datediff(time, lag_time, 'ss') > 60*30, rn, null))
               over (partition by user_id, wifi order by time) as group_idx
    from (
        select *,
               row_number() over (partition by user_id, wifi order by time) as rn,
               lag(time,1) over (partition by user_id, wifi order by time) as lag_time,
               lag(status,1) over (partition by user_id, wifi order by time) as lag_status
        from user_wifi_log
    ) t1
) t2
group by user_id, wifi, group_idx;

The result lists each user‑Wi‑Fi pair, the status (scan or conn), and the start/end timestamps of each continuous behavior segment.

Conclusion

By adopting a flexible, divergent data‑processing mindset, even the most basic MaxCompute SQL syntax can address complex scenarios such as sequence generation, interval manipulation, combinatorial calculations, and continuous activity analysis. The article provides reusable templates and concrete examples that readers can adapt to their own data‑processing tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLdata-processingMaxComputecombinatoricsSequencesIntervals
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.