Databases 8 min read

Parsing and Normalizing Space-Delimited Valid Periods in PostgreSQL

The article demonstrates how to split a space‑separated "validperiods" column into separate begin, end, and price fields in PostgreSQL using string_to_array, to_timestamp, regexp_replace, and unnest to transform the data into a readable tabular format.

Qunar Tech Salon

Apr 30, 2015

Parsing and Normalizing Space-Delimited Valid Periods in PostgreSQL

Many developers store formatted data in a single text column for convenience, such as a validperiods field that contains three values (start time, end time, price) separated by spaces. The article shows an initial query that retrieves this raw data:

qunar_group=# select id, validperiods from rrs limit 10;
   id    |        validperiods        
---------+----------------------------
 1522764 | 1363996800 1371945600 4512
 1522774 | 1363996800 1371945600 4941
 ... (more rows) ...
(10 rows)

Using string_to_array() the three components can be extracted into separate columns:

qunar_group=# select id,
       (string_to_array(validperiods, ' '))[1] as begin,
       (string_to_array(validperiods, ' '))[2] as end,
       (string_to_array(validperiods, ' '))[3] as price
  from rrs limit 10;
   id    |   begin    |    end     | price 
---------+------------+------------+-------
 1522764 | 1363996800 | 1371945600 | 4512
 ...

Converting the timestamps to readable dates improves clarity:

qunar_group=# select id,
       to_timestamp((string_to_array(validperiods, ' '))[1]::int) as begin,
       to_timestamp((string_to_array(validperiods, ' '))[2]::int) as end,
       (string_to_array(validperiods, ' '))[3] as price
  from rrs limit 10;
   id    |         begin          |          end           | price 
---------+------------------------+------------------------+-------
 1522764 | 2013-03-23 08:00:00+08 | 2013-06-23 08:00:00+08 | 4512
 ...

The problematic design appears when a row contains multiple "start‑end‑price" groups, all separated by spaces, making simple splitting impossible. An example query shows such a record:

qunar_group=# select id, validperiods from rrs where id = '2669032';
   id    |                                                              validperiods                                                               
---------+-----------------------------------------------------------------------------------------------------------------------------------------
 2669032 | 1363824000 1363824000 7080 1364688000 1364688000 7080 1365206400 1365206400 6580 ...

Because both groups and values use spaces as delimiters, the article proposes inserting commas between groups using regexp_replace():

qunar_group=# select id,
       regexp_replace(validperiods, '([0-9]+[ ][0-9]+[ ][0-9]+)([ ])', '\1,', 'g')
  from rrs where id = '2669032';
   id    |                                                             regexp_replace                                                               
---------+-----------------------------------------------------------------------------------------------------------------------------------------
 2669032 | 1363824000 1363824000 7080,1364688000 1364688000 7080,1365206400 1365206400 6580, ...

After adding commas, the string can be split into individual groups with string_to_array() and expanded into rows using unnest():

qunar_group=# select id,
       unnest(string_to_array(regexp_replace(validperiods, '([0-9]+[ ][0-9]+[ ][0-9]+)([ ])', '\1,', 'g'), ','))
  from rrs where id = '2669032';
   id    |           unnest           
---------+----------------------------
 2669032 | 1363824000 1363824000 7080
 2669032 | 1364688000 1364688000 7080
 2669032 | 1365206400 1365206400 6580
 ...

Finally, the groups are parsed again with string_to_array() and converted to timestamps for a clean, tabular result:

qunar_group=# select id,
       to_timestamp((string_to_array(validperiods, ' '))[1]::int) as begin,
       to_timestamp((string_to_array(validperiods, ' '))[2]::int) as end,
       (string_to_array(validperiods, ' '))[3] as price
  from (
        select id,
               unnest(string_to_array(regexp_replace(validperiods, '([0-9]+[ ][0-9]+[ ][0-9]+)([ ])', '\1,', 'g'), ',')) as validperiods
        from rrs where id = '2669032') a;
   id    |         begin          |          end           | price 
---------+------------------------+------------------------+-------
 2669032 | 2013-03-21 08:00:00+08 | 2013-03-21 08:00:00+08 | 7080
 2669032 | 2013-03-31 08:00:00+08 | 2013-03-31 08:00:00+08 | 7080
 ...

The author, a search and database researcher at Qunar, provides these examples to illustrate practical PostgreSQL techniques for handling poorly designed, space‑delimited data structures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sql postgresql regex data normalization string_to_array

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.