Implementing Row-to-Column Pivot in Hive: Traditional and Map Approaches
This article explains how to perform row-to-column transformations (pivot) in Hive using two methods: a traditional SQL approach mimicking Oracle/SQL Server pivot syntax and a more concise map-based technique, comparing their syntax, performance, and memory considerations.
Preface
Traditional relational databases such as Oracle (post‑11g) and SQL Server (post‑2005) provide a PIVOT function for row‑to‑column conversion; this article describes two ways to achieve the same in Hive.
Traditional Database Method
This method follows the approach used before PIVOT functions existed in Oracle/SQL Server, with only syntax changes to Hive.
with testtable(
select 1 id,'k1' key,123 value
union all
select 1,'k2' key,124 value
union all
select 2,'k1' key,234 value
)
select id,
max(case when key='k1' then value else null end) k1,
max(case when key='k2' then value else null end) k2
from testtable
group by idMap Method
This approach builds a key‑value map in Hive by concatenating the column to be pivoted with its value, then extracts the values from the map.
with testtable(
select 1 id,'k1' key,123 value
union all
select 1,'k2' key,124 value
union all
select 2,'k1' key,234 value
)
select id,kv['k1'],kv['k2']
from (
select id,str_to_map(concat_ws(',', collect_set(concat(key, '-', value))),',','-') kv
from testtable
group by id) tConclusion
Both methods can pivot rows to columns; the traditional method is easier to understand but becomes cumbersome when many keys exist, while the map method is more concise but requires more memory because all data are collected together. The traditional method uses aggregation functions to reduce data on the fly.
Likes and shares are the biggest support~
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
