Big Data 6 min read

What Is Hive and How Does It Turn SQL into MapReduce?

This article explains Hive as a SQL‑based interface for Hadoop, shows why it simplifies large‑scale data analysis, provides practical command‑line examples for table creation, data loading, and queries, and details how HiveQL is internally converted into MapReduce jobs.

Java High-Performance Architecture

Oct 21, 2016

What is Hive

In short, Hive lets you query and analyze massive data stored in Hadoop using SQL.

Hive maps structured data files to database tables and translates SQL statements into MapReduce jobs.

Why Use Hive

Before Hive, analyzing large files required writing custom MapReduce programs, packaging JARs, and running them, which was cumbersome especially for many statistical queries.

SQL is widely known and has low learning cost, so Hive enables SQL queries on Hadoop data, improving efficiency.

Usage Examples

Table Operations

Create a table: hive> CREATE TABLE pokes (foo INT, bar STRING); Describe the table:

hive> DESCRIBE pokes;
OK
foo    int
bar    string
Time taken: 0.17 seconds, Fetched: 2 row(s)

Drop the table:

hive> DROP TABLE pokes;

Loading Data

Load a local file into the table:

hive> LOAD DATA LOCAL INPATH 'kv1.txt' OVERWRITE INTO TABLE pokes;

Queries

Example 1 – select rows where foo < 5:

hive> SELECT * FROM pokes WHERE foo<5;
... (output omitted)

Example 2 – count rows where foo < 5:

hive> SELECT COUNT(*) FROM pokes WHERE foo<5;
... (output omitted)

How HiveQL Is Translated to MapReduce

Background

Consider a user table (user_id, name) and an order table (user_id, order_id). A join query retrieves user names with their order IDs.

hive> SELECT u.name, o.order_id FROM order o JOIN user u ON o.user_id = u.user_id;

Map Phase

The map operation emits the join key ( user_id) as the key and the corresponding record as the value.

Example key‑value pairs:

key: 1   value: <1,张三>
key: 1   value: <2,001>
key: 1   value: <2,002>
key: 2   value: <1,李四>
key: 2   value: <2,003>

Shuffle and Sort

Values are grouped by key, preparing them for reduction.

Reduce Phase

The reducer combines records with the same key to produce the final joined result:

name   order_id
张三   001
张三   002
李四   003

Thus Hive enables SQL‑driven MapReduce analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Data Warehouse Hive MapReduce

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.