Big Data 21 min read

Hive SQL Table Creation, Data Loading, and Query Examples for Student, Course, Teacher, and Score Datasets

This article demonstrates how to create Hive tables for student, course, teacher, and score data, generate CSV files, load them into Hive, and provides a comprehensive set of Hive SQL queries covering data retrieval, aggregation, ranking, and statistical analysis for educational datasets.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Hive SQL Table Creation, Data Loading, and Query Examples for Student, Course, Teacher, and Score Datasets

This guide shows how to define four Hive tables— student, course, teacher, and score —with string and integer columns, using a tab‑delimited row format.

create table student(s_id string,s_name string,s_birth string,s_sex string) row format delimited fields terminated by '\t';
create table course(c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';
create table teacher(t_id string,t_name string) row format delimited fields terminated by '\t';
create table score(s_id string,c_id string,s_score int) row format delimited fields terminated by '\t';

Sample CSV data for each table is created with vi commands (e.g., /export/data/hivedatas/student.csv) and contains records such as student IDs, names, birth dates, and genders.

01 赵雷 1990-01-01 男
02 钱电 1990-12-21 男
03 孙风 1990-05-20 男
04 李云 1990-08-06 男
05 周梅 1991-12-01 女
06 吴兰 1992-03-01 女
07 郑竹 1989-07-01 女
08 王菊 1990-01-20 女

Data files for course, teacher, and score are similarly prepared.

01 语文 02
02 数学 01
03 英语 03
01 张三
02 李四
03 王五
01 01 80
01 02 90
01 03 99
02 01 70
02 02 60
02 03 80
03 01 80
03 02 80
03 03 80
04 01 50
04 02 30
04 03 20
05 01 76
05 02 87
06 01 31
06 03 34
07 02 89
07 03 98

These CSV files are loaded into Hive tables with LOAD DATA statements.

load data local inpath '/export/data/hivedatas/student.csv' into table student;
load data local inpath '/export/data/hivedatas/course.csv' into table course;
load data local inpath '/export/data/hivedatas/teacher.csv' into table teacher;
load data local inpath '/export/data/hivedatas/score.csv' into table score;

A brief reminder of Hive SELECT syntax is provided.

SELECT [ALL | DISTINCT] select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list [HAVING condition]] [CLUSTER BY col_list | DISTRIBUTE BY col_list] [SORT BY col_list] [LIMIT number];

Following this, the article lists more than forty practical Hive queries (labeled 1‑50) that illustrate common analytical tasks, such as:

Finding students whose scores in course "01" are higher than those in course "02".

Calculating average scores per student and filtering by a threshold.

Counting students per gender, per age group, or per birth month.

Ranking students by total or average score using row_number() (where supported).

Identifying students who have taken all courses, or who have missed specific courses.

Aggregating statistics per course (max, min, average, pass rate, etc.).

Finding teachers with a specific surname, or students taught by a particular teacher.

Generating top‑N lists per course and overall.

Each query is presented in a SELECT ... block, often with two alternative versions (e.g., using JOIN vs. comma‑separated tables). The examples cover joins, subqueries, group by, having, union, left join, case expressions, and window functions.

-- Example 1: Students with higher score in course '01' than in '02'
select student.*, a.s_score as 01_score, b.s_score as 02_score
from student
join score a on student.s_id = a.s_id and a.c_id = '01'
left join score b on student.s_id = b.s_id and b.c_id = '02'
where a.s_score > b.s_score;
-- Example 15: Students with at least two failing courses (score < 60)
select student.s_id, student.s_name, tmp.avg_score
from student
inner join (
  select s_id from score where s_score < 60 group by s_id having count(s_id) > 1
) tmp2 on student.s_id = tmp2.s_id
left join (
  select s_id, round(avg(s_score)) avg_score from score group by s_id
) tmp on tmp.s_id = student.s_id;

Later sections also show how to compute age from birth date, find birthdays within the current week/month, and calculate ranking statistics.

select s_name, s_birth,
  (year(current_date) - year(s_birth) -
    case when month(current_date) < month(s_birth) then 1
         when month(current_date) = month(s_birth) and day(current_date) < day(s_birth) then 1
         else 0 end) as age
from student;

The article concludes with a friendly reminder to like, share, and bookmark the post.

Overall, the document serves as a practical reference for Hive users needing to create tables, import data, and perform a wide range of analytical queries on educational datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataSQLData WarehouseHiveQuery Examples
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.