Sqoop Tutorial: Importing and Exporting Data between Relational Databases, HDFS, Hive, and HBase
This article provides a comprehensive guide to using Sqoop for importing data from relational databases into HDFS, Hive, and HBase, as well as exporting data back to databases, covering command syntax, options, and practical examples for big‑data workflows.
Sqoop is a tool that translates user‑written commands into MapReduce jobs to transfer data between relational databases and Hadoop components such as HDFS, Hive, and HBase.
Import to HDFS – examples of full‑table import, column‑specific import, and filtered import using --where or custom --query statements.
bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/test \
--username root \
--password 123 \
--table t_emp \
--target-dir /sqoopTest \
--delete-target-dir \
--fields-terminated-by "\t" \
--num-mappers 2 \
--split-by id \
--columns id,name,ageImport to Hive – use --hive-import and --hive-overwrite to load data directly into a Hive table, with notes on table creation and field delimiters.
bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/test \
--username root \
--password 123 \
--query 'select * from t_emp where id>3 and $CONDITIONS' \
--target-dir /sqoopTest \
--fields-terminated-by "\t" \
--delete-target-dir \
--hive-import \
--hive-overwrite \
--hive-table t_emp \
--num-mappers 1 \
--split-by idImport to HBase – enable --hbase-create-table, specify --hbase-table, --hbase-row-key, and column family with --column-family. Compatibility issues may require manual table creation.
bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/test \
--username root \
--password 123 \
--query 'select * from t_emp where id>3 and $CONDITIONS' \
--target-dir /sqoopTest \
--delete-target-dir \
--hbase-create-table \
--hbase-table "t_emp" \
--hbase-row-key "id" \
--column-family "info" \
--num-mappers 2 \
--split-by idExport from HDFS to a relational database – basic export command, handling empty vs. non‑empty target tables, and conflict resolution using --update-key with updateonly or allowinsert modes.
bin/sqoop export \
--connect 'jdbc:mysql://hadoop103:3306/mydb?useUnicode=true&characterEncoding=utf-8' \
--username root \
--password 123456 \
--table t_emp2 \
--export-dir /hive/t_emp \
--input-fields-terminated-by "\t" \
--update-key id \
--update-mode allowinsertAdditional tips include using --null-string and --null-non-string for proper NULL handling, compressing output with --compress, and troubleshooting steps such as checking MySQL binlogs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
