Big Data 7 min read

Integrating SparkSQL with Hive: Configuration, MetaStore Setup, and Example Scala Code

This article explains the differences between Spark on Hive and Hive on Spark, then provides step‑by‑step instructions for configuring Hive MetaStore, setting up SparkSQL to use Hive, and demonstrates a complete Scala program that creates a Hive table, loads data, and queries it.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Integrating SparkSQL with Hive: Configuration, MetaStore Setup, and Example Scala Code

This article explains the differences between Spark on Hive and Hive on Spark, then provides step‑by‑step instructions for enabling Hive MetaStore, configuring SparkSQL to use Hive, and demonstrates a complete Scala example that creates a Hive table, loads data, and queries it.

Differences between Spark on Hive and Hive on Spark

Spark on Hive uses Spark‑SQL to execute Hive statements while still running on Spark RDDs. Hive on Spark replaces the traditional MapReduce engine with Spark RDDs, requiring recompilation and additional JARs.

Prerequisites

Refer to the official Apache Spark documentation for SQL data sources with Hive tables: http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html .

Configuration of Hive

Modify hive/conf/hive-site.xml to set the warehouse directory, disable local mode, and specify MetaStore URIs.

<?xml version="1.0"?>
<configuration>
    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
    </property>
    <property>
      <name>hive.metastore.local</name>
      <value>false</value>
    </property>
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://node01:9083</value>
    </property>
</configuration>

Start the Hive MetaStore service:

nohup /export/servers/hive/bin/hive --service metastore 2>&1 >> /var/log.log &

SparkSQL integration with Hive MetaStore

Copy the Hive and Hadoop configuration files into Spark’s configuration directory so Spark can access the MetaStore and HDFS warehouse.

cp /export/servers/hive-1.1.0-cdh5.14.0/conf/hive-site.xml /export/servers/spark/conf
cp /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop/core-site.xml /export/servers/spark/conf
cp /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop/hdfs-site.xml /export/servers/spark/conf

Tip: when testing locally in IDEA, place these files in the resources directory.

Example Scala program

import org.apache.spark.sql.SparkSession

object HiveSupport {
  def main(args: Array[String]): Unit = {
    // Create SparkSession with Hive support
    val spark = SparkSession.builder()
      .appName("HiveSupport")
      .master("local[*]")
      .config("spark.sql.warehouse.dir", "hdfs://node01:8020/user/hive/warehouse")
      .config("hive.metastore.uris", "thrift://node01:9083")
      .enableHiveSupport() // enable Hive syntax support
      .getOrCreate()

    spark.sparkContext.setLogLevel("WARN")

    // Show existing tables
    spark.sql("show tables").show()

    // Create a new table
    spark.sql("CREATE TABLE person (id int, name string, age int) row format delimited fields terminated by ' '")

    // Load data from a local file
    spark.sql("LOAD DATA LOCAL INPATH 'in/person.txt' INTO TABLE person")

    // Query the table
    spark.sql("select * from person").show()

    spark.stop()
  }
}

Before running the program, check the existing tables in the Hive shell (e.g., show tables;). After execution, the new person table appears in Hive, and its contents can be verified both via SparkSQL output and Hive CLI.

The article concludes with a friendly reminder to like, bookmark, and share the post.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataSparkSQLHiveData IntegrationScalaMetaStore
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.