Big Data 7 min read

Spark Thrift Server: Introduction, Deployment Guide, Architecture, and Comparison with HiveServer2

This article introduces Spark Thrift Server, explains how to deploy it by copying configuration files and required JARs, details its architecture and SQL execution flow, compares it with HiveServer2, and discusses its advantages, limitations, and practical suitability.

Big Data Technology Architecture

Jul 7, 2020

Spark Thrift Server: Introduction, Deployment Guide, Architecture, and Comparison with HiveServer2

Spark Thrift Server is a Thrift service built by the Spark community on top of HiveServer2, aiming for seamless compatibility with HiveServer2 APIs and protocols, allowing users to access Spark via Hive's Beeline client.

Deployment steps: copy hive-site.xml, hdfs-site.xml, and core-site.xml into spark/conf; if the Hive Metastore version is not 1.2, set hive.metastore.schema.verification=false in hive-site.xml; copy required JARs into spark/jars (e.g., cp hive/lib/hive-shims* spark/jars and

cp hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.4.jar spark/jars

); finally start the server with

# ThriftServer's essence is to submit the server as a Spark job, so a queue must be specified
sbin/start-thriftserver.sh --hiveconf spark.yarn.queue=root.bigdata.date

Architecture: Spark Thrift Server reuses most of HiveServer2's code, keeping the same CLIService, OperationManager, and other request‑handling components. It implements its own SparkSQLCLIService, SparkSQLSessionManager, and SparkSQLOperationManager, plus a custom SparkExecuteStatementOperation that actually runs SQL statements.

SQL execution: The server runs as a Spark application (driver) submitted via spark-submit in client mode. When a SQL request arrives, it is delegated to SparkExecuteStatementOperation, which obtains a SQLContext and calls SQLContext.sql() to execute the statement, just like a regular Spark job.

Differences from HiveServer2: While the interfaces are identical, Spark Thrift Server runs as a single long‑lived Spark application, tying its resource usage to one application rather than leveraging dynamic cluster scheduling fully. It also inherits HiveServer2's code for metadata operations, table creation, etc.

Advantages:

Generally better performance than Hive on Spark.

Active SparkSQL community with frequent releases, leading to continual performance improvements.

Disadvantages:

Resource allocation is limited to the single application; without dynamic allocation it cannot fully utilize cluster resources, making it less suitable for enterprise workloads.

Official Spark JIRA (e.g., SPARK-11100) indicates limited enthusiasm from the Spark project for this component.

Built on Hive 1.2, so incompatibilities may arise with newer Metastore versions.

Conclusion: Spark Thrift Server is essentially a lightweight modification of HiveServer2 with modest code changes. Although it offers a familiar Hive interface, its architecture as a single Spark application makes it more appropriate for experimental or internal quick‑query scenarios rather than production‑grade enterprise deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture SQL deployment Spark Thrift Server HiveServer2

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.