Big Data 8 min read

Hive Performance Tuning: Parallel Execution, Strict Mode, JVM Reuse, and Speculative Execution

This article explains Hive performance tuning techniques, including enabling parallel execution, configuring strict mode to prevent risky queries, reusing JVMs to reduce overhead, and using speculative execution to mitigate slow tasks, with configuration examples and practical considerations.

Big Data Technology & Architecture

Nov 1, 2020

Below is a review of Hive performance tuning techniques, covering parallel execution, strict mode, JVM reuse, and speculative execution.

Parallel Execution

Enable task parallelism with the following settings:

set hive.exec.parallel=true;              //打开任务并行执行
set hive.exec.parallel.thread.number=16;  //同一个sql允许最大并行度，默认为8。

This is beneficial only when system resources are idle; otherwise parallelism provides little advantage.

Strict Mode

Hive provides a strict mode to prevent execution of high‑risk queries. Set hive.mapred.mode to strict to enable it.

<property>
    <name>hive.mapred.mode</name>
    <value>strict</value>
    <description>
      The mode in which the Hive operations are being performed.
      In strict mode, some risky queries are not allowed to run. They include:
        Cartesian Product.
        No partition being picked up for a query.
        Comparing bigints and strings.
        Comparing bigints and doubles.
        Orderby without limit.
    </description>
  </property>

For partitioned tables, scanning all partitions is prohibited unless the WHERE clause contains a partition filter.

Queries using order by must include a limit clause to avoid long‑running reducers.

Cartesian product queries are disallowed because Hive cannot optimize them like relational databases.

JVM Reuse

Reusing JVM instances reduces the overhead of launching a new JVM for each map or reduce task, which is especially useful for jobs with many short‑lived tasks.

<property>
  <name>mapreduce.job.jvm.numtasks</name>
  <value>10</value>
  <description>How many tasks to run per jvm. If set to -1, there is
  no limit.
  </description>
</property>

In Hive you can set:

set mapred.job.reuse.jvm.num.tasks=10;

Note that JVM reuse occupies task slots for the duration of the job, which may lead to idle slots if some reducers run much longer than others.

Speculative Execution

Speculative execution launches duplicate tasks for slow‑running map or reduce tasks, using the result of the task that finishes first.

Enable it in Hadoop’s mapred-site.xml:

<property>
  <name>mapreduce.map.speculative</name>
  <value>true</value>
  <description>If true, then multiple instances of some map tasks 
               may be executed in parallel.</description>
</property>

<property>
  <name>mapreduce.reduce.speculative</name>
  <value>true</value>
  <description>If true, then multiple instances of some reduce tasks 
               may be executed in parallel.</description>
</property>

Hive also provides its own setting:

<property>
    <name>hive.mapred.reduce.tasks.speculative.execution</name>
    <value>true</value>
    <description>Whether speculative execution for reducers should be turned on. </description>
  </property>

Whether to enable speculative execution depends on workload characteristics; it can be disabled for latency‑sensitive jobs or enabled for large‑scale jobs where task stragglers would otherwise delay completion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Performance Tuning Hive parallel execution Speculative Execution JVM Reuse

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.