Big Data 20 min read

How to Upgrade Hive from 0.13 to 2.1 Without Downtime: Tips, Pitfalls, and Best Practices

This article walks through a gray‑scale, controlled upgrade of Hive from version 0.13 to 2.1, covering metadata schema analysis, syntax compatibility, new Hive‑2.1 features, UDF adjustments, performance tweaks, and a step‑by‑step procedure to ensure stability and zero service interruption.

dbaplus Community

Mar 20, 2018

How to Upgrade Hive from 0.13 to 2.1 Without Downtime: Tips, Pitfalls, and Best Practices

Background

Hive is the most widely used SQL engine on big‑data platforms, providing a SQL abstraction layer that translates queries into distributed jobs on engines such as MapReduce, Spark, or Tez. Its metadata model has become a de‑facto standard, and most modern SQL engines support Hive‑compatible metastore schemas.

Upgrade Motivation

Our team performed a cross‑version upgrade from Hive 0.13 to Hive 2.1 in a production environment handling more than 80 000 daily SQL queries, mixed workloads (ETL on Spark, ad‑hoc queries on Presto/Kylin), and a variety of custom SQL syntax.

Key Upgrade Goals

Gray‑scale rollout with full control.

Maintain service availability throughout the upgrade.

Ensure stability and compatibility of existing workloads.

1. Metadata Schema Upgrade

Upgrading only the Hive binaries without a careful analysis of the metastore schema can cause obscure errors. The 0.13‑to‑2.1 upgrade scripts reside in ${HIVE_HOME}/metastore/scripts/upgrade/${DB_TYPE}/ and contain 17 sub‑scripts. They are classified as follows:

Recommended upgrade (hive‑13076) : adds table‑level primary‑key and foreign‑key support. Low risk; apply before the gray‑scale client rollout.

Transaction‑related scripts : only needed if Hive transactions are enabled.

Hcatalog changes : required only when Hcatalog is used.

Other non‑critical changes : indexes, column length adjustments, etc.

For most environments that do not use Hive transactions, Hcatalog, or Avro export, only hive‑13076 needs to be applied.

2. Syntax Compatibility

Hive 2.1 enforces stricter ANSI‑SQL compliance, introducing new reserved words and changing return types of some functions. Common issues and fixes include:

New reserved words – disable checking with set hive.support.sql11.reserved.keywords=false.

New keywords (e.g., OFFSET, SUBQUERY, PRIMARY) – wrap conflicting identifiers in backticks.

Case‑sensitive configuration parameters – ensure correct case or disable strict checking.

Missing column types in CREATE TABLE – always specify explicit types.

Unnamed derived columns – give meaningful aliases that do not start with an underscore.

Window frame boundaries – avoid zero‑preceding values; use current row instead.

Union‑all queries – only the final SELECT may contain ORDER BY, CLUSTER BY, DISTRIBUTE BY, SORT BY, or LIMIT.

Example of a failing CREATE statement and its fix:

create table test_table_null_no_type as select null as id, "zhangsan" as name;

Fix:

create table test_table_fixed (id int, name string) as select null as id, "zhangsan" as name;

3. Pre‑upgrade Compatibility Testing

Before the actual upgrade, spin up a Hive 2.1 HiveServer2 instance and run EXPLAIN on a representative set of production SQLs to catch syntax errors early. Collect workloads via custom client hooks or by replaying MapReduce/Spark event logs.

4. New Features in Hive 2.1

ORC & Parquet support : ORC is preferred for its vectorized execution and better integration with Hive/Presto. Compression choices: zlib for large, infrequently accessed ODS tables; snappy for hot DM tables.

Cost‑Based Optimizer (CBO) : still experimental; works best with primary‑key/foreign‑key metadata.

Vectorized execution : currently only for ORC files.

Hive on Spark / Tez : Spark is the primary acceleration path; Tez adoption remains low.

5. UDF Compatibility

Some built‑in UDFs changed behavior between versions: trim no longer accepts non‑string types – copy the 0.13 implementation if needed. date_add and date_sub now return date instead of string. Roll back to 0.13 code or adjust queries accordingly.

6. Known Issues & Work‑arounds

Metastore schema version mismatch – set hive.metastore.schema.verification=false or disable the version check entirely.

Metastore server memory leaks (BoneCP+DirectSQL) – apply patch from HIVE‑15551 or upgrade to 2.2.

HiveServer2 impersonation changes – configure proxy users on the NameNode or revert to the old CreateRemoteUser mechanism.

Operation log not sent to client – set hive.async.log.enabled=false (see HIVE‑14183).

PermGen OOM – add export HADOOP_OPTS="-XX:MaxPermSize=128m" to hive-env.sh.

Parallel compilation disabled by default – set hive.driver.parallel.compilation=true to allow concurrent query compilation.

Stats collection now HDFS‑based and single‑threaded – consider disabling the task or reverting to DB‑based stats.

Column pruning may reorder columns causing ArrayIndexOutOfBoundsException – limit queries or apply HIVE‑14564 fix.

7. Upgrade Procedure

Backup the metastore schema.

Batch‑validate production SQLs against a Hive 2.1 instance and resolve compatibility issues.

Apply only the necessary metadata scripts (e.g., hive‑13076).

Patch custom UDFs and code paths for known incompatibilities.

Gray‑scale upgrade Hive 2.1 client, then HiveServer2, finally Metastore Server.

After client stability, optionally upgrade the remaining metadata to 2.1.1.

Upgrade JDBC drivers if required.

8. Future Plans

We intend to migrate most MapReduce‑based workloads to SparkSQL, keep Hive as a fallback, and build a unified SQL routing layer that combines Kylin, Presto, SparkSQL, and Hive. Data‑caching will leverage hot‑table snapshots and automatic Kylin cube creation. Storage will gradually adopt CarbonData while maintaining Hive‑compatible metadata.

References

Hive reserved‑word list: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords

UDF date‑function documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Issue trackers: HIVE‑14183, HIVE‑15551, HIVE‑16877, HIVE‑17063, HIVE‑17309

Metadata upgrade details (gist): https://gist.github.com/ericsahit/9622411305e42bf6735197fdec612794

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Big Data Metadata Hive Upgrade SQL Compatibility

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Upgrade Motivation

Key Upgrade Goals

1. Metadata Schema Upgrade

2. Syntax Compatibility

3. Pre‑upgrade Compatibility Testing

4. New Features in Hive 2.1

5. UDF Compatibility

6. Known Issues & Work‑arounds

7. Upgrade Procedure

8. Future Plans

References

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

4. New Features in Hive 2.1