What’s New in the Big Data Ecosystem? Hadoop 3.0 Alpha, Druid 0.9.2, Kudu 1.1 and More
This article summarizes the latest releases and feature updates in the big data ecosystem—including Hadoop 3.0 Alpha, Druid 0.9.2, Apache Kudu 1.1.0, HAWQ 2.1.0 enterprise—as well as a brief overview of Docker’s 2015‑2016 version history and its adoption status in China.
Hadoop 3.0.0 Alpha (September 3 2016)
Hadoop 3.0.0 Alpha is built for Java 8 (Java 7 reached end‑of‑life in April 2015) and is intended for testing only, not for production use.
Key changes compared with Hadoop 2.7.0:
Minimum Java version raised to 8; all Hadoop JARs are compiled for Java 8.
HDFS adds Erasure Coding (EC) to reduce storage overhead compared with triple‑replication. EC saves space but incurs extra CPU and network cost during reconstruction.
YARN Timeline Service v2 provides a more stable and scalable job‑timeline tracking service, improving streaming and aggregation support.
Shell scripts have been rewritten to fix long‑standing bugs. Incompatible changes are documented in https://issues.apache.org/jira/browse/HADOOP-9902.
MapReduce local optimization: a C/C++ collector implementation can improve shuffle‑heavy jobs by >30 % (see https://issues.apache.org/jira/browse/MAPREDUCE-2841).
Support for more than two NameNodes, allowing multiple standby nodes for higher fault tolerance.
Default service ports moved out of the Linux ephemeral range to avoid startup conflicts (see https://issues.apache.org/jira/browse/HDFS-9427 and https://issues.apache.org/jira/browse/HADOOP-12811).
Microsoft Azure Data Lake Filesystem connector added as an alternative default filesystem.
Intra‑DataNode balancer for disk‑level rebalancing (see HDFS commands documentation).
Heap management for daemons and tasks redesigned; HADOOP_HEAPSIZE deprecated (see https://issues.apache.org/jira/browse/HADOOP-10950 and https://issues.apache.org/jira/browse/MAPREDUCE-5785).
Official release notes: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-common/release/3.0.0-alpha1/RELEASENOTES.3.0.0-alpha1.html
Druid 0.9.2 (December 1 2016)
Druid 0.9.2 introduces a rewritten Group‑By engine, optional roll‑up disabling, enhanced integer column filtering, new long‑type encoding options, and performance improvements for DataSketch, HyperUnique, and query caching.
New Group‑By engine (optional memory‑spill to disk) delivers 2–5× speedup; disabled by default. Configuration details: http://druid.io/docs/0.9.2/querying/groupbyquery.html#implementation-details
Roll‑up can be turned off during ingestion to retain raw data while still supporting aggregations. Configuration: http://druid.io/docs/0.9.2/ingestion/index.html
Advanced filters for integer and __time columns, useful for retention analysis.
Long column encoding choices: auto (auto‑detect), longs (fixed 64‑bit), none (no compression).
DataSketch performance up to 80 % improvement; HyperUnique 19‑30 % faster.
Apache Kudu 1.1.0 (November 15 2016)
Kudu is a columnar storage system designed for fast analytics on rapidly changing data, complementing HDFS and HBase.
Python client parity with C++ and Java clients.
Support for list predicates.
Java client now includes client‑side tracing.
Spark 2.0 jars compiled for Scala 2.11 are provided.
Raft‑based leader election improves stability.
More details: http://kudu.apache.org/2016/11/15/weekly-update.html
Apache HAWQ 2.1.0 Enterprise (December 2016)
HAWQ 2.1.0 integrates with YARN for dynamic resource management and adds elastic query execution, a new scheduler, block‑level storage optimizations, HDFS metadata caching, PXF enhancements, and a unified hawq management command.
YARN‑based resource allocation with multi‑level queues.
Virtual‑segment based elastic query execution; queries are dynamically assigned to a subset of cluster nodes.
Dynamic cluster scaling without redistributing tables.
Block‑level storage optimizations for AO and Parquet tables, enabling faster parallel reads.
Single‑directory table storage on HDFS for easier data exchange.
PXF now integrates with HCatalog and supports predicate and projection push‑down for Hive/ORC tables.
HAWQ Register allows direct registration of external Parquet files.
GPORCA optimizer upgraded with new features and bug fixes.
Heartbeat‑driven fault‑tolerance service automatically detects and removes failed nodes.
Support for HDP 2.5, Ambari 2.2 plugins, and automatic Kerberos configuration.
Docker development overview (up to 2016)
Docker has been open‑source since 2013. In 2015 the project released major versions V1.5 through V1.9; in 2016 it released V1.10, V1.11, V1.12, and V1.13.
The most significant change in V1.12 was the integration of SwarmKit into the Docker engine, replacing the separate Swarm project.
Official release list: https://github.com/docker/docker/releases
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
