Deploy Hadoop on Kubernetes with Helm: A Complete Step‑by‑Step Guide
This tutorial walks you through deploying Hadoop 3.x on a Kubernetes cluster using Helm, covering repository setup, Docker image creation, Helm chart customization, service configuration, installation, verification, and clean‑up, with all necessary commands and YAML snippets.
Overview
Hadoop is an Apache open‑source distributed computing platform built around HDFS (Hadoop Distributed File System) and MapReduce; Hadoop 2.0 introduced YARN as a fine‑grained resource scheduler that can also run other frameworks such as Spark. Its high fault tolerance, scalability and efficiency allow deployment on inexpensive hardware, and the current stable release is 3.x.
Start Deployment
1) Add Helm Repository
helm repo add apache-hadoop-helm https://pfisterer.github.io/apache-hadoop-helm/
helm pull apache-hadoop-helm/hadoop --version 1.2.0
tar -xf hadoop-1.2.0.tgz2) Build Docker Image
FROM myharbor.com/bigdata/centos:7.9.2009
RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
RUN export LANG=zh_CN.UTF-8
# Create user and group for securityContext.runAsUser: 9999
RUN groupadd --system --gid=9999 admin && useradd --system --home-dir /home/admin --uid=9999 --gid=admin admin
# Install sudo and grant permissions
RUN yum -y install sudo ; chmod 640 /etc/sudoers
RUN echo "admin ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
RUN yum -y install net-tools telnet wget
RUN mkdir /opt/apache/
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME=/opt/apache/jdk1.8.0_212
ENV PATH=$JAVA_HOME/bin:$PATH
ENV HADOOP_VERSION 3.3.2
ENV HADOOP_HOME=/opt/apache/hadoop
ENV HADOOP_COMMON_HOME=${HADOOP_HOME} \
HADOOP_HDFS_HOME=${HADOOP_HOME} \
HADOOP_MAPRED_HOME=${HADOOP_HOME} \
HADOOP_YARN_HOME=${HADOOP_HOME} \
HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop \
PATH=${PATH}:${HADOOP_HOME}/bin
ADD hadoop-${HADOOP_VERSION}.tar.gz /opt/apache
RUN ln -s /opt/apache/hadoop-${HADOOP_VERSION} ${HADOOP_HOME}
RUN chown -R admin:admin /opt/apache
WORKDIR $HADOOP_HOME
# Expose ports
EXPOSE 50010 50020 50070 50075 50090 8020 9000
EXPOSE 19888
EXPOSE 8030 8031 8032 8033 8040 8042 8088
EXPOSE 49707 21223) Build Image
docker build -t myharbor.com/bigdata/hadoop:3.3.2 . --no-cache
# -t: image name, .: Dockerfile directory, --no-cache: do not use cache4) Push Image
docker push myharbor.com/bigdata/hadoop:3.3.25) Adjust Directory Structure
mkdir hadoop/templates/hdfs/hdfs-nn-pv.yaml hadoop/templates/hdfs/hdfs-dn-pv.yaml
mv hadoop/templates/hdfs/hdfs-nn-pv.yaml hadoop/templates/hdfs/hdfs-nn-pv.yaml
mv hadoop/templates/hdfs/hdfs-dn-pv.yaml hadoop/templates/hdfs/hdfs-dn-pv.yaml6) Modify Configuration
hadoop/values.yaml– set image repository, tag, pullPolicy, persistence for NameNode and DataNode, service ports, securityContext (runAsUser, privileged). hadoop/templates/hdfs/hdfs-nn-pv.yaml – PersistentVolume definition for NameNode. hadoop/templates/hdfs/hdfs-dn-pv.yaml – PersistentVolume definition for DataNode. hadoop/templates/hdfs/hdfs-nn-svc.yaml – Headless Service for NameNode. hadoop/templates/hdfs/hdfs-dn-svc.yaml – Headless Service for DataNode. hadoop/templates/yarn/yarn-rm-svc.yaml – Service for YARN ResourceManager UI.
Update controllers to include securityContext.runAsUser and securityContext.privileged.
Adjust hadoop/templates/hadoop-configmap.yaml – replace /root with /opt/apache and set TMP_URL for YARN UI.
Installation
# Create storage directories
mkdir -p /opt/bigdata/servers/hadoop/{nn,dn}/data/data{1..3}
# Install chart
helm install hadoop ./hadoop -n hadoop --create-namespacePost‑Installation Notes
NAME: hadoop
LAST DEPLOYED: Sat Sep 24 17:00:55 2022
NAMESPACE: hadoop
STATUS: deployed
# Check HDFS status
kubectl exec -n hadoop -it hadoop-hadoop-hdfs-nn-0 -- /opt/hadoop/bin/hdfs dfsadmin -report
# List YARN nodes
kubectl exec -n hadoop -it hadoop-hadoop-yarn-rm-0 -- /opt/hadoop/bin/yarn node -list
# Port‑forward YARN ResourceManager UI
kubectl port-forward -n hadoop hadoop-hadoop-yarn-rm-0 8088:8088
# Then open http://localhost:8088 in a browser
# Run Hadoop test (TestDFSIO)
kubectl exec -n hadoop -it hadoop-hadoop-yarn-nm-0 -- /opt/hadoop/bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.2-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 128MB -resFile /tmp/TestDFSIOwrite.txt
# List MapReduce jobs
kubectl exec -n hadoop -it hadoop-hadoop-yarn-rm-0 -- /opt/hadoop/bin/mapred job -list
# Use with Zeppelin chart
helm install --namespace hadoop --set hadoop.useConfigMap=true,hadoop.configMapName=hadoop-hadoop stable/zeppelin
# Scale Yarn NodeManagers
helm upgrade hadoop --set yarn.nodeManager.replicas=4 stable/hadoopAccess Web UIs
HDFS web UI: http://192.168.182.110:30870/
YARN web UI: http://192.168.182.110:30088/
HDFS Test Verification
kubectl exec -it hadoop-hadoop-hdfs-nn-0 -n hadoop -- bash
hdfs dfs -mkdir /tmp
hdfs dfs -ls /
hdfs dfs -put test.txt /tmp/
hdfs dfs -cat /tmp/test.txtUninstall
helm uninstall hadoop -n hadoop
kubectl delete pod -n hadoop $(kubectl get pod -n hadoop | awk 'NR>1{print $1}') --force
kubectl patch ns hadoop -p '{"metadata":{"finalizers":null}}'
kubectl delete ns hadoop --forceThe Helm chart source code is available at https://gitee.com/hadoop-bigdata/hadoop-on-k8s . This single‑node deployment is intended for testing; a future article will cover high‑availability Hadoop on Kubernetes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
