Build a Hadoop Cluster with Docker: Step‑by‑Step Guide
Learn how to quickly set up a multi‑node Hadoop cluster on a single machine using Docker containers, covering image preparation, SSH configuration, fixed IP assignment with pipework, and building custom Hadoop images, enabling a lightweight, cost‑effective big‑data environment for development and testing.
After writing an article on Hadoop cluster setup, a friend suggested using Docker for deployment. Docker simplifies creating a learning environment on a personal computer without needing multiple virtual machines.
Setting up a cluster traditionally requires several servers, which is a barrier for individuals. Using Docker, you can download a CentOS image, run multiple containers that act like lightweight virtual machines, and assign each an IP address for SSH access.
Install Docker.
Obtain a CentOS image.
Install SSH.
Configure container IP addresses.
Install Java and Hadoop.
Configure Hadoop.
The first step is straightforward: download Docker from the official site. Steps 5 and 6 are the same as on a physical server, so the guide focuses on steps 2‑4.
Get the centos7 image
$ docker pull centosThe image is about 70 MB; using a Docker accelerator like Alibaba Cloud speeds up the download. List images with:
$ docker imagesInstall SSH
Create a Dockerfile based on the centos7 image to add SSH support:
FROM centos
MAINTAINER dys
RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN yum install -y openssh-clients
RUN echo "root:111111" | chpasswd
RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN mkdir /var/run/sshd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]This Dockerfile installs SSH packages, sets the root password to 111111, and starts the SSH daemon.
Build the image and name it centos7-ssh: $ docker build -t="centos7-ssh" . Verify the new image appears in the list:
$ docker imagesSet a Fixed IP
Use pipework to assign IP addresses to containers.
$ git clone https://github.com/jpetazzo/pipework.git
$ cp pipework/pipework /usr/local/bin/Install bridge-utils: $ yum -y install bridge-utils Create a bridge network:
$ brctl addbr br1
$ ip link set dev br1 up
$ ip addr add 192.168.3.1/24 dev br1Run a container from the centos7-ssh image: $ docker run -d --name=centos7.ssh centos7-ssh Assign it an IP: $ pipework br1 centos7.ssh 192.168.3.20/24 Verify connectivity with ping and ssh:
$ ping 192.168.3.20
$ ssh 192.168.3.20Repeat to create two more containers, giving them IPs 192.168.3.22 and 192.168.3.23, resulting in three SSH‑accessible containers that act as three servers.
Build a Hadoop Image
Create another Dockerfile based on centos7-ssh to add Java and Hadoop:
FROM centos7-ssh
ADD jdk-8u101-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_101 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH
ADD hadoop-2.7.3.tar.gz /usr/local
RUN mv /usr/local/hadoop-2.7.3 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH
RUN yum install -y which sudoPlace the JDK and Hadoop tarballs in the Dockerfile directory, then build the image named hadoop: $ docker build -t="hadoop" . Run three containers from this image, naming them hadoop0, hadoop1, and hadoop2, with hadoop0 as the master and exposing ports 50070 and 8088 for the web UI:
$ docker run --name hadoop0 --hostname hadoop0 -d -P -p 50070:50070 -p 8088:8088 hadoop
$ docker run --name hadoop1 --hostname hadoop1 -d -P hadoop
$ docker run --name hadoop2 --hostname hadoop2 -d -P hadoopAssign fixed IPs to the Hadoop containers:
$ pipework br1 hadoop0 192.168.3.30/24
$ pipework br1 hadoop1 192.168.3.31/24
$ pipework br1 hadoop2 192.168.3.32/24Configure the Hadoop Cluster
Open three terminal windows and attach to each container:
$ docker exec -it hadoop0 /bin/bash
$ docker exec -it hadoop1 /bin/bash
$ docker exec -it hadoop2 /bin/bashIn each container, edit /etc/hosts to add:
192.168.3.30 master
192.168.3.31 slave1
192.168.3.32 slave2Proceed with password‑less SSH setup and Hadoop configuration files as described in the original Hadoop cluster tutorial.
Thus, the Hadoop cluster is successfully built using Docker containers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
