How to Set Up Hadoop Java Development on Windows and Access HDFS via Java API
This guide walks through installing Hadoop on Windows, configuring environment variables and XML files, adding the required winutils binaries, verifying the setup with HDFS shell commands, and then building a Maven project that uses the Java API to list and inspect files in HDFS.
Install Hadoop on Windows using the same version as the CentOS cluster (2.8.0). Download the tarball, extract it with 7‑zip (first to .tar, then to a directory), and move the resulting folder to a path without Chinese characters or spaces.
Download the matching hadoop.dll and winutils.exe (e.g., from https://github.com/cdarlint/winutils) and copy them into the Hadoop bin directory.
Add the Hadoop bin and sbin directories to the system PATH environment variable.
Configuration files
hadoop-env.cmd : set JAVA_HOME to the JDK path, e.g., set JAVA_HOME=C:\PROGRA~1\Java\jdk1.8.0_202.
core-site.xml :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.148.128:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\SoftWare\hadoop-2.8.0\hdfs\tmp</value>
</property>
</configuration>hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>yarn-site.xml (replace all manager addresses with the master IP 192.168.148.128):
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.148.128:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.148.128:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.148.128:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.148.128:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.148.128:18088</value>
</property>
</configuration>Verify the installation in a Windows cmd prompt: hadoop version List the HDFS root directory:
hdfs dfs -ls /Maven project for Java API access
Create a Maven project in IntelliJ IDEA. The pom.xml must contain Hadoop dependencies matching the cluster version (2.8.0) and JUnit for testing:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>Create package com.badao.hdfsdemo and class hellohdfs with the following source:
package com.badao.hdfsdemo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.IOException;
public class hellohdfs {
public static void main(String[] args) throws IOException {
FileSystem fileSystem = getFileSystem();
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new Path("/"), true);
while (listFiles.hasNext()) {
LocatedFileStatus status = listFiles.next();
System.out.println(status.getPath().getName()); // file name
System.out.println(status.getLen()); // length
System.out.println(status.getPermission()); // permission
System.out.println(status.getOwner()); // owner
System.out.println(status.getGroup()); // group
System.out.println(status.getModificationTime()); // modification time
BlockLocation[] blockLocations = status.getBlockLocations();
for (BlockLocation blockLocation : blockLocations) {
String[] hosts = blockLocation.getHosts();
for (String host : hosts) {
System.out.println(host);
}
}
}
fileSystem.close();
}
/** Obtain the HDFS FileSystem instance. */
public static FileSystem getFileSystem() throws IOException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.148.128:9000");
configuration.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
System.setProperty("HADOOP_USER_NAME", "root");
return FileSystem.get(configuration);
}
}Key configuration lines in getFileSystem():
configuration.set("fs.defaultFS", "hdfs://192.168.148.128:9000")matches the address defined in core-site.xml.
configuration.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")binds the HDFS implementation and prevents the runtime error No FileSystem for scheme: hdfs. System.setProperty("HADOOP_USER_NAME", "root") avoids an AccessControlException caused by a missing user permission.
Running the main method prints the name, size, permissions, owner, group, modification timestamp, and the host nodes for each block of every file under the HDFS root, confirming that the Java API can successfully interact with the Hadoop cluster from a Windows workstation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
The Dominant Programmer
Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
