Configuring Hadoop to Support LZO Compression
This guide explains how to enable LZO compression in Hadoop by installing the twitter‑provided hadoop‑lzo library, updating core‑site.xml, synchronizing files across nodes, creating LZO indexes, and running a WordCount MapReduce job with LZO‑compressed output.
Hadoop does not natively support LZO compression, so the twitter‑provided hadoop‑lzo component must be installed and compiled against Hadoop and the LZO library.
1. Place the compiled hadoop‑lzo-0.4.20.jar into /opt/module/hadoop-2.7.2/share/hadoop/common/ on the master node:
[atguigu@hadoop102 common]$ pwd</code>
<code>/opt/module/hadoop-2.7.2/share/hadoop/common</code>
<code>[atguigu@hadoop102 common]$ ls</code>
<code>hadoop-lzo-0.4.20.jar2. Synchronize the JAR to the other Hadoop nodes (hadoop103, hadoop104) using the xsync command:
[atguigu@hadoop102 common]$ xsync hadoop-lzo-0.4.20.jar3. Add LZO support to core-site.xml by inserting the following properties:
<?xml version="1.0" encoding="UTF-8"?></code>
<code><configuration></code>
<code> <property></code>
<code> <name>io.compression.codecs</name></code>
<code> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value></code>
<code> </property></code>
<code> <property></code>
<code> <name>io.compression.codec.lzo.class</name></code>
<code> <value>com.hadoop.compression.lzo.LzoCodec</value></code>
<code> </property></code>
<code></configuration>4. Synchronize the updated core-site.xml to the other nodes: [atguigu@hadoop102 hadoop]$ xsync core-site.xml 5. Start the Hadoop cluster:
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh</code>
<code>[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh6. Because LZO files are splittable only when an index file exists, create an index for each LZO output file using the DistributedLzoIndexer:
[atguigu@hadoop202 bin]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /output/part-r-00000.lzo7. Example: run the built‑in WordCount job with LZO compression.
a) Create a small word.txt file and upload it to HDFS:
[atguigu@hadoop202 bin]$ vim word.txt</code>
<code>[atguigu@hadoop202 bin]$ hdfs dfs -put word.txt /inputb) Execute the WordCount job with LZO output compression:
[atguigu@hadoop202 bin]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec /input /outputThe job produces LZO‑compressed part files. After indexing, the files appear with accompanying .index files, enabling Hadoop to split them for parallel processing.
© This article is compiled by the "Big Data Technology and Architecture" team; redistribution without permission is prohibited.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
