Lab
Cloud

Deploy and Configure a Single-Node Hadoop Cluster

Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following: * Installing Java * Deploying Hadoop from an archive file * Configuring Hadoop's `JAVA_HOME` * Configuring the default filesystem for Hadoop * Configuring HDFS replication * Setting up passwordless SSH * Formatting the Hadoop Distributed File System (HDFS) * Starting Hadoop * Creating files and directories in Hadoop * Examining a text file with a MapReduce job

Get started Contact sales

Path Info

Level

Beginner

Duration

2h 0m

Published

Jan 14, 2019

Challenge

Install Java
Log into Node 1 as cloud_user and install the java-19-amazon-corretto-devel package:
```
sudo yum -y install java-19-amazon-corretto-devel
```
Challenge

Deploy Hadoop
From the cloud_user home directory, download Hadoop-3.3.5 from your desired mirror. You can view a list of mirrors here:
```
curl -O  https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
```
Unpack the archive in place:
```
tar -xzf hadoop-3.3.5.tar.gz
```
Delete the archive file:
```
rm hadoop-3.3.5.tar.gz
```
Rename the installation directory:
```
mv hadoop-3.3.5/ hadoop/
```
Challenge

Configure java_home
From /home/cloud_user/hadoop, set JAVA_HOME in etc/hadoop/hadoop-env.sh by changing the following line:
```
export JAVA_HOME=${JAVA_HOME}
```
Change it to this:
```
export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/
```
Save and close the file.
Challenge

Configure Core Hadoop
Set the default filesystem to hdfs on localhost in /home/cloud_user/hadoop/etc/hadoop/core-site.xml by changing the following lines:
```
<configuration>
</configuration>
```
Change them to this:
```
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>
```
Save and close the file.
Challenge

Configure HDFS
Set the default block replication to 1 in /home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml by changing the following lines:
```
<configuration>
</configuration>
```
Change them to this:
```
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>
```
Save and close the file.
Challenge

Set Up Passwordless SSH Access to localhost
As cloud_user, generate a public/private RSA key pair with:
```
ssh-keygen
```
The default option for each prompt will suffice.

Add your newly generated public key to your authorized keys list with:
```
cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys
```
Challenge

Format the Filesystem
From /home/cloud_user/hadoop/, format the DFS with:
```
bin/hdfs namenode -format
```
Challenge

Start Hadoop
Start the NameNode and DataNode daemons from /home/cloud_user/hadoop with:
```
sbin/start-dfs.sh
```
Challenge

Download and Copy the Latin Text to Hadoop
From /home/cloud_user/hadoop, download the latin.txt file with:
```
curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt
```
From /home/cloud_user/hadoop, create the /user and /user/root directories in Hadoop with:
```
bin/hdfs dfs -mkdir -p /user/cloud_user
```
From /home/cloud_user/hadoop/, copy the latin.txt file to Hadoop at /user/cloud_user/latin with:
```
bin/hdfs dfs -put latin.txt latin
```
Challenge

Examine the latin.txt Text with MapReduce
From /home/cloud_user/hadoop/, use the hadoop-mapreduce-examples-*.jar to calculate the average length of the words in the /user/cloud_user/latin file and save the job output to /user/cloud_user/latin_wordmean_output in Hadoop with:
```
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordmean latin latin_wordmean_output
```
From /home/cloud_user/hadoop/, examine your wordmean job output files with:
```
bin/hdfs dfs -cat latin_wordmean_output/*
```

Author

Pluralsight Skills

Pluralsight Skills gives leaders confidence they have the skills needed to execute technology strategy. Technology teams can benchmark expertise across roles, speed up release cycles and build reliable, secure products. By leveraging our expert content, skill assessments and one-of-a-kind analytics, keep up with the pace of change, put the right people on the right projects and boost productivity. It's the most effective path to developing tech skills at scale.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Ready to get started?

View individual plans View team plans

Deploy and Configure a Single-Node Hadoop Cluster

Path Info

Table of Contents

Install Java

Deploy Hadoop

Configure java_home

Configure Core Hadoop

Configure HDFS

Set Up Passwordless SSH Access to localhost

Format the Filesystem

Start Hadoop

Download and Copy the Latin Text to Hadoop

Examine the latin.txt Text with MapReduce

What's a lab?

Provided environment for hands-on practice

Guided walkthrough

Did you know?