-
Notifications
You must be signed in to change notification settings - Fork 1
Hadoop setup Single node cluster
-
Download and install VMWare player
-
Download Ubuntu and setup in the VMWare player. A RAM of 4096 MB (4 GB) and a fixed hard disk storage of 16 GB is sufficient for the 'Single Cluster' setup for our learning.
Important
|
Windows PCs were having problems in supporting 64 Bit Ubuntu desktop client. Please try with 32 Bit desktop client version. Remeber to choose 32 Bit JDK if you are choosing 32 Bit OS. |
Inside the VM follow the instructions below. These instructions are for NOT specific to VM setup.
We create a separate user for hadoop environment. Setup the user with password and a default group. Press enter and accept all default options while creating user. Add the new user to sudo.
m@ubuntu:~$ sudo addgroup hadoop m@ubuntu:~$ sudo adduser --ingroup hadoop hduser m@ubuntu:~$ sudo adduser hduser sudo
Verify changing to hduser
m@ubuntu:~$ su hduser Password: hduser@ubuntu:$ cd ~
-
ssh : Is the client program used to connect to remote machine.(most linux distributions has this by default)
-
sshd : The daemon that runs in the server. It listens to client connection requests and facilitates connection of clients to the server. (the main installation installs this daemon)
-
rsync : remote-sync is used to sync files across linux machine. It is not required for single cluster environment. (most linux distributions has this by default)
hduser@ubuntu:~$ sudo apt-get install ssh
Verify the installation
hduser@ubuntu:~$ which ssh
should display something like '/usr/bin/ssh'
hduser@ubuntu:~$ which sshd
should display something like '/usr/sbin/sshd'
We can setup JDK for Hadoop in 2 ways.
-
Install a separate JDK for Hadoop so that it won’t interfere with system wide settings. (recommended)
-
Use the existing installation of JDK
We are going to setup JDK within the user’s home directory so that it will not interfere with system wide settings. Choose your desired version of JDK for your current platform (64 or 32 bit). We have chosen 'jdk-8u112-linux-i586.tar.gz' as we are having a 32 bit Ubuntu OS. I have tested Hadoop 2.7.3 with JDK 8 and it is working with no major issues(so far). We will create a symlink as part of the instruction.
hduser@ubuntu:~$ mkdir /home/hduser/apps hduser@ubuntu:~$ cd /home/hduser/apps/ hduser@ubuntu:~$ wget --no-cookies --no-check-certificate --header \ "Cookie: oraclelicense=accept-securebackup-cookie" \ "http://download.oracle.com/otn-pub/java/jdk/8u112-b15/jdk-8u112-linux-i586.tar.gz" hduser@ubuntu:~$ tar zxvf jdk-8u112-linux-i586.tar.gz hduser@ubuntu:~$ rm jdk-8u112-linux-i586.tar.gz hduser@ubuntu:~$ cd ~ hduser@ubuntu:~$ ln -s /home/hduser/apps/jdk1.8.0_112 /home/hduser/apps/jdk_for_hadoop
Follow: Setup Java Home
If you have decided to use a installed java, please follow the steps below. Find the java location by below commands. We then create a symlink to the existing jdk.
hduser@ubuntu:~$ which javac /usr/bin/javac hduser@ubuntu:~$ readlink -f /usr/bin/javac /usr/lib/jvm/java-8-oracle/bin/javac hduser@ubuntu:~$ readlink -f /usr/lib/jvm/java-8-oracle/bin/javac /usr/lib/jvm/java-8-oracle/bin/javac hduser@ubuntu:~$ cd ~ hduser@ubuntu:~$ ln -s /home/hduser/apps/jdk1.8.0_112 /home/hduser/apps/jdk_for_hadoop
Use readlink repeatedly with previous response until there is no links i.e no output from the command or same output repeated.
Follow: Setup Java Home
Edit the ~/.bashrc file and append the below content to the end of the file.
hduser@ubuntu:~$ vi ~/.bashrc
export JAVA_HOME=/home/hduser/apps/jdk_for_hadoop export PATH=$JAVA_HOME/bin:$PATH
Save the file and execute below command to re-load the environment variables.
hduser@ubuntu:~$ source ~/.bashrc
Verify if the installation is successful
hduser@ubuntu:~$ java -version
should output something similar below:
java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) Client VM (build 25.112-b15, mixed mode)
We need to setup password less SSH login for the user we just created, so that the Hadoop application running as that particular user can communicate with other machines in the cluster. We still need that setup in our single cluster environment. Now we will create the rsa key. press enter and accept all default options
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Now add the generated key to the key store. In a distributed environment we need to add this key to all other machines.
hduser@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
verify if ssh works, type 'yes' for the question to 'continue connecting?'
hduser@ubuntu:~$ ssh localhost hduser@ubuntu:~$ exit
Download the the desired version of Hadoop binary from your nearest mirror and setup Hadoop in 'usr' directory.
hduser@ubuntu:$ cd ~ hduser@ubuntu:~$ wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz hduser@ubuntu:~$ tar -xvzf hadoop-2.7.3.tar.gz hduser@ubuntu:~$ sudo mv hadoop-2.7.3 /usr/local/hadoop hduser@ubuntu:~$ sudo chown -R hduser:hadoop /usr/local/hadoop
Create a separate Hadoop configuration directory (will be setup as HADOOP_CONF_DIR environment variable)
hduser@ubuntu:~$ sudo cp -R /usr/local/hadoop/etc/hadoop /usr/local/hadoop-conf hduser@ubuntu:~$ sudo chown -R hduser:hadoop /usr/local/hadoop-conf
Edit the ~/.bashrc file and append the below content to the end of the file.
hduser@ubuntu:~$ vi ~/.bashrc
#HADOOP VARIABLES START export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=/usr/local/hadoop-conf export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" #HADOOP VARIABLES END
Save the file and execute below command to re-load the environment variables.
hduser@ubuntu:~$ source ~/.bashrc
Edit the hadoop-env.sh
hduser@ubuntu:~$ vi /usr/local/hadoop-conf/hadoop-env.sh
append below line at the end of file:
export JAVA_HOME=/home/hduser/apps/jdk_for_hadoop
Edit the core-site.xml as below and add the configuration:
hduser@ubuntu:~$ vi /usr/local/hadoop-conf/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost/</value> </property> </configuration>
Please create below directories for the namenode and data node
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode $ sudo chown -R hduser:hadoop /usr/local/hadoop_store
Edit hdfs-site.xml as below:
hduser@ubuntu:~$ vi /usr/local/hadoop-conf/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>
Copy the mapred-site.xml.template into mapred-site.xml as below:
hduser@ubuntu:~$ cp /usr/local/hadoop-conf/mapred-site.xml.template /usr/local/hadoop-conf/mapred-site.xml
Edit mapred-site.xml as below:
hduser@ubuntu:~$ vi /usr/local/hadoop-conf/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
hduser@ubuntu:~$ vi /usr/local/hadoop-conf/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>127.0.0.1:8032</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Please execute the below command to format the HDFS file system.
Warning
|
This has to be done only once as part of setup. Repeated usage will format the HDFS again and will cause loss of data. |
hduser@ubuntu:~$ hdfs namenode -format
Start the daemons as below.
hduser@ubuntu:~$ start-dfs.sh hduser@ubuntu:~$ start-yarn.sh hduser@ubuntu:~$ mr-jobhistory-daemon.sh start historyserver
Check list of Daemons running
hduser@ubuntu:~$ jps 6048 NameNode 6753 NodeManager 6386 SecondaryNameNode 6630 ResourceManager 6202 DataNode 7099 JobHistoryServer 7182 Jps
Stop the daemons in below order.
hduser@ubuntu:~$ mr-jobhistory-daemon.sh stop historyserver hduser@ubuntu:~$ stop-yarn.sh hduser@ubuntu:~$ stop-dfs.sh
The above daemons activity can be checked from the UI below.