Hadoop Installation on Ubuntu Multinode Clusters

If you have set up hadoop on signle node then what next? Install it on multinode. Yes !

In short follow below steps & enjoy hadoop at multiple platform :-

Requirement:- 2 or more ubuntu systems (one for master or others for slaves)

Single node setup first

In previous post you learnt how to set up ubuntu on single node so at first set up hadoop on two or more systems.

Bonus - Change 'master' as system name of master system or 'slave1', 'slave2', ... of slave systems. This is convention not a must follow rule. Follow conventions to avoid unwanted errors or easily trace them.

Note- master will also work as slave.

How ?

Open file /etc/hostname or change

YourPcName with master or slave

Network setting

So you have installed single node. Yes, then continue first stop each single node cluster by using 'stop-all.sh'.To install hadoop on multinode each system much have to on a network. So connect system by using LAN or WAN. Then update /etc/hosts


192.168.0.1   master


192.168.0.2   slave1


192.168.0.3   slave2


192.168.0.4   slave3


. . .

Note - ' . . . ' means repeat for all slaves (used below also) .

SSH Configration

for multinode cluster setup hduser of master must be able to connect to its own account on master or hduser account on slaves without any password. hduser is able to connent to its own account beacuse we defined ssh in single node. To connect slaves we have to copy rsa public key to slaves. So copy it via using following comand for each slave

hduser@master:~$
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1

connect from master to master


ssh master





hduser@master:~$ ssh master


The authenticity of host 'master (192.168.0.1)' can't be established.


RSA key fingerprint is 3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.


Are you sure you want to continue connecting (yes/no)? yes


Warning: Permanently added 'master' (RSA) to the list of known hosts.


Linux master 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686


...


hduser@master:~$

connect from master to slaves


ssh slaves 





hduser@master:~$ ssh slave


The authenticity of host 'slave (192.168.0.2)' can't be established.


RSA key fingerprint is 74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.


Are you sure you want to continue connecting (yes/no)? yes


Warning: Permanently added 'slave' (RSA) to the list of known hosts.


Ubuntu 10.04


...

hduser@slave:~$

Note - first time when you will try to acess slaves via ssh, it will ask for password of slave's haduser password. So you must have to know each slaves hduser password.

Update Configuration files

After ssh configuration now you have to update 5 conf files (path - /usr/local/hadoop/conf/) masters, slaves, or three *-site.xml files.

update conf/masters file (on master only)

conf/masters


master


slave1


slave2


slave3


. . .

update con/slaves file (on master only)

conf/slaves


master


slave1


slave2


slave3


. . .

now update *-site.xml files (on all machines)

Update core-site.xml. In core-site.xml update the value of 'fs.default.name' by 'master' which define the namenode host and port.

core-site.xml


<property>


<name>fs.default.name</name>


<value>hdfs://master:54310</value>


<description>The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.</description>


</property>

Update mapred-site.xml. Update the value of 'mapred.job.tracker' by 'master' which define the jobtracker host and port.

mapred-site.xml


<property>


<name>mapred.job.tracker</name>


<value>master:54311</value>


<description>The host and port that the MapReduce job tracker
runs at.  If "local", then jobs are run in-process as a single


map and reduce task.


</description>


</property>

update dfs-site.xml. The default value of dfs.replication is 3. if we have 4 nodes available, then we set dfs.replication to 4.

conf/hdfs-site.xml


<property>


<name>dfs.replication</name>


<value>4</value>


<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.


</description>


</property>

Formatting HDFS via namenode

before starting hadoop cluster on multinode we must have to format the Hadoop filesystem first. So format it via using below command on master. And do not format a running cluster. First stop it to avoid data lose.

hduser@master:/usr/local/hadoop$
bin/hadoop namenode -format

Start multinode cluster

First start HDFS daemons to start namenode on master or datanode on master or slaves via using this command on master.


bin/start-dfs.sh





hduser@master:/usr/local/hadoop$ bin/start-dfs.sh


starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-master.out


slave: Ubuntu 10.04


slave: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-slave.out


master: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-master.out


master: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-master.out


hduser@master:/usr/local/hadoop$

Test it via jps command on master and slaves

on master


hduser@master:/usr/local/hadoop$ jps


14799 NameNode


15314 Jps


14880 DataNode


14977 SecondaryNameNode




hduser@master:/usr/local/hadoop$

on slaves


hduser@slave1:/usr/local/hadoop$ jps


15183 DataNode


15616 Jps




hduser@slave1:/usr/local/hadoop$

Second start Mapreduce daemons to start jobtracker on master or tasktracker on master and slaves by using below command on master.

bin/start-mapred.sh

hduser@master:/usr/local/hadoop$ bin/start-mapred.sh


starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out


slave: Ubuntu 10.04


slave: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-slave.out


master: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-master.out


hduser@master:/usr/local/hadoop$

now check it via jps command.

On master


hduser@master:/usr/local/hadoop$ jps


16017 Jps


14799 NameNode


15686 TaskTracker


14880 DataNode


15596 JobTracker


14977 SecondaryNameNode




hduser@master:/usr/local/hadoop$

on slaves


hduser@slave1:/usr/local/hadoop$ jps


15183 DataNode


15897 TaskTracker


16284 Jps




hduser@slave1:/usr/local/hadoop$

Stopping multinode cluster

Stop first Maperd daemons via bin/stop-mapred.sh command on master. It will stop jobtracker on master or tasktracker on slaves or master.

bin/stop-mapred.sh

hduser@master:/usr/local/hadoop$ bin/stop-mapred.sh


stopping jobtracker


slave1: Ubuntu 10.04


master: stopping tasktracker


slave1: stopping tasktracker


. . .


hduser@master:/usr/local/hadoop$

check it via jps

on master


hduser@master:/usr/local/hadoop$ jps


14799 NameNode


18386 Jps


14880 DataNode


14977 SecondaryNameNode




hduser@master:/usr/local/hadoop$

on slaves


hduser@slave:/usr/local/hadoop$ jps




15183 DataNode


18636 Jps


hduser@slave:/usr/local/hadoop$

Second stop HDFS daemons via bin/stop-dfs.sh command on master. It will stop namenode on master or datanode on slaves or master.

bin/stop-dfs.sh


hduser@master:/usr/local/hadoop$ bin/stop-dfs.sh


stopping namenode


slave1: Ubuntu 10.04


slave1: stopping datanode


. . .


master: stopping datanode


master: stopping secondarynamenode


hduser@ master:/usr/local/hadoop$

check it via jps

on master


hduser@master:/usr/local/hadoop$ jps


18670 Jps


hduser@master:/usr/local/hadoop$

on slaves


hduser@slave1:/usr/local/hadoop$ jps




18894 Jps


hduser@slave1:/usr/local/hadoop$

now your hadoop multinode cluster installation, starting and stopping is done. Your next work is to perform mapreduce job on multinode. So good luck. Stay connected with us. Lots more things to come. Till then bbye.:)

Search This Blog

Software user