Hadoop Installation on Ubuntu Multinode Clusters | hadoop tutorial for beginners



If you have set up hadoop on signle node then what next? Install it on multinode. Yes !

In short follow below steps & enjoy hadoop at multiple platform :-

Requirement:- 2 or more ubuntu systems (one for master or others for slaves)


In previous post you learnt how to set up ubuntu on single node so at first set up hadoop on two or more systems.

Bonus Change 'master' as system name of master system or 'slave1', 'slave2', ... of slave systems. This is convention not a must follow rule. Follow conventions to avoid unwanted errors or easily trace them.
Note- master will also work as slave.

How ?

Open file /etc/hostname or change
YourPcName with master or slave

Network setting

So you have installed single node. Yes, then continue first stop each single node cluster by using 'stop-all.sh'.To install hadoop on multinode each system much have to on a network. So connect system by using LAN or WAN. Then update /etc/hosts


192.168.0.1 master


192.168.0.2 slave1


192.168.0.3 slave2


192.168.0.4 slave3


. . .


Note -  ' . . . '  means repeat for all slaves (used below also) .

SSH Configration

for multinode cluster setup hduser of master must be able to connect to its own account on master or hduser account on slaves without any password. hduser is able to connent to its own account beacuse we defined ssh in single node. To connect slaves we have to copy rsa public key to slaves. So copy it via using following comand for each slave


hduser@master:~$
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1


  • connect from master to master


ssh master





hduser@master:~$ ssh master


The authenticity of host 'master (192.168.0.1)' can't be established.


RSA key fingerprint is 3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.


Are you sure you want to continue connecting (yes/no)? yes


Warning: Permanently added 'master' (RSA) to the list of known hosts.


Linux master 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686


...


hduser@master:~$




  • connect from master to slaves


ssh slaves





hduser@master:~$ ssh slave


The authenticity of host 'slave (192.168.0.2)' can't be established.


RSA key fingerprint is 74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.


Are you sure you want to continue connecting (yes/no)? yes


Warning: Permanently added 'slave' (RSA) to the list of known hosts.


Ubuntu 10.04


...

hduser@slave:~$

Note - first time when you will try to acess slaves via ssh, it will ask for password of slave's haduser password. So you must have to know each slaves hduser password.

Update Configuration files

After ssh configuration now you have to update 5 conf files (path - /usr/local/hadoop/conf/) masters, slaves, or three *-site.xml files.

  • update conf/masters file (on master only)

conf/masters

master


slave1


slave2


slave3


. . .



  •  update con/slaves file (on master only)


    conf/slaves

    master


    slave1


    slave2


    slave3


    . . .


    now update *-site.xml files (on all machines)


    • Update core-site.xml. In core-site.xml update the value of 'fs.default.name' by 'master' which define the namenode host and port.
    core-site.xml


    <property>


    <name>fs.default.name</name>


    <value>hdfs://master:54310</value>


    <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>


    </property>



    • Update mapred-site.xml. Update the value of 'mapred.job.tracker' by 'master' which define the jobtracker host and port.

    mapred-site.xml


    <property>


    <name>mapred.job.tracker</name>


    <value>master:54311</value>


    <description>The host and port that the MapReduce job tracker
    runs
    at. If "local", then jobs are run in-process as a single


    map and reduce task.


    </description>


    </property>



    • update dfs-site.xml. The default value of dfs.replication is 3. if we have 4 nodes available, then we set dfs.replication to 4.

    conf/hdfs-site.xml


    <property>


    <name>dfs.replication</name>


    <value>4</value>


    <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.


    </description>


    </property>


    Formatting HDFS via namenode

    before starting hadoop cluster on multinode we must have to format the Hadoop filesystem first. So format it via using below command on master. And do not format a running cluster. First stop it to avoid data lose.


    hduser@master:/usr/local/hadoop$
    bin/hadoop namenode -format


    Start multinode cluster

    First start HDFS daemons to start namenode on master or datanode on master or slaves via using this command on master.


    bin/start-dfs.sh





    hduser@master:/usr/local/hadoop$ bin/start-dfs.sh


    starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-master.out


    slave: Ubuntu 10.04


    slave: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-slave.out


    master: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-master.out


    master: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-master.out


    hduser@master:/usr/local/hadoop$



    • Test it via jps command on master and slaves

    on master


    hduser@master:/usr/local/hadoop$ jps


    14799 NameNode


    15314 Jps


    14880 DataNode


    14977 SecondaryNameNode




    hduser@master:/usr/local/hadoop$


    on slaves


    hduser@slave1:/usr/local/hadoop$ jps


    15183 DataNode


    15616 Jps




    hduser@slave1:/usr/local/hadoop$



    • Second start Mapreduce daemons to start jobtracker on master or tasktracker on master and slaves by using below command on master.

    bin/start-mapred.sh
    hduser@master:/usr/local/hadoop$ bin/start-mapred.sh

    starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out


    slave: Ubuntu 10.04


    slave: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-slave.out


    master: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-master.out


    hduser@master:/usr/local/hadoop$


    now check it via jps command.

    On master


    hduser@master:/usr/local/hadoop$ jps


    16017 Jps


    14799 NameNode


    15686 TaskTracker


    14880 DataNode


    15596 JobTracker


    14977 SecondaryNameNode




    hduser@master:/usr/local/hadoop$


    on slaves


    hduser@slave1:/usr/local/hadoop$ jps


    15183 DataNode


    15897 TaskTracker


    16284 Jps




    hduser@slave1:/usr/local/hadoop$


    Stopping multinode cluster


    • Stop first Maperd daemons via bin/stop-mapred.sh command on master. It will stop jobtracker on master or tasktracker on slaves or master.

    bin/stop-mapred.sh
    hduser@master:/usr/local/hadoop$ bin/stop-mapred.sh

    stopping jobtracker


    slave1: Ubuntu 10.04


    master: stopping tasktracker


    slave1: stopping tasktracker


    . . .


    hduser@master:/usr/local/hadoop$


    check it via jps

    on master


    hduser@master:/usr/local/hadoop$ jps


    14799 NameNode


    18386 Jps


    14880 DataNode


    14977 SecondaryNameNode




    hduser@master:/usr/local/hadoop$


    on slaves


    hduser@slave:/usr/local/hadoop$ jps




    15183 DataNode


    18636 Jps


    hduser@slave:/usr/local/hadoop$



    • Second stop HDFS daemons via bin/stop-dfs.sh command on master. It will stop namenode on master or datanode on slaves or master.

    bin/stop-dfs.sh


    hduser@master:/usr/local/hadoop$ bin/stop-dfs.sh


    stopping namenode


    slave1: Ubuntu 10.04


    slave1: stopping datanode


    . . .


    master: stopping datanode


    master: stopping secondarynamenode


    hduser@ master:/usr/local/hadoop$



    check it via jps

    on master


    hduser@master:/usr/local/hadoop$ jps


    18670 Jps


    hduser@master:/usr/local/hadoop$


    on slaves


    hduser@slave1:/usr/local/hadoop$ jps




    18894 Jps


    hduser@slave1:/usr/local/hadoop$



    now your hadoop multinode cluster installation, starting and stopping is done. Your next work is to perform mapreduce job on multinode. So good luck. Stay connected with us. Lots more things to come. Till then bbye.:)   

    Comments

    Popular posts from this blog

    Read analog input of Arduino Due board

    Application Note of Cortex-M3 Embedded Software Development