Hadoop Installation on Ubuntu Multinode Clusters

If you have set up hadoop on signle node then what next? Install it on multinode. Yes !

In short follow below steps & enjoy hadoop at multiple platform :-

Requirement:- 2 or more ubuntu systems (one for master or others for slaves)

In previous post you learnt how to set up ubuntu on single node so at first set up hadoop on two or more systems.

Bonus Change 'master' as system name of master system or 'slave1', 'slave2', ... of slave systems. This is convention not a must follow rule. Follow conventions to avoid unwanted errors or easily trace them.
Note- master will also work as slave.

How ?

Open file /etc/hostname or change
YourPcName with master or slave

Network setting

So you have installed single node. Yes, then continue first stop each single node cluster by using 'stop-all.sh'.To install hadoop on multinode each system much have to on a network. So connect system by using LAN or WAN. Then update /etc/hosts master slave1 slave2 slave3

. . .

Note -  ' . . . '  means repeat for all slaves (used below also) .

SSH Configration

for multinode cluster setup hduser of master must be able to connect to its own account on master or hduser account on slaves without any password. hduser is able to connent to its own account beacuse we defined ssh in single node. To connect slaves we have to copy rsa public key to slaves. So copy it via using following comand for each slave

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1

  • connect from master to master

ssh master

hduser@master:~$ ssh master

The authenticity of host 'master (' can't be established.

RSA key fingerprint is 3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master' (RSA) to the list of known hosts.

Linux master 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686



  • connect from master to slaves

ssh slaves

hduser@master:~$ ssh slave

The authenticity of host 'slave (' can't be established.

RSA key fingerprint is 74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'slave' (RSA) to the list of known hosts.

Ubuntu 10.04



Note - first time when you will try to acess slaves via ssh, it will ask for password of slave's haduser password. So you must have to know each slaves hduser password.

Update Configuration files

After ssh configuration now you have to update 5 conf files (path - /usr/local/hadoop/conf/) masters, slaves, or three *-site.xml files.

  • update conf/masters file (on master only)






. . .

  •  update con/slaves file (on master only)






    . . .

    now update *-site.xml files (on all machines)

    • Update core-site.xml. In core-site.xml update the value of 'fs.default.name' by 'master' which define the namenode host and port.




    <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>


    • Update mapred-site.xml. Update the value of 'mapred.job.tracker' by 'master' which define the jobtracker host and port.





    <description>The host and port that the MapReduce job tracker
    at. If "local", then jobs are run in-process as a single

    map and reduce task.



    • update dfs-site.xml. The default value of dfs.replication is 3. if we have 4 nodes available, then we set dfs.replication to 4.





    <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.



    Formatting HDFS via namenode

    before starting hadoop cluster on multinode we must have to format the Hadoop filesystem first. So format it via using below command on master. And do not format a running cluster. First stop it to avoid data lose.

    bin/hadoop namenode -format

    Start multinode cluster

    First start HDFS daemons to start namenode on master or datanode on master or slaves via using this command on master.


    hduser@master:/usr/local/hadoop$ bin/start-dfs.sh

    starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-master.out

    slave: Ubuntu 10.04

    slave: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-slave.out

    master: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-master.out

    master: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-master.out


    • Test it via jps command on master and slaves

    on master

    hduser@master:/usr/local/hadoop$ jps

    14799 NameNode

    15314 Jps

    14880 DataNode

    14977 SecondaryNameNode


    on slaves

    hduser@slave1:/usr/local/hadoop$ jps

    15183 DataNode

    15616 Jps


    • Second start Mapreduce daemons to start jobtracker on master or tasktracker on master and slaves by using below command on master.

    hduser@master:/usr/local/hadoop$ bin/start-mapred.sh

    starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out

    slave: Ubuntu 10.04

    slave: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-slave.out

    master: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-master.out


    now check it via jps command.

    On master

    hduser@master:/usr/local/hadoop$ jps

    16017 Jps

    14799 NameNode

    15686 TaskTracker

    14880 DataNode

    15596 JobTracker

    14977 SecondaryNameNode


    on slaves

    hduser@slave1:/usr/local/hadoop$ jps

    15183 DataNode

    15897 TaskTracker

    16284 Jps


    Stopping multinode cluster

    • Stop first Maperd daemons via bin/stop-mapred.sh command on master. It will stop jobtracker on master or tasktracker on slaves or master.

    hduser@master:/usr/local/hadoop$ bin/stop-mapred.sh

    stopping jobtracker

    slave1: Ubuntu 10.04

    master: stopping tasktracker

    slave1: stopping tasktracker

    . . .


    check it via jps

    on master

    hduser@master:/usr/local/hadoop$ jps

    14799 NameNode

    18386 Jps

    14880 DataNode

    14977 SecondaryNameNode


    on slaves

    hduser@slave:/usr/local/hadoop$ jps

    15183 DataNode

    18636 Jps


    • Second stop HDFS daemons via bin/stop-dfs.sh command on master. It will stop namenode on master or datanode on slaves or master.


    hduser@master:/usr/local/hadoop$ bin/stop-dfs.sh

    stopping namenode

    slave1: Ubuntu 10.04

    slave1: stopping datanode

    . . .

    master: stopping datanode

    master: stopping secondarynamenode

    hduser@ master:/usr/local/hadoop$

    check it via jps

    on master

    hduser@master:/usr/local/hadoop$ jps

    18670 Jps


    on slaves

    hduser@slave1:/usr/local/hadoop$ jps

    18894 Jps


    now your hadoop multinode cluster installation, starting and stopping is done. Your next work is to perform mapreduce job on multinode. So good luck. Stay connected with us. Lots more things to come. Till then bbye.:)   


