Hadoop Installation on Ubuntu Multinode Clusters | hadoop tutorial for beginners
If you have set up hadoop on signle node then what next? Install it on multinode. Yes !
In short follow below steps & enjoy hadoop at multiple platform :-
Requirement:- 2 or more ubuntu systems (one for master or others for slaves)
In previous post you learnt how to set up ubuntu on single node so at first set up hadoop on two or more systems.
Bonus - Change 'master' as system name of master system or 'slave1', 'slave2', ... of slave systems. This is convention not a must follow rule. Follow conventions to avoid unwanted errors or easily trace them.
Note- master will also work as slave.
How ?
Open file /etc/hostname or change
YourPcName with master or slave
Network setting
So you have installed single node. Yes, then continue first stop each single node cluster by using 'stop-all.sh'.To install hadoop on multinode each system much have to on a network. So connect system by using LAN or WAN. Then update /etc/hosts
192.168.0.1 master
192.168.0.2 slave1
192.168.0.3 slave2
192.168.0.4 slave3
. . .
Note - ' . . . ' means repeat for all slaves (used below also) .
SSH Configration
for multinode cluster setup hduser of master must be able to connect to its own account on master or hduser account on slaves without any password. hduser is able to connent to its own account beacuse we defined ssh in single node. To connect slaves we have to copy rsa public key to slaves. So copy it via using following comand for each slave
hduser@master:~$
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1
- connect from master to master
ssh master
hduser@master:~$ ssh master
The authenticity of host 'master (192.168.0.1)' can't be established.
RSA key fingerprint is 3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master' (RSA) to the list of known hosts.
Linux master 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686
...
hduser@master:~$
- connect from master to slaves
ssh slaves
hduser@master:~$ ssh slave
The authenticity of host 'slave (192.168.0.2)' can't be established.
RSA key fingerprint is 74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave' (RSA) to the list of known hosts.
Ubuntu 10.04
...
hduser@slave:~$
Note - first time when you will try to acess slaves via ssh, it will ask for password of slave's haduser password. So you must have to know each slaves hduser password.
Update Configuration files
After ssh configuration now you have to update 5 conf files (path - /usr/local/hadoop/conf/) masters, slaves, or three *-site.xml files.
- update conf/masters file (on master only)
conf/masters
master
slave1
slave2
slave3
. . .
- update con/slaves file (on master only)
conf/slaves
master
slave1
slave2
slave3
. . .
now update *-site.xml files (on all machines)
- Update core-site.xml. In core-site.xml update the value of 'fs.default.name' by 'master' which define the namenode host and port.
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>
- Update mapred-site.xml. Update the value of 'mapred.job.tracker' by 'master' which define the jobtracker host and port.
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker
runs at. If "local", then jobs are run in-process as a single
map and reduce task.
</description>
</property>
- update dfs-site.xml. The default value of dfs.replication is 3. if we have 4 nodes available, then we set dfs.replication to 4.
conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>4</value>
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
</property>
Formatting HDFS via namenode
before starting hadoop cluster on multinode we must have to format the Hadoop filesystem first. So format it via using below command on master. And do not format a running cluster. First stop it to avoid data lose.
hduser@master:/usr/local/hadoop$
bin/hadoop namenode -format
Start multinode cluster
First start HDFS daemons to start namenode on master or datanode on master or slaves via using this command on master.
bin/start-dfs.sh
hduser@master:/usr/local/hadoop$ bin/start-dfs.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-master.out
slave: Ubuntu 10.04
slave: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-slave.out
master: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-master.out
master: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-master.out
hduser@master:/usr/local/hadoop$
- Test it via jps command on master and slaves
on master
hduser@master:/usr/local/hadoop$ jps
14799 NameNode
15314 Jps
14880 DataNode
14977 SecondaryNameNode
hduser@master:/usr/local/hadoop$
on slaves
hduser@slave1:/usr/local/hadoop$ jps
15183 DataNode
15616 Jps
hduser@slave1:/usr/local/hadoop$
- Second start Mapreduce daemons to start jobtracker on master or tasktracker on master and slaves by using below command on master.
bin/start-mapred.sh
hduser@master:/usr/local/hadoop$ bin/start-mapred.sh
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out
slave: Ubuntu 10.04
slave: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-slave.out
master: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-master.out
hduser@master:/usr/local/hadoop$
now check it via jps command.
On master
hduser@master:/usr/local/hadoop$ jps
16017 Jps
14799 NameNode
15686 TaskTracker
14880 DataNode
15596 JobTracker
14977 SecondaryNameNode
hduser@master:/usr/local/hadoop$
on slaves
hduser@slave1:/usr/local/hadoop$ jps
15183 DataNode
15897 TaskTracker
16284 Jps
hduser@slave1:/usr/local/hadoop$
Stopping multinode cluster
- Stop first Maperd daemons via bin/stop-mapred.sh command on master. It will stop jobtracker on master or tasktracker on slaves or master.
bin/stop-mapred.sh
hduser@master:/usr/local/hadoop$ bin/stop-mapred.sh
stopping jobtracker
slave1: Ubuntu 10.04
master: stopping tasktracker
slave1: stopping tasktracker
. . .
hduser@master:/usr/local/hadoop$
check it via jps
on master
hduser@master:/usr/local/hadoop$ jps
14799 NameNode
18386 Jps
14880 DataNode
14977 SecondaryNameNode
hduser@master:/usr/local/hadoop$
on slaves
hduser@slave:/usr/local/hadoop$ jps
15183 DataNode
18636 Jps
hduser@slave:/usr/local/hadoop$
- Second stop HDFS daemons via bin/stop-dfs.sh command on master. It will stop namenode on master or datanode on slaves or master.
bin/stop-dfs.sh
hduser@master:/usr/local/hadoop$ bin/stop-dfs.sh
stopping namenode
slave1: Ubuntu 10.04
slave1: stopping datanode
. . .
master: stopping datanode
master: stopping secondarynamenode
hduser@ master:/usr/local/hadoop$
check it via jps
on master
hduser@master:/usr/local/hadoop$ jps
18670 Jps
hduser@master:/usr/local/hadoop$
on slaves
hduser@slave1:/usr/local/hadoop$ jps
18894 Jps
hduser@slave1:/usr/local/hadoop$
now your hadoop multinode cluster installation, starting and stopping is done. Your next work is to perform mapreduce job on multinode. So good luck. Stay connected with us. Lots more things to come. Till then bbye.:)
Comments
Post a Comment