Hadoop Installation on Ubuntu Single Node Cluster | hadoop tutorial for beginners


Installing hadoop on single node quite simple if one proceed in through steps. To install hadoop there some specific steps. Going through which it takes hardly ten minutes to setup the environment on single node. These steps are as follow just remember the header, each step will further described later. So lets start with hadoop.

1. Prerequisite

JAVA - Hadoop build on java platform so it need an java environment to execute it task. Thus one who want hadoop he must have java installed in his/her machine. Support hadoop can support any version of java higher than java 1.5(aka java).
To install java:
sudo apt-get update

it will update your machine. It need to be done in a newly install ubuntu environment. It simple make packages of Apache available to ubuntu.
sudo apt-get upgrade

This command will check for new version of different packages.
sudo apt-get install openjdk-7-jdk

Now check whether java installation was correctly done or not.
java -version

it will show the version of java which have installed if above work have been done successfully otherwise it will show something like �does not find command�

2. Making a dedicated user for hadoop

This is not mandatory step but it is always preferable to make a dedicated user for hadoop as doing so main user environment doesn't affected by hadoop environment.
$ sudo addgroup hadoop

It will add a dedicated group named hadoop.
$ sudo adduser --ingroup hadoop hduser

This command will add a dedicated user in group hadooop. Now add �hduser� to list sudo user. Making it to use root permission of machine temporally.
$ sudo gedit /etc/sudoers

This command will open the configuration file for sudo user
Now add following content in the file.
hduser ALL=(ALL) ALL

Save the file and exit.
This will define access permission for hduser to when it is used with �sudo�

3. SSH configurations

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to local host for the hduser user we created in the previous section.
First you need to install ssh services on your system.
$ su � hduser

This will switch the user to �hduser�.
hduser@BigData:~$ sudo apt-get install openssh-server

This will install ssh services on your ubuntu based machine.
hduser@BigData:~$ ssh-keygen -t rsa -P ��user@ubuntu:~$ su - hduser

Output of this command is as show follows(just press enter wherever it ask for the input):

Generating public/private rsa key pair.


Enter file in which to save the key (/home/hduser/.ssh/id_rsa):


Created directory '/home/hduser/.ssh'.


our identification has been saved in /home/hduser/.ssh/id_rsa.


Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.


The key fingerprint is:


9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu


The key's randomart image is:


[...snipp...]


hduser@BigData:~$

Above shown comm and will generate a ssh key enabling users to connect the machine without password.
After generating ssh key you have to enable SSH access to your local machine with this newly created key.
hduser@BigData:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now we all done with ssh configuration now check everything is alright or not.
hduser@BigData:~$ ssh localhost

The output should looks like this:

Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)


* Documentation: https://help.ubuntu.com/


Last login: Thu Jun 12 23:35:33 2014


4. Disabling IPV6

One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of my Ubuntu box. In my case, I realised that there�s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, I simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. To disable IPv6 on Ubuntu 14.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file:

# disable ipv6


net.ipv6.conf.all.disable_ipv6 = 1


net.ipv6.conf.default.disable_ipv6 = 1


net.ipv6.conf.lo.disable_ipv6 = 1

Now restart your system to affect changes.
Check whether ipv6 disabled or not by typing following command:
hduser@BigData:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

If output is 1 that's mean ipv6 is successfully disabled if 0 that's mean it is still enable debug the changes you have made in such case.

5. Setup Hadoop in local system

Download any stable version of hadoop. But hadoop 2.x is upgrade version of hadoop 1.x is
lower version of it.
Copy tar file from downloaded location(in my case it is in Downloads) to /usr/local
$ sudo cp $HOME/downloads/hadoop-1.2.1.tar.gz /usr/local

Making hadoop home as directory �hadoop�.

$ cd /usr/local


$ sudo chown hduser:hadoop hadoop-1.2.1.tar.gz


$ tar xzf hadoop-1.2.1.tar.gz


$ mkdir hadoop


$ mv hadoop-1.2.1.tar.gz hadoop


6. Configuring ubuntu environment

In this phase we will set ubuntu environment making hadoop work upon it.
To do so we need to open .bashrc file which is responsible for this.
hduser@BigData:~$ sudo gedit /home/hduser/.bashrc

This will open .bashrc file. We will set JAVA_HOME, HADOOP_HOME and their corresponding PATH in this file.You just need to paste the content given below at the end of this file.
export PATH=$PATH:$HADOOP_HOME/bin

# Set Hadoop-related environment variables


export HADOOP_HOME=/usr/local/hadoop


# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)


export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64


# Some convenient aliases and functions for running Hadoop-related commands


nalias fs &> /dev/null


alias fs="hadoop fs"


unalias hls &> /dev/null


alias hls="fs -ls"


# If you have LZO compression enabled in your Hadoop cluster and


# compress job outputs with LZOP (not covered in this tutorial):


# Conveniently inspect an LZOP compressed file from the command


# line; run via:


#


# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo


#


# Requires installed 'lzop' command.


#


lzohead () {


hadoop fs -cat $1 | lzop -dc | head -1000 | less


}


# Add Hadoop bin/ directory to PATH


export PATH=$PATH:$HADOOP_HOME/bin


See the 5th line of above code
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

This is for 64 bit system. If your system is 32 bit then make it as follow
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386

7. Configure hadoop for single node.

Before starting hadoop we have configure it for single node. There are basically three file where on need to set configuration these are

*-site.xml

These are located in hadoop/conf/ directory.
You can leave the settings below �as is� with the exception of the hadoop.tmp.dir parameter � this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop�s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don�t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point.
Now we create the directory and set the required ownerships and permissions:


$ sudo mkdir -p /app/hadoop/tmp


$ sudo chown hduser:hadoop /app/hadoop/tmp


#...and if you want to tighten up security, chmod from 755 to 750...


$ sudo chmod 750 /app/hadoop/tmp

Lest start with the configuration.

  • Configuring core-site.xml

hduser@BigData:~$ sudo gedit conf/core-site.xml

add following content in between <configuration></configuration> tag

<property>


<name>hadoop.tmp.dir</name>


<value>/app/hadoop/tmp</value>


<description>A base for other temporary directories.</description>


</property>


<property>


<name>fs.default.name</name>


<value>hdfs://localhost:54310</value>


<description>The name of the default file system. A URI whose


scheme and authority determine the FileSystem implementation. The


uri's scheme determines the config property (fs.SCHEME.impl) naming


the FileSystem implementation class. The uri's authority is used to


determine the host, port, etc. for a filesystem.</description>


</property>

save and close it.

  • Configuring mapred-site.xml


hduser@BigData:~$ sudo gedit conf/mapred-site.xml


<property>


<name>mapred.job.tracker</name>


<value>localhost:54311</value>


<description>The host and port that the Map Reduce job tracker runs


at. If "local", then jobs are run in-process as a single map


and reduce task.


</description>


</property>


  • Configuring hdfs-site.xml

hduser@BigData:~$ sudo gedit conf/hdfs-site.xml

<property>


<name>dfs.replication</name>


<value>1</value>


<description>Default block replication.


The actual number of replications can be specified when the file is created.


The default is used if replication is not specified in create time.


</description>


</property>


Now you have all set to start the hadoop on single node. See Getting Started with Hadoop and the documentation in Hadoop�s API Overview if you have any questions about Hadoop�s configuration options.

8. Formatting namenode.

Before starting you should format hadoop via namenode using following commands
hduser@BigData:~$ /usr/local/hadoop/bin/hadoop namenode -format

The output will look like this:

hduser@BigData:/usr/local/hadoop$ bin/hadoop namenode -format


10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:


/************************************************************


STARTUP_MSG: Starting NameNode


STARTUP_MSG: host = ubuntu/127.0.1.1


STARTUP_MSG: args = [-format]


STARTUP_MSG: version = 0.20.2


STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010


************************************************************/


10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop


10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup


10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true


10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.


10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.


10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:


/************************************************************


SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1


************************************************************/


hduser@BigData:/usr/local/hadoop$



9. Start services.

Now we all set up to start the services of hadoop.
hduser@BigData:~$ /usr/local/hadoop/bin/start-all.sh

To check that thing are right type:
$jps

It will show all on running services. There should be 5 services which are related to hadoop.If all of them are running then you have succeed. Otherwise debug the problem(s). The output show all these 5 services as shown below:

hduser@BigData:/usr/local/hadoop$ jps


2287 TaskTracker


2149 JobTracker


1938 DataNode


2085 SecondaryNameNode


2349 Jps


1788 NameNode



10. Stop Services

When you all done with hadoop single node cluster you should stop the services of it. To stop
the services of hadoop type:
$ Stop-all.sh

This will stop all 5 on going services. You can check via typing
$ jps

The output of it looks like as follow
2349 Jps

This is all from us Stay connected with us. Lots more things to come. Till then bubye.:)

Comments

Popular posts from this blog

How to Remove �Powered by Blogger� attribution From Blogspot Blog?

Fritzing: an Electronic Design Automation software