Hadoop Installation on Ubuntu Single Node Cluster | hadoop tutorial for beginners
Installing hadoop on single node quite simple if one proceed in through steps. To install hadoop there some specific steps. Going through which it takes hardly ten minutes to setup the environment on single node. These steps are as follow just remember the header, each step will further described later. So lets start with hadoop.
1. Prerequisite
JAVA - Hadoop build on java platform so it need an java environment to execute it task. Thus one who want hadoop he must have java installed in his/her machine. Support hadoop can support any version of java higher than java 1.5(aka java).
To install java:
sudo apt-get update
it will update your machine. It need to be done in a newly install ubuntu environment. It simple make packages of Apache available to ubuntu.
sudo apt-get upgrade
This command will check for new version of different packages.
sudo apt-get install openjdk-7-jdk
Now check whether java installation was correctly done or not.
java -version
it will show the version of java which have installed if above work have been done successfully otherwise it will show something like �does not find command�
2. Making a dedicated user for hadoop
This is not mandatory step but it is always preferable to make a dedicated user for hadoop as doing so main user environment doesn't affected by hadoop environment.
$ sudo addgroup hadoop
It will add a dedicated group named hadoop.
$ sudo adduser --ingroup hadoop hduser
This command will add a dedicated user in group hadooop. Now add �hduser� to list sudo user. Making it to use root permission of machine temporally.
$ sudo gedit /etc/sudoers
This command will open the configuration file for sudo user
Now add following content in the file.
hduser ALL=(ALL) ALL
Save the file and exit.
This will define access permission for hduser to when it is used with �sudo�
3. SSH configurations
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to local host for the hduser user we created in the previous section.
First you need to install ssh services on your system.
$ su � hduser
This will switch the user to �hduser�.
hduser@BigData:~$ sudo apt-get install openssh-server
This will install ssh services on your ubuntu based machine.
hduser@BigData:~$ ssh-keygen -t rsa -P ��user@ubuntu:~$ su - hduser
Output of this command is as show follows(just press enter wherever it ask for the input):
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
our identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:
[...snipp...]
hduser@BigData:~$
Above shown comm and will generate a ssh key enabling users to connect the machine without password.
After generating ssh key you have to enable SSH access to your local machine with this newly created key.
hduser@BigData:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Now we all done with ssh configuration now check everything is alright or not.
hduser@BigData:~$ ssh localhost
The output should looks like this:
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Thu Jun 12 23:35:33 2014
4. Disabling IPV6
One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of my Ubuntu box. In my case, I realised that there�s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, I simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. To disable IPv6 on Ubuntu 14.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Now restart your system to affect changes.
Check whether ipv6 disabled or not by typing following command:
hduser@BigData:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
If output is 1 that's mean ipv6 is successfully disabled if 0 that's mean it is still enable debug the changes you have made in such case.
5. Setup Hadoop in local system
Download any stable version of hadoop. But hadoop 2.x is upgrade version of hadoop 1.x is
lower version of it.
Copy tar file from downloaded location(in my case it is in Downloads) to /usr/local
$ sudo cp $HOME/downloads/hadoop-1.2.1.tar.gz /usr/local
Making hadoop home as directory �hadoop�.
$ cd /usr/local
$ sudo chown hduser:hadoop hadoop-1.2.1.tar.gz
$ tar xzf hadoop-1.2.1.tar.gz
$ mkdir hadoop
$ mv hadoop-1.2.1.tar.gz hadoop
6. Configuring ubuntu environment
In this phase we will set ubuntu environment making hadoop work upon it.
To do so we need to open .bashrc file which is responsible for this.
hduser@BigData:~$ sudo gedit /home/hduser/.bashrc
This will open .bashrc file. We will set JAVA_HOME, HADOOP_HOME and their corresponding PATH in this file.You just need to paste the content given below at the end of this file.
export PATH=$PATH:$HADOOP_HOME/bin
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Some convenient aliases and functions for running Hadoop-related commands
nalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
See the 5th line of above code
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
This is for 64 bit system. If your system is 32 bit then make it as follow
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
7. Configure hadoop for single node.
Before starting hadoop we have configure it for single node. There are basically three file where on need to set configuration these are
*-site.xml
These are located in hadoop/conf/ directory.
You can leave the settings below �as is� with the exception of the hadoop.tmp.dir parameter � this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop�s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don�t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point.
Now we create the directory and set the required ownerships and permissions:
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
#...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
Lest start with the configuration.
- Configuring core-site.xml
hduser@BigData:~$ sudo gedit conf/core-site.xml
add following content in between <configuration></configuration> tag
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
save and close it.
- Configuring mapred-site.xml
hduser@BigData:~$ sudo gedit conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the Map Reduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
- Configuring hdfs-site.xml
hduser@BigData:~$ sudo gedit conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
Now you have all set to start the hadoop on single node. See Getting Started with Hadoop and the documentation in Hadoop�s API Overview if you have any questions about Hadoop�s configuration options.
8. Formatting namenode.
Before starting you should format hadoop via namenode using following commands
hduser@BigData:~$ /usr/local/hadoop/bin/hadoop namenode -format
The output will look like this:
hduser@BigData:/usr/local/hadoop$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@BigData:/usr/local/hadoop$
9. Start services.
Now we all set up to start the services of hadoop.
hduser@BigData:~$ /usr/local/hadoop/bin/start-all.sh
To check that thing are right type:
$jps
It will show all on running services. There should be 5 services which are related to hadoop.If all of them are running then you have succeed. Otherwise debug the problem(s). The output show all these 5 services as shown below:
hduser@BigData:/usr/local/hadoop$ jps
2287 TaskTracker
2149 JobTracker
1938 DataNode
2085 SecondaryNameNode
2349 Jps
1788 NameNode
10. Stop Services
When you all done with hadoop single node cluster you should stop the services of it. To stop
the services of hadoop type:
$ Stop-all.sh
This will stop all 5 on going services. You can check via typing
$ jps
The output of it looks like as follow
2349 Jps
This is all from us Stay connected with us. Lots more things to come. Till then bubye.:)
Comments
Post a Comment