After Understanding What is Hadoop Lets deploy Hadoop on Single Machine:
This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest step by step tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Standalone mode(single node cluster). In this tutorial, I will describe required steps for deploying Hadoop. The main goal of this tutorial is to get a ”simple” Hadoop installation up and running so that you can play around with the software and learn more about it.
This Tutorial has been tested on:
This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest step by step tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Standalone mode(single node cluster). In this tutorial, I will describe required steps for deploying Hadoop. The main goal of this tutorial is to get a ”simple” Hadoop installation up and running so that you can play around with the software and learn more about it.
This Tutorial has been tested on:
Prerequisites:
Install Java:
Java 1.6.x (either Sun Java or Open Java) is recommended for Hadoop
Hadoop Installation:
Edit configuration file /usr/local/hadoop-0.20.2/conf/hadoop-env.sh and set JAVA_HOME:
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
Go your hadoop installation directory(HADOOP_HOME ie /usr/local/hadoop-0.20.2/):
It will generate following output:
Now lets run some examples:
Java 1.6.x (either Sun Java or Open Java) is recommended for Hadoop
1. Add the Canonical Partner Repository to your apt repositories (If you are using ubuntu version other then 10.04 then add repository corresponding to that version):
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" |
2. Update the source list
$ sudo apt-get update
|
3. Install sun-java6-jdk
$ sudo apt-get install sun-java6-jdk
|
4. After installation, make a quick check whether Sun’s JDK is correctly set up:
Adding a dedicated Hadoop system user:
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc)
user@ubuntu:~# java -version java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing) |
Adding a dedicated Hadoop system user:
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc)
$ sudo adduser hadoop_admin |
Login to hadoop_admin User:
user@ubuntu:~$ su - hadoop_admin |
Hadoop Installation:
$ cd /usr/local
$ sudo tar xzf hadoop-0.20.2.tar.gz
$ sudo chown -R hadoop_admin /usr/local/hadoop-0.20.2
|
Define JAVA_HOME:
Edit configuration file /usr/local/hadoop-0.20.2/conf/hadoop-env.sh and set JAVA_HOME:
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
$ vi conf/hadoop-env.sh |
Go your hadoop installation directory(HADOOP_HOME ie /usr/local/hadoop-0.20.2/):
$ bin/hadoop |
It will generate following output:
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters
|
Hadoop Setup in Standalone Mode is Completed.......!!!!!!!
Now lets run some examples:
1. Run Classic Pi example:
$ bin/hadoop jar hadoop-*-examples.jar pi 10 100
|
2. Run grep example:
$ mkdir input
$ cp conf/*.xml input $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' $ cat output/* |
3. Run word count example:
$ mkdir inputwords
$ cp conf/*.xml inputwords $ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords |
If you find any error visit Hadoop troubleshooting
should be "su - hadoop_admin" instead of "su - hadoopadmin"
ReplyDeleteThanx,
ReplyDeleteIt was just type mistake I will correct it