Friday, March 4, 2011

Hadoop in Standalone Mode

After Understanding What is Hadoop Lets deploy Hadoop on Single Machine:

This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest step by step tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Standalone mode(single node cluster). In this tutorial, I will describe required steps for deploying Hadoop. The main goal of this tutorial is to get a ”simple” Hadoop installation up and running so that you can play around with the software and learn more about it.

This Tutorial has been tested on:
  • Ubuntu Linux (10.04 LTS)
  • Hadoop 0.20.2
Install Java: 
Java 1.6.x (either Sun Java or Open Java) is recommended for Hadoop

1. Add the Canonical Partner Repository to your apt repositories (If you are using ubuntu version other then 10.04 then add repository corresponding to that version):

    $ sudo add-apt-repository "deb lucid partner"   

2. Update the source list

    $ sudo apt-get update   

3. Install sun-java6-jdk

    $ sudo apt-get install sun-java6-jdk   

4. After installation, make a quick check whether Sun’s JDK is correctly set up:

    user@ubuntu:~# java -version
    java version "1.6.0_20"
    Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
    Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)   

Adding a dedicated Hadoop system user:
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc)

    $ sudo adduser hadoop_admin   

Login to hadoop_admin User:

    user@ubuntu:~$ su - hadoop_admin   

Hadoop Installation:

    $ cd /usr/local
    $ sudo tar xzf hadoop-0.20.2.tar.gz
    $ sudo chown -R hadoop_admin /usr/local/hadoop-0.20.2   


Edit configuration file /usr/local/hadoop-0.20.2/conf/ and set JAVA_HOME:
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)

    $ vi conf/   

Go your hadoop installation directory(HADOOP_HOME ie /usr/local/hadoop-0.20.2/):

    $ bin/hadoop   

It will generate following output:

Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  mradmin              run a Map-Reduce admin client
  fsck                 run a DFS filesystem checking utility
  fs                   run a generic filesystem user client
  balancer             run a cluster balancing utility
  jobtracker           run the MapReduce job Tracker node
  pipes                run a Pipes job
  tasktracker          run a MapReduce task Tracker node
  job                  manipulate MapReduce jobs
  queue                get information regarding JobQueues
  version              print the version
  jar <jar>            run a jar file
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME <src>* <dest> create a hadoop archive   
  daemonlog            get/set the log level for each daemon
  CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters

Hadoop Setup in Standalone Mode is Completed.......!!!!!!!

Now lets run some examples:
1. Run Classic Pi example:

    $ bin/hadoop jar hadoop-*-examples.jar pi 10 100   

2. Run grep example:

    $ mkdir input
    $ cp conf/*.xml input
    $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
    $ cat output/*

3. Run word count example:

    $ mkdir inputwords
    $ cp conf/*.xml inputwords
    $ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords   

If you find any error visit Hadoop troubleshooting


  1. should be "su - hadoop_admin" instead of "su - hadoopadmin"

  2. Thanx,
    It was just type mistake I will correct it