Friday, March 4, 2011

Running Cloudera in Pseudo Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Pseudo distributed mode (single node cluster)

Following steps tested on
Hadoop: CDH (Cloudera Distribution of Apache Hadoop)
OS: Ubuntu

Deploy Cloudera (CDH3) in Pseudo Distributed mode:
$ sudo add-apt-repository 
"deb lucid partner"
If you are using ubuntu 10.04 LTS run this command 
sudo apt-get install sun-java6-jdk Install java
lsb_release –c Name of the your distribution (let DISTRO)(eg: hardy or jaunty etc.)
vi /etc/apt/sources.list.d/cloudera.list
Then type:
deb DISTRO-cdh3 contrib
deb-src DISTRO-cdh3 contrib
A repository enables your package manager to install cloudera
replace DISTRO with the name of your distribution
sudo apt-get -y install curl install curl
curl -s | sudo apt-key add - Add a repository key. Add the Cloudera Public GPG Key to your repository
sudo apt-get update Update APT package index
sudo apt-get -y install hadoop-0.20-conf-pseudo Install Hadoop in pseudo-distributed mode:
A pseudo-distributed Hadoop installation is composed of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done Start the Cloudera Daemons
dpkg -L hadoop-0.20-conf-pseudo Viewing the files on Debian systems
jps It should give output like this:
14799 NameNode
14977 SecondaryNameNode
15183 DataNode
15596 JobTracker
15897 TaskTracker
Congratulations Cloudrea Setup is Completed. Now lets run some examples
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 10 100 run pi example
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop-0.20/conf/*.xml input
hadoop-0.20 fs -ls input
hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
run grep example
hadoop-0.20 fs -mkdir inputwords
hadoop-0.20 fs -put /etc/hadoop-0.20/conf/*.xml inputwords
hadoop-0.20 fs -ls inputwords
hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep inputwords outputwords 'dfs[a-z.]+'
run word count example
http://localhost:50070/ web based interface for name node
http://localhost:50030/ web based interface for Job tracker
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done Shutdown CDH3 Hadoop services


  1. AnonymousJune 15, 2011

    thank you for this post, it helps me a lot in my hive project!

  2. AnonymousJune 20, 2011


    Thanks for the post. I was trying this on Ubuntu Natty narwhal 11.04 and there isn't a package from cloudera. Did you try this exercise on Natty , if not, the steps to make it work will be another blog post.

  3. thanks.. it's work.. hope next step is more easy...

  4. Thanks, very helpful tutorial! We've recently posted a simpler, Hadoop single-node cluster tutorial that may be a good starting point before you delve into this pseudo-distributed mode one.