Monday, April 13, 2015

Install Cloudera Hadoop CDH5 with YARN on Ubuntu


This tutorial describes how to install and configure a single-node Hadoop cluster on Ubuntu OS. Single Node Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The tutorial is very simple and to the point, so that you can install Hadoop in 10 Min. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

Recommended Platform:

  • OS: Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like: CentOS, Redhat, etc.)
  • Hadoop: Cloudera Distribution for Apache hadoop CDH5.x (you can use Apache hadoop 2.x)


Install Java 7 (Recommended Oracle Java)

Install Python Software Properties

$sudo apt-get install python-software-properties

Add Repository

$sudo add-apt-repository ppa:webupd8team/java

Update the source list

$sudo apt-get update

Install Java

$sudo apt-get install oracle-java7-installer

Configure SSH

Install Open SSH Server-Client

$sudo apt-get install openssh-server openssh-client

Generate Key Pairs

$ssh-keygen -t rsa -P ""

3.2.3 Configure password-less SSH

$cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Check by SSH to localhost

$ssh localhost

Install Hadoop

Download Hadoop

Untar Tar ball

$tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2)