Steps to install Hadoop 2.x release (Yarn) on single node cluster setup (pseudo-distributed mode)
Hadoop 2.x release involves many changes to Hadoop and MapReduce. The centralized JobTracker service is replaced with a ResourceManager that manages the resources in the cluster and an ApplicationManager that manages the application lifecycle. These architectural changes enable hadoop to scale to much larger clusters. For more details on architectural changes in Hadoop next-gen (a.k.a. Yarn), watch this video or visit this blog.
This post explains on installing Hadoop 2.x a.k.a. Yarn on a single-node cluster.
- Java 7 installed
- Dedicated user for hadoop (not mandatory)
- SSH configured
Steps to install Hadoop 2.x:
1. Download tarball
You can download tarball for hadoop 2.x from http://apache.cs.utah.edu/hadoop/common/current2/ (hadoop-2.4.0.tar.gz ).
Extract it to a folder in your home directory say, $HOME/yarn.
$ cd $HOME/yarn (Optional)
$ sudo chown -R hduser:hadoop hadoop-2.4.0