Monday, March 28, 2011

Create an AMI


This blog will guide you through creating an Ubuntu AMI (Amazon Machine Image) from a launched Instance. In this tutorial we will create S3 backed AMI from running instance (Ubuntu). Before getting down to create an actual AMI let’s try to understand some basic terminologies:

Understand what AMI is: An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. We can say that AMI is an image from which an instance can boot.

What is Amazon EC2: Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers
Create your own AMI, so that you can boot new custom instance which have all the required software preinstalled. Your AMI becomes basic unit of deployment; it will save your time of installing required software again and again.

Wednesday, March 16, 2011

Hue Features

This Hue Tutorial will describe features of Cloudera Hue

Cloudera Hue is a web based UI for hadoop. It was started as Cloudera Desktop later named to hue. Hue provides following features:
  • Beeswax
  • File Browser
  • Job Designer
  • Job Browser
  • User Admin

Hue Installation and Configuration

This section describes instructions for cloudera Hue installation and change its default configuration like configure other database with hue and send notification/email of job completion etc...


Installing Hue on one machine with CDH in pseudo-distributed mode:


To Install Hue:
  • With this single command hue will get installed

$ sudo apt-get install hue

Friday, March 4, 2011

Running Cloudera in Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode (multi node cluster)

Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).


Running Cloudera in Pseudo Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Pseudo distributed mode (single node cluster)

Following steps tested on
Hadoop: CDH (Cloudera Distribution of Apache Hadoop)
OS: Ubuntu

Running Cloudera in Standalone Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Standalone mode(single node cluster)

Following steps tested on
Hadoop: CDH (Cloudera Distribution of Apache Hadoop)
OS: Ubuntu

Upgrade Hadoop

This section contains instructions to Upgrade Hadoop from one version to another. This tutorial explain how to Upgrade Hadoop, which is running in distributed mode(ie on cluster) without loss of data. Before starting upgrade procedure please ensure that no job is running.

COMMAND DESCRIPTION
bin/stop-mapred.sh Stop map-reduce cluster(s) and all client

applications running on the DFS cluster
bin/stop-dfs.sh Stop DFS using the shutdown command.
Install new version of Hadoop (On all the nodes in the cluster)

bin/start-dfs.sh -upgrade Start DFS cluster with -upgrade option
bin/start-mapred.sh Start map-reduce cluster
bin/hadoop dfsadmin --finalizeUpgrade Verify the components run properly and
then finalize the upgrade when convinced
If you get any error visit Hadoop Troubleshooting


Hadoop in Distributed Mode

This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in distributed mode(multi node cluster)

Prerequisite: Before starting hadoop in distributed mode you must setup hadoop in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine on a single machine).




Following steps tested on:
OS: ubuntu
Hadoop: Apache Hadoop 0.20.X

Hadoop in Pseudo Distributed Mode

After Running Hadoop in Standalone mode Lets deploy Hadoop on Single Machine:


This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Pseudo distributed mode (single node cluster) In this tutorial, I will describe required steps for deploying Hadoop. The main goal of this tutorial is to get a ”simple” Hadoop installation up and running so that you can play around with the software and learn more about it.


This Tutorial has been tested on:
  • Ubuntu Linux (10.04 LTS)
  • Hadoop 0.20.2

Hadoop in Standalone Mode

After Understanding What is Hadoop Lets deploy Hadoop on Single Machine:


This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest step by step tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Standalone mode(single node cluster). In this tutorial, I will describe required steps for deploying Hadoop. The main goal of this tutorial is to get a ”simple” Hadoop installation up and running so that you can play around with the software and learn more about it.


This Tutorial has been tested on:
  • Ubuntu Linux (10.04 LTS)
  • Hadoop 0.20.2

Understanding What is Hadoop


What is Hadoop:
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets (In the range of terabytes to zetabytes).