Sunday, April 29, 2012

Save Data in EBS Volume

This tutorial will guide you through create Amazon EBS (Elastic Block Store) volume, attach it to your running instance and save your data to Amazon EBS. Before that lets understand what is Amazon EBS volumes and what are the features it provides

Amazon Elastic Block Store (EBS) provides block level storage volumes for use with Amazon EC2 instances. We can imagine it as attaching an external hard drive to your system to store data. We can attach multiple EBS volumes to an instance, but one volume can be attached to single instance at a time. Data will be remain saved in the volume after your instance is terminated.

Some Features of Amazon EBS Volumes
·        Amazon EBS allows you to create storage volumes from 1 GB to 1 TB
·        Amazon EBS volumes placed in a specific availability zone can then be attached to instances in that same availability zone.
·        Each storage volume is automatically replicated within the same Availability Zone. This prevents data loss due to failure of any single hardware component.
·        Amazon EBS also provides the ability to create point-in-time snapshots of volumes, which persists to Amazon S3. These snapshots can be used as the starting point for new Amazon EBS volumes. Snapshot can be used to instantiate new volume
·        AWS also enables you to create new volumes from AWS hosted public data sets.
·        Amazon CloudWatch exposes performance metrics for EBS volumes, giving you insight into bandwidth, throughput, latency, and queue depth.

Saturday, April 28, 2012

Deploy Hadoop Cluster

Step by Step Tutorial to Deploy Hadoop Cluster (fully distributed mode):
To setup Hadoop in cluster (distributed cluster) requires multiple machines/nodes, one node will act as master and rest all will act as slaves.
If you want Hadoop quick introduction please click here.
If you want to setup hadoop in pseudo distributed mode please click here

In this tutorial:
  • I am using 3 nodes, 1 master 2 slaves
  • I am using Cloudera distribution for Apache hadoop CDH3U3 (you can use Apache hadoop (0.20.X) also)
  • I am deploying hadoop on ubuntu (you can use other OS (cent OS, Redhat, etc))

Install / Setup Hadoop on cluster

Install Hadoop on master:

1. Add entry of master and slaves in hosts file:
Edit hosts file and following add entries
$ sudo pico /etc/hosts
MASTER-IP    master
SLAVE01-IP   slave01
SLAVE02-IP   slave02
(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP)