Tuesday, November 27, 2012

Hbase-A Soft Introduction & Quickstart

After understanding whats is Hadoop, and after deploying hadoop , lets’ start understanding HBase. This tutorial explains basics of HBase, and its features. Here I tried to explain functionality HBase provides and a quick start about HBase, a Basic tutorial for beginners. You will get to know where to use HBase, in which situation HBase can be useful.

Apache HBase
Source: Apache
Understanding What is HBase
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.

In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object

List of the benefits / functionalities HBase offers:
      ·         Open source project (Apache)
      ·         A sparse , three-dimensional array of cells, indexed by: RowKey, ColumnKey, Timestamp/Version
      ·         Distributed, Reliable, large-scale data store
      ·         Efficient at random reads/writes
      ·         Sharded into regions along an ordered RowKey space
      ·         Within each region: Data is grouped into column families
      ·         Sort order within each column family: Row Key (asc), Column Key (asc), Timestamp (desc)
      ·         Store large amounts of data
      ·         High write throughput
      ·         Efficient random access within large data sets
      ·         Scale gracefully with data
      ·         For structured and semi-structured data
      ·         Don’t provide full RDMS capabilities (cross table transactions, joins, etc.)

After understanding basics of HBase, let’s deploy HBase on a single Node (in pseudo distributed mode)
      ·         Install Hadoop on single machine, you can refer howto setup Hadoop on single Node

Install / Setup HBase on Ubuntu
1. Download HBase
Download a stable version of HBase either from Apache or Cloudera

2. Untar Tar Ball
tar xzf hbase-*.tar.gz

3. Set Java_Home in hbase-env.sh
pico conf/hbase-env.sh

4. Add following entries to conf/hbase-site.xml


5. Comment in /etc/hosts
#           hostname

6. Start hbase

7. Start HBase Shell
bin/hbase shell

Congratulations HBase has been installed on your machine

Run jps command to check required daemons are running
3669  HMaster

Now let’s run some basic commands of HBase
1. Create table with name ‘test’ and column family ‘cf’
create 'test', 'cf'

2. List all the tables

3. Add data into ‘test’ table
put 'test', 'row1', 'cf:a', 'value_1'
put 'test', 'row2', 'cf:b', 'value_2'

4. Read the data from table ‘test’
scan ‘test’

This tutorial has been tested on
Hadoop 0.20.x
HBase 0.90.x
Ubuntu 12.04

No comments:

Post a Comment