After understanding whats is Hadoop, and
after deploying hadoop , lets’ start understanding HBase. This tutorial explains
basics of HBase, and its features. Here I tried to explain functionality HBase provides and a quick start about HBase, a Basic tutorial for beginners. You will get to
know where to use HBase, in which situation HBase can be useful.
This tutorial has been tested on
Hadoop 0.20.x
HBase 0.90.x
Ubuntu 12.04
Source: Apache
Understanding What is HBase
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.
In HBase a master node manages
the cluster and region servers store portions of the tables and perform the
work on the data. An HBase system comprises a set of tables. Each table
contains rows and columns, much like a traditional database. Each table must
have an element defined as a Primary Key, and all access attempts to HBase
tables must use this Primary Key. An HBase column represents an attribute of an
object
List of the benefits /
functionalities HBase offers:
·
Open source project (Apache)
·
A sparse , three-dimensional array of cells,
indexed by: RowKey, ColumnKey, Timestamp/Version
·
Distributed, Reliable, large-scale data store
·
Efficient at random reads/writes
·
Sharded into regions along an ordered RowKey
space
·
Within each region: Data is grouped into column
families
·
Sort order within each column family: Row Key
(asc), Column Key (asc), Timestamp (desc)
·
Store large amounts of data
·
High write throughput
·
Efficient random access within large data sets
·
Scale gracefully with data
·
For structured and semi-structured data
·
Don’t provide full RDMS capabilities (cross
table transactions, joins, etc.)
After understanding basics of HBase,
let’s deploy HBase on a single Node (in pseudo distributed mode)
Pre-requisites:
Install / Setup HBase on Ubuntu
1. Download HBase
2. Untar Tar Ball
tar xzf hbase-*.tar.gz
3. Set Java_Home in hbase-env.sh
pico conf/hbase-env.sh
4. Add following entries to conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.master</name>
<value>localhost:60000</value>
</property>
</configuration>
5. Comment in /etc/hosts
#127.0.0.1 hostname
6. Start hbase
bin/start-hbase.sh
7. Start HBase Shell
bin/hbase shell
Congratulations HBase has been
installed on your machine
Run jps command to check required
daemons are running
3669 HMaster
Now let’s run some basic commands
of HBase
1. Create table with name ‘test’ and
column family ‘cf’
create 'test', 'cf'
2. List all the tables
list
3. Add data into ‘test’ table
put 'test', 'row1', 'cf:a',
'value_1'
put 'test', 'row2', 'cf:b',
'value_2'
4. Read the data from table ‘test’
scan ‘test’
This tutorial has been tested on
Hadoop 0.20.x
HBase 0.90.x
Ubuntu 12.04
No comments:
Post a Comment