Showing posts with label HBase. Show all posts
Showing posts with label HBase. Show all posts

Hbase at Facebook

After understanding basics of HBase, let’s try to understand How Facebook uses HBase, I have got very good tutorial from Facebook, how they are using HBase for messeging. This tutorial includes Introduction to Hbase, Why Hbase, MySQL to HbaseMigration at Facebook


 

Hbase-A Soft Introduction & Quickstart

After understanding whats is Hadoop, and after deploying hadoop , lets’ start understanding HBase. This tutorial explains basics of HBase, and its features. Here I tried to explain functionality HBase provides and a quick start about HBase, a Basic tutorial for beginners. You will get to know where to use HBase, in which situation HBase can be useful.


Apache HBase
Source: Apache
Understanding What is HBase
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.

Deploy Apache Flume NG (1.x.x)

In this tutorial I have explained how to install / deploy / configure Flume NG on single system, and how to configure Flume NG to copy data to HDFS, then configuration for copying data to HBase

Before going to configurations let’s understand what Flume NG (1.x.x) is:
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic application.





Understanding What is Hadoop


What is Hadoop:
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets (In the range of terabytes to zetabytes).