Showing posts with label Linux. Show all posts
Showing posts with label Linux. Show all posts

Deploy Apache Flume NG (1.x.x)

In this tutorial I have explained how to install / deploy / configure Flume NG on single system, and how to configure Flume NG to copy data to HDFS, then configuration for copying data to HBase

Before going to configurations let’s understand what Flume NG (1.x.x) is:
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic application.





Create Ubuntu AMI from Scratch on local machine

This guide will explain about creating AMI from scratch. Here we will create AMI on local system. The main benefit of creating AMI on local system is cost saving; we do not need to launch instance for configuring application. Instead we can configure our OS, install / configure required software and then create AMI on local system. Then we can upload newly created AMI on s3. Now from this AMI we can launch instance when we need them. In this way we will get pre-configured instance. In this tutorial we will create Ubuntu AMI from scratch. You can also follow same procedure on cloud (ie you can create this on instance also)

In this tutorial we will create (create AMI from scratch on local system), bundle (bundle the image), upload (upload newly created AMI on s3), run(run the instance based on this AMI) AMI.

What AMI is: An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. We can say that AMI is an image from which an instance can boot

Understanding What is Hadoop


What is Hadoop:
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets (In the range of terabytes to zetabytes).