Monday, July 11, 2016

Cloudera Hue - A soft Introduction

This Hue Tutorial will describe features of Cloudera Hue. Hue is a set of web applications that enable you to interact with a CDH cluster. Hue applications let you browse HDFS and work with Hive and Cloudera Impala queries, MapReduce jobs, and Oozie workflows. Cloudera Hue is a handy tool for the windows based use, as it provides a good UI with the help of which we can interact with Hadoop and its sub-projects. We can manage users, jobs, Hive, HBase, Impala tables, Oozie workflow, Sqoop, Yarn, etc.
Cloudera Hue
Cloudera Hue is a web based UI for hadoop. It was started as Cloudera Desktop later named to hue. Hue provides following features:
  • Beeswax
  • File Browser
  • Job Designer
  • Job Browser
  • User Admin
  • Beeswax is basically Hive UI
  • It Provides following features: Create Hive tables; Load data; Run and Manage Hive queries and Download results in Excel format
File Browser:
  • It is basically browser for distributed file system
  • It Provides following features: Browse HDFS; Change permissions and ownership; upload, download, view and edit files
Job Designer:
  • We can design MapReduce jobs, which can be templates that prompt for parameters when they are submitted.
  1. Streaming: To create Map/Reduce functions in any non-Java language
  2. Jars: To create Map/Reduce job in java language
  3. Install samples: To run samples which are already installed
Job Browser:
  • Job Browser provides all the information about jobs submitted like their state(run/complete/fail), user, submission time, completion time etc
  • It provides following features: View jobs; Tasks; Counters; Logs, etc.
  • This feature is quit similar to job tracker UI given by hadoop
User Admin:
  • Account management for HUE users
  • Create and delete users, and update their passwords and e-mail addresses.
In the latest version of Hue connection with following products has been added:
  • Oozie
  • Impala
  • HBase
  • Sqoop
  • Yarn

Thursday, June 4, 2015

Is Big Data just a Hype, Deep Dive into Big Data

Watch the exclusive recording of Big Data Session conducted By DataFlair

Are you Ready to Migrate your Career in the Latest upcoming Technology Big Data

Understand how Big Data is the Biggest Buzz Word of the Industry

What is Big Data

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.

What leaders say About Big Data:

Big Data is the new Oil
 - Gartner

Hadoop will grow at CAGR of 58%, will reach $50 billion by 2020
 - Experfy

Big Data market will be growing 6 times faster than the overall IT market
 - IDC

Why learn Big Data?

It is no secret that the data content in the world is growing exponentially. For last two decades, IT communities have been grappling with the issue of managing the glut of data. Google has been at the forefront of this problem and came up with a framework which is now widely known as Big Data & Hadoop. This framework fundamentally changes the traditional approaches – which are no longer coping with the volumes of data being generated today.

This video Tutorial covers Following Topics:

 - Why Big Data is biggest Buzz Word
 - Basics of Big Data & Hadoop
 - Essence of Big Data (volume, velocity, variety, veracity of Big Data)
 - Problems with conventional systems like RDBMS and OS file-system
 - Introduction of Hadoop & Hadoop ecosystem
 - Real time Hadoop use cases
 - Future of Hadoop & Careers in Hadoop
 - Job Roles in Hadoop like: Hadoop Analyst, Hadoop Developer, Hadoop Admin, Hadoop Architect etc.
 - How DataFlair will help you in making your career in Big Data.

Are you Ready to Migrate your Career in the Latest upcoming Technology Big Data

Big Data Hadoop Tutorial For Beginners

Why learn Big Data Hadoop?

We create 2.5 quintillion bytes of data every day. So much that 90% of the data in the world today has been created in the last two years alone (Source: IBM). These extremely large datasets are hard to deal with using legacy systems such as RDBMS as data exceed the storage and processing capacity of database. The legacy systems are becoming obsolete.
According to Gartner: “Big Data is new Oil”. Big Data is all about finding the needle of value in a haystack of Structured, Semi-structured and Un-structured data. Hadoop (the Solution of All Big Data Problems) has become the most important component in the data stack, which enables rapid processing of data at petabyte scale. Hadoop is expected to be at the core of more than half of all analytics software within the next two years.

Watch the exclusive recording of Big Data Live Session

In this tutorial, you will be gaining knowledge on:
- Basics of Big Data & Hadoop
- Introduction to Big Data
- Why Big Data
- Essence of Big Data-volume, velocity, variety, veracity
- Problems with conventional systems like RDBMS and OS file-system
- Introduction of Hadoop
- Introduction of Map-Reduce and HDFS
- Real time Hadoop use cases
- Introduction of Hadoop ecosystem
- Future of Hadoop
- Careers in Hadoop
- Job Roles in Hadoop like: Hadoop Analyst, Hadoop Developer, Hadoop Admin, etc.

Sunday, May 17, 2015

Install Hadoop in Distributed mode - Setup Hadoop Cluster on Cloud

This tutorial explains How to Setup and configure Hadoop on Multiple machines, i.e. Installation of Hadoop in Distributed Mode. In the cluster setup there is one master and 2 slaves will be configured. During the deployment all the pre-requisites will be installed. Hadoop installation is done on Amazon cloud (AWS).
Follow following video tutorial for the installation and configuration of Hadoop 1 in distributed mode  (real cluster mode)on Amazon Cloud:

In this video following topics has been covered:
 - Installation and configuration of Hadoop 1.x or Cloudera CDH3Ux in Distributed mode (on multiple node cluster).
 - Launch 3 instances on AWS (Amazon Cloud), on which we will setup the real cluster. One instance will act as Master and rest all the instances will act as slaves.
 - Prerequisites for hadoop Installation.
   -- Installation of Java.
   -- Setup of password-less ssh.
 - Important configurations properties.
 - Setup Configuration in core-site.xml, hdfs-site.xml, map-red-site.xml.
 - Format name-node.
 - Start hadoop services: NameNode, DataNode, secondary-namenode, JobTracker, TaskTracker.
 - Setup environment variables for  Hadoop,
 - Submit Map-Reduce Job.