Showing posts with label High Performance Computing. Show all posts
Showing posts with label High Performance Computing. Show all posts

Hadoop Troubleshooting-002

How to resolve java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FsShell error.


bash-3.00$ hadoop dfs -ls
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FsShell

Solution:
Run following command and check whether jar hadoop-common-2.0.2-alpha-gphd-2.0.1.0.jar is in classpath.
$hadoop classpath


if given jar is not in the classpath then add following entry in hadoop-env.sh
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:YOUR-PATH/hadoop-2.0.2-alpha-gphd-2.0.1.0/share/hadoop/common/hadoop-common-2.0.2-alpha-gphd-2.0.1.0.jar


Hbase at Facebook

After understanding basics of HBase, let’s try to understand How Facebook uses HBase, I have got very good tutorial from Facebook, how they are using HBase for messeging. This tutorial includes Introduction to Hbase, Why Hbase, MySQL to HbaseMigration at Facebook


 

Hbase-A Soft Introduction & Quickstart

After understanding whats is Hadoop, and after deploying hadoop , lets’ start understanding HBase. This tutorial explains basics of HBase, and its features. Here I tried to explain functionality HBase provides and a quick start about HBase, a Basic tutorial for beginners. You will get to know where to use HBase, in which situation HBase can be useful.


Apache HBase
Source: Apache
Understanding What is HBase
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.

Optimize Map Reduce Job Performance

Optimize Hadoop Performance. To improve Hadoop performance, you need to change various configuration parameter in core-site.xml, hdfs-site.xml, mapred-site.xml. The configuration / optimization of parameter to improve performance depends on the type of processing, it depends on case to case, there is no hard and fast rule.

To install Hadoop on ubuntu cluster you can refer this post

We can change block size, number of mappers and reducers, sort factor, jvm reuse, memory for java process, enable compression, map output compression, use combiner, etc.
I found a very nice description given by Cloudera



Create an AMI


This blog will guide you through creating an Ubuntu AMI (Amazon Machine Image) from a launched Instance. In this tutorial we will create S3 backed AMI from running instance (Ubuntu). Before getting down to create an actual AMI let’s try to understand some basic terminologies:

Understand what AMI is: An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. We can say that AMI is an image from which an instance can boot.

What is Amazon EC2: Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers
Create your own AMI, so that you can boot new custom instance which have all the required software preinstalled. Your AMI becomes basic unit of deployment; it will save your time of installing required software again and again.

Hue Installation and Configuration

This section describes instructions for cloudera Hue installation and change its default configuration like configure other database with hue and send notification/email of job completion etc...


Installing Hue on one machine with CDH in pseudo-distributed mode:


To Install Hue:
  • With this single command hue will get installed

$ sudo apt-get install hue