Friday, March 4, 2011

Understanding What is Hadoop

What is Hadoop:
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets (In the range of terabytes to zetabytes).

Who uses Hadoop:
Hadoop is mainly used by the companies which deals with large amount of data. They may require Process the dataPerform Analysis or Generate ReportsCurrently all leading organizations including Facebook, Yahoo, Amazon, IBM, Joost, PowerSet, New York Times, Veoh etc are using Hadoop. For more Information click here

Why Hadoop:
MapReduce is Google's secret weapon: A way of breaking complicated problems apart, and spreading them across many computers. Hadoop is an open source implementation of MapReduce, and its own filesystem HDFS(Hadoop distributed file system)

Hadoop has defeated Super Computer in tera sort:
Hadoop clusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark. The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won. For more Information click here

Europe’s Largest Ad Targeting Platform Uses Hadoop:
Europe’s Largest Ad Company get over 100GB of data daily, Now using classical solution like RDBMS they need 5 days to for analysis and generate reports. So they were running 1 weak behind. After lots of research they started using hadoop. Now Interesting fact is "Tey are able to process data and generate reports with in 1 Hour" Thats the beauty of Hadoop. For more Information click here

Leading Distributions of Hadoop:

1. Apache Hadoop:
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
Apache Hadoop Offers:

  • Hadoop Common: The common utilities that support the other Hadoop subprojects.
  • HDFS: A distributed file system that provides high throughput access to application data.
  • MapReduce: A software framework for distributed processing of large data sets on compute clusters.
  • Avro: A data serialization system.
  • Chukwa: A data collection system for managing large distributed systems.
  • HBase: A scalable, distributed database that supports structured data storage for large tables.
  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout: A Scalable machine learning and data mining library.
  • Pig: A high-level data-flow language and execution framework for parallel computation.
  • ZooKeeper: A high-performance coordination service for distributed applications.

2. Cloudera Hadoop:
Cloudera’s Distribution for Apache Hadoop (CDH) sets a new standard for Hadoop-based data management platforms. It is the most comprehensive platform available today and significantly accelerates deployment of Apache Hadoop in your organization. CDH is based on the most recent stable version of Apache Hadoop. It includes some useful patches backported from future releases, as well as improvements we have developed for our customers

Cloudera Hadoop Offers:
  • HDFS – Self healing distributed file system
  • MapReduce – Powerful, parallel data processing framework
  • Hadoop Common – a set of utilities that support the Hadoop subprojects
  • HBase – Hadoop database for random read/write access
  • Hive – SQL-like queries and tables on large datasets
  • Pig – Dataflow language and compiler
  • Oozie – Workflow for interdependent Hadoop jobs
  • Sqoop – Integrate databases and data warehouses with Hadoop
  • Flume – Highly reliable, configurable streaming data collection
  • Zookeeper – Coordination service for distributed applications
  • Hue – User interface framework and SDK for visual Hadoop applications
Architecture of Hadoop:
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data

Source: Apache

Name Node:
NameNode manages the namespace, file system metadata, and access control. There is exactly one NameNode in each cluster. We can say NameNode is master and data nodes are slave. It Contains all the informations about data(ie meta data)

Data Node:
DataNode Holds Actual file system data. Each data node manages its own locally-attached storage (i.e., the node's hard disk) and stores a copy of some or all blocks in the file system. There are one or more DataNodes in each cluster.

Install / Deploy Hadoop:
Hadoop can be installed in 3 modes
1. Standalone mode:
To deploying Hadoop in standalone mode, we just need to set path of JAVA_HOME. In this mode there is no need to start the daemons and no need of name node format as data save in local disk. For Tutorial / Instructions click here
2. Pseudo Distributed mode:
In this mode all the daemons(nameNode, dataNode, secondaryNameNode, jobTracker, taskTracker) run on single machine. For Tutorial / Instructions click here
3. Distributed mode:
In this mode daemons(nameNode, jobTracker, secondaryNameNode(Optionally)) run on master(NameNode) and daemons(dataNode and taskTracker) run on slave(DataNode). For Tutorial / Instructions click here



  1. Like it even though most of the content is copied from Hadoop KB. You could have given credits to the original site. But it's a cool compilation

  2. Nice explanation on storage and distributed processing of very large data sets on computer clusters built from commodity hardware. This is the thing i was expecting. Thanks for sharing.

    Hadoop Training Chennai

    1. Hibernate Training Institutes in ChennaiHibernate Training Institutes in Chennai Hibernate Online Training Hibernate Online Training Hibernate Training in Chennai Hibernate Training in Chennai Java Online Training Java Online Training

    2. CommonJS Training in Chennai CommonJS Training Node.js Training in CHennai Node.js Training in chennai HTML5 Training in Chennai HTML5 Training in Chennai

  3. Wonderful sharing of big data storage concept using hadoop. Hadoop online Training

  4. Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai

  5. Software developers to create stunning mobile application with ease. Further, they can make use of this platform at free of cost. Java Course in Chennai | Best JAVA Training in ChennaiThanks for your informative post on Java application development. This open source platform assists

  6. I agree with your thoughts!!! As the demand of java programming application keeps on increasing, there is massive demand for java professionals in software development industries. Thus, taking training will assist students to be skilled java developers in leading MNCs. J2EE Training in Chennai | JAVA Training Institutes in Chennai

  7. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
    Web Designing Training in Chennai || Selenium Training in Chennai

  8. The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai

  9. This technical post helps me to improve my skills set, thanks for this wonder article I expect your upcoming blog, so keep sharing.
    ccna course in Chennai|ccna training in Chennai|ccna institutes in Chennai|ccna courses in Chennai

  10. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.
    sap training in Chennai|SAP institutes in chennai|SAP training chennai|SAP Training in Chennai

  11. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.
    sap training in Chennai|SAP institutes in chennai|SAP training chennai|SAP Training in Chennai

  12. Thanks for your post; selenium is most trusted automation tool to validate web application and browser. This tool provides precise and complete information about a software application or environment. Selenium Training in Chennai | Selenium Course in Chennai | Best Selenium training institute in Chennai

  13. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Best hadoop training institute in chennai | Big Data Hadoop Training in Chennai | Hadoop Course in Chennai

  14. I like your writing style, it was very clear to understanding the concept well; I hope you ll keep your blog as updated.
    Angularjs training in chennai|Angular institutes in Chennai|Angular training institutes in Chennai

  15. Latest technology have created a greater impact over testing web applications. This vital in identifying important issues that raises in web appplications. Thanks for sharing this information in here. Keep blogging article like this.

    Selenium training in chennai | Selenium training chennai | Best selenium training in chennai

  16. is an american company which offfers CRM based cloud services and it is loved globally for it quality services
    salesforce training in chennai|salesforce training institute in chennai | salesforce course in chennai

  17. Amazing content.If you are interested instudying nodejs visit this website. Nodejs is an open source, server side web application that enables you to build fast and scalable web application that is capable of running large number of simultaneous connections that has high throughput.
    Node js Training in Chennai | Node JS training institute in chennai

  18. thanku for sharing..

    SEO training in hyderabad by experts in digital markeing And by prosessional experts in seo.All the training by placement and also guide by the professionals.SEO training in hyderabad

  19. Excellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this.

    Loadrunner Training in Chennai|Software testing training in chennai|Qtp training in Chennai

  20. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    Node JS training in chennai | Node JS training institute in chennai

  21. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Salesforce training in Chennai | salesforce course in Chennai

  22. This data is magnificent. I am impressed with your writing style and how properly you define this topic. After studying your post, my understanding has improved substantially. Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic.
    Salesforce Training institutes in Chennai|Salesforce Course in Chennai|Salesforce Training in Chennai

  23. The usage of third party storage system for the data storage can be avoided in cloud computing and we can store, access the data through internet.
    cloud computing training in chennai | cloud computing courses in chennai

  24. Thanks for Sharing the valuable information and thanks for sharing the wonderful article..We are glad to see such a wonderful article..
    QTP Training in Chennai | QTP Training Institute in Chennai | QTP Training

  25. I have read your blog its very attractive and impressive. I like it your blog.

    JavaEE Training in Chennai JavaEE Training in Chennai

    Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

    Java Online Training Java Online Training Core Java 8 Training in Chennai Java 8 Training in Chennai


  26. Great content thanks for sharing this informative blog which provided me technical information keep posting.
    Selenium Training in Chennai | Selenium Testing Course in Chennai

  27. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.
    QTP Training Centers in Chennai | Selenium Training Centers in Chennai

  28. This is my first visit to your blog, your post made productive reading, thank you. dot net training in chennai

  29. Excellent sharing. Testing is a wonderful field for IT peoples. Want to learn Selenium Training reach GangBoard. Hadoop Online Training

  30. You have done really great job. Your blog is very unique and informative. Thanks. Devops Online Training | Data Science Online Training

  31. Nice sharing. R is a language and environment for statistical computing and graphics. Want to make a career in R Programming. Learn R Programming Online Training course @ GangBoard. We are the best provider of online training on evergreen technologies.

  32. This information is impressive; I am inspired by Selenium Training in Chennai your post writing style & how continuously you describe this topic. After reading your post,thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
    Selenium Training in Velachery

  33. Great...You have clearly explained Selenium Training in Chennai about Memorandum of Understanding..It's more informative and easy to understand..Keep on blogging.. Selenium Course in Chennai

  34. The best thing is that your blog really informative thanks for your great information!
    erp software in chennai | erp solutions in chennai | erp software development company in chennai

  35. Thank you for this valuable information. I have got some important suggestions from it. Get your business to the next level in simple steps.
    erp software company in chennai | erp software providers in chennai.

  36. I like it your blog. Thank you for Sharing the valuable information!
    erp providers in chennai | erp software solutions in chennai

  37. Thanks for sharing this information and keep updating us. This content is quite informatics to me.
    Hadoop Training in Chennai | Hadoop Training Chennai | Big Data Training in Chennai

  38. Nice post. Very interesting to read. Thank you for Sharing.
    erp software in chennai


  39. Thanks for sharing this information and keep updating us. This is informatics and really useful to me.
    Selenium Training in Chennai | Selenium Training | Selenium Course in Chennai

  40. I found Lot of informative things in CALL360, 24 HOURS MEDICAL SHOP, Hospitals, Call Taxi, Many more, More than 200 categories,
    1 Million Business listings. Keep your city in finger tip currently CALL360 serve Chennai, Kanchipuram & Thiruvallur District Very shortly They expand to entire South India.

  41. Ladies hostel in chennai OMR providing state of the art facilities including Air condition, food, safety, security. Get best hostel facilitiesWomen hostel Adyar

  42. brilliant article that I was searching for. Helps me a lot
    call360 is Fastest local search Engine we have 12 years of experience in online industery, in our Search Engine we offer,
    more than 220 categories and 1 Million Business Listing most frequently search categories
    are Money exchange Chennai and Bike mechanic Chennai,
    we deliver 100% accure data to users & 100% Verified leads to our
    registered business vendors and our most popular categories are
    AC mechanic chennai,
    Advertising agencies chennai
    catering services chennai

  43. brilliant article that I was searching for. Helps me a lot.
    We are one of the Finest ladies hostel near OMR and our
    womens hostel in adyar is secure place for working womens
    we provide home based food with hi quality, our hostel located very near to Adyar bus depot.
    womens hostel near Adyar bus depot, we are one of the best and experienced
    womens hostel near omr

  44. Nice information about Hadoop Thanks for sharing it
    Hadoop Training in Chennai