Python Tic Tac Toe Game with Source Code

Develop Classic Tic Tac Toe Game in Python 

The shortest competitive mind game ever made is the Tic-Tac-Toe. Make a 3x3 grid and play your game anywhere, anytime. Well, now it's possible to develop this game through python. Let’s find out!

About Tic Tac Toe Project:

The objective of this project is to build the Tic Tac Toe game through python programming. Python has some very cool graphic libraries and APIs like Tkinter for a better user-friendly interface of our game. Defining some conditions based on the rules of the game are enough to develop this game in this project.


Project Prerequisites:

The tic tac toe project will be needing a good knowledge of Python and GUI (Graphic User Interface). GUI is a powerful tool in Python for rendering graphics with the help of Tkinter. Also, defining functions and using them in outer scopes and GUI forms a chain to implement the game. We need to install the Tkinter library beforehand to implement it in our project. To install these libraries on PyCharm: 
[ File> Settings> Project> Project Interpreter> + > install the libraries].

Download Tic Tac Toe Python Source Code

Before proceeding ahead, please download the tic-tac-toe project code: Tic Tac Toe Project

Tic Tac Toe Game in Python

Let’s start the development

1. Initializing Game Components:

from tkinter import *
import tkinter.messagebox
tk = Tk()
tk.title("Tic Tac Toe")


a. The message box acts as a notification dialog box, which we will use to notify user 
b. To initialize tkinter, we use Tk() for its usage in the further project.

2. Initializing variables:

#The variables we require to mention the players:
userA = StringVar()
userB = StringVar()
play1 = StringVar()
play2= StringVar()

#Information of Names to get from the user:
player1_name = Entry(tk, textvariable=play1, bd=7)
player1_name.grid(row=1, column=1, columnspan=9)
player2_name = Entry(tk, textvariable=play2, bd=7)
player2_name.grid(row=2, column=1, columnspan=9)

#Initializing some conditions:
click = True
initial = 0

a. The user input required to get the names of the players for the game is in StringVar(). 
b. We add the user input spaces in our dialog box for the player names using the above values.

3. Defining Functions

#Initial state of the buttons before clicking:
def disabled():


configure() provides an attribute or a state to the designed button. Here, the state is disabled which means the grid buttons would not be responsive. 

#Defining message box for a particular input:
def btnClick(buttons):
  global click, initial, player2_name, player1_name, userA,userB
  if buttons["text"] == " " and click == True:
    	buttons["text"] = "X"
    	click = False
    	userB = play2.get() + " Wins!"
    	userA = play1.get() + " Wins!"
    	initial += 1

  elif buttons["text"] == " " and click == False:
    	buttons["text"] = "O"
    	click = True
    	initial += 1
    	tkinter.messagebox.showinfo("Tic-Tac-Toe", "Button already Pressed!")

a. We declare some global variables, click, initial, player2_name, player1_name, userA and userB. Global variables can also be used outside the scope. 
b. We have initialized click outside this function to be TRUE, it would print the empty string text to be "X". The next stage it changes to is false for the next user input. 
c. Similarly, if the click is FALSE, it would print the empty string to be “O”. The next stage it changes into is true for the next user input. 
d. Initial increments with the given input of “X” and “O”. 
e. By these button clicks, we check for the winning player by another defined function winCondition() and display the message on the screen. 

#Defining conditions for winning the game:
def winCondition():
  if (gameBtn1['text'] == 'X' and gameBtn2['text'] == 'X' and gameBtn3['text'] == 'X' or
    	gameBtn4['text'] == 'X' and gameBtn5['text'] == 'X' and gameBtn6['text'] == 'X' or
    	gameBtn7['text'] =='X' and gameBtn8['text'] == 'X' and gameBtn9['text'] == 'X' or
    	gameBtn1['text'] == 'X' and gameBtn5['text'] == 'X' and gameBtn9['text'] == 'X' or
    	gameBtn3['text'] == 'X' and gameBtn5['text'] == 'X' and gameBtn7['text'] == 'X' or
    	gameBtn1['text'] == 'X' and gameBtn2['text'] == 'X' and gameBtn3['text'] == 'X' or
    	gameBtn1['text'] == 'X' and gameBtn4['text'] == 'X' and gameBtn7['text'] == 'X' or
    	gameBtn2['text'] == 'X' and gameBtn5['text'] == 'X' and gameBtn8['text'] == 'X' or
    	gameBtn7['text'] == 'X' and gameBtn6['text'] == 'X' and gameBtn9['text'] == 'X'):
    	tkinter.messagebox.showinfo("Tic-Tac-Toe", userA)

  elif initial == 8:
    	tkinter.messagebox.showinfo("Tic-Tac-Toe", "It's a Tie")

  elif (gameBtn1['text'] == 'O' and gameBtn2['text'] == 'O' and gameBtn3['text'] == 'O' or
      	gameBtn4['text'] == 'O' and gameBtn5['text'] == 'O' and gameBtn6['text'] == 'O' or
      	gameBtn7['text'] == 'O' and gameBtn8['text'] == 'O' and gameBtn9['text'] == 'O' or
      	gameBtn1['text'] == 'O' and gameBtn5['text'] == 'O' and gameBtn9['text'] == 'O' or
      	gameBtn3['text'] == 'O' and gameBtn5['text'] == 'O' and gameBtn7['text'] == 'O' or
      	gameBtn1['text'] == 'O' and gameBtn2['text'] == 'O' and gameBtn3['text'] == 'O' or
      	gameBtn1['text'] == 'O' and gameBtn4['text'] == 'O' and gameBtn7['text'] == 'O' or
      	gameBtn2['text'] == 'O' and gameBtn5['text'] == 'O' and gameBtn8['text'] == 'O' or
      	gameBtn7['text'] == 'O' and gameBtn6['text'] == 'O' and gameBtn9['text'] == 'O'):
    	tkinter.messagebox.showinfo("Tic-Tac-Toe", userB)
buttons = StringVar()

The winning conditions are set according to the rules of Tic Tac Toe ie lines matching diagonally or straight 

#Name of the players:

label = Label( tk, text="Player 1:", font='Arial', bg='white', fg='black', height=1, width=8)
label.grid(row=1, column=0)

label = Label( tk, text="Player 2:", font='Arial', bg='white', fg='black', height=1, width=8)
label.grid(row=2, column=0)

# Styling the Game Buttons:

gameBtn1 = Button(tk, text=" ", font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn1))
gameBtn1.grid(row=3, column=0)

gameBtn2 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn2))
gameBtn2.grid(row=3, column=1)

gameBtn3 = Button(tk, text=' ',font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn3))
gameBtn3.grid(row=3, column=2)

gameBtn4 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn4))
gameBtn4.grid(row=4, column=0)

gameBtn5 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn5))
gameBtn5.grid(row=4, column=1)

gameBtn6 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10,command=lambda: btnClick(gameBtn6))
gameBtn6.grid(row=4, column=2)

gameBtn7 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn7))
gameBtn7.grid(row=5, column=0)

gameBtn8 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn8))
gameBtn8.grid(row=5, column=1)

gameBtn9 = Button(tk, text=' ', font='Arial', bg='black', fg='white', height=5, width=10, command=lambda: btnClick(gameBtn9))
gameBtn9.grid(row=5, column=2)

#To display the created graphic window:



We require a grid of 9 buttons to form the game structure for tic tac toe. To operate these buttons, provide them a condition to become active with btnClick().


In this article, we have implemented the Tic Tac Toe Game in Python. Tkinter GUI helped in providing a quick and friendly way to make use of the game features.

Python Project - Text Editor with python and Tkinter

Python text editor is a standalone deployable application that is used as an alternative to Notepad. It uses python and the Tkinter python GUI library to provide one with the basic tools and functions needed to keep a record.

python project text editor

Setting up the necessary dependencies

Text editor or notepad works on python 3 and thus it requires python 3 installed. For installing python 3, visit official python website and download the latest stable release and proceed further.

Once python 3 is installed and the path is set up in the environment variables, we can verify the installation by running python -V in the command line. Successful installing would show a version number while unsuccessful installation will trigger an error.

Before getting started it is important to note that the python version running is 3.x. and not python2. The project is built on python3 and hence the code won’t work with the python2.

The next library required for Python Text editor project to work is the Tkinter library. To install Tkinter, open up the command line and type the following command:

pip install tk

Start the development of text editor project in Python

Download Python Text Editor Source Code

Before proceeding ahead, please download the source code of Python Text Editor Project: Text Editor Project in Python

1. Import dependencies

from tkinter import *
import tkinter.scrolledtext as ScrolledText
import tkinter.filedialog as FileDialog
import tkinter.messagebox as MessageBox
import tkinter.simpledialog as SimpleDialog
import os
from tkinter import font

2. Set up working window

#Creating base for the main working window
base = Tk(className = ' TechnologyMania Text Editor')
textArea = ScrolledText.ScrolledText(base, width = 160, height = 50)

3. Code for menu bar

#Creating cascade menu and menu options
menu = Menu(base)
base.config(menu = menu)

fileMenu = Menu(menu)
menu.add_cascade(label = "File", menu = fileMenu)
fileMenu.add_command(label = 'New', command = newFile)
fileMenu.add_command(label = 'Open', command = openFile)
fileMenu.add_command(label = 'Save', command = saveFile)
fileMenu.add_command(label = "Exit", command = exiteditor)

editmenu = Menu(menu)
menu.add_cascade(label = 'Edit', menu = editmenu)
editmenu.add_command(label = 'Change Font', command = changeFont)


#Making the window remain open

4. New file option

#New File Function
def newFile():

    #if content present
    if len(textArea.get('1.0' , END)) > 0:
        if MessageBox.askyesno("Save?" , "Do you wish to save the file"):
            textArea.delete('1.0', END)
            base.title("TechnologyMania Text Editor")

            textArea.delete('1.0', END)
            base.title("TechnologyMania Text Editor")

5. Open file option

#Open File Function
def openFile():

    file = FileDialog.askopenfile(parent = base, title = 'Select a file', filetype = (("Text file" , "*.txt"),("All files" , "*.*")))

    base.title(os.path.basename( + "  - TechnologyMania Text Editor")

    if file != None:
        contents =
        textArea.delete('1.0', END)
        textArea.insert('1.0', contents)

6. Save file option

#Save File Function
def saveFile():

    file = FileDialog.asksaveasfile(mode = 'w', defaultextension = ".txt", filetype = (("Text file" , "*.txt"),("All files" , "*.*")))

    if file != None:
        data = textArea.get('1.0' , END)
        base.title(os.path.basename( + "  - TechnologyMania Text Editor")

7. Exit Option

#Exit function        
def exiteditor():

    if len(textArea.get('1.0' , END+'-1c')) > 0:
        if MessageBox.askyesno("Save?" , "Do you wish to save the file before closing"):


        if MessageBox.askyesno("Quit", "Are you sure you want to quit ?"):

8. Change Font Option

def changeFont():

    #creating new window
    fontbase = Tk(className = 'Choose your Font')
    fontbase.minsize(height = 600, width = 600)

    #defining font selection dropdown and label
    font_label = Label(fontbase, text = "Choose Font", font = ("Helvetica", 14))
    font_options = list(font.families())
    font_variable = StringVar(fontbase)
    font_menu = OptionMenu(fontbase, font_variable, *font_options)
    font_menu.pack() = 50, y = 100) = 35, y = 70)

    #defining style selection dropdown and label
    style_label = Label(fontbase, text = "Choose Style", font = ("Helvetica", 14))
    style_options = ['Normal', 'Bold', "Italic" , "Bold and Italic"]
    style_variable = StringVar(fontbase)
    style_menu = OptionMenu(fontbase, style_variable, *style_options)
    style_menu.pack() = 250 , y = 100) = 230, y = 70)

    #defining size selection dropdown and label
    size_label = Label(fontbase, text = "Choose Size", font = ("Helvetica", 14))
    size_options = ['6','8','10','12','14','16','20','24','30','32','36','40','46','52','60','72','80']
    size_variable = StringVar(fontbase)
    size_menu = OptionMenu(fontbase, size_variable, *size_options)
    size_menu.pack() = 450, y = 100) = 425, y = 70)

    #confirm funtion
    def confirm():
        f_C = font_variable.get()
        s_C = style_variable.get()
        si_C = int(size_variable.get())

        if(s_C == 'Bold'):
            textArea.configure(font=("f_C", si_C, "bold"))

        if(s_C == 'Normal'):
            textArea.configure(font=("f_C", si_C, "normal"))

        if(s_C == 'Italic'):
            textArea.configure(font=("f_C", si_C, "italic"))

        if(s_C == "Bold and Italic"):
            textArea.configure(font=("f_C", si_C, 'bold', "italic"))


    #cancel function
    def cancel():

    #creating and placing the buttons    
    confirm_btn = Button(fontbase, text = "Confirm", command = confirm)
    cancel_btn = Button(fontbase, text = "Cancel", command = cancel) = 400, y = 530) = 500, y = 530)


In this python project, we created a text editor or notepad with Tkinter. This is a good python project for beginners to brush up python skills.

If you like this project, please rate Technology Mania on Facebook

Hadoop Ecosystem Components

Hadoop - Most popular big data tool on the planet

Let us talk about the Hadoop ecosystem and its various components.

After reading this article you will come to know about what is the Hadoop ecosystem and which different components make up the Hadoop ecosystem.

The article explains the Hadoop ecosystem and all its components along with their features.

Hadoop Ecosystem

Hadoop Ecosystem comprises various components such as HDFS, YARN, MapReduce, HBase, Hive, Pig, Zookeeper, Flume, Sqoop, Oozie, and some more.

Hadoop ecosystem is a platform or framework that comprises a suite of various components and services to solve the problem that arises while dealing with big data. It consists of Apache Open Source projects and various commercial tools. The Hadoop ecosystem encompasses different services like (ingesting, storing, analyzing and maintaining) inside it.

Now let us understand each Hadoop ecosystem component in detail:

Components of Hadoop Ecosystem

Hadoop Ecosystem


Hadoop is known for its distributed storage (HDFS). Hadoop Distributed File System is a core component of the Hadoop ecosystem. It serves as a backbone for the Hadoop framework. HDFS enables Hadoop to store huge amounts of data from heterogeneous sources. HDFs stores data of any format either structured, unstructured or semi-structured. It is a java based distributed file system that provides distributed, fault-tolerant, reliable, cost-effective and scalable storage.

HDFS consists of two daemons, that is, NameNode and DataNode.

a. NameNode: NameNode is the master node in HDFS architecture. It keeps the meta-data about the data blocks like locations, permissions, etc. It does not store the actual data. It manages and monitors the DataNode.

b. DataNode: There are multiple DataNodes in the Hadoop cluster. The actual data is stored in DataNode. They are in-expensive commodity hardware responsible for performing processing.


Yet Another Resource Negotiator (YARN) manages resources and schedules jobs in the Hadoop cluster. It was introduced in Hadoop 2.0.

It is designed to split the functionality of job scheduling and resource management into separate daemons. YARN sits in between the HDFS and MapReduce. YARN consists of ResourceManager, NodeManager, and per-application ApplicationMaster.

ResourceManager is the central master node responsible for managing all processing requests. There are multiple NodeMangers. The

ResourceManager interacts with NodeManagers. Each slave DataNode has its own NodeManager for executing tasks. The ApplicationMaster negotiates resources from the ResourceManager. It works with NodeManager(s) for executing and monitoring the tasks.


  • It does better resource management.
  • Provide Scalability
  • Dynamic allocation of cluster resources.

3. MapReduce

MapReduce is the heart of the Hadoop framework. It is the core component in a Hadoop ecosystem for processing data. MapReduce provides the logic of processing. In simple words, MapReduce is a programming model for writing applications that processes huge amounts of data using distributed and parallel algorithms inside a Hadoop environment.

The MapReduce program consists of two functions that are Map() and Reduce(). The Map function performs filtering, grouping, and sorting. On the other hand, the Reduce function performs aggregation and summarization of the result which are produced by the map function. The input and output of the Map and Reduce function are key-value pairs. The output of the Map function is the input for the Reduce function.

Features of MapReduce

  • Simplicity – MapReduce jobs were easy to run. We can write MapReduce applications in any language such as C++, java, python, etc.
  • Scalability – Hadoop MapReduce can process petabytes of data.
  • Speed – MapReduce process data in a distributed manner thus processing can be done in less time.
  • Fault Tolerance – If one copy of data is unavailable, then the other machine has the replica of the same data which can be used for processing the same subtask.

4. Apache Spark

Apache Spark was developed by Apache Software Foundation for performing real-time batch processing at a higher speed. It was developed to meet the growing demands of processing real-time data that can't be handled by the map-reduce task. With its in-memory processing capabilities, it increases the processing speed and optimization.

Apache Spark can easily handle tasks like batch processing, iterative or interactive real-time processing, graph conversions, and visualization.

Features of Apache Spark

  • Speed: Spark is 100x times faster than Hadoop for large scale data processing due to its in-memory computing and optimization.
  • Ease of Use: It contains many easy to use APIs for operating on large datasets.
  • Generality: It is a unified engine that comes packaged with higher-level libraries, that include support for SQL querying, machine learning, streaming data, and graph processing.
  • Runs Everywhere: Apache Spark can run on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud.

5. Pig

Apache Pig is an abstraction over Hadoop MapReduce. Pig is a tool used for analyzing large sets of data. It is generally used with Apache Hadoop. Pig enables us to perform all the data manipulation operations in Hadoop. Pig provides Pig Latin which is a high-level language for writing data analysis programs.

Pig Latin provides various operators that can be used by programmers for developing their own functions for processing, reading, and writing data. For analyzing data using Pig, programmers have to write scripts using Pig Latin. Internally, these scripts are converted into map-reduce tasks. Pig Engine is a component in Apache Pig that accepts Pig Latin scripts as input and converts Latin scripts into Hadoop MapReduce jobs.

Apache Pig enables programmers to perform complex MapReduce tasks without writing complex MapReduce code in java.

Features of Apache Pig

  • Rich set of operators: It offers a rich set of operators to programmers for performing operations like sort, join, filer, etc.
  • Ease of programming: Pig Latin is very similar to SQL. It is easy for the developer to write a pig script if he/she is familiar with SQL.
  • Optimization opportunities: All the tasks in Pig automatically optimize their execution. Thus the programmers have to focus only on the language semantics.
  • UDF’s: Pig facilitates programmers to create User-defined Functions in any programming languages and invoke them in Pig Scripts.
  • Handles all kinds of data: We can analyze data of any format using Apache Pig. Pig stores result in Hadoop HDFS.

6. Hive

The hive was developed by Facebook to reduce the work of writing MapReduce programs. Apache Hive is an open-source data warehouse system that is used for performing distributed processing and data analyses.

It uses a Hive Query language (HQL) which is a declarative language similar to SQL.

Apache Hive translates all the hive queries into MapReduce programs. Hive supports developers to perform processing and analyses on huge volumes of data by replacing complex java MapReduce programs with hive queries. One who is familiar with SQL commands can easily write the hive queries.
Hive does three functions i.e summarization, query, and the analysis.
Hive is mainly used for data analytics.

Some major components of Hive are:

a. Hive client: Apache Hive provides support for applications written in any programming language like Java, python, Ruby, etc.

Beeline shell: It is the command line shell from which users can submit their queries to the system.

b. HiveServer2: It enables clients to execute its queries against the Hive.

c. Hive compiler: It parses the Hive query. Hive compiler performs type checking and semantic analysis on the different query blocks.

d. Metastore: It is the central repository that stores metadata.

Features of Hive

  • Scalable
  • Support all primitive data types of SQL
  • Support for user-defined function
  • Hive provides a tool for ETL operations and adds SQL like capabilities to the Hadoop environment

7. HBase

HBase is an open-source distributed NoSQL database that stores sparse data in tables consisting of billions of rows and columns. It is modeled after Google’s big table and is written in java. HBase provides support for all kinds of data and is built on top of Hadoop. We use HBase when we have to search or retrieve a small amount of data from large volumes of data.

For example: Consider a case in which we are having billions of customer emails. In all these emails we have to find out the customer name who has used the word cancel in their emails. The request required to be processed quickly. For such cases HBase was designed.

The components of HBase are:

a. HBase Master: HBase Master is not a part of the actual data storage. It is responsible for negotiating load balancing across all the RegionServer. It monitors and maintains a Hadoop cluster and controls the failover. HMaster handles DDL operation.

b. RegionServer: RegionServer is the worker node. It handles read, writes, delete, and update requests from the clients. Region server process will run on every node in the Hadoop cluster. It runs on HDFS DateNode.

Features of HBase

  • Scalable storage
  • Support fault-tolerant feature
  • Support for real-time search on sparse data

8. HCatalog

Hadoop ecosystem provides a table and storage management layer for Hadoop called HCatalog. The users with different data processing tools like Hive, Pig, MapReduce can easily read and write data on the grid using HCatalog. It explores the metadata stored in the meta-store of Hive to all other applications. It allows users to store data in any format and structure. User doesn’t have to worry about in which format the data is stored.
HCatalog supports RCFile, CSV, JSON, sequence file, and ORC file formats by default.

Features of HCatalog

  • It enables notifications of data availability.
  • HCatalog frees the user from the overhead of data storage and format with table abstraction.
  • HCatalog can provide visibility for data cleaning and archiving tools.

9. Thrift

Apache Thrift is a software framework from Apache Software Foundation for scalable cross-language services development. It was developed at Facebook. Apache thrift combines the software stack with a code generation engine for building cross-language services. Thrift is an interface definition language for the communication of the Remote Procedure Call. For performance reasons, Apache Thrift is used in the Hadoop ecosystem as Hadoop does a lot of RPC calls.

10. Apache Flume

Apache Flume is an open-source tool for ingesting data from multiple sources into HDFS, HBase or any other central repository. It is a distributed system design for the purpose of moving data from various applications to the Hadoop Distributed File System. Using Flume, we can collect, aggregate, and move streaming data ( example log files, events) from web servers to centralized stores. Apache Flume acts as a courier server between various data sources and HDFS. Apache Flume transfers data generated by various sources such as social media platforms, e-commerce sites, etc. into Hadoop storage.

Features of Apache Flume

  • Apache Flume is a scalable, extensible, fault-tolerant, and distributed service.
  • Apache Flume has a simple and flexible architecture.
  • Apache Flume is horizontally scalable.
  • Apache Flume has the flexibility of collecting data in batch or real-time mode.

11. Apache Sqoop

Apache Sqoop is another data ingestion tool. It is designed for transferring data between relational databases and Hadoop. It is used for importing data to and exporting data from relational databases. Most enterprises store data in RDBMS, so Sqoop is used for importing that data into Hadoop distributed storage for analyses.

The database admins and the developers can use the command-line interface for importing and exporting data. Apache Sqoop converts these commands into MapReduce format and sends them to the Hadoop Distributed FileSystem using YARN. The Sqoop import tool imports individual tables from relational databases to HDFS. The Sqoop export tool exports the set of files from the Hadoop Distributed FileSystem back to an RDBMS.

Features of Sqoop

  • It supports compression.
  • Sqoop is fault-tolerant.
  • Sqoop can perform concurrent operations like Apache Flume.
  • Sqoop supports Kerberos authentication.

12. Oozie

Oozie is a scheduler system that runs and manages Hadoop jobs in a distributed environment. Oozie allows for combining multiple complex jobs and allows them to run in a sequential manner for achieving bigger tasks. It is a Java Web-Application. Oozie is open source and available under Apache license 2.0. Apache Oozie is tightly integrated with the Hadoop stack. It supports all Hadoop jobs like Pig, Sqoop, Hive, and system-specific jobs such as Shell and Java.

Oozie triggers workflow actions, which in turn use the Hadoop execution engine for actually executing the task.

There are two kinds of Oozie jobs:

a. Oozie workflow: The Oozie workflow is the sequential set of actions that are to be executed. We can assume this as a relay race.

b. Oozie Coordinator: The Oozie Coordinator are the Oozie jobs that are triggered when the data is available to it. We can assume it as the response-stimuli system in our body. Oozie Coordinator responds to the availability of data and rests otherwise.

Features of OOzie

  • Oozie can leverage existing Hadoop systems for fail-over, load balancing, etc.
  • It detects task completion via callback and polling.
  • It is extensible, scalable, and reliable.

13. Avro

Avro is an open-source project. Avro provides data exchange and data serialization services to Apache Hadoop. Both of these services can be either used independently or together. Avro provides the facility of exchanging big data between programs that are written in any language. With the Avro serialization service, the programs efficiently serialize data into the files or into the messages.

It stores data definitions as well as data together in one file or message. The data definition stored by Avro is in JSON format. This makes it easy to read and interpret. The data stored by Avro is in a binary format that makes it compact and efficient.

Features of Avro

  • Rich data structures.
  • Remote procedure call.
  • Compact, fast, binary data format.
  • A container file, to store persistent data.

14. Apache Ambari

Apache Ambari is an open-source project that aims at making management of Hadoop simpler by developing software for managing, monitoring, and provisioning Hadoop clusters. It is an administration tool that is deployed on the top of Hadoop clusters. Ambari keeps track of the running applications and their status. It provides an easy-to-use Hadoop cluster management web User Interface backed by its RESTful APIs.

It allows a wide range of tools such as Hive, MapReduce, Pig, etc. to be installed on the Hadoop cluster and manages and monitors their performance.

Features of Apache Ambari

  • It is flexible.
  • Adaptive technology thus fits well in the enterprise environment.
  • Provide authentication, authorization, and auditing through Kerberos.
  • User-friendly configuration.

15. Apache Drill

Apache Drill is another most important Hadoop ecosystem component. The main purpose of Apache Drill is large-scale processing of structured as well as semi-structured data. Apache Drill is a low latency distributed query engine. It is scalable and can scale to several thousands of nodes. It can query petabytes of data. Apache Drill has a schema-free model.

Features of Apache Drill

  • It has a specialized memory management system for eliminating garbage collection and optimizing memory usage.
  • It allows the reuse of existing Hive deployment to the developers.
  • Apache Drill provides an extensible and flexible architecture at all layers including query optimization, query layer, and client API.
  • Apache Drill provides a hierarchical columnar data model for representing highly dynamic, complex data.
  • It allows for efficient processing.

16. Apache Zookeeper

Apache Zookeeper is a Hadoop Ecosystem component for managing configuration information, providing distributed synchronization, naming, and group services. Zookeeper is used by groups of nodes for coordination amongst themselves and for maintaining shared data through robust synchronization techniques. ZooKeeper is a distributed application providing services for writing a distributed application.

Before the development of Zookeeper, it was really very difficult and time consuming for maintaining coordination between various services in the Hadoop Ecosystem. Zookeeper makes coordination easier and saves a lot of time through synchronization, grouping and naming, configuration maintenance.

Features of Zookeeper

  • Zookeeper is fast with workloads.
  • It maintains a record of all the transactions.
  • It offers atomicity that a transaction would either complete or fail, the transactions are not partially done.

17. Solr & Lucene

The Apache Solr and Apache Lucene are the two services in the Hadoop Ecosystem. They are used for searching and indexing. Lucene is based on Java and helps in spell checking. If Apache Lucene is the engine that Apache Solr is the car that builds around the engine. Thus, Apache Solr is the complete application that is built around Apache Lucene. It uses Lucene java library for searching and indexing.

18. Apache Mahout

It is an open-source top-level project at Apache. It is used for building scalable machine learning algorithms. Apache Mahout implements various popular machine learning algorithms like Clustering, Classification, Collaborative Filtering, Recommendation, etc. The Apache Mahout does:

a. Collaborative filtering: Apache Mahout mines user behaviors, user patterns, and user characteristics. And on the basis of this, it predicts and provides recommendations to the users. E-commerce websites are typical use-case.

b. Clustering: Apache Mahout organizes all similar groups of data together.

c. Classification: Classification means classifying and categorizing data into several sub-departments. For example, Apache Mahout can be used for categorizing articles into blogs, essays, news, research papers, etc.

d. Frequent itemset missing: Here Apache Mahout checks for the objects which are likely to be appearing together. It makes suggestions if objects are missing. For example, if we search for mobile then it will also recommend mobile cover because in general mobile and mobile cover are brought together.

Features of Apache Mahout

  • It works well in a distributed environment.
  • It scales effectively in the cloud infrastructure.
  • Apache Mahout offers a ready-to-use framework to its coder for doing data mining tasks.
  • It lets applications analyze huge data sets effectively in a quick time.


I hope after reading this article, you clearly understand what is the Hadoop ecosystem and what are its different components. Hadoop ecosystem comprises many open-source projects for analyzing data in batch as well as real-time mode. In the Hadoop ecosystem, there are many tools that offer different services. These Hadoop Ecosystem components empower Hadoop functionality.