This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in distributed mode(multi node cluster)
Prerequisite: Before starting hadoop in distributed mode you must setup hadoop in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine on a single machine).
Following steps tested on:
OS: ubuntu
Hadoop: Apache Hadoop 0.20.X
Deploy Hadoop in Distributed Mode:
Prerequisite: Before starting hadoop in distributed mode you must setup hadoop in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine on a single machine).
Following steps tested on:
OS: ubuntu
Hadoop: Apache Hadoop 0.20.X
Deploy Hadoop in Distributed Mode:
COMMAND | DESCRIPTION |
---|---|
$ bin/stop-all.sh | run this cmd on all machines in cluster (master and slave) |
$ vi /etc/hosts | Then type IP-add master(eg: 192.168.0.1 master) IP-add slave(eg: 192.168.0.2 slave) run this cmd on all machines in cluster (master and slave) |
setting passwordless ssh (on all the machines you must login with same user name) run this cmd on master | |
or $ cat | we can also set passwordless ssh manually |
$ vi conf/master
then type master
| run this cmd on master |
$ vi conf/slaves
then type slave
| run this cmd on all machines in cluster (master and slave) |
$ vi conf/core-site.xml
then type:
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
| Edit configuration file core-site.xml run this cmd on all machines in cluster (master and slave) |
$ vi conf/mapred-site.xml
then type:
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
| Edit configuration file mapred-site.xml run this cmd on all machines in cluster (master and slave) |
then type:
<property>
<name>dfs.replication</name>
<value>2</value> </property> | Edit configuration file hdfs-site.xml run this cmd on all machines in cluster (master and slave) |
$ vi conf/mapred-site.xml
then type:
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<property>
<name>mapred.map.tasks</name>
<value>20</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value> </property> | Edit configuration file mapred-site.xml run this cmd on master |
$ bin/start-dfs.sh
| run this cmd on master |
$ jps
|
It should give output like this:
14799 NameNode
15314 Jps
16977 secondaryNameNode run this cmd on master |
$ jps
|
It should give output like this:
15183 DataNode
run this cmd on all slaves15616 Jps |
$ bin/start-mapred.sh
| run this cmd on master |
$ jps
|
It should give output like this:
16017 Jps
14799 NameNode
15596 JobTracker
14977 SecondaryNameNode
run this cmd on master |
$ jps
|
It should give output like this:
15183 DataNode
15897 TaskTracker
16284 Jps run this cmd on all slaves |
Congratulations Hadoop Setup is Completed | |
http://localhost:50070/ | web based interface for name node |
http://localhost:50030/ | web based interface for job tracker |
Now lets run some examples | |
$ bin/hadoop jar hadoop-*-examples.jar pi 10 100 | run pi example |
$ bin/hadoop dfs -mkdir input $ bin/hadoop dfs -put conf input $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' $ bin/hadoop dfs -cat output/* | run grep example |
$ bin/hadoop dfs -mkdir inputwords $ bin/hadoop dfs -put conf inputwords $ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords $ bin/hadoop dfs -cat outputwords/* | run wordcount example |
$ bin/stop-mapred.sh
$ bin/stop-dfs.sh
| run this cmd on master |
No comments:
Post a Comment