Saturday, June 23, 2012

Deploy Apache Flume NG (1.x.x)

In this tutorial I have explained how to install / deploy / configure Flume NG on single system, and how to configure Flume NG to copy data to HDFS, then configuration for copying data to HBase

Before going to configurations let’s understand what Flume NG (1.x.x) is:
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic application.

In this tutorial I have used:
Hadoop 0.20.X (Apache or Cloudera)
Flume 1.1.0 
HBase 0.90.4
Ubuntu 11.10

I am assuming that you have Hadoop cluster ready, if you don’t have Hadoop installed you can refer:
  • For Hadoop in pseudo distributed mode please click here
  • For Hadoop in distributed mode please click here
Let’s Deploy Flume NG:
Download Flume NG from

tar xzf flume-1.1.0-cdh4.0.0.tar.gz
Copy configuration file
cp conf/ conf/flume.conf
cp conf/ conf/

Following are the Configurations for copy data from file on local file system to HDFS
Edit flume.conf and add following entries:

agent1.sources = tail
agent1.channels = Channel-2
agent1.sinks = HDFS

agent1.sources.tail.type = exec
agent1.sources.tail.command = tail -F /var/log/apache2/access.log
agent1.sources.tail.channels = Channel-2 = Channel-2
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://localhost:9000/flume
agent1.sinks.HDFS.hdfs.fileType = DataStream

agent1.channels.Channel-2.type = memory

Start Flume to copy data to HDFS:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1

Note that the agent name is specified by -n agent1 and must match a agent name given in -f conf/flume.conf

You can check your data in HDFS either from web console (http://localhost:50070) or command prompt

Now lets configure Flume NG to copy data to HBase:
Here I am assuming that you have HBase installed and ready

Create table in HBase in which you want to copy data

create 'myTab', 'cf'

Edit flume.conf and add following entries:


hbase-agent.sources.tail.command=tail -F /tmp/test05

hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase.SimpleHbaseEventSerializer


Set Following variables in .bashrc


Now start Flume to copy data to HBase:

bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n hbase-agent

Check whether data gets copy into HBase or not
scan 'myTab'

Note That SimpleHbaseEventSerializer specified in the configuration is just an example of Event Serializer you should write your own class depend on your requirements


  1. i have 2 node....agent 1 and collecter....when i configure...agent 1- as text ("XXXX") agentsink("localhost,35853")....and collecter
    collectorSource(35853) console .....its getting fail in agent there any additional setting...

    below is the error details

    12/06/27 11:23:37 INFO debug.TextFileSource: File /home/hadoop/as/ash opened
    12/06/27 11:23:37 INFO agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Wed Jun 27 11:23:33 IST 2012' snkVer:'Wed Jun 27 11:23:33 IST 2012' ts='Wed Jun 27 11:23:33 IST 2012' flowId:'default-flow' source:'text( "/home/hadoop/as/ash" )' sink:'agentSink( "localhost", 35853 )' }
    12/06/27 11:23:37 INFO durability.NaiveFileWALManager: NaiveFileWALManager is now open
    12/06/27 11:23:37 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=10000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@1a1c42f] checkPeriodMs = 250 spec='ackingWal'
    12/06/27 11:23:37 INFO rolling.RollSink: opening RollSink 'ackingWal'
    12/06/27 11:23:37 INFO hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-hadoop/agent/localhost/writing/20120627-112337056+0530.1716541877172.00000019
    12/06/27 11:23:37 WARN rolling.RollSink: Failure when attempting to open initial sink failure to login

  2. Hi Jeet,
    I think you are getting confused with Flume OG and Flume NG

    in flume-NG architecture has been completely changed
    In Flume NG
    the concept of collector is removed and concept of agent is changed agent is now JVM process

    I think you should refer

  3. in flume NG how to start a master ?
    i did all setting from
    now what the command i should write to start mysource and sink

    1. Hi Jeet,
      Again you are getting confuse with Flume NG and Flume OG

      there is no master in NG
      if you show architecture diagram shown above there is simply source, sink and channel connecting source and sink

      all the details about the command you should run is mentioned on above tutorial
      to start Flume you need to run :
      "bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1"

  4. Hi rahul...i am done with step in i run below command //// and i changed agent1.sources.tail.command = tail -F /var/log/apache2/access.log to

    agent1.sources.tail.command = text -F /home/hadoop/as/ash (it is in my local fs)

    now in below command i am getting error as

    bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1

    Unknown or unspecified command 'agent'
    usage: bin/flume-ng [COMMAND] [OPTION]...

    So what should i write in place of agent
    ////please reply..


    1. in your command what is "text" ??
      agent1.sources.tail.command = text -F /home/hadoop/as/ash

      that should be tail or some unix command
      rest all is looking fine

  5. let me change and try

  6. i did it tail -F /home/hadoop/as/ash

    but still same error.

    bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n agent1

    Unknown or unspecified command 'agent'
    usage: bin/flume-ng [COMMAND] [OPTION]...

    1. please post contents of your configuration file "flume.conf"

  7. # The channel can be defined as follows.
    foo.sources.seqGenSrc.channels = memoryChannel

    # Each sink's type must be defined
    foo.sinks.loggerSink.type = logger

    #Specify the channel the sink should use = memoryChannel

    # Each channel's type is defined.
    foo.channels.memoryChannel.type = memory

    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    foo.channels.memoryChannel.capacity = 100
    # Define a memory channel called ch1 on agent1
    agent1.channels.ch1.type = memory

    # Define an Avro source called avro-source1 on agent1 and tell it
    # to bind to Connect it to channel ch1.
    agent1.sources.avro-source1.channels = ch1
    agent1.sources.avro-source1.type = avro
    agent1.sources.avro-source1.bind =
    agent1.sources.avro-source1.port = 41414

    # Define a logger sink that simply logs all events it receives
    # and connect it to the other end of the same channel. = ch1
    agent1.sinks.log-sink1.type = logger

    # Finally, now that we've defined all of our components, tell
    # agent1 which ones we want to activate.
    agent1.channels = ch1
    agent1.sources = avro-source1
    agent1.sinks = log-sink1

  8. this is my new flume.conf file....please tell me which command i should run????

  9. hi
    i am really very new to this technology.
    i tried to configure flume using ur configuration using hbase as sink.
    but i got ERROR properties.PropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath.

    i have included flume conf path in FLUME_CLASSPATH
    also have given JAVA_HOME path.
    what elso do i need to include in
    pls guide me.

  10. Dear Rahul,
    Thanks for the example. I simplified your .conf file to write to console with logger (since I was not able to get tail to write to hdfs), but nothing appears on the console. Even if I add new lines to the log file, there is no output on the console. My tail2logger.conf file is as follows:
    # list sources, sinks and channels in the source agent
    agent1.sources = tail1
    agent1.channels = memoryChannel
    agent1.sinks = sink1

    #Describe the source
    agent1.sources.tail1.type = exec
    agent1.sources.tail1.command = tail -F /home/hduser/flume/exlog.txt

    #Describe the sink
    agent1.sinks.sink1.type = logger

    #Describe the channel
    agent1.channels.memoryChannel.type = memory

    # Bind the source and sink to the channel
    agent1.sources.tail1.channels = memoryChannel = memoryChannel

    and I use the command-line:
    flume-ng agent --conf-file tail2logger.conf --name agent1 -Dflume.root.logger=INFO,console

    Appreciate your help