Home > elasticflume

elasticflume

Elasticflume is a project mainly written in ..., it's free.

Integration between Cloudera's Flume and ElasticSearch

Using ElasticSearch Flume integration

Pre-Conditions:

  • have Flume installed, or at least cloned from the Flume git repo, if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs).

    From here on, this Flume directory will be referred to as FLUME_HOME

  • Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch

Getting Started with elasticflume

  1. First, setup some environment variables to your local paths, to make the following steps simpler: export FLUME_HOME=<path to where you have Flume checkedout/installed> export ELASTICSEARCH_HOME=

    export ELASTICFLUME_HOME=path to where you have elasticflume checked out>

        (Be careful with these last 2 env vars because they are deceivingly similar)
  2. Build it using Maven:

    1.1 Install the Flume library into your local Maven repo (because it's not available in central) Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.

    mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar

    1.2 Build elasticflume cd $ELASTICFLUME_HOME mvn package

  3. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):

    ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/

  4. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:

    flume.master.servers localhost A comma-separated list of hostnames, one for each machine in the Flume Master.

    ... (the above may not be necessary, because it's the default, but I had to do it for some reason).

    You will also need to register the elasticflume plugin via creating a new a property block:

    flume.plugin.classes org.elasticsearch.flume.ElasticSearchSink Comma separated list of plugins
  5. Startup Flume Master, and Flume nodes, you will need 2 different shells here. cd $FLUME_HOME bin/flume master

    VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:
    
    2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink

    bin/flume node_nowatch

  6. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message) cd $FLUME_HOME bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"

    NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not 'localhost' which it is often. If things are not working properly, you should check by:

    bin/flume shell -c localhost -e "getnodestatus"

    If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with a logical name by doing this:

    bin/flume shell -c localhost -e "map localhost"

  7. NOW FOR THE TEST! :) In the console window you started the "node_nowatch" above, type (and yes, straight after all those log messages, just start typing, trust me..):

    hello world hello there good sir

    (ie. that is, type the 2 lines ensuring you press return after each)

  8. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:

    curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d ' { "query" : { "term" : { "message" : "hello" } } } '

    You should get a pretty printed JSON formatted search results, something like:

    { "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.1976817, "hits" : [ { "_index" : "flume", "_type" : "LOG", "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680", "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"} }, { "_index" : "flume", "_type" : "LOG", "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5", "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"} } ] } }

  9. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.

Previous:open-ra1n