Elasticflume is a project mainly written in ..., it's free.
Integration between Cloudera's Flume and ElasticSearch
Using ElasticSearch Flume integration
Pre-Conditions:
have Flume installed, or at least cloned from the Flume git repo, if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs).
From here on, this Flume directory will be referred to as FLUME_HOME
Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch
Getting Started with elasticflume
First, setup some environment variables to your local paths, to make the following steps simpler:
export FLUME_HOME=<path to where you have Flume checkedout/installed>
export ELASTICSEARCH_HOME=
export ELASTICFLUME_HOME=path to where you have elasticflume checked out>
(Be careful with these last 2 env vars because they are deceivingly similar)
Build it using Maven:
1.1 Install the Flume library into your local Maven repo (because it's not available in central) Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.
mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar
1.2 Build elasticflume cd $ELASTICFLUME_HOME mvn package
Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):
ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/
Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:
... (the above may not be necessary, because it's the default, but I had to do it for some reason).
You will also need to register the elasticflume plugin via creating a new a property block:
Startup Flume Master, and Flume nodes, you will need 2 different shells here. cd $FLUME_HOME bin/flume master
VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:
2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink
bin/flume node_nowatch
Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message) cd $FLUME_HOME bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"
NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not 'localhost' which it is often. If things are not working properly, you should check by:
bin/flume shell -c localhost -e "getnodestatus"
If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with a logical name by doing this:
bin/flume shell -c localhost -e "map
NOW FOR THE TEST! :) In the console window you started the "node_nowatch" above, type (and yes, straight after all those log messages, just start typing, trust me..):
hello world hello there good sir
(ie. that is, type the 2 lines ensuring you press return after each)
Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:
curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d ' { "query" : { "term" : { "message" : "hello" } } } '
You should get a pretty printed JSON formatted search results, something like:
{ "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.1976817, "hits" : [ { "_index" : "flume", "_type" : "LOG", "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680", "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"} }, { "_index" : "flume", "_type" : "LOG", "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5", "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"} } ] } }
Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.