Home > jruby-on-hadoop

jruby-on-hadoop

Jruby-on-hadoop is a project mainly written in RUBY and JAVA, it's free.

Using Hadoop by Ruby script, supported by JRuby. Not Hadoop streaming.

= JRuby on Hadoop

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. We recommend to use this with hadoop-papyrus on the github / gemcutter.

== Description

== Install

Required gems are all on GemCutter.

== Usage

Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
put files into your hdfs. ex) test/inputs/file1
Now you can run 'joh' like below: $ joh examples/wordcount.rb test/inputs test/outputs You can get Hadoop job results in your hdfs test/outputs/part-*

== Example see also examples/wordcount.rb

def setup(conf)

setup jobconf

end

def map(key, value, output, reporter)

value.split.each do |word| output.collect(word, 1) end end

def reduce(key, values, output, reporter)

sum = 0 values.each {|v| sum += v } output.collect(key, sum) end

== Build

You can build hadoop-ruby.jar by "ant". ant

Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.19.2.

== Author Koichi Fujikawa [email protected]

== Copyright License: Apache License