Home > lorkbong

lorkbong

Lorkbong is a project mainly written in RUBY and SHELL, it's free.

Throwaway demo of heroku + wukong + emr

h1. Lorkbong: Very stupid example for Wukong / Elastic Map Reduce integration

Lorkbong (named after the staff carried by Sun Wukong) is a very very simple example Heroku app that lets you trigger showing job status or launching a new job, either by visiting a special URL or by triggering a rake task.

h2. Setup

Create the app:

heroku create lorkbong-example

Edit the obvious files in @config/@. You can make life much more dangerous but slightly simpler by adding your keys to the config. If you do this, ake sure you DON'T check this app into a public repo or we'll switch all our infrastructure to run off your account.

You should probably be more responsible and use environment variables for your credentials. Use the heroku commandline tool to run the following command:

heroku config:add AWS_ACCESS_KEY_ID= heroku config:add AWS_SECRET_ACCESS_KEY= heroku config:add EMR_KEYPAIR=cat /path/to/your/keypair.pem

Now visit http://lorkbong-example.heroku.com (or whatever you called it). Follow the link for 'list jobs'. You should see a listing of any jobs in your queue.

h2. Debugging

Right now the job info is hardcoded into the file @app/helpers/emr_script.rb@ (sorry). Open that file and edit the block so that it runs (for the first time) with 'alive' set to true.

  #
  # You need to edit the following things:
  #
  EMR_OPTS = {
    # # Path to the runner.
    :emr_runner    => "#{::ROOT_DIR}/vendor/elastic-mapreduce/elastic-mapreduce",
    # # Temp storage for the keypair file (elastic-mapreduce script demands it be a static file).
    :keypair_file  => ::ROOT_DIR+'/tmp/emr_keypair.pem',
    # # If you're debugging:
    # # first run with alive set to true, and launch the job.
    :alive => true,
    # # After the job has been created and run for the first time, fill your
    # # jobflow into the following and set alive back to nil.
    # :jobflow       => "j-18OUFBXJ0Z01W",
  }
  # Path to the input files. Note the 's3n' prefix.
  EMR_INPUT  = "s3n://emr.yourdomain.com/wukong/data/examples/links-simple-sorted-10k.txt"
  # Path to the output files. This directory must not exist. Note the 's3n' prefix.
  EMR_OUTPUT = "s3n://emr.yourdomain.com/wukong/data/examples/wp-link-degree-4"

Make note of the jobflow ID (or check the list jobs path at /emr/list ) and hack that into the file above for debugging.

Check the "AWS console":http://bit.ly/awsconsole for a closer look at the job progress. You can also find the public IP of the master node from the console, and log in to the machine directly:

  ssh -i /path/to/your/keypair.pem [email protected]

h2. Credits

  • The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.
Previous:FrontPage