We-have-your-kidneys is a project mainly written in ..., it's free.
Ad network (ish) demonstration code from Cassandra London talk May 2011
This code was written to provide a demonstration of DataStax's Brisk at the May 2011 Cassandra London meetup. It uses the phpcassa Cassandra client library for PHP.
You can view a podcast of the talk here and browse the slides here.
It was then updated in October 2011 for a presentation at the NoSQL Exchange 2011. This time we added in some extra features for tracking impressions and clicks and actually recommending ads!
http://wehaveyourkidneys.com/add.php?segment=<segmentCode>&expires=<numberOfSeconds>
Where:
There is also a pixel version for using in img tags.
http://pixel.wehaveyourkidneys.com/add.php?segment=<segmentCode>&expires=<numberOfSeconds>
http://wehaveyourkidneys.com/show.php
One of the things that excites me about Brisk* is the ease with which you can analyse data in Cassandra. Brisk provides Hive support for Cassandra (an SQL-like interface for map reduce jobs). Brisk allows you to both read and write data from Cassandra. During the talk I demonstrated the following queries.
To run my queries, I created a Hive external table for my user ColumnFamily. This used to be a requirement; however the latest version of Brisk will automatically hook up to any Cassandra CFs. However! What it won't do automatically is cast to nice column names (via the "mapping" parameter below). Hence it is sometimes handy to create an external table with a new name.
USE whyk;
CREATE EXTERNAL TABLE tempUsers
(userUuid string, segmentId string, value string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.columns.mapping" = ":key,:column,:value",
"cassandra.cf.name" = "users"
);
http://www.datastax.com/docs/1.0/datastax_enterprise/about_hive#reference-serdeproperties-and-tblproperties
I could then count up the number of users in each segment:
SELECT segmentId, count(1) AS total
FROM tempUsers
GROUP BY segmentId
ORDER BY total DESC;
I could also calculate the mean average and standard deviation of the number of segments that users belong to:
SELECT avg(num), stddev_samp(num)
FROM (
SELECT count(1) AS num
FROM tempUsers
GROUP BY userUuid
) tmp;
Finally, you can just go with DataStax Enterprise. There is a version available specifically for startups, and a free trial for 30 days.
http://www.datastax.com/download
I went for an Ubuntu Lucid Amazon box with compiled Brisk.
https://github.com/steeve/brisk
sudo apt-get update
sudo apt-get install git-core ant openjdk-6-jdk libmaven-compiler-plugin-java
git clone git://github.com/steeve/brisk.git
cd brisk
ant
./bin/brisk cassandra -t
sudo apt-get install apache2 php5 php-pear php5-dev uuid-dev
cd /var/www
git clone git://github.com/davegardnerisme/we-have-your-kidneys.git
cd we-have-your-kidneys
ln -s /var/www/we-have-your-kidneys/vhost/wehaveyourkidneys.com.vhost \
/etc/apache2/sites-available/wehaveyourkidneys.com.vhost
sudo a2ensite wehaveyourkidneys.com.vhost
git submodule init
git submodule update
sudo pecl install uuid
echo 'extension=uuid.so' > /etc/php5/conf.d/uuid.ini
sudo service apache2 reload
This project has been written to try to make it obvious what Cassandra commands are being executed. Things like DRY (Don't Repeat Yourself) have been ignored. The idea is that any given file should be easy to read purely in terms of how it reads or writes to Cassandra.
I choose PHP because I am most familiar with it. I am not suggesting it is the most suitable language for this kind of application.
Segments are limited to alphanumeric and minus signs; in this way we can use other special characters when constructing composite keys (for row / column names). Specifically, we prepend segment names with seg: and we use | to split up composite column names.