Home > pyrandomprojection

pyrandomprojection

Pyrandomprojection is a project mainly written in Python, it's free.

Random projection library for Python, converting a dictionary to low-dimensional numpy matrix

pyrandomprojection

by Joseph Turian

Random projection library for Python, converting a dictionary to low-dimensional numpy vector

"Random projection is a simple geometric technique for reducing the dimensionality of a set of points in Euclidean space while preserving pairwise distances approximately" -Santosh Vempala http://www-math.mit.edu/~vempala/rp.html

See also: http://github.com/turian/random-indexing-wordrepresentations A more end-to-end package (rather than drop-in function) for specifically inducing word representations over a corpus, using random indexing (a specific application of random projection).

WARNING:

  • Read warnings in common.gaussian and common.deterministicrandom: http://github.com/turian/common/blob/master/gaussian.py http://github.com/turian/common/blob/master/deterministicrandom.py that indicate potential issues with the RNG and hash function.

NOTE:

  • The runtime is typically bounded by the number of calls to randomrow. If you have a single instance, simply call project() on it. If you have a list of instances, you shouldn't just call project on each instance, because that will be #nonzeros * randomrow calls. Instead, iterate over the columns (i.e. iterate feature-major order, not instance-major order), so you only do one randomrow op per feature type. There is example code that does this here: http://github.com/glorotxa/DeepANN/blob/master/exp_scripts/randomprojection.py

import math

import murmur

TODO: Need test suite to make sure that values are consistent across architectures.

REQUIREMENTS:

  • numpy Output is put in a (dense) numpy vector

  • My python common library: http://github.com/turian/common

    which in turn will require:
    
    * Murmur:
        easy_install murmur
        http://pypi.python.org/pypi/Murmur/
        Provides fast murmur hashes for strings, files, and ziped files.