Search_in_memory_php is a project mainly written in PHP, it's free.
SearchInMemory is a reseach about full text (fault-tolerant) searches
SearchInMemory as a reseach about full text/fault-tolerant searches
SearchInMemory is my first impression about how works and how should work full text search engines or fault-tolerant searches (FTS). It begins as a research about how stuff works.
It was also a test for: is it hard to write own full text search engine?
You may consider it as a experiment to uncover how full text search engines works.
I have also tried a BinaryTree implementation of index, but it wasn't so good for me as HashIndex.
But even when I finished a really huge part of code there is still place for improvements(you can use it as a roadmap for your own FTS), like:
Cheers, Rafal "RaVbaker" Piekarski
Contact: web: http://about.me/ravbaker twitter: ravbaker github: https://github.com/RaVbaker
Great start for your own research:
http://en.wikipedia.org/wiki/Levenshtein_distance - a minimal knowlegde about comparing similar words
http://en.wikipedia.org/wiki/Inverted_index - goot start for building indexes - specially full inverted indexes
http://en.wikipedia.org/wiki/N-gram - N-grams, what it is and why?
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html - quite old Google Research department post about n-grams in practise - with a large available dataset.
http://ngrams.googlelabs.com/ - a practise usage of ngrams with Books Ngram Viewer from Google.
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html - great article about how Did you mean works with Lucene. Very inspiring post - but mainly about Java.
http://framework.zend.com/code/filedetails.php?repname=Zend+Framework&path=%2Ftrunk%2Flibrary%2FZend%2FSearch%2FLucene.php - sourcecode from Zend Framework with their PHP implementation of Lucene. Nice source of thougts.
http://www.ir.uwaterloo.ca/book/ - A book when you think BIG. It's about building your own service for full text search engine scalable almost like Google/Bing. Lots of theory and C/C++ code and algorithms. For very begining I suggest reading an excerpt from chapter 4 - Static inverted indicies - http://www.ir.uwaterloo.ca/book/04-static-inverted-indices.pdf
Helpful php functions: http://www.php.net/manual/en/function.levenshtein.php, http://www.php.net/manual/en/function.metaphone.php, http://php.net/manual/en/function.soundex.php, http://docs.php.net/manual/en/language.types.array.php :)