search_in_memory_php

Home > search_in_memory_php

search_in_memory_php

Search_in_memory_php is a project mainly written in PHP, it's free.

SearchInMemory is a reseach about full text (fault-tolerant) searches

SearchInMemory as a reseach about full text/fault-tolerant searches

SearchInMemory is my first impression about how works and how should work full text search engines or fault-tolerant searches (FTS). It begins as a research about how stuff works.

It was also a test for: is it hard to write own full text search engine?

You may consider it as a experiment to uncover how full text search engines works.

I have also tried a BinaryTree implementation of index, but it wasn't so good for me as HashIndex.

But even when I finished a really huge part of code there is still place for improvements(you can use it as a roadmap for your own FTS), like:

n-gram indexing and searching with levenstein sorting
wildcard searching
excluding some phrases from results
caching generated indexed on some memory based structure on disk
steeming words to others
improved and more complex way to do faceting
a socket connector for searcher from a unix level
index updating and deleting particular records
whole phrases in " " signs, like: "billy bob" to match exactly this phrase (need to improve inverted indexes in HashIndex)
possibilities of import/export data from indexes using formats: json/xml/csv
tweaking results based on special criteria or queries

Cheers, Rafal "RaVbaker" Piekarski

Contact: web: http://about.me/ravbaker twitter: ravbaker github: https://github.com/RaVbaker

Great start for your own research:

http://en.wikipedia.org/wiki/Levenshtein_distance - a minimal knowlegde about comparing similar words
http://en.wikipedia.org/wiki/Inverted_index - goot start for building indexes - specially full inverted indexes
http://en.wikipedia.org/wiki/N-gram - N-grams, what it is and why?
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html - quite old Google Research department post about n-grams in practise - with a large available dataset.
http://ngrams.googlelabs.com/ - a practise usage of ngrams with Books Ngram Viewer from Google.
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html - great article about how Did you mean works with Lucene. Very inspiring post - but mainly about Java.
http://framework.zend.com/code/filedetails.php?repname=Zend+Framework&path=%2Ftrunk%2Flibrary%2FZend%2FSearch%2FLucene.php - sourcecode from Zend Framework with their PHP implementation of Lucene. Nice source of thougts.
http://www.ir.uwaterloo.ca/book/ - A book when you think BIG. It's about building your own service for full text search engine scalable almost like Google/Bing. Lots of theory and C/C++ code and algorithms. For very begining I suggest reading an excerpt from chapter 4 - Static inverted indicies - http://www.ir.uwaterloo.ca/book/04-static-inverted-indices.pdf
Helpful php functions: http://www.php.net/manual/en/function.levenshtein.php, http://www.php.net/manual/en/function.metaphone.php, http://php.net/manual/en/function.soundex.php, http://docs.php.net/manual/en/language.types.array.php :)

Previous：relationships

Next：base