Home > bloom-filter

bloom-filter

Bloom-filter is a project mainly written in Scala, it's free.

This is an implementation of Bloom Filter, especially CountingBloomFilter.

This ia an implementation of Bloom Filter, especially Counting Bloom Filter.

What is Bloom Filter?: see http://en.wikipedia.org/wiki/Bloom_filter.

How to build and run the sample:

  1. set up Simple Build Tool(http://code.google.com/p/simple-build-tool/)
  2. run sbt
  3. update
  4. test
  5. doc (optional)
  6. run

Sample Scenario: 0.initialize a CountingBloomFilter with input max false positive probability and input expected number of items. 1.add random distinct 2n(twice of your input) strings to the filter 2.discard n strings in added strings from the filter 3.check that the filter contains each discarded string(i.e. measure false positive prob.). 4.outputs false positive results and average execution times.

Sample Scenario Outputs: [FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000 [Adding 200000 distinct random strings].......... done. [Discarding 100000 strings].......... done. [Detected false positive results] hfG Thq4 zdfR zJcz 6Gww iHoc ovam jumy wvzw [mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials) [average time for add] 0.007[ms] (average for 200000 times trials) [average time for discard] 0.006[ms] (average for 100000 times trials) [average time for check] 0.006[ms] (average for 100000 times trials)

Sample SBT Outputs: $sbt [info] Building project bloom-filter 1.0 against Scala 2.8.1 [info] using BloomFilterProject with sbt 0.7.5 and Scala 2.7.7

test [info] [info] == compile == [info] Source analysis: 2 new/modified, 0 indirectly invalidated, 0 removed. [info] Compiling main sources... [info] Compilation successful. [info] Post-analysis: 37 classes. [info] == compile == [info] [info] == copy-resources == [info] == copy-resources == [info] [info] == copy-test-resources == [info] == copy-test-resources == [info] [info] == test-compile == [info] Source analysis: 1 new/modified, 0 indirectly invalidated, 0 removed. [info] Compiling test sources... [info] Compilation successful. [info] Post-analysis: 22 classes. [info] == test-compile == [info] [info] == test-start == [info] == test-start == [info] [info] == org.everpeace.util.CountingBloomFilterTest == [info] + Counts does not change after add and discard. [info] + No false negative results. [FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000 [mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials) [info] + False positive prob. is less than max false positive prob. set. [info] == org.everpeace.util.CountingBloomFilterTest == [info] [info] == test-complete == [info] == test-complete == [info] [info] == test-finish == [info] Passed: : Total 3, Failed 0, Errors 0, Passed 3, Skipped 0 [info]
[info] All tests PASSED. [info] == test-finish == [info] [info] == Test cleanup 1 == [info] Deleting directory /var/folders/LA/LAiTnWCkECC26KLkxKXrIE+++TI/-Tmp-/sbt_39625890 [info] == Test cleanup 1 == [info] [info] == test-cleanup == [info] == test-cleanup == [info] [info] == test == [info] == test == [success] Successful. [info] [info] Total time: 19 s, completed 2011/05/04 22:46:21 run 0.0001 100000 [info] [info] == copy-resources == [info] == copy-resources == [info] [info] == compile == [info] Source analysis: 0 new/modified, 0 indirectly invalidated, 0 removed. [info] Compiling main sources... [info] Nothing to compile. [info] Post-analysis: 37 classes. [info] == compile == [info] [info] == run == [info] Running org.everpeace.util.CountingBloomFilterSample 0.0001 100000

Sample Senario: 0.initialize a CountingBloomFilter with max false positive probability is 0.0001000 and expected number of items is 100000 1.add distinct random 200000 strings the filter (twice of your input) 2.discard 100000 strings of added (your input) 3.check that the filter contains each filter. (outputs only false positive results)

[FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000 [Adding 200000 distinct random strings].......... done. [Discarding 100000 strings].......... done. [Detected false positive results] hfG Thq4 zdfR zJcz 6Gww iHoc ovam jumy wvzw [mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials) [average time for add] 0.007[ms] (average for 200000 times trials) [average time for discard] 0.006[ms] (average for 100000 times trials) [average time for check] 0.006[ms] (average for 100000 times trials)

[info] == run == [success] Successful. [info] [info] Total time: 5 s, completed 2011/05/04 22:46:36

Previous:halleck