Skip to content

Latest commit

 

History

History
47 lines (38 loc) · 1.31 KB

README.md

File metadata and controls

47 lines (38 loc) · 1.31 KB

Fastbloom

A lightweight but fast Bloomfilter written in Python(2.7).

Introduction

Based on mmap and MurMur, Spooky hash functions as the base hashes. Use double hashing to reduce the number of hashes to two.

Requirements

Install boost and boost-python

  • Ubuntu

sudo apt-get install libboost-all-dev

  • Centos

sudo yum install boost-devel

  • OSX

brew install boost boost-python

Install pyhash

sudo pip install pyhash

Install fastbloom

sudo pip install fastbloom

Examples

>>> from fastbloom import BloomFilter
>>> filter_ = BloomFilter(10000, 0.001) # set size of input 1000, error rate 0.1%
>>> filter_.add('www.google.com')
>>> 'www.google.com' in filter_
True
>>> 'github.com' in filter_
False

Benchmark

When input_size set to 1000000, accepted_error_rate set to 0.01%

  • Memory consumption: 4 MB
  • Add operation: 7322.42 ops
  • Check operation: 24788.79 ops
  • Actual fault positive rate: 0.0000

You can just run the benchmark.py to see the actual benchmarks

Todo

  • add a faster bloomfilter
  • add save and restore functions(as files)
  • write a scalable bloomfilter