This repo contains my attempts to re-create some classical probabilistic algorithms. The main idea of probabilistic algorithms is to sacrifice a tiny amount of precision so that the algorithms can process a large amount of data.
- Bloom Filter: An algorithm that can quickly identify whether a data is present in a huge database. A common use case is the identification of malicious url with web browser.
- Count sketch: An algorithm to summarize some key features of an on-going data stream (such as the number of the same Twitter's tweet clicks in the last hour)
- Locality Sensitivity Hashing (LSH): An algorithm to quickly retrieve similar elements / data to the user's query.