Demonstration app to show how Dedupe might be used as a geocoder
Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.
Install OS level dependencies:
- Python 3.4
- PostgreSQL 9.4 +
Install app requirements
We recommend using virtualenv and virtualenvwrapper for working in a virtualized development environment. Read how to set up virtualenv.
Once you have virtualenvwrapper set up,
mkvirtualenv dedupe-geocoder
git clone https://github.com/datamade/dedupe-geocoder.git
cd dedupe-geocoder
pip install -r requirements.txt
cp geocoder/app_config.py.example geocoder/app_config.py
In app_config.py
, put your Postgres user in DB_USER
and password in DB_PW
.
Afterwards, whenever you want to work on dedupe-geocoder,
workon dedupe-geocoder
Before we can run the website, we need to create a database.
createdb geocoder
Then, we run the loadAddresses.py
script to download our data from the Cook
County data portal.
python loadAddresses.py --download --load_data
This command will take between 15-45 min depending on your internet connection.
You can run loadAddresses.py
again to get the latest data from the Cook
County, add more training data, or create a table of block keys for dedupe to
use to match new records. Useful flags are:
--download Download fresh address data.
--load_data Load downloaded address data into database.
--train Add more training data and save settings file.
--block After training, create the block table used by dedupe for matching.
To run locally:
workon dedupe-geocoder
python runserver.py
navigate to http://localhost:5000/
- Eric van Zanten - developer
- Derek Eder - developer
- Forest Gregg - developer
- Cathy Deng - developer
If something is not behaving intuitively, it is a bug, and should be reported. Report it here: https://github.com/datamade/dedupe-geocoder/issues
- Fork the project.
- Make your feature addition or bug fix.
- Commit, do not mess with rakefile, version, or history.
- Send a pull request. Bonus points for topic branches.
Copyright (c) 2015 DataMade. Released under the MIT License.