Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Create reusable Scraping and Parsing classes #18

Open
jamesshannon opened this issue Nov 2, 2020 · 1 comment · May be fixed by #21
Open

Create reusable Scraping and Parsing classes #18

jamesshannon opened this issue Nov 2, 2020 · 1 comment · May be fixed by #21

Comments

@jamesshannon
Copy link

There is a lot of repeated code across each county's scraper and parser scripts. And lots of code that should be repeated but isn't (like retrying on transient network failures). Additionally, there are a handful of counties which use shared systems (e.g., https://www.mptsweb.net/).

Ideally, a county script would instantiate a class with a few variables (CSV file location, URL template, etc) and define a parse_html() function and call a method which takes care of everything else.

I'm working on this as part of Placer (#17). I'm creating this issue to track and discuss the work.

@typpo One question I have so far is related to my Placer work. You recommend the geojson script step. It seems like it'd be easier to do this in python (with, e.g., pyshp) to minimize the number of steps that someone has to follow. Have you found that the geojson script is better for one reason or another?

@typpo
Copy link
Owner

typpo commented Nov 2, 2020

Reusable classes would be very useful! Thanks for getting this started.

Using pyshp would be nicer and cleaner than the geojson conversion. I'm in the habit of converting to geojson first just so I can see what type of data is in the shapefile (for example: is there all the required address info? Is there zoning info? Does it use latlng or XY coordinates).

@jamesshannon jamesshannon linked a pull request Nov 4, 2020 that will close this issue
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants