Functions to get all darwin cut notes based on image dimensions and throw away full-page notes (non cut notes). Works by comparing image dimensions to mean image dimensions within folder. Written in PySpark for efficient parallel processing due to dataset size of ~350GB and ~60k images.
-
Notifications
You must be signed in to change notification settings - Fork 1
HackTheStacks/darwin-image-preprocessing
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Functions to get all darwin cut notes based on image dimensions - in python and spark for efficient parallel processing
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published