Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improved performace and correctness for raster union #48

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marthinwurer
Copy link

Moved some stuff around and made some performance improvements to the merge functionality. There were a lot of extra array copies and instances where larger data types than needed were used.

The main improvements were:

  • selecting the correct numpy dtype for the arrays that were created
  • selecting a default no data value if there was none
  • creating a pre-filled array with the no data value instead of multiplying it with an array of ones
  • making the mask start out as an array of bools.

Testing done: Just on my own code, merging 24 rasters into a 43201x86401 array. It just barely works now instead of crashing because it was not able to allocate the arrays.

@marthinwurer
Copy link
Author

CI might be broken, it's erroring because it can't find libspatialindex_c, which is something that I noted in #27.

@ozak
Copy link
Owner

ozak commented Oct 10, 2019

Looks so much nicer. Thanks!

Would be good to test and make sure it still works "as expected". It is on the to-do list in #14. If you feel like contributing some tests that's be really great!

I'll try to see where that libspatialindex_c is coming from. I do not think the package uses it explicitly. It may be from one of the other GIS packages, like geopandas or pysal.

@marthinwurer
Copy link
Author

It looks like rtree from geopandas is where it's coming from. I can mock up some tests with some fake data and see if the outputs are the same. I'm pretty sure types are going to be different because they weren't handled before, but I could do something like a mse threshold between outputs to make sure that nothing wonky is happening.

@ozak
Copy link
Owner

ozak commented Oct 10, 2019

It seems in the CI with py3.5 is failing due to some matplotlib incompatibility with scikits-image. On py36 it is rtree which is needed by geopandas. Using conda install -c conda-forge georasters has been working so far without any of these issues, since all the versions of the packages are coherent. I guess pip is having problems due to the compatibility of packages.


# get parameters for union

datatype = None # this is the gdal datatype
Copy link
Owner

@ozak ozak Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the if condition could fail and then no datatype is assigned. Not sure whether having datatype=None will cause issues, especially when exporting using to_tiff. Did you check?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants