-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
TRT-571 - Update net2cog to process multiple variables. #38
base: develop
Are you sure you want to change the base?
Conversation
net2cog/netcdf_convert.py
Outdated
raise Net2CogError(error) from error | ||
except LookupError as err: | ||
logger.info("Variable %s cannot be converted to tif: %s", variable_name, err) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that return None
is the best thing for each of these cases, but it was more of an equivalent to the previous continue
when there was a loop in this function. I'd probably prefer to raise an exception instead in a few of these return None
instances. What do you think @frankinspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense to raise an exception here. Perhaps create a custom FailedConversion
exception or similar so that we could in the future add custom error handling in netcdf_convert_harmony
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the Net2CogError
was only being raised within this function, so I opted to use that. Happy to make a child of that exception type, though, if that feels better.
if not var_list: | ||
# Empty list means "all" variables, so get all variables in | ||
# the `xarray.Dataset`. | ||
var_list = list(xds.data_vars.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I wanted to call out.
I think using xarray.Dataset.data_vars
will essentially prevent an "all" request from trying to convert coordinate variables, so it feels like the right thing to choose, instead of all varibles.
That said, if a user specifies the coordinate variables by name, we'll still hit the code that checks against EXCLUDED_VARIABLES
. Maybe it would be more scaleable to check the xarray.Dataset.coordinates
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable step forward. The EXCLUDED_VARIABLES
was a pretty simplistic way of trying to deal with not converting coordinate variables so this does improve that IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, what we have in this PR is good for now, and perhaps swapping EXCLUDED_VARIABLES
for xarray.Dataset.coordinates
is something for future work? Happy either way, just making sure I'm not missing some work to do here.
net2cog/netcdf_convert_harmony.py
Outdated
# Rename the downloaded output from a SHA256 hash to the basename | ||
# of the source data file | ||
source_asset_basename = os.path.basename(asset.href) | ||
os.rename(input_filename, os.path.join(output_dir, source_asset_basename)) | ||
input_filename = os.path.join(output_dir, source_asset_basename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit on the fence with this change. It feels a bit clunky.
When Harmony downloads a file, it assigns a SHA256 hash as the filename, based on the URL it got the file from. But, we want to be passing around the base name of the original granule, so the output file names can be generated for every variable. I was considering two options:
- This: rename the downloaded file to have a base name that matches the source file, instead of the SHA 256 hash.
- Pass around the source granule base name an additional variable.
I opted for a bit more complexity here, to avoid having another string floating around and being confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to reconsider this approach given my previous comment about removing harmony dependency from netcdf_convert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - if we aren't using the generate_output_filename
in netcdf_convert.py
, then I think we can avoid having to rename things. So this can be cleaned up to just use the SHA256-derived name.
I think the other part of it, though, is making sure that there is a mapping in the output from variable name to file path, so that we can generate the correct file name in netcdf_convert_harmony.py
. If you are fine with tweaking the output of netcdf_convert.py::netcdf_converter
to something like a dictionary, then I can work up something like that and move the renaming out to netcdf_convert_harmony.py
.
Update: A simpler alternative - the default (non-Harmony) file name could just be <variable name>.tif
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like <variable name>.tif
. Do we need to worry about special characters there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about special characters there?
Great catch!
I had just run with using <variable_name>.tif
in 279641, but hadn't accounted for the possibility of a slash (which will come up with hierarchical collections). I will at least do something to swap out slash characters for underscores and push that up. Should be a quick tweak, and I think that's compatible with what harmony-service-lib
does to those characters as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There we go: 656f49
I just do a simple string replace for slashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thank you for the contribution! Just a few comments to consider
net2cog/netcdf_convert.py
Outdated
xarray dataset loaded from NetCDF file | ||
variable_name: str | ||
Name of the variable as passed in to Harmony. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit picky but shouldn't mention Harmony in the non-harmony module. Intent is that this netcdf_convert
module could be used independently of Harmony.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 279641
net2cog/netcdf_convert.py
Outdated
raise Net2CogError(error) from error | ||
except LookupError as err: | ||
logger.info("Variable %s cannot be converted to tif: %s", variable_name, err) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense to raise an exception here. Perhaps create a custom FailedConversion
exception or similar so that we could in the future add custom error handling in netcdf_convert_harmony
if not var_list: | ||
# Empty list means "all" variables, so get all variables in | ||
# the `xarray.Dataset`. | ||
var_list = list(xds.data_vars.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable step forward. The EXCLUDED_VARIABLES
was a pretty simplistic way of trying to deal with not converting coordinate variables so this does improve that IMO
net2cog/netcdf_convert_harmony.py
Outdated
# Rename the downloaded output from a SHA256 hash to the basename | ||
# of the source data file | ||
source_asset_basename = os.path.basename(asset.href) | ||
os.rename(input_filename, os.path.join(output_dir, source_asset_basename)) | ||
input_filename = os.path.join(output_dir, source_asset_basename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to reconsider this approach given my previous comment about removing harmony dependency from netcdf_convert
Github Issue: #32 (also one of the items mentioned in #35).
Description
This PR adds support for requests asking for multiple variables, either explicitly, or when a Harmony request specifies "all".
Overview of work done
There's a fair bit of shuffling around, but essentially the code changes are:
_write_cogtiff
function has been rescoped to just write a single COG. The iteration now takes place innetcdf_converter
.generate_output_filename
from theharmony-service-lib-py
module, which now has areformatted
suffix (note, this is a minor difference fromconverted
). I removed a bunch of custom code that was geared around generating the output filename, as that is no longer needed.conftest.py
.Overview of verification done
tests/conftest.py
). There are now tests that run to confirm multiple variables can be processed, either via explicitly requesting them or by using "all" in a Harmony request. These tests exist at the overall service level, and thenetcdf_converter
function level.Overview of integration done
PR checklist:
See Pull Request Review Checklist for pointers on reviewing this pull request