-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Enable download of large (spatial extent) cutouts from ERA5 via cdsapi. #236
Conversation
How does that interact with queuing at CDSAPI? Does that increase the chances of getting stuck in the request in month 9 or so? |
I don't know. The downloads for the larger cutouts worked relatively smoothly (1-2 hours), but the number of requests is 12x higher for a normal year, so the chances might be higher. On the other hand, since the downloaded slices are smaller I would not expect major performance changes. Probably acceptable, since you're not downloading cutouts on an everyday basis. I don't know enough about the internals of the ERA5 climate store and I don't think we should optimise our retrieval routines for it as long as we haven't received any complaints for bad performance. |
Alright. I did not encounter any issues downloading large datasets. Seems to work nicely @FabianHofmann . What would be helpful is a message indicating which month/year combination is currently being downloaded, do you have an idea on how to easily implement this @FabianHofmann ? Then I'd suggest @davide-f tries to download his cutout as well and if that works without issues then we can merge. |
@euronion Super! thank you very much. Currently, I am a bit busy with other stuff and I cannot run the machine with copernicus waiting long time for the analysis, unfortunately. As I have free resources, I'll test that. |
Great. For the logging I would suggest to go with e.g. "2013-01", instead of "2013" only. atlite/atlite/datasets/era5.py Line 309 in 3c7b4b8
which could be changed into timestr = f"{request["year"])}-{request["month"]}" and changed replaced accordingly in atlite/atlite/datasets/era5.py Line 311 in 3c7b4b8
|
As discussed with @euronion, I'll wait for his latest updates by the end of the week (estimate), and I'll run the model for the entire world. As a comment, the "number of slices", currently one a month, may be a parameter as well. |
@davide-f You're good to give it a try! Regarding your comment: If it works for you @davide-f and the time it takes is acceptable (please report it as well if you can) then I'd stay away from overoptimising this aspect and just keep the monthly retrieval. |
@euronion the branch is running :) I'll track it and update you as I have news. I totally agree on seeing if the monthly retrieval works fine and it's expected time. I fear that it may take very long times though. I'll notify you as I have news :) |
I confirm that the first 1-month chunk has been downloaded. I'll be waiting for the entire procedure to end and let you know :) |
@euronion The procedure for the world (+- 180° lat lon) completed in 5 to 12 hours (I run it twice) successfully and produced an output file of 380Gb (large but we are speaking of a lot of data), see the settings below. atlite:
nprocesses: 4
cutouts:
# geographical bounds automatically determined from countries input
world-2013-era5:
module: era5
dx: 0.3 # cutout resolution
dy: 0.3 # cutout resolution
# Below customization options are dealt in an automated way depending on
# the snapshots and the selected countries. See 'build_cutout.py'
time: ["2013-01-01", "2014-01-01"] # specify different weather year (~40 years available)
x: [-180., 180.] # manual set cutout range
y: [-180., 180.] # manual set cutout range As a recommendation, to silence some warning, if interested, the following comment was risen:
The output also makes sense, however, it has some weird white bands, though I don't think this is related to this PR, what do you think? |
As discussed, for efficiency purposes, it may be interesting to decide the number of chunks to divide the output. |
I attempted to compress cutouts during/after creation but without much success. using I would have preferred a solution where compression is done by |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #236 +/- ##
==========================================
- Coverage 72.83% 72.74% -0.09%
==========================================
Files 19 19
Lines 1590 1596 +6
Branches 227 270 +43
==========================================
+ Hits 1158 1161 +3
- Misses 362 363 +1
- Partials 70 72 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@davide-f If you wish to reduce the file size you can follow the instructions in the updated doc: Should save ~50% :) |
Month indicator has been added, e.g. info prompt during creation looks like this to indicate the month currently being retrieved
|
I suggest we offload the heuristic into a separate issue and tackle it if necessary. ATM I think it would be a nice but unnecessary feature. |
RTR @FabianHofmann would you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested by @nworbmot
No idea why the CI keeps failing (no issues locally) and why it is continuing the old CI.yaml with Python 3.8 instead of 3.11 |
Closes #221 .
Change proposed in this Pull Request
Split download of ERA5 into monthly downloads (currently: annual downloads) to prevent too-large downloads from ERA5 CDSAPI.
TODO
Description
Motivation and Context
See #221 .
How Has This Been Tested?
Locally by downloading a large cutout.
Type of change
Checklist
pytest
inside the repository and no unexpected problems came up.doc/
.environment.yaml
file.doc/release_notes.rst
.pre-commit run --all
to lint/format/check my contribution