Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Store the NW2 baseline simulations on the cloud #8

Open
gustavo-marques opened this issue Mar 18, 2022 · 17 comments
Open

Store the NW2 baseline simulations on the cloud #8

gustavo-marques opened this issue Mar 18, 2022 · 17 comments

Comments

@gustavo-marques
Copy link

Our Cloud Data Guide states that the CPT has access to an a 10 TB allocation on OSN. To store all the baselines we will need ~ 17.5 TB of storage (assuming the compression when converting the data into Zarr won't matter too much). We only saved the "full output" over the last 2,000 days of simulation, so there is not much data cleaning to be done.
Any ideas on how we should proceed?

cc'ing @rabernat @adcroft @NoraLoose

@rabernat
Copy link
Contributor

I think we can ask OSN for more space. I will do that and follow up here when I have a response.

@NoraLoose
Copy link
Member

Thanks for checking with OSN @rabernat! Have you heard back from them?

@NoraLoose
Copy link
Member

It would be good if we could make progress on this issue soon, since the revisions of several manuscripts depend on having the data openly available.

If OSN is not an option, what could be alternatives?

@LaureZanna
Copy link

@NoraLoose Thanks for that! I agree we need to find a solution since we have to submit the revisions of NW2 (soonish).
@rabernat : is OSN an option for us (looks like you ask them for more space ) ? Thanks!

@rabernat
Copy link
Contributor

rabernat commented Jun 15, 2022

Sorry for not replying sooner. Yes, we can store the data in OSN via Pangeo Forge!

To ingest the data, the next step is to open an issue here: https://github.com/pangeo-forge/staged-recipes/issues

@NoraLoose - if you can get that started, I will chime in and help move it forward.

@NoraLoose
Copy link
Member

Thanks @rabernat! I am happy to help getting the NeverWorld2 data in the cloud, and can open a PR on pangeo.

Just for clarification: Are the instructions here not up-to-date anymore?

@rabernat
Copy link
Contributor

Things have evolved considerably since I wrote that. Pangeo Forge didn't even exist yet. 🙃 The instructions are not wrong per se, but we have better systems now. This is explains what Pangeo Forge is and why we are building it: https://pangeo-forge.readthedocs.io/en/latest/what_is_pangeo_forge.html

@NoraLoose
Copy link
Member

@gustavo-marques, @adcroft, @LaureZanna and others:

Before we start the process over at pangeo-forge, let's agree on which of the NeverWorld2 output files we want to make available on the cloud.

We have 8 experiments (4 different resolutions, with hmix=5 and hmix=20). Here are the relevant output files:

  • averages_*.nc (5-day averages)
  • snapshots_*.nc (snapshots at 5-day frequency)
  • longmean_*.nc (500-day averages, but over the full spin-up)
  • multiple ocean.stats.nc files per run due to restart (time series of domain-integrated metrics like APE, KE over full spin-up)
  • static.nc (holding the grid information)
  • Restart files

Do we want to make all of these available? Am I missing some?

@gustavo-marques
Copy link
Author

@NoraLoose, thank you very much for working on this!

The only file you missed is Vertical_coordinate.nc, but I do not know if anyone is using this file in their analysis. We do not use it in the NW2 description paper. @ElizabethYankovsky, please let us know if you use this file in your analysis.

We might want to combine ocean.stats.* into single files since these files are small. The 0.25 deg runs have a lot of restart files. Perhaps we only need to upload a few of them (every 10 years?).

@ElizabethYankovsky
Copy link

Thanks @NoraLoose and @gustavo-marques! No, I'm not using the Vertical_coordinate.nc file in my analysis.

@LaureZanna
Copy link

Looks to me @NoraLoose !

@NoraLoose
Copy link
Member

Yes, I think we can skip the Vertical_coordinate.nc file because the Layer Potential Densities are also contained in the averages and snapshot files.

@gustavo-marques:

Could you work on combining the ocean.stats.* into a single file? (Simply concatenating the files will result in double time stamps.) Then I will get the process going at pangeo-forge in the meantime.

As for the restart files: Maybe we only need the last (and possibly first) restart file, so people can extend the NW2 time series for each resolution?

@gustavo-marques
Copy link
Author

@NoraLoose: yes, I will combine the ocean.stats.* for each experiment. Thanks!

@gustavo-marques
Copy link
Author

@NoraLoose: I've combined the ocean.stats.* file for each experiment. The combined file is called ocean_stats.nc and is located on the same directory level as the other files (averages, snapshots, etc).

@rabernat
Copy link
Contributor

Great to see this proposal!

The main blocker here is getting the data off of glade. In order for Pangeo Forge to access the data, CISL needs to upgrade their Globus installation to V5. Fortunately, according to CISL, that should be very soon (days or weeks).

@gustavo-marques and @NoraLoose - could you let me know your UCAR usernames so I can have you added to the Globus trial?

@NoraLoose
Copy link
Member

My username is noraloose.

@gustavo-marques
Copy link
Author

Mine is gmarques.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants