Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Determine standards for saving ensemble #35

Open
hscannell opened this issue Nov 19, 2021 · 10 comments
Open

Determine standards for saving ensemble #35

hscannell opened this issue Nov 19, 2021 · 10 comments

Comments

@hscannell
Copy link
Member

The ensemble will adapt the standards used in pyqg to stay consistent.

  1. Pickup file generated from the spin up simulation. Contains everything available from the model.to_dataset() method in pyqg.

  2. Each member in the ensemble will contain daily snapshots of 4 upper layer properties:

  • potential vorticity anomaly
  • strain magnitude
  • x particle position
  • y particle position

The dimensions of all 4 variables are (time, y0, x0).

Other things to consider:

  • how many ensemble members should be generated?
  • How long should each member be run? (depends on decoration timescales)
  • What metadata should be preserved?
  • File format (netCDF vs. Zarr)?
    **You cannot append data to an existing variable without re-creating it with netCDF. However, you can append data to an existing variable with Zarr.
@hscannell
Copy link
Member Author

I've copied meta data and attributes from the pyqg pickup file. I've also included both long_name and units for the new variable (strain, vort, x, and y). The dimensions of all data variables are expanded to include ensemble member.

ds_particles = xr.Dataset({
'x': (('time', 'y0', 'x0'), np.reshape(lpa.x.copy(), shape),{'long_name': 'particle position in the x direction', 'units': 'grid point'}),
'y': (('time', 'y0', 'x0'), np.reshape(lpa.y.copy(), shape),{'long_name': 'particle position in the y direction', 'units': 'grid point'}),
'vort': (('time', 'y0', 'x0'), np.reshape(particle_vorticity, shape),{'long_name': 'particle relative vorticity', 'units': 'second -1'}),
'strain': (('time', 'y0', 'x0'), np.reshape(particle_strain, shape),{'long_name': 'particle strain magnitude', 'units': 'unitless'}),
},
coords = {
'x0': (('x0'), np.reshape(x0, shape[1:])[0,:],{'long_name': 'real space grid points in the x direction', 'units': 'grid point'}),
'y0': (('y0'), np.reshape(y0, shape[1:])[:,0],{'long_name': 'real space grid points in the y direction', 'units': 'grid point'}),
'time': (('time'), np.array([m.t]),{'long_name': 'model time', 'units': 'seconds',})
},
attrs = ds_initial.attrs,
)
ds_particles = ds_particles.expand_dims(dim='n') # ensemble member
ds_particles['n'] = [int(n)]
ds_particles['n'].attrs = {'long_name': 'ensemble member', 'units': 'none',}

A single zarr store will be created for each ensemble member. This gets appended to at daily increments during run_with_snapshots.

# Save as Zarr
fn = '/burg/abernathey/users/hillary/lcs/pyqg_ensemble/'+'%03d'%int(n)+'.zarr'
ds_particles = ds_particles.chunk() #this uses a global chunk
if m.t == Tsave:
ds_particles.to_zarr(fn, mode='a', consolidated=True)
else:
ds_particles.to_zarr(fn, mode='a', append_dim='time', consolidated=True)

@rabernat
Copy link
Contributor

rabernat commented Nov 30, 2021

Thanks for this great work Hillary!

A few comments:

  • The units for x, y, x0, and y0 should be meters
  • Don't do 'units': 'none', just leave the units attribute out if you don't have units

Could you call print(ds) and print(ds.info()) on one of the datasets and share the full output here on this issue?

@hscannell
Copy link
Member Author

ds.info()
xarray.Dataset {
dimensions:
	n = 1 ;
	time = 77 ;
	y0 = 1024 ;
	x0 = 1024 ;

variables:
	int64 n(n) ;
		n:long_name = ensemble member ;
		n:units = none ;
	float64 strain(n, time, y0, x0) ;
		strain:long_name = particle strain magnitude ;
		strain:units = unitless ;
	timedelta64[ns] time(time) ;
		time:long_name = model time ;
	float64 vort(n, time, y0, x0) ;
		vort:long_name = particle relative vorticity ;
		vort:units = second -1 ;
	float64 x(n, time, y0, x0) ;
		x:long_name = particle position in the x direction ;
		x:units = grid point ;
	float64 x0(x0) ;
		x0:long_name = real space grid points in the x direction ;
		x0:units = grid point ;
	float64 y(n, time, y0, x0) ;
		y:long_name = particle position in the y direction ;
		y:units = grid point ;
	float64 y0(y0) ;
		y0:long_name = real space grid points in the y direction ;
		y0:units = grid point ;

// global attributes:
	:pyqg:L = 1200000 ;
	:pyqg:M = 262144 ;
	:pyqg:W = 1200000 ;
	:pyqg:beta = 1.3e-11 ;
	:pyqg:del2 = 0.8 ;
	:pyqg:delta = 0.25 ;
	:pyqg:dt = 600.0 ;
	:pyqg:filterfac = 23.6 ;
	:pyqg:nk = 257 ;
	:pyqg:nl = 512 ;
	:pyqg:ntd = 6 ;
	:pyqg:nx = 512 ;
	:pyqg:ny = 512 ;
	:pyqg:nz = 2 ;
	:pyqg:rd = 15000 ;
	:pyqg:rek = 5.787037037037037e-07 ;
	:pyqg:taveint = 86400.0 ;
	:pyqg:tavestart = 86400 ;
	:pyqg:tc = 141120 ;
	:pyqg:tmax = 311040000 ;
	:pyqg:twrite = 50000 ;
	:reference = https://pyqg.readthedocs.io/en/latest/index.html ;
	:title = pyqg: Python Quasigeostrophic Model ;
}
print(ds)

Screen Shot 2021-12-01 at 1 03 16 PM

@hscannell
Copy link
Member Author

I'll fix the units for x,y,x0,y0 and remove the units=none.

Do we leave time as timedelta64[ns] and change units to nanoseconds?? Time prints out as nanoseconds even if I save timedelta64[D].

print(ds.time.values[0])
numpy.timedelta64(86400000000000,'ns')

@rabernat
Copy link
Contributor

rabernat commented Dec 1, 2021

Thanks!

Strain is not unitless. Its units are the same as vorticity.

Don't worry about the timedelta64[ns] stuff. It's fine. That's just how xarray decodes all time variables. You can turn it off by opening the file with decode_times=False.

@hscannell
Copy link
Member Author

Oh cool, I didn't realize that decode_times=False essentially assigns date time units depending on the input.

@rabernat
Copy link
Contributor

rabernat commented Dec 1, 2021

When xarray opens any dataset, by default it does decoding of CF attributes. So the values you see in the opened dataset are not necessarily the same ones you see on disk.

If you do decode_times=False, you bypass this decoding and just get the raw values that are stored on disk. Note that your decoded time variable has no units, but the encoded one does. Look at ds.time.encoding on the decoded dataset to see these hidden parameters.

@hscannell
Copy link
Member Author

Very cool, thanks! I see that units show up on the decoded dataset below, but not on the dataset with raw values.

ds = xr.open_zarr('/burg/abernathey/users/hillary/lcs/pyqg_ensemble/001.zarr', decode_times=False)
print(ds.time.encoding)

{'chunks': (1,),
'preferred_chunks': {'time': 1},
'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
'filters': None,
'dtype': dtype('int64')}

ds = xr.open_zarr('/burg/abernathey/users/hillary/lcs/pyqg_ensemble/001.zarr', decode_times=True)
print(ds.time.encoding)

{'chunks': (1,),
'preferred_chunks': {'time': 1},
'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
'filters': None,
'units': 'days',
'dtype': dtype('int64')}

@hscannell
Copy link
Member Author

The output of the ensemble now includes the stream function p(n, time, y, x), where x and y are model grid points (length 512).

# Eulerian Stream Function
p = m.ifft(m.ph)

I see that p was added to pyqg/xarray_output.py in this PR 🎉 , but because I never call m.to_dataset() here, I have to manually add it.

I renamed the x and y particle positions xpos(n, time, y0, x0) and ypos(n, time, y0, x0) respectively, where y0 and x0 are the particle grid points (length 1024). Does this make sense?

xarray.Dataset {
dimensions:
	n = 1 ;
	time = 1 ;
	y = 512 ;
	x = 512 ;
	y0 = 1024 ;
	x0 = 1024 ;

variables:
	int64 n(n) ;
		n:long_name = ensemble member ;
	float64 p(n, time, y, x) ;
		p:long_name = streamfunction in real space ;
		p:units = meters squared second ^-1 ;
	float64 strain(n, time, y0, x0) ;
		strain:long_name = particle strain magnitude ;
		strain:units = second ^-1 ;
	timedelta64[ns] time(time) ;
		time:long_name = model time ;
	float64 vort(n, time, y0, x0) ;
		vort:long_name = particle relative vorticity ;
		vort:units = second ^-1 ;
	float64 x(x) ;
		x:long_name = model grid points in the x direction ;
		x:units = meters ;
	float64 x0(x0) ;
		x0:long_name = particle grid points in the x direction ;
		x0:units = meters ;
	float64 xpos(n, time, y0, x0) ;
		xpos:long_name = particle position in the x direction ;
		xpos:units = meters ;
	float64 y(y) ;
		y:long_name = model grid points in the y direction ;
		y:units = meters ;
	float64 y0(y0) ;
		y0:long_name = particle grid points in the y direction ;
		y0:units = meters ;
	float64 ypos(n, time, y0, x0) ;
		ypos:long_name = particle position in the y direction ;
		ypos:units = meters ;

// global attributes:
	:pyqg:L = 1200000 ;
	:pyqg:M = 262144 ;
	:pyqg:W = 1200000 ;
	:pyqg:beta = 1.3e-11 ;
	:pyqg:del2 = 0.8 ;
	:pyqg:delta = 0.25 ;
	:pyqg:dt = 600.0 ;
	:pyqg:filterfac = 23.6 ;
	:pyqg:nk = 257 ;
	:pyqg:nl = 512 ;
	:pyqg:ntd = 6 ;
	:pyqg:nx = 512 ;
	:pyqg:ny = 512 ;
	:pyqg:nz = 2 ;
	:pyqg:rd = 15000 ;
	:pyqg:rek = 5.787037037037037e-07 ;
	:pyqg:taveint = 86400 ;
	:pyqg:tavestart = 86400 ;
	:pyqg:tc = 777600 ;
	:pyqg:tmax = 466560000 ;
	:pyqg:twrite = 50000 ;
	:reference = https://pyqg.readthedocs.io/en/latest/index.html ;
	:title = pyqg: Python Quasigeostrophic Model ;
}

@hscannell
Copy link
Member Author

Screen Shot 2021-12-03 at 3 40 54 PM

Should we call vorticity and strain something else besides vort and strain? Like q and s? Although there is already a q in pyqg, so that might get confusing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants