Determine standards for saving ensemble #35

hscannell · 2021-11-19T21:27:34Z

The ensemble will adapt the standards used in pyqg to stay consistent.

Pickup file generated from the spin up simulation. Contains everything available from the model.to_dataset() method in pyqg.
Each member in the ensemble will contain daily snapshots of 4 upper layer properties:

potential vorticity anomaly
strain magnitude
x particle position
y particle position

The dimensions of all 4 variables are (time, y0, x0).

Other things to consider:

how many ensemble members should be generated?
How long should each member be run? (depends on decoration timescales)
What metadata should be preserved?
File format (netCDF vs. Zarr)?
**You cannot append data to an existing variable without re-creating it with netCDF. However, you can append data to an existing variable with Zarr.

The text was updated successfully, but these errors were encountered:

hscannell · 2021-11-22T19:26:17Z

I've copied meta data and attributes from the pyqg pickup file. I've also included both long_name and units for the new variable (strain, vort, x, and y). The dimensions of all data variables are expanded to include ensemble member.

lcs-ml/ensemble_particle_generator/ensemble_generator.py

Lines 94 to 110 in 3281517

    
           ds_particles = xr.Dataset({ 
        
               'x': (('time', 'y0', 'x0'), np.reshape(lpa.x.copy(), shape),{'long_name': 'particle position in the x direction', 'units': 'grid point'}), 
        
               'y': (('time', 'y0', 'x0'), np.reshape(lpa.y.copy(), shape),{'long_name': 'particle position in the y direction', 'units': 'grid point'}), 
        
               'vort': (('time', 'y0', 'x0'), np.reshape(particle_vorticity, shape),{'long_name': 'particle relative vorticity', 'units': 'second -1'}), 
        
               'strain': (('time', 'y0', 'x0'), np.reshape(particle_strain, shape),{'long_name': 'particle strain magnitude', 'units': 'unitless'}), 
        
           }, 
        
               coords = { 
        
                   'x0': (('x0'), np.reshape(x0, shape[1:])[0,:],{'long_name': 'real space grid points in the x direction', 'units': 'grid point'}), 
        
                   'y0': (('y0'), np.reshape(y0, shape[1:])[:,0],{'long_name': 'real space grid points in the y direction', 'units': 'grid point'}), 
        
                   'time': (('time'), np.array([m.t]),{'long_name': 'model time', 'units': 'seconds',}) 
        
               }, 
        
               attrs = ds_initial.attrs, 
        
           ) 
        
           ds_particles = ds_particles.expand_dims(dim='n') # ensemble member 
        
           ds_particles['n'] = [int(n)] 
        
           ds_particles['n'].attrs = {'long_name': 'ensemble member', 'units': 'none',}

A single zarr store will be created for each ensemble member. This gets appended to at daily increments during run_with_snapshots.

lcs-ml/ensemble_particle_generator/ensemble_generator.py

Lines 113 to 119 in 3281517

    
           # Save as Zarr 
        
           fn = '/burg/abernathey/users/hillary/lcs/pyqg_ensemble/'+'%03d'%int(n)+'.zarr'  
        
           ds_particles = ds_particles.chunk() #this uses a global chunk 
        
           if m.t == Tsave: 
        
               ds_particles.to_zarr(fn, mode='a', consolidated=True) 
        
           else: 
        
               ds_particles.to_zarr(fn, mode='a', append_dim='time', consolidated=True)

rabernat · 2021-11-30T21:55:00Z

Thanks for this great work Hillary!

A few comments:

The units for x, y, x0, and y0 should be meters
Don't do 'units': 'none', just leave the units attribute out if you don't have units

Could you call print(ds) and print(ds.info()) on one of the datasets and share the full output here on this issue?

hscannell · 2021-12-01T18:03:36Z

ds.info()

xarray.Dataset {
dimensions:
	n = 1 ;
	time = 77 ;
	y0 = 1024 ;
	x0 = 1024 ;

variables:
	int64 n(n) ;
		n:long_name = ensemble member ;
		n:units = none ;
	float64 strain(n, time, y0, x0) ;
		strain:long_name = particle strain magnitude ;
		strain:units = unitless ;
	timedelta64[ns] time(time) ;
		time:long_name = model time ;
	float64 vort(n, time, y0, x0) ;
		vort:long_name = particle relative vorticity ;
		vort:units = second -1 ;
	float64 x(n, time, y0, x0) ;
		x:long_name = particle position in the x direction ;
		x:units = grid point ;
	float64 x0(x0) ;
		x0:long_name = real space grid points in the x direction ;
		x0:units = grid point ;
	float64 y(n, time, y0, x0) ;
		y:long_name = particle position in the y direction ;
		y:units = grid point ;
	float64 y0(y0) ;
		y0:long_name = real space grid points in the y direction ;
		y0:units = grid point ;

// global attributes:
	:pyqg:L = 1200000 ;
	:pyqg:M = 262144 ;
	:pyqg:W = 1200000 ;
	:pyqg:beta = 1.3e-11 ;
	:pyqg:del2 = 0.8 ;
	:pyqg:delta = 0.25 ;
	:pyqg:dt = 600.0 ;
	:pyqg:filterfac = 23.6 ;
	:pyqg:nk = 257 ;
	:pyqg:nl = 512 ;
	:pyqg:ntd = 6 ;
	:pyqg:nx = 512 ;
	:pyqg:ny = 512 ;
	:pyqg:nz = 2 ;
	:pyqg:rd = 15000 ;
	:pyqg:rek = 5.787037037037037e-07 ;
	:pyqg:taveint = 86400.0 ;
	:pyqg:tavestart = 86400 ;
	:pyqg:tc = 141120 ;
	:pyqg:tmax = 311040000 ;
	:pyqg:twrite = 50000 ;
	:reference = https://pyqg.readthedocs.io/en/latest/index.html ;
	:title = pyqg: Python Quasigeostrophic Model ;
}

print(ds)

hscannell · 2021-12-01T18:06:52Z

I'll fix the units for x,y,x0,y0 and remove the units=none.

Do we leave time as timedelta64[ns] and change units to nanoseconds?? Time prints out as nanoseconds even if I save timedelta64[D].

print(ds.time.values[0])

numpy.timedelta64(86400000000000,'ns')

rabernat · 2021-12-01T18:13:47Z

Thanks!

Strain is not unitless. Its units are the same as vorticity.

Don't worry about the timedelta64[ns] stuff. It's fine. That's just how xarray decodes all time variables. You can turn it off by opening the file with decode_times=False.

hscannell · 2021-12-01T18:56:38Z

Oh cool, I didn't realize that decode_times=False essentially assigns date time units depending on the input.

rabernat · 2021-12-01T18:59:19Z

When xarray opens any dataset, by default it does decoding of CF attributes. So the values you see in the opened dataset are not necessarily the same ones you see on disk.

If you do decode_times=False, you bypass this decoding and just get the raw values that are stored on disk. Note that your decoded time variable has no units, but the encoded one does. Look at ds.time.encoding on the decoded dataset to see these hidden parameters.

hscannell · 2021-12-01T19:06:33Z

Very cool, thanks! I see that units show up on the decoded dataset below, but not on the dataset with raw values.

ds = xr.open_zarr('/burg/abernathey/users/hillary/lcs/pyqg_ensemble/001.zarr', decode_times=False)
print(ds.time.encoding)

{'chunks': (1,),
'preferred_chunks': {'time': 1},
'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
'filters': None,
'dtype': dtype('int64')}

ds = xr.open_zarr('/burg/abernathey/users/hillary/lcs/pyqg_ensemble/001.zarr', decode_times=True)
print(ds.time.encoding)

{'chunks': (1,),
'preferred_chunks': {'time': 1},
'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
'filters': None,
'units': 'days',
'dtype': dtype('int64')}

hscannell · 2021-12-03T20:36:31Z

The output of the ensemble now includes the stream function p(n, time, y, x), where x and y are model grid points (length 512).

lcs-ml/ensemble_particle_generator/ensemble_generator.py

Lines 83 to 84 in 5580d9e

    
           # Eulerian Stream Function 
        
           p = m.ifft(m.ph)

I see that p was added to pyqg/xarray_output.py in this PR 🎉 , but because I never call m.to_dataset() here, I have to manually add it.

I renamed the x and y particle positions xpos(n, time, y0, x0) and ypos(n, time, y0, x0) respectively, where y0 and x0 are the particle grid points (length 1024). Does this make sense?

xarray.Dataset {
dimensions:
	n = 1 ;
	time = 1 ;
	y = 512 ;
	x = 512 ;
	y0 = 1024 ;
	x0 = 1024 ;

variables:
	int64 n(n) ;
		n:long_name = ensemble member ;
	float64 p(n, time, y, x) ;
		p:long_name = streamfunction in real space ;
		p:units = meters squared second ^-1 ;
	float64 strain(n, time, y0, x0) ;
		strain:long_name = particle strain magnitude ;
		strain:units = second ^-1 ;
	timedelta64[ns] time(time) ;
		time:long_name = model time ;
	float64 vort(n, time, y0, x0) ;
		vort:long_name = particle relative vorticity ;
		vort:units = second ^-1 ;
	float64 x(x) ;
		x:long_name = model grid points in the x direction ;
		x:units = meters ;
	float64 x0(x0) ;
		x0:long_name = particle grid points in the x direction ;
		x0:units = meters ;
	float64 xpos(n, time, y0, x0) ;
		xpos:long_name = particle position in the x direction ;
		xpos:units = meters ;
	float64 y(y) ;
		y:long_name = model grid points in the y direction ;
		y:units = meters ;
	float64 y0(y0) ;
		y0:long_name = particle grid points in the y direction ;
		y0:units = meters ;
	float64 ypos(n, time, y0, x0) ;
		ypos:long_name = particle position in the y direction ;
		ypos:units = meters ;

// global attributes:
	:pyqg:L = 1200000 ;
	:pyqg:M = 262144 ;
	:pyqg:W = 1200000 ;
	:pyqg:beta = 1.3e-11 ;
	:pyqg:del2 = 0.8 ;
	:pyqg:delta = 0.25 ;
	:pyqg:dt = 600.0 ;
	:pyqg:filterfac = 23.6 ;
	:pyqg:nk = 257 ;
	:pyqg:nl = 512 ;
	:pyqg:ntd = 6 ;
	:pyqg:nx = 512 ;
	:pyqg:ny = 512 ;
	:pyqg:nz = 2 ;
	:pyqg:rd = 15000 ;
	:pyqg:rek = 5.787037037037037e-07 ;
	:pyqg:taveint = 86400 ;
	:pyqg:tavestart = 86400 ;
	:pyqg:tc = 777600 ;
	:pyqg:tmax = 466560000 ;
	:pyqg:twrite = 50000 ;
	:reference = https://pyqg.readthedocs.io/en/latest/index.html ;
	:title = pyqg: Python Quasigeostrophic Model ;
}

hscannell · 2021-12-03T20:42:54Z

Should we call vorticity and strain something else besides vort and strain? Like q and s? Although there is already a q in pyqg, so that might get confusing.

hscannell added this to the 4. Create ensemble of QG models milestone Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine standards for saving ensemble #35

Determine standards for saving ensemble #35

hscannell commented Nov 19, 2021

hscannell commented Nov 22, 2021

rabernat commented Nov 30, 2021 •

edited

Loading

hscannell commented Dec 1, 2021

hscannell commented Dec 1, 2021

rabernat commented Dec 1, 2021

hscannell commented Dec 1, 2021

rabernat commented Dec 1, 2021

hscannell commented Dec 1, 2021

hscannell commented Dec 3, 2021

hscannell commented Dec 3, 2021

Determine standards for saving ensemble #35

Determine standards for saving ensemble #35

Comments

hscannell commented Nov 19, 2021

hscannell commented Nov 22, 2021

rabernat commented Nov 30, 2021 • edited Loading

hscannell commented Dec 1, 2021

hscannell commented Dec 1, 2021

rabernat commented Dec 1, 2021

hscannell commented Dec 1, 2021

rabernat commented Dec 1, 2021

hscannell commented Dec 1, 2021

hscannell commented Dec 3, 2021

hscannell commented Dec 3, 2021

rabernat commented Nov 30, 2021 •

edited

Loading