Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Trim FES2022 Loading Areas #559

Open
phillipjws opened this issue Jan 6, 2025 · 4 comments
Open

Trim FES2022 Loading Areas #559

phillipjws opened this issue Jan 6, 2025 · 4 comments

Comments

@phillipjws
Copy link

Hello,
I have been using CoastSat for quite some time now, with a main bottleneck in my automation being the loading of the FES2022 config files. I was digging around to see if there was any optimizations to reduce RAM usage in this part of the toolkit, and I found that in CoastSeg they trim the area, allowing the user to only load what they need. I tried to write a short script to automate this based on an input polygon, and this was my outcome:

import os
import xarray as xr
import json
import numpy as np
from glob import glob

def get_geometry_from_geojson(geojson_file):
    """
    Load the geometry (polygon) from a GeoJSON file.
    """
    with open(geojson_file) as f:
        geojson = json.load(f)
    return geojson["features"][0]["geometry"]

def clip_to_region(nc_files, geometry, output_dir):
    """
    Clips NetCDF files to the specified region and saves the clipped files.
    Longitudes are kept in the 0 to 360 range, and filenames are not altered.
    """
    coords = np.array(geometry["coordinates"][0])
    lon_min, lon_max = coords[:, 0].min(), coords[:, 0].max()
    lat_min, lat_max = coords[:, 1].min(), coords[:, 1].max()

    if lon_min < 0:
        lon_min += 360
    if lon_max < 0:
        lon_max += 360

    for file_path in nc_files:
        print(f"Processing: {file_path}")
        ds = xr.open_dataset(file_path, engine="netcdf4")

        # Ensure longitude is in 0° to 360° format
        if ds.lon.min() < 0:
            ds = ds.assign_coords({"lon": (ds.lon % 360)}).sortby("lon")

        # Select the region
        clipped_ds = ds.sel(
            lon=slice(lon_min, lon_max),
            lat=slice(lat_min, lat_max)
        )

        # Preserve metadata
        clipped_ds.attrs = ds.attrs
        for var in clipped_ds.data_vars:
            clipped_ds[var].attrs = ds[var].attrs

        # Save the clipped file to the same name in the output directory
        filename = os.path.basename(file_path)
        output_path = os.path.join(output_dir, filename)
        os.makedirs(output_dir, exist_ok=True)
        clipped_ds.to_netcdf(output_path)
        print(f"Saved clipped file to: {output_path}")

def main():
    # Path to the GeoJSON file
    geojson_file = r"C:\...\region.geojson"

    # Directories containing NetCDF files
    load_tide_dir = r"C:\...\load_tide"
    ocean_tide_dir = r"C:\...\ocean_tide"

    # Output directory for clipped files
    output_dir = r"C:\...\clipped_files"

    # Load the geometry
    geometry = get_geometry_from_geojson(geojson_file)

    # Find NetCDF files
    load_tide_files = glob(os.path.join(load_tide_dir, "*.nc"))
    ocean_tide_files = glob(os.path.join(ocean_tide_dir, "*.nc"))

    # Clip and save files
    print("Clipping load_tide files...")
    clip_to_region(load_tide_files, geometry, os.path.join(output_dir, "load_tide"))

    print("Clipping ocean_tide files...")
    clip_to_region(ocean_tide_files, geometry, os.path.join(output_dir, "ocean_tide"))

if __name__ == "__main__":
    main()

This might only be useful on a larger scale, as I'm not too sure how useful this would be to trim individually for each AOI. You get the same results when using these files as the original one, with a fraction of the resource usage.

Something like this being incorporated into the workflow of CoastSat would be very helpful, although may be worth investigating to see if the user should be trimming for each AOI or a bigger polygon containing all of their specific AOIs like I have.

Let me know your thoughts on incorporating this as a feature, as load time and resource usage are greatly reduced through this method.

@kvos
Copy link
Owner

kvos commented Jan 13, 2025

hey @phillipjws , I think it's a great feature!!! I have noticed the RAM usage too with FES2022. It was not the case with FES2014 where it wouldn't load everything in memory.
I like your solution very much. If you want to go ahead and make a pull request it would be great. It could be integrated by making a copy in the AOI folder (data/sitename) of the clipped netcdf files and then only loading from there for that given AOI.

@phillipjws
Copy link
Author

Awesome, I will work on a PR throughout this week @kvos. I will likely do a little bit of refactoring from the original script I posted to make it better match with the rest of the toolkit. I'm not too sure that clipping per AOI is the best methodology, as I'm not too sure if this will save time/resources apart from when the same AOI is being rerun. For my purposes I have been focusing initially on the western coast of Canada, and so have been using a region that contains all of Canada, which has significantly reduced the time needed. As long as it contains all of the AOIs the user may be interested in, I think this is a good approach, although I'm keen to anything, let me know your thoughts. Can write some test cases on an AOI scale to see how much time is saved aswell. Let me know whether I should put the new code in SDS_slope.py (due to it being related to tidal modelling) or if it would better fit under SDS_tools.py.

Thanks @kvos !

@kvos
Copy link
Owner

kvos commented Jan 13, 2025

ok makes sense for a larger region. Just make sure not to overwrite the original files, so the user always has the choice to go for global. yes good suggestion, it can go in SDS_slope.py

@phillipjws
Copy link
Author

Sounds good, I will likely create a new subfolder in the root of the package to hold these files, and will create a new yaml in the same spot. This will be easy to automate, as we will already know where all of the cdf files are. Thanks

This was referenced Jan 21, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants