Batch performant discover_nhdplus_id #417

mhweber · 2024-12-16T17:55:31Z

Currently the StreamCatTools sc_get_comid() function is calling discover_nhdplus_id to derive NHDPlus COMIDs for sets of lat and lons values. A number of users have recently been trying to speed this up through parallelizing or sending batch requests that exceed server limit in underlying NLDI service.

StreamCatTools has a similar function called lc_get_comid which calls nhdplusTools get_waterbodies and pulls NHDPlus waterbody COMIDs from the subset features.

Would calling NHDPlus subset service directly or via nhdplusTools be more performant and robust than discover_nhdplus_id for large calls to derive COMIDs for a large set of lat and lons?

The text was updated successfully, but these errors were encountered:

dblodgett-usgs · 2024-12-17T02:55:03Z

Thanks for prompting this, @mhweber -- I've run into this use case a few times where people have long lists and end up using patterns that don't scale well. I'll look at an alternate discover_nhdplus_id implementation and put some thought into whether there is a faster way to do it via geoserver services.

… for #417

dblodgett-usgs · 2024-12-17T17:34:24Z

I just merged a change that will help a bit. Will leave this open and think about whether there's a more significant update where we could do a spatial join remotely to retrieve comids.

DEQathomps · 2024-12-19T22:07:08Z

This is timely. I'm really interested in using nhdplusTools for watershed delineations. Will you please explain how to batch process? Following the code from the vignette, this function works great when dealing with a single station (only first lines of code presented for simplicity):

start_point <- st_sfc(st_point(c(-122.802489389074, 43.85780225517)), crs = 4269)
start_comid <- discover_nhdplus_id(start_point)

However, processing multiple stations at once results in errors, server timeouts, etc. Example code below.

lon2<-c(-122.802489389074, -122.691787093599)
lat2<-c(43.85780225517, 43.9239837521485)
start_points <- st_sfc(st_point(c(lon2, lat2)), crs = 4269)
start_comids <- discover_nhdplus_id(start_points)

I run into similar complications with other steps (e.g., flowlines, catchments). Have tried multiple approaches and have the most recent package installed.

Has anyone processed multiple stations simultaneously or been able to create batch watershed delineations? Any tips would be much appreciated!

dblodgett-usgs · 2024-12-27T02:14:51Z

Under the hood, discover_nhdplus_id() for a point is doing a point in polygon against the NHDPlusV2.

In a previous version, it was calling a web service with a little more overhead than what I switched it to in the current implementation but it's still basically just dropping your point into a catchment.

For batches of points, downloading the NHDPlusV2 catchments and using sf::st_join() to get the COMID for each point is going to be best.

If you need to iterate against the web service, it is best to use httr::RETRY() or httr2::req_retry() with exponential back off to ensure you don't overload the server. Any degree of parallelism is probably not advised as the back end systems are not architected for it.

mhweber · 2024-12-27T06:09:30Z

Thanks @dblodgett-usgs! @DEQathomps I've put together geoparquet files for all NHDPlusV2 lake watersheds and am interested in publishing in an S3 bucket and was curious about potentially doing the same for NHDPlusV2 reach COMIDs. We have a method tied to StreamCat and LakeCat that uses staged numpy arrays to (fairly) quickly generate watersheds on the fly but publishing as geoparquet seems like it would be a useful product. The intent would be to add functionality to StreamCatTools to request watersheds for given lakes or reaches. Looking into the feasibility of this at the moment.

dblodgett-usgs · 2024-12-27T19:16:11Z

I guess I'm missing something -- you are talking about "watershed delineation" and "generating watersheds" but discover_nhdplus_id() just returns a COMID for a place in the network. What's the rest of the workflow look like here?

mhweber · 2024-12-27T22:00:57Z

Sorry @dblodgett-usgs - not really related to this issue, my bad - was just following up on @DEQathomps question above, has anyone "been able to create batch watershed delineations" - just wanted to point out I was working toward potentially sharing out staged watershed delineations for lakes via geoparquet in S3. Which may or may not really be a viable way - but I have them all and want to make easily accessible.

mhweber · 2025-01-27T17:35:49Z

@dblodgett-usgs I'm guessing you can close this with the recommendation for using httr::RETRY or httr2::req_retry or alternatively downloading the NHDPlusV2 catchments and using sf::st_join() to get the COMIDs unless you want to leave this open for some further developments you're thinking of?

dblodgett-usgs · 2025-01-31T13:12:28Z

I'll leave it open. I want to think some more about whether there is a more scalable way to do discovery.

dblodgett-usgs added a commit that referenced this issue Dec 17, 2024

improve performance of discover nhdplus by calling geoserver directly…

b517cfd

… for #417

mhweber mentioned this issue Dec 4, 2024

Warning message for sc_get_comid() du to rate limiting of the NLDI service USEPA/StreamCatTools#53

Open

glitt13 mentioned this issue Feb 10, 2025

Batch comid retrievals from nhdplus NOAA-OWP/formulation-selector#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch performant discover_nhdplus_id #417

Batch performant discover_nhdplus_id #417

mhweber commented Dec 16, 2024

dblodgett-usgs commented Dec 17, 2024

dblodgett-usgs commented Dec 17, 2024

DEQathomps commented Dec 19, 2024

dblodgett-usgs commented Dec 27, 2024

mhweber commented Dec 27, 2024

dblodgett-usgs commented Dec 27, 2024

mhweber commented Dec 27, 2024

mhweber commented Jan 27, 2025

dblodgett-usgs commented Jan 31, 2025

Batch performant discover_nhdplus_id #417

Batch performant discover_nhdplus_id #417

Comments

mhweber commented Dec 16, 2024

dblodgett-usgs commented Dec 17, 2024

dblodgett-usgs commented Dec 17, 2024

DEQathomps commented Dec 19, 2024

dblodgett-usgs commented Dec 27, 2024

mhweber commented Dec 27, 2024

dblodgett-usgs commented Dec 27, 2024

mhweber commented Dec 27, 2024

mhweber commented Jan 27, 2025

dblodgett-usgs commented Jan 31, 2025