-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Batch performant discover_nhdplus_id #417
Comments
Thanks for prompting this, @mhweber -- I've run into this use case a few times where people have long lists and end up using patterns that don't scale well. I'll look at an alternate discover_nhdplus_id implementation and put some thought into whether there is a faster way to do it via geoserver services. |
I just merged a change that will help a bit. Will leave this open and think about whether there's a more significant update where we could do a spatial join remotely to retrieve comids. |
This is timely. I'm really interested in using nhdplusTools for watershed delineations. Will you please explain how to batch process? Following the code from the vignette, this function works great when dealing with a single station (only first lines of code presented for simplicity): start_point <- st_sfc(st_point(c(-122.802489389074, 43.85780225517)), crs = 4269) However, processing multiple stations at once results in errors, server timeouts, etc. Example code below. lon2<-c(-122.802489389074, -122.691787093599) I run into similar complications with other steps (e.g., flowlines, catchments). Have tried multiple approaches and have the most recent package installed. Has anyone processed multiple stations simultaneously or been able to create batch watershed delineations? Any tips would be much appreciated! |
Under the hood, discover_nhdplus_id() for a point is doing a point in polygon against the NHDPlusV2. In a previous version, it was calling a web service with a little more overhead than what I switched it to in the current implementation but it's still basically just dropping your point into a catchment. For batches of points, downloading the NHDPlusV2 catchments and using sf::st_join() to get the COMID for each point is going to be best. If you need to iterate against the web service, it is best to use |
Thanks @dblodgett-usgs! @DEQathomps I've put together geoparquet files for all NHDPlusV2 lake watersheds and am interested in publishing in an S3 bucket and was curious about potentially doing the same for NHDPlusV2 reach COMIDs. We have a method tied to StreamCat and LakeCat that uses staged numpy arrays to (fairly) quickly generate watersheds on the fly but publishing as geoparquet seems like it would be a useful product. The intent would be to add functionality to StreamCatTools to request watersheds for given lakes or reaches. Looking into the feasibility of this at the moment. |
I guess I'm missing something -- you are talking about "watershed delineation" and "generating watersheds" but |
Sorry @dblodgett-usgs - not really related to this issue, my bad - was just following up on @DEQathomps question above, has anyone "been able to create batch watershed delineations" - just wanted to point out I was working toward potentially sharing out staged watershed delineations for lakes via geoparquet in S3. Which may or may not really be a viable way - but I have them all and want to make easily accessible. |
@dblodgett-usgs I'm guessing you can close this with the recommendation for using |
I'll leave it open. I want to think some more about whether there is a more scalable way to do discovery. |
Currently the StreamCatTools sc_get_comid() function is calling
discover_nhdplus_id
to derive NHDPlus COMIDs for sets of lat and lons values. A number of users have recently been trying to speed this up through parallelizing or sending batch requests that exceed server limit in underlying NLDI service.StreamCatTools has a similar function called lc_get_comid which calls nhdplusTools
get_waterbodies
and pulls NHDPlus waterbody COMIDs from the subset features.Would calling NHDPlus subset service directly or via nhdplusTools be more performant and robust than
discover_nhdplus_id
for large calls to derive COMIDs for a large set of lat and lons?The text was updated successfully, but these errors were encountered: