Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

STAC requests failing #386

Open
waltersdan opened this issue Jan 9, 2025 · 10 comments
Open

STAC requests failing #386

waltersdan opened this issue Jan 9, 2025 · 10 comments

Comments

@waltersdan
Copy link

Hello,
We've been seeing STAC requests failing fairly frequently for the last few days. The requests fail with various errors, and seem to continue failing on retries for a while. It seems to vary from a few minutes to several hours.

Specifically, we have been querying https://cmr.earthdata.nasa.gov/stac/LPCLOUD for the HLSL30 and HLSS30 collections. We have also tried the cloudstac endpoint with similar results. Some examples of the errors have been:

"errors":["GraphQL Error (Code: 504): {\\"response\\":{\\"message\\":\\"Endpoint request timed out\\",\\"status\\":504,\\"headers\\":{}},\\"request\\":{\\"query\\":\\"\\\\n query getCollectionsIds($params: CollectionsInput!) {\\\\n collections(params: $params) {\\\\n count\\\\n cursor\\\\n items {\\\\n conceptId\\\\n entryId\\\\n title\\\\n provider\\\\n }\\\\n }\\\\n }\\\\n\\",\\"variables\\":{\\"params\\":{\\"provider\\":\\"LPCLOUD\\",\\"limit\\":100}}}}"]}',)

{"errors":["An Internal Error has occurred.: {\\"response\\":{\\"errors\\":[{\\"message\\":\\"An Internal Error has occurred.\\",\\"locations\\":[{\\"line\\":3,\\"column\\":5}],\\"path\\":[\\"collections\\"],\\"extensions\\":{\\"code\\":\\"CMR_ERROR\\",\\"stacktrace\\":[\\"GraphQLError: An Internal Error has occurred.\\",\\" at _n (/var/task/src/graphql/handler.js:3968:9410)\\",\\" at Mc.parse (/var/task/src/graphql/handler.js:3968:17828)\\",\\" at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\\",\\" at async Object.k0e [as collectionSourceFetch] (/var/task/src/graphql/handler.js:4044:16429)\\",\\" at async r (/var/task/src/graphql/handler.js:4044:34423)\\"]}}],\\"data\\":null,\\"status\\":200,\\"headers\\":{}},\\"request\\":{\\"query\\":\\"\\\\n query getCollectionsIds($params: CollectionsInput!) {\\\\n collections(params: $params) {\\\\n count\\\\n cursor\\\\n items {\\\\n conceptId\\\\n entryId\\\\n title\\\\n provider\\\\n }\\\\n }\\\\n }\\\\n\\",\\"variables\\":{\\"params\\":{\\"provider\\":\\"LPCLOUD\\",\\"limit\\":100}}}}"]}',)

{"errors":["Oops! Something has gone wrong. We have been alerted and are working to resolve the problem. Please try your request again later."]}',) -> {"errors":["Oops! Something has gone wrong. We have been alerted and are working to resolve the problem. Please try your request again later."]}

<html>\n <head>\n <title>Request Limit Exceeded</title>\n </head>\n <body>\n <p>CMR Search rate exceeded. Please refer to the following for guidance: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#request-moderation\n</p>\n </body>\n</html>

@matus-hodul
Copy link

We are seeing the same. There was significant down time when querying HLS throughout the day on Thursday and Friday of last week (9 and 10 Jan):

Image

And still downtime on Monday, though less than the previous days:

Image

There also appears to be a consistent spike of downtime around 1 PM pacific even on weekends:

Image

Same errors are being seen as by @waltersdan above.

@eudoroolivares2016
Copy link
Contributor

Going to investigate and determine if the signature of the data-harvesting has been updated if so we will readjust our limiting rules and look to see if theres another solution we can make to permanently fix this. Thank you we appreciate your patience

@waltersdan
Copy link
Author

I'm not sure if this helps, but as an example today, after a period of inactivity, I tried a single STAC query and on the very first attempt it gave me the 'CMR Search rate exceeded' message.

@eudoroolivares2016
Copy link
Contributor

Hello, wanted to reach out and mention that we were discussing this yesterday. We are going to reassess our current throttling rules around cmr-stac. The reason that these rules are in place are to protect our upstream CMR API from activity that we've seen where data harvesters were causing degraded services for all of the CMR API users. Incidentally some of those times do overlap with a known CMR outage unrelated to cmr-stac (The Jan 9 and 10 dates).

@waltersdan That is helpful thank you

@eudoroolivares2016
Copy link
Contributor

@waltersdan Are you still running into issues? I have further increased the allowed quantity of requests that we allow through cmr-stac . Can you provide a code snippet of how you are requesting against /stac we know that pystac , a common client can page over the stac API such that CMR itself is sent significant traffic hence why the rate limiting was implemented with some code snippits we may be able to advise on ways to reduce the amount of calls per minute to ensure your use-case is met.

@waltersdan
Copy link
Author

Thanks for the update! At first glance this morning things seem to be running smoothly - we'll keep an eye on it for the next few days and let you know.

We are using pystac_client, which I understand builds on pystac. A minimal example of our current code:

from pystac_client import Client

url = "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
params = {'intersects': {'type': 'Polygon',
          'coordinates': [[[-120.5, 49.3],
                        [-120.4, 49.3],
                        [-120.4, 49.4],
                        [-120.5, 49.4],
                        [-120.5, 49.3]]]},
         'collections': ['HLSL30_2.0'],
         'datetime': '2013-01-01/2025-01-01'}

cat = Client.open(url)
search = cat.search(**params)
items = search.item_collection()

Is there any preference for using the 'stac' vs 'cloudstac' url to access HLS specifically? If you have any adjustments or best practices that would help on the server end, would certainly appreciate it!

@YazidZidane
Copy link

Hi, I'm also facing this error this morning, here's the error log:

[2025-01-28 14:53:24,829: ERROR/ForkPoolWorker-3] Task create_task[02d09c2a-2382-4477-b256-dcba7c01c814] raised unexpected: APIError('{"errors":["Oops! Something has gone wrong. We have been alerted and are working to resolve the problem. Please try your request again later."]}')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/src/app/worker.py", line 40, in create_task
    nr_images, dates, bbox, epsg, data, mask = infc.ndvi_images(task_geometry, date_range=date_range, max_cloud_cover=max_cloud_cover, update_func=self.update_state)
  File "/usr/src/app/hlstools/scripts/interface.py", line 45, in ndvi_images
    colls = self.cat.search_tiles(geometry, date_range=dates)
  File "/usr/src/app/hlstools/scripts/catalog.py", line 96, in search_tiles
    all_items = results.get_all_items()
  File "/usr/local/lib/python3.9/site-packages/pystac_client/item_search.py", line 711, in get_all_items
    feature_collection = self.get_all_items_as_dict()
  File "/usr/local/lib/python3.9/site-packages/pystac_client/item_search.py", line 688, in get_all_items_as_dict
    for page in self._stac_io.get_pages(
  File "/usr/local/lib/python3.9/site-packages/pystac_client/stac_api_io.py", line 220, in get_pages
    page = self.read_json(url, method=method, parameters=parameters)
  File "/usr/local/lib/python3.9/site-packages/pystac/stac_io.py", line 205, in read_json
    txt = self.read_text(source, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pystac_client/stac_api_io.py", line 97, in read_text
    return self.request(href, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pystac_client/stac_api_io.py", line 144, in request
    raise APIError.from_response(resp)
pystac_client.exceptions.APIError: {"errors":["Oops! Something has gone wrong. We have been alerted and are working to resolve the problem. Please try your request again later."]}

Similarly, this seems to happen randomly over time. The url I'm using is also https://cmr.earthdata.nasa.gov/stac/LPCLOUD and collections is ["HLSS30.v2.0"]

@ZZMitch
Copy link

ZZMitch commented Jan 28, 2025

Just to add on, I have also seen the CMR Search rate exceeded come up from time to time when accessing HLS L30/S30 from 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'.

I have only recently returned to accessing HLS from STAC - and have not been keeping a close eye on updates over the last couple of months. But will watch this space and the Earthdata forums! I have found that restarting the kernel sometimes allows me to continue my processing (I iterate through spatial tiles, so can pick up where I left off pretty easily), though other times the error comes up again right away.

Tracks back to:

catalog.search(bbox, datetime = f'{start}/{end}', collections = ['HLSL30_2.0'], limit = 100).item_collection()

Where bbox is the current tile coordinates I am processing (usually 60 x 60 km2), along with a given datetime range (usually I do a year at a time, currently grabbing 2024 data). catalog comes from: pc.Client.open('https://cmr.earthdata.nasa.gov/stac/LPCLOUD') (pc is pystac_client).

The useful part of the error:

pystac_client.exceptions.APIError: <html>
  <head>
    <title>Request Limit Exceeded</title>
  </head>
  <body>
   <p>CMR Search rate exceeded. Please refer to the following for guidance: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#request-moderation
</p>
  </body>
</html>

Is there a way to see how close we are to hitting the throttling rules mentioned in https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#request-moderation?

@ircwaves
Copy link

@ZZMitch -- That document mentions a retry-after header value being returned from CMR. I wonder if cmr-stac is forwarding that over? With that, you might be able to use the urllib3 Retry behavior (as in this example) in pystac-client, to avoid having to manually restart your code.

Again, that's assuming the value is propagated.

@eudoroolivares2016
Copy link
Contributor

@waltersdan There is not a preference between cloudstac vs stac the former is limiting the assets returned to those which are cloud_hosted see https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-cloud-hosted for more information. I would suggest just using stac endpoint though

@ZZMitch CMR does not have an API to query against the limit to see how close someone is to hitting the throttling rules I asked about it, it is not something on their roadmap to implement. One note is that not all the rules are per user some are sheer volume so unfortunately that means that it is possible that a minority of users can lead to service throttled for other users (Again this is a consequence of needing to protect the CMR API more broadly)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants