Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clarification of how to handle errors that occur in a subscription event stream #995

Open
Yogu opened this issue Oct 19, 2022 · 2 comments

Comments

@Yogu
Copy link
Contributor

Yogu commented Oct 19, 2022

The GraphQL specification is pretty explicit in regards to handling of errors that occur within a field resolver. It does not say a lot about errors that occur while generating the events of a subscription field. Is this an intentional omission with the intention that implementations come up with their own strategies for error handling, or should this be specified in more detail?

I asked this question two days ago in the Discord server in #graphql-spec, and @benjie suggested to create an issue, so here it is.

Event stream errors in the spec

Section 6.2.3 defines "event streams" and mentions the possibility of errors:

Event streams may complete in response to an error or simply because no more events will occur.

I did not find any other reference to errors in regards to event streams or their usage in subscriptions. This makes me wonder why the specification mentions these two cases (normal completion vs. completion due ton an error). It could be interpreted as "if an error occurs, the event stream should be completed", or it could mean "if an error occurs, the event stream may be completed", or it could just be an non-normative hint.

Section 5.2.3.2 defines the operation defines an operation to convert a source stream to a response stream. (by the way, why is it called MapSourceToResponseEvent and not MapSourceStreamToResponseStream?) This operation does not mention errors, but it defines that the response stream should be completed when the source stream completes. Section 5.2.3 defines Subscribe which uses MapSourceToResponseEvent, but just returns it. This then bubbles up to the introduction of section 6 where it states that the result should be "formatted according to the Response section below". Section 7 (the "below") does not mention streams or subscriptions, so it does not further define how a completion would be formatted.

Field resolver errors in the spec

In contrast, for errors that occur in resolvers, there is a whole subsection in the specification: 6.4.4 "Handling Field Errors". It defines where an error can occur, it defines what should happen, and 7.1.2 Errors defines how the errors should be serialized.

Comparing field resolver errors and event stream errors

Field resolvers and field streams are very similar in the specification. ResolveFieldValue "calls" the "internal function provided by objectType for determining the resolved value of a field named fieldName". ResolveFieldEventStream "calls" the "internal function provided by subscriptionType for determining the resolved event stream of a subscription field named fieldName". Both are intended to be implemented by users of the graphql implementation. This is application code, so so they can produce all kinds of unforseen errors.

The "internal function provided by subscriptionType for determining the resolved event stream of a subscription field" could produce errors in two ways:

a. The function itself could raise an error
b. The generated "resolved event stream" could experience an error while it is generating events.

I think both of these error scenarios are comparable to an error thrown by the "internal function provided by objectType for determining the resolved value of a field". However, only errors of field resolvers are "caught" in the specification (please correct me if I'm wrong). Especially handling of errors generated like described in b. is not described.

What implementations and graphql servers can do

Errors raised while calling the "internal function provided by subscriptionType for determining the resolved event stream of a subscription field" could be handled like describe din 6.4.4 Handling Field Errors. I think this is pretty reasonable, and I guess there are implementations that already do this. (Maybe it's already specced like this and I just don't understand it)

Errors that occur during the execution of an event stream are a bit more complicated.

  • Servers could just ignore the errors and retry forever. However, this would take up resources, and the clients would not know what is going on
  • Servers could complete the event stream. This would free up resources and the client knows that something is going on. However, it can no longer distinguish an error state from the case that there are simply no more events to come. So it would not know whether it should retry.
  • The type of the subscription field could be changed into an union type to include status updates aside from the regular data. This is probably a good choice for expected errors. However, extending each and every subscription field with a union only to cover error cases clutters up the schema. Note that the server might decide against communicating the kind of error, because it could be some internal error reaching origin servers, so the clients would not even benefit from an error definition in the schema. Also, there might be intermediate GraphQL servers that just proxy the subscription to another graphql server without changing data or schema. The union approach would not work for them.
  • The server could close the underlying transport that is used for the subscription (e.g. a WebSocket). However, this would also affect other operations that are using the same transport.
  • Wire protocols for graphql subscriptions could implement a way to signal errors of a single operation. If this is expected from wire protocols, I think it would be good to mention it in the spec that this part is intentionally left to wire protocols.

Summary

I think it would be great if the specification specified error handling. Alternatively, I it could also clarify that it is up to implementations and wire protocols to provide this. Currently, these might omit error handling and say "working as specified, we can't do anything".

@aseemk
Copy link

aseemk commented Dec 20, 2023

Hi there. I want to +1 this issue, as we're running into a difficult challenge here.

We're using graphql-js and serving subscriptions over WebSocket via graphql-ws, and graphql-js's implementation (and interpretation) of the spec is to simply re-throw any errors that occur iterating over the source event stream:

https://github.com/graphql/graphql-js/blob/8a95335f545024c09abfa0f07cc326f73a0e466f/src/execution/__tests__/subscribe-test.ts#L1043

(Notice in this example how this isn't an error creating the source event stream. The source event stream is successfully created, and a first source event is even yielded. An error is thrown trying to pull the second source event.)

graphql-ws (by design) doesn't try/catch/handle thrown errors from graphql-js, so the entire underlying WebSocket connection is closed on these errors: enisdenjo/graphql-ws#333

As @Yogu has noted above, this is obviously really bad for us. This interrupts every other subscription the client may be subscribed to at that moment, adds reconnection overhead, drops events, etc. And if we're experiencing some downtime on a specific subscription/source stream, this'll result in repeat disconnect-reconnect thrash, because the client also has no signal on which subscription has failed!!

We can change our code to:

  • Have our source event stream iterator never throw (i.e. try/catch every iteration ourselves)
  • Have it instead always return a wrapper type, mimicking {data, errors}
  • Unwrap this wrapper type in our response event stream resolver (mapper)
  • And have this resolver throw any errors or return data if no errors

graphql-js does try/catch thrown errors like this from response event stream resolvers:

https://github.com/graphql/graphql-js/blob/8a95335f545024c09abfa0f07cc326f73a0e466f/src/execution/__tests__/subscribe-test.ts#L976

And translates them into a {data: null, errors: ...} response event (which is what we want).

Doing this would obviously be pretty manual, though, and we'd have to do it for every subscription we have.

As noted in @Yogu's comment above, the GraphQL spec is explicit about handling errors when resolving source events to response events (as graphql-js implements above).

graphql-js also does try/catch and gracefully handle errors thrown when creating source event streams (even though it's unclear to me how explicit the spec is here too):

https://github.com/graphql/graphql-js/blob/8a95335f545024c09abfa0f07cc326f73a0e466f/src/execution/__tests__/subscribe-test.ts#L514

So it's only iterating over source event streams — the "middle" step of the algorithm — where graphql-js doesn't handle errors gracefully.

I'm unclear if this is intentional in the spec or not, but it doesn't feel like good/desirable behavior to me.

Would it be possible to clarify this in the spec, and ideally, gracefully handle and return {data: null, errors: ...} response events whenever the source stream throws an error too?

Thank you for the consideration!

@benjie
Copy link
Member

benjie commented Mar 4, 2024

For this issue to get more attention, I suggest you bring it to a GraphQL Working Group; the next one is on Thursday: https://github.com/graphql/graphql-wg/blob/main/agendas/2024/03-Mar/07-wg-primary.md

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants