Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update grpc-js to 1.10.x #1388

Merged
merged 9 commits into from
Apr 9, 2024
Merged

Update grpc-js to 1.10.x #1388

merged 9 commits into from
Apr 9, 2024

Conversation

mjameswh
Copy link
Contributor

@mjameswh mjameswh commented Apr 5, 2024

What was changed

  • Remove version pin on @grpc/grpc-js, and update it to 1.10.6.
  • Added tests to assert that bugs and concerns regarding grpc-js 1.8+ are no longer pertinent.

Why?

grpc-js had been pinned to 1.7.3 in #1073, as 1.8.x introduced multiple bugs, as well a built-in retry feature, which could possibly conflict with our own retry interceptor.

Recently, there's been security concerns regarding older releases of grpc-js, as well as report of a bug that has been presumably been resolved in recent grpc-js releases.

After more testing, it has been demonstrated that the built-in retry mechanism introduced in grpc-js 1.8 does not conflict with ours. Their retry mechanism is enabled by default, but will only perform so called "transparent retries" unless configured otherwise.

Transparent retries are cases where the gRPC library is able to determine that a request has not been received by the remote end (e.g. if the underlying HTTP/2 channel is being closed following a GOAWAY notification). Those cases are inherently safe to retry, but identifying those require connection level knowledge that is not exposed to our retry interceptor. Consequently, our own retry logic would simply fail to retry in those specific cases.

Resolves #1026.

@mjameswh mjameswh requested a review from a team as a code owner April 5, 2024 21:52
@antlai-temporal
Copy link
Contributor

antlai-temporal commented Apr 5, 2024

Do we have any control of the retry timeout that this library is using, it may be less surprising if we could set it to be the same as our retry, or alternatively, I saw that it would retry 5 times (not configurable) it may make sense if 5*their_timeout < our_timeout ...

@mjameswh
Copy link
Contributor Author

mjameswh commented Apr 8, 2024

Do we have any control of the retry timeout that this library is using, it may be less surprising if we could set it to be the same as our retry, or alternatively, I saw that it would retry 5 times (not configurable) it may make sense if 5*their_timeout < our_timeout ...

No. AFAICS, there's no knob we can tune for transparent retries. It is possible to completely disable that feature, but we really do want it, as that is required to avoid issues in Cloud when the proxy layer sends HTTP2 GOAWAY (we have had a few reports of that lately). Another option would be to set a default deadline, but that would have other undesirable impacts (e.g. deadline would also affect the server's processing of requests).

In fact, there's no backoff/throttling involved in transparent proxies, and no timeout either.
Transparent retry is mostly about local failures, so in many cases, the failure is detected "immediately" (i.e. without network delay), and retrying the request poses no risk of extra load on the network or server. In cases where the failure is at the network- or server-level, the request will be transparently retried only once, and without artificial/extra delay.

In this regard, see the following statements from the gRPC Retry Policy design document (you may want to read this section first for context):

If the RPC reaches the gRPC server library, but has never been seen by the server application logic (the third case), the client library will immediately retry it once. If this fails, then the RPC will be handled by the configured retry policy. This extra caution is needed because this case involves extra load on the wire. In the gRPC HTTP/2 transport, the client library knows it is in this case if the stream ends with an RST_STREAM frame with the error code REFUSED_STREAM or if the HTTP/2 connection closes with a GOAWAY frame with the last stream identifier less than the stream's ID.

Since retry throttling is designed to prevent server application overload, and these transparent retries do not make it to the server application layer, they do not count as failures when deciding whether to throttle retry attempts.

One last thing. The official implementations of gRPC for Java and Go both have had support for transparent retries since 2017; the Node implementation is just catching up with those. And yet, I can't find any trace of issues that would relate to transparent retries in our Java and Go SDKs.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@mjameswh
Copy link
Contributor Author

mjameswh commented Apr 9, 2024

Ignoring CI fail on job "Continuous Integration / build-docs", as deployment to Vercel is known to be broken ATM.

@mjameswh mjameswh merged commit 3b6c925 into main Apr 9, 2024
28 of 29 checks passed
@mjameswh mjameswh deleted the grpcjs-1.10 branch April 9, 2024 12:27
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Chore] Upgrade @grpc/grpc-js to 1.8+
3 participants