Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Non-deterministic deadlock on concurrent RPC calls #189

Closed
lthibault opened this issue Nov 15, 2021 · 0 comments · Fixed by #225
Closed

Non-deterministic deadlock on concurrent RPC calls #189

lthibault opened this issue Nov 15, 2021 · 0 comments · Fixed by #225
Assignees
Labels
Milestone

Comments

@lthibault
Copy link
Collaborator

As discussed in the Matrix channel, concurrent RPC calls seem to provoke an intermittent deadlock.

I am attaching scratch.zip, which contains a working example to reproduce this behavior. The example was built on commit e33fd9381f266af736f49ab7366c49f0397e3c16 of go-capnproto2 and version 0.9.1 of the capnp tool.

Summarizing a few key points from the Matrix discussion:

  1. This bug also manifests using out-of-process transports, notably TCP to localhost.
  2. The deadlock appears to occur when n>2 concurrent calls are made
    • This suggests a full buffer/channel somewhere
    • The synchronous semantics of net.Pipe likely cause this to happen sooner

@zenhack IFF you have the time, I would be super interested in pair-programming to debug this. This seems like a good opportunity to get more involved in the internals.

@lthibault lthibault added the bug label Nov 15, 2021
@lthibault lthibault added this to the 3.0 milestone Nov 15, 2021
zenhack added a commit to zenhack/go-capnp that referenced this issue Feb 20, 2022
This should fix capnproto#189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.
zenhack added a commit to zenhack/go-capnp that referenced this issue Apr 16, 2022
This should fix capnproto#189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants