Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix(server): fix deadlock caused by session close/delete #607

Merged
merged 2 commits into from
Feb 21, 2025
Merged

Conversation

mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Feb 17, 2025

Motivation

Avoid calling the session "delete" with locks. This will have the potential to cause deadlocks. the following operation will block leaderController newTerm.

1 @ 0x4773ae 0x452dc5 0x1bb794f 0x1bb08a7 0x1bb0653 0x1bbbbb7 0x1bbbff4 0xad5153 0xac212c 0xad50cd 0x47f661
# labels: {"client-identity":"1472eea4-5cb2-42e5-886b-7057b62033f9", "namespace":"broker", "oxia":"session", "session-id":"23995237", "shard":"2"}
#	0x1bb794e	github.com/streamnative/oxia/server.(*quorumAckTracker).WaitForCommitOffset+0x10e	/src/oxia/server/quorum_ack_tracker.go:181
#	0x1bb08a6	github.com/streamnative/oxia/server.(*leaderController).write+0x1e6			/src/oxia/server/leader_controller.go:799
#	0x1bb0652	github.com/streamnative/oxia/server.(*leaderController).Write+0x32			/src/oxia/server/leader_controller.go:782
#	0x1bbbbb6	github.com/streamnative/oxia/server.(*session).delete+0x7f6				/src/oxia/server/session.go:131
#	0x1bbbff3	github.com/streamnative/oxia/server.(*session).waitForHeartbeats+0x1d3			/src/oxia/server/session.go:176
#	0xad5152	github.com/streamnative/oxia/common.DoWithLabels.func1+0x12				/src/oxia/common/pprof.go:46
#	0xac212b	runtime/pprof.Do+0x8b									/usr/local/go/src/runtime/pprof/runtime.go:51
#	0xad50cc	github.com/streamnative/oxia/common.DoWithLabels+0x34c					/src/oxia/common/pprof.go:42
1 @ 0x4773ae 0x453e65 0x453e34 0x478925 0x48c75d 0x1bbe325 0x1bbe2ff 0x1babd6d 0x1ba87f3 0x9db5eb 0xa448fd 0x9db443 0x97482b 0x9797eb 0x97253f 0x47f661
# labels: {"bind":"[::]:6649", "oxia":"internal"}
#	0x478924	sync.runtime_SemacquireMutex+0x24									/usr/local/go/src/runtime/sema.go:95
#	0x48c75c	sync.(*Mutex).lockSlow+0x15c										/usr/local/go/src/sync/mutex.go:173
#	0x1bbe324	sync.(*Mutex).Lock+0xe4											/usr/local/go/src/sync/mutex.go:92
#	0x1bbe2fe	github.com/streamnative/oxia/server.(*sessionManager).Close+0xbe					/src/oxia/server/session_manager.go:298
#	0x1babd6c	github.com/streamnative/oxia/server.(*leaderController).NewTerm+0x6cc					/src/oxia/server/leader_controller.go:268
#	0x1ba87f2	github.com/streamnative/oxia/server.(*internalRpcServer).NewTerm+0x4b2					/src/oxia/server/internal_rpc_server.go:143
#	0x9db5ea	github.com/streamnative/oxia/proto._OxiaCoordination_NewTerm_Handler.func1+0xca				/src/oxia/proto/replication_grpc.pb.go:207
#	0xa448fc	github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func3+0x7c	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107
#	0x9db442	github.com/streamnative/oxia/proto._OxiaCoordination_NewTerm_Handler+0x142				/src/oxia/proto/replication_grpc.pb.go:209
#	0x97482a	google.golang.org/grpc.(*Server).processUnaryRPC+0xe2a							/go/pkg/mod/google.golang.org/grpc@v1.68.0/server.go:1394
#	0x9797ea	google.golang.org/grpc.(*Server).handleStream+0xe8a							/go/pkg/mod/google.golang.org/grpc@v1.68.0/server.go:1805
#	0x97253e	google.golang.org/grpc.(*Server).serveStreams.func2.1+0x7e						/go/pkg/mod/google.golang.org/grpc@v1.68.0/server.go:1029

full goroutines: oxia-0.txt

@merlimat merlimat merged commit d9a4781 into main Feb 21, 2025
8 checks passed
@merlimat merlimat deleted the deadlock branch February 21, 2025 02:24
mattisonchao added a commit that referenced this pull request Feb 21, 2025
### Motivation

This is the follow-up PR for #607. We should fail the pending request
when `quorumAckTracker` closes.

### Modification

- Introduce a callback
- fail pending request when `quorumAckTracker` closes.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants