Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[fix][fn] Fix Deadlock in Functions Worker LeaderService #21711

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

Technoboy-
Copy link
Contributor

Fixes #21501

Motivation

No need to synchronized the method isLeader in LeaderService

See the deadlock stack :

"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@Technoboy- Technoboy- self-assigned this Dec 12, 2023
@Technoboy- Technoboy- added this to the 3.2.0 milestone Dec 12, 2023
@Technoboy- Technoboy- changed the title [fix][function] Fix Deadlock in Functions Worker LeaderService [fix][fn] Fix Deadlock in Functions Worker LeaderService Dec 12, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 12, 2023
@Technoboy- Technoboy- requested a review from freeznet December 12, 2023 02:49
@codecov-commenter
Copy link

Codecov Report

Merging #21711 (b0f8772) into master (495b141) will increase coverage by 36.63%.
Report is 2 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21711       +/-   ##
=============================================
+ Coverage     36.75%   73.39%   +36.63%     
- Complexity    12271    32761    +20490     
=============================================
  Files          1717     1893      +176     
  Lines        131197   140714     +9517     
  Branches      14339    15502     +1163     
=============================================
+ Hits          48220   103270    +55050     
+ Misses        76596    29335    -47261     
- Partials       6381     8109     +1728     
Flag Coverage Δ
inttests 24.10% <100.00%> (-0.01%) ⬇️
systests 24.79% <100.00%> (+0.01%) ⬆️
unittests 72.67% <100.00%> (+40.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../apache/pulsar/functions/worker/LeaderService.java 82.66% <100.00%> (+8.00%) ⬆️

... and 1463 files with indirect coverage changes

@liangyepianzhou liangyepianzhou merged commit 3396065 into apache:master Dec 12, 2023
Technoboy- added a commit that referenced this pull request Dec 14, 2023
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
Technoboy- added a commit that referenced this pull request Jan 3, 2024
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
Technoboy- added a commit that referenced this pull request Jan 3, 2024
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 4, 2024
Fixes apache#21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```

(cherry picked from commit ac11655)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 8, 2024
Fixes apache#21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```

(cherry picked from commit ac11655)
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Deadlock in Functions Worker service at shutdown in tests
6 participants