Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Seeing DeadLock when the join() on the Deferred times-out at approximately same time as runcallbacks and doCall for Signal callback #4

Closed
dvdreddy opened this issue May 20, 2015 · 6 comments
Assignees

Comments

@dvdreddy
Copy link

Hi,

We are seeing frequent deadlocks happening because the Deferred tries to acquire lock (code-link) due to synchronize(this) when it tries to run the callback chain and Signal (the callback object inside Deferred) also needs to acquire lock (code-link) on itself when it tries to run its callback method to notify the waiter. But inside doJoin method the lock is already acquired on the Signal (code-link)and when you are trying to throw the TimeOutException it will try to convert the Deferred to String which leads to acquiring the lock(code-link) on (this) Deferred. Hope you follow the logic I am also attaching the jstack output so it can make more sense.

Found one Java-level deadlock:
=============================
"pool-88-thread-1":
  waiting to lock monitor 0x00007fa088011518 (object 0x00007fa62e0442f0, a com.stumbleupon.async.Deferred$Signal),
  which is held by "pool-1-thread-520"
"pool-1-thread-520":
  waiting to lock monitor 0x00007fa088011468 (object 0x00007fa62e044240, a com.stumbleupon.async.Deferred),
  which is held by "pool-88-thread-1"

Java stack information for the threads listed above:
===================================================
"pool-88-thread-1":
    at com.stumbleupon.async.Deferred$Signal.call(Deferred.java:1215)
    - waiting to lock <0x00007fa62e0442f0> (a com.stumbleupon.async.Deferred$Signal)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
    - locked <0x00007fa62e044240> (a com.stumbleupon.async.Deferred)
    at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:744)
"pool-1-thread-520":
    at com.stumbleupon.async.Deferred.toString(Deferred.java:1416)
    - waiting to lock <0x00007fa62e044240> (a com.stumbleupon.async.Deferred)
    at java.lang.String.valueOf(String.java:2979)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at com.stumbleupon.async.TimeoutException.<init>(TimeoutException.java:43)
    at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
    - locked <0x00007fa62e0442f0> (a com.stumbleupon.async.Deferred$Signal)
    at com.stumbleupon.async.Deferred.join(Deferred.java:1045)

I am thinking of a couple of ways to fix this, I want your suggestions

  • Any Better way of fixing it than the below two
  • Should we move to a better way for toString in Deferred
  • Should we throw the Exception only outside the synchronized (signal_cb) { (line no : 1133 Deferred.java). This may require to set a bool that timeout happen and come out of synchronize to check status and then throw exception if it is still pending or something like that.
@tsuna tsuna self-assigned this May 20, 2015
tsuna added a commit to tsuna/async that referenced this issue May 20, 2015
@tsuna
Copy link
Member

tsuna commented May 20, 2015

Sigh, toString() acquiring the lock on this has already caused another issue like this. It's embarrassing.

What do you think about this fix: tsuna/async@d5e34d9 – can you confirm that fixes the problem for you?

@tsuna tsuna closed this as completed in 3552d39 May 20, 2015
@tsuna
Copy link
Member

tsuna commented May 20, 2015

Please confirm that the problem is now resolved and I'll push a new version to Maven.

@dvdreddy
Copy link
Author

Will run in prod and confirm tomorrow, Also thanks for the quick fix

@dvdreddy
Copy link
Author

Looks like this resolved the problem, did not see any issues until now. Again thanks for the quick turn around

@dvdreddy
Copy link
Author

Can confirm no issues in high traffic also, this resolved the problem

@tsuna
Copy link
Member

tsuna commented May 20, 2015

I pushed v1.4.1 to Maven. Thanks for the bug report with the excellent analysis.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants