Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Take ACCESS EXCLUSIVE LOCK carefully #8

Merged
merged 3 commits into from
Aug 18, 2021
Merged

Conversation

za-arthur
Copy link

Try to get the lock in infinite loop with timeouts. This allows
to not stuck lock queue for readers if pg_repack cannot take the
lock fast enough.

Links:

Try to get the lock in infinite loop with timeouts. This allows
to not stuck lock queue for readers if pg_repack cannot take the
lock fast enough.

Links: https://adjustcom.atlassian.net/browse/PAD-234
@za-arthur za-arthur added the enhancement New feature or request label Aug 3, 2021
@za-arthur za-arthur self-assigned this Aug 3, 2021
@nick-adjust
Copy link

Would it perhaps make sense to first acquire an EXCLUSIVE lock and only then escalate it to ACCESS EXCLUSIVE?
The idea is to first wait (preferably without timeouts) until all the writers are out (and prevent new ones from starting if there is no timeouts).
And then one can try and acquire ACCESS EXCLUSIVE lock in a subtransaction with the same logic as above.

This way a long-running merge wont make readers wait unnecessarily and we will keep all the benefits of the variant in this PR

@za-arthur
Copy link
Author

za-arthur commented Aug 4, 2021

@nick-adjust I think I understand what you propose. But what problem will it solve?

I think the current PR also won't make readers and writers wait unnecessary. And the current problem is that long running query blocks pg_repack from getting ACCESS EXCLUSIVE lock, which blocks locking queue.

If we get EXCLUSIVE lock first it won't solve the issue. Because eventually we need to get ACCESS EXCLUSIVE lock.

PS: Actually lock_exclusive_wait is implemented in a bit wrong way, I need to fix it.

@nick-adjust
Copy link

@nick-adjust I think I understand what you propose. But what problem will it solve?

I mean a case where we have a long-running MERGE and we're trying to acquire an ACCESS EXCLUSIVE lock right away: we keep it for 1s and release it, making every single read query that comes after use wait for 1s. This can be eliminated by first trying to acquire an EXCLUSIVE lock, which won't block readers at all, but will conflict with writers. So the effect is:

  1. If there is a long running writer, we won't hinder readers unnecessarily
  2. If we've acquired EXCLUSIVE lock, there is a guarantee that another (potentially long-running) writer won't acquire a conflicting lock while we're trying to escalate the lock to ACCESS EXCLUSIVE, so the acquisition of AE-lock is more deterministic
  3. If there is no writers at all, this has little to no effect on the overall state of the system (with the exception that there is a possibly long running open transaction)

I hope above points make sense. Otherwise questions are welcome.

Also I don't insist on this being implemented in the PR, but this is something that is definitely worth keeping in mind and its worth observing the kpi-service response times once this patch is in production.

@za-arthur
Copy link
Author

za-arthur commented Aug 6, 2021

Thanks for the clarification! Indeed it makes sense to call EXCLUSIVE first.

There is a problem with implementing it because pg_repack doesn't know what lock query it executes. I'm not sure why it is implemented in this way, probably it is done to be able to configure queries by clients. See the line:
https://github.com/adjust/pg_repack/blob/master/bin/pg_repack.c#L1821
And these queries are defined in the view repack.tables:
https://github.com/adjust/pg_repack/blob/master/lib/pg_repack.sql.in#L261

It can take longer time to implement than fixing the initial issue because it will be necessary to consider repack.tables view. And probably it will make sense to have additional PR for this.

We use `lock_exclusive()` to take an exclusive lock. It tries to
get the lock in a short time.
@za-arthur
Copy link
Author

I pushed different commit. Now pg_repack tries to get ACCESS EXCLUSIVE lock by lock_exclusive() function. The functions tries to get the lock in a loop by short queries. If wait-timeout is set then it can fail, in that case pg_repack won't cleanup leftovers.

Copy link

@mrimbault mrimbault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Thank you!

@scorpevans
Copy link

I pushed different commit. Now pg_repack tries to get ACCESS EXCLUSIVE lock by lock_exclusive() function. The functions tries to get the lock in a loop by short queries. If wait-timeout is set then it can fail, in that case pg_repack won't cleanup leftovers.

I agree with @nick-adjust that we should try to get Exclusive lock first, then escalate afterwards.
I didn't follow your comment on why we can't easily inject this: #8 (comment) .

@za-arthur
Copy link
Author

I didn't follow your comment on why we can't easily inject this: #8 (comment) .

I updated the second link which was wrong.

@za-arthur
Copy link
Author

I'll merge this and create separate PR for escalating EXCLUSIVE lock.

@za-arthur za-arthur merged commit 7513317 into master Aug 18, 2021
@za-arthur za-arthur deleted the pad-234-lock-carefully branch August 18, 2021 12:03
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants