Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixed bug: Lost DB connection causes jobs to not be processed anymore. #163

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thorsteneckel
Copy link

Hi there lovely maintainers,

first of all: Thanks for this great gem! It does a great job over at zammad. However, we faced an issue in one of our customers installations. Long story short: The DB connection socket is closed (due to a restarting DB) while Delayed::Job reserves the job. Delayed::Job rescues the raised exception, logs it as INFO(!) and calls the recover_from method on the backend and then may try to re-run the job or skips it. An example log entry looks like this:

I, [2018-11-12T16:56:33.875406 #4559] INFO -- : 2018-11-12T16:56:33+0100: [Worker(host:some_name pid:1337)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-11-12 15:56:33.873572', locked_by = 'host:some_name pid:1337' WHERE id IN (SELECT "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-11-12 15:56:33.872595' AND (locked_at IS NULL OR locked_at < '2018-11-12 11:56:33.872665') OR locked_by = 'host:some_name pid:1337') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *

This gem does not utilize the Delayed::Job recover_from callback yet. This PR changes that to make sure the DB connection is still present after any exception was raised while processing a job. If the DB connection is lost and can't be reestablished a new exception will be raised and have to be handled accordingly.

Sadly I have no clue how to provide tests for this. I tried my best but haven't found out how. Please let me know how you would approach it and I'm happy to add those.

Greetings from Germany 👋

@thorsteneckel
Copy link
Author

The failing TravisCI jobs have different cause than the changes I introduced. Let me know if/how I can help to get this merged. Thanks!

sauy7 added a commit to fishbrain/delayed_job_active_record that referenced this pull request Sep 11, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant