Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Custom timeout per task and retry doesn't seem possible #1281

Open
saro2-a opened this issue Jan 10, 2025 · 2 comments
Open

Custom timeout per task and retry doesn't seem possible #1281

saro2-a opened this issue Jan 10, 2025 · 2 comments

Comments

@saro2-a
Copy link

saro2-a commented Jan 10, 2025

I was trying to restart stalled jobs, with custom timeouts.

We have several jobs that depending on the input they can either last 1 minute or 3h, with a uniform distribution. At the time of job submission we know how long it is going to take (more or less), but when I fetch "get_stalled_jobs" it seems the "started_at" of the event might not be retained at the creation of the job:

It is fetched:
SELECT job.id, status, task_name, priority, lock, queueing_lock, args, scheduled_at, queue_name, attempts, max(event.at) started_at

but not retained
https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/manager.py#L175
https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/jobs.py#L77

hence seemingly making the task impossible?

        @self.app.periodic(cron="*/10 * * * *")
        @self.app.task(queueing_lock="retry_stalled_jobs", pass_context=True)
        async def retry_stalled_jobs(context, timestamp):
            stalled_jobs = await self.app.job_manager.get_stalled_jobs(
                nb_seconds=RUNNING_JOBS_MAX_TIME_SECONDS
            )
            # TODO it is currently not possible to have some jobs with custom duration.
            # it needs to be solved at lib level
            for job in stalled_jobs:
                proc_task_max_run_time = job.task_kwargs.get("proc_task_max_run_time")
                if not proc_task_max_run_time or proc_task_max_run_time < now()- {{{ job.started_at ??where to get the start time of the event??}}}:
                    await self.app.job_manager.retry_job(job)

Could we either:

  • support proc_task_max_run_time as a first class parameter (probably preferred)
  • or pass the started_at?

Thank you

@ewjoachim
Copy link
Member

ewjoachim commented Jan 10, 2025

This looks similar to #702 which we wanted to tackle in #740 with heartbeats

EDIT: well, no, timeouts and retrying are different. It's close but not the same. I'll try looking in more details.

@ewjoachim
Copy link
Member

I think you're right in that the manager doesn't git access to the "Events" table. I think what would make the most sense is the ability to inspect the events of a job.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants