Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow for other campuses to claim expired jobs. #44

Merged
merged 2 commits into from
Apr 28, 2023
Merged

Conversation

Foxcapades
Copy link
Member

Description

Allow for other campuses to claim expired jobs.

Tested locally by

  • running a job
  • expiring that job
  • deleting the local DB so that the job in MinIO is "unowned"
  • attempting to rerun the now unowned job

Changes

  • When looking up a job, if the status in MinIO does not align with the last known status from the internal DB, remove the job from the internal DB as another campus has claimed and is operating on the job.
  • When submitting a job, if the job already exists and is owned by another campus, but is expired, claim it and run anyway.

PR Checklist

  • Updated relevant source docs
  • Updated readme / docs
  • Updated dependencies

@Foxcapades Foxcapades added the enhancement New feature or request label Mar 17, 2023
@Foxcapades Foxcapades requested a review from dmgaldi March 17, 2023 13:51
@Foxcapades Foxcapades self-assigned this Mar 17, 2023
@Foxcapades Foxcapades linked an issue Mar 17, 2023 that may be closed by this pull request
@Foxcapades Foxcapades requested review from ryanrdoherty and removed request for dmgaldi April 27, 2023 13:21
// Throw an exception
throw IllegalStateException("Attempted to submit a job that would overwrite an existing job owned by another campus (${job.jobID})")
throw IllegalStateException("Attempted to submit a job that would overwrite an existing, non-expired job owned by another campus (${job.jobID})")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is an exception here because we should already have returned a non-new, non-expired status in the caller, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the question. This is here to prevent us from starting a non-expired job that is owned by another campus. The caller should verify that submitting a job is a sane action before calling this method, likely by calling getJob() and at least ensuring job ownership (or non-existence).

Maybe this logic should be pulled out to another method like canSubmitJob(jobID): Boolean?

Right now it would be something like:

val job = AsyncPlatform.getJob(...)

if (job == null || job.isOwned)
  AsyncPlatform.submitJob(...)
else
  throw ForbiddenException("refusing to overwrite job owned by another campus")


// If the job already exists
if (exists)
if (existingJob != null && existingJob.owned)
// Reset the job status to queued and update the queue name
QueueDB.markJobAsQueued(job.jobID, queue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this line. Why would we mark as queued here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so this covers our expired and failed cases where we want to restart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh. I think the else below this is not doing what we think any more because of the added conditional? Like if job exists but is not owned, we don't want to add it right? Maybe only if it is expired? Or is that case handled above this call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it exists, and was previously owned by this campus, then there is no need to re-create the job records so we just update the existing record to say queued.

If it doesn't exist, or was not previously owned by this campus, we have to actually create the job record

@ryanrdoherty ryanrdoherty self-requested a review April 28, 2023 17:50
@Foxcapades Foxcapades merged commit af3a71e into main Apr 28, 2023
@Foxcapades Foxcapades deleted the issue-33 branch April 28, 2023 19:57
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When rerun request comes in for expired job
2 participants