Skip to content

Update the routing logic based on recent changes #9307

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 12 commits into from
Sep 3, 2021

Conversation

hross
Copy link
Contributor

@hross hross commented Aug 25, 2021

Why:

We have updated the routing logic for runners and want to make it clear in the docs.

What's being changed:

Routing logic for self hosted runners documentation.

Check off the following:

  • I have reviewed my changes in staging (look for the latest deployment event in your pull request's timeline, then click View deployment).
  • For content changes, I have completed the self-review checklist.

Writer impact (This section is for GitHub staff members only):

  • This pull request impacts the contribution experience
    • I have added the 'writer impact' label
    • I have added a description and/or a video demo of the changes below (e.g. a "before and after video")

@welcome
Copy link

welcome bot commented Aug 25, 2021

Thanks for opening this pull request! A GitHub docs team member should be by to give feedback soon. In the meantime, please check out the contributing guidelines.

@hross hross requested a review from TingluoHuang August 25, 2021 10:28
@github-actions github-actions bot added the triage Do not begin working on this issue until triaged by the team label Aug 25, 2021
@hross hross requested a review from martin389 August 25, 2021 10:30
@ramyaparimi ramyaparimi added content This issue or pull request belongs to the Docs Content team waiting for review Issue/PR is waiting for a writer's review and removed triage Do not begin working on this issue until triaged by the team labels Aug 25, 2021
@ramyaparimi
Copy link
Contributor

@hross
Thanks so much for opening a PR! I'll get this triaged for review ⚡

@@ -74,5 +74,5 @@ When routing a job to a self-hosted runner, {% data variables.product.prodname_d
2. The job is then sent to the first matching runner that is online and idle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% data variables.product.prodname_dotcom %} first searches for an online and enabled runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}.

  • If we don't find an online and enabled runner at any level, the job is queued to all levels and wait for any runner from any level to come online and pickup the job.
    • If the job remains queued for more than 24 hours, the job will fail.
  • If we find an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner.
    • 60 seconds after sending the job, if the job is not picked up by the preferred runner, we will try to send the same job to all other levels as well.
    • If the job remains queued for more than 24 hours, the job will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TingluoHuang, I've updated the draft accordingly ⚡

@@ -71,8 +71,7 @@ These labels operate cumulatively, so a self-hosted runner’s labels must match
When routing a job to a self-hosted runner, {% data variables.product.prodname_dotcom %} looks for a runner that matches the job's `runs-on` labels:

1. {% data variables.product.prodname_dotcom %} first searches for a runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}.
- If no online runner is found, the job will be queued to all levels and whichever level first has an online and availabile runner will pick up the job.
Copy link
Member

@TingluoHuang TingluoHuang Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If no online runner and enabled is found, the job will be queued to all levels and whichever level first has an online and enabled runner will pick up the job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to draft 👍

@martin389 martin389 self-assigned this Aug 26, 2021
@martin389 martin389 requested a review from TingluoHuang August 27, 2021 01:01
@martin389
Copy link
Contributor

Thanks @TingluoHuang -- I've updated the draft with your comments, and this is ready for another review 👍

- If the job remains queued for more than 24 hours, the job will fail.
- If {% data variables.product.prodname_dotcom %} finds an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner.
- If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need to add any more detail after we send the job to all levels.

If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well and waits for any runner from any level to come online and pickup the job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean that send the same job to all other levels as well.? Is it the same behavior as the earlier description of the job is queued to all levels and waits for any runner from any level to come online and pickup the job.?

Or, is this the same as saying something like: "If the runner doesn't pick up the assigned job within 60 seconds, GitHub starts searching again for an online and enabled runner at all levels."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't search for an online and enabled runner after 60 seconds, we queue the job to all levels and wait for a label matched runner from one of the levels that comes online/enable to pick up the job

TingluoHuang
TingluoHuang previously approved these changes Aug 27, 2021
Copy link
Contributor

@lucascosti lucascosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked some questions and made a suggestion to make this a little clearer.

@TingluoHuang When did these change come in to effect? I assume it won't be included in GHES 3.2?

@martin389 We'll probably need to keep the old description for the GHES versions the new one doesn't apply to yet.

- If the job remains queued for more than 24 hours, the job will fail.
- If {% data variables.product.prodname_dotcom %} finds an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner.
- If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean that send the same job to all other levels as well.? Is it the same behavior as the earlier description of the job is queued to all levels and waits for any runner from any level to come online and pickup the job.?

Or, is this the same as saying something like: "If the runner doesn't pick up the assigned job within 60 seconds, GitHub starts searching again for an online and enabled runner at all levels."?

- If all matching runners are offline, the job will queue at the level with the highest number of matching offline runners.
- If there are no matching runners at any level, the job will fail.
- {% data variables.product.prodname_dotcom %} first searches for an online and enabled runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}.
- If {% data variables.product.prodname_dotcom %} doesn't find an online and enabled runner at any level, the job is queued to all levels and waits for any runner from any level to come online and pickup the job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TingluoHuang In the previous description, we said that If there are no matching runners at any level, the job will fail.. With this new behavior, if there are no runners configured at any level that match the specified labels for the job, will the job be queued and wait 24 hours before failing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job will be queued and wait for 24 hours before failing. Within 24 hours, any label matched runner from any level (repo/org/enterprise) that comes online can pick up the job

@TingluoHuang
Copy link
Member

@lucascosti the change is NOT in GHES 3.2

@lucascosti lucascosti self-assigned this Sep 1, 2021
@lucascosti
Copy link
Contributor

Ok wording is ready for review:

https://docs-9307--hross-update-assign.herokuapp.com/en/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow#routing-precedence-for-self-hosted-runners

@TingluoHuang / @hross could you please confirm its accuracy?

I've opened a docs-engineering issue internally to look at the check that is failing.

@@ -70,9 +70,17 @@ These labels operate cumulatively, so a self-hosted runner’s labels must match

When routing a job to a self-hosted runner, {% data variables.product.prodname_dotcom %} looks for a runner that matches the job's `runs-on` labels:

1. {% data variables.product.prodname_dotcom %} first searches for a runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}.
{% ifversion fpt or ghes > 3.2 or ghae %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior is not on for GHAE M1, not sure whether that matters to the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Hmm, ok; I'll edit this for -next

- If the runner doesn't pick up the assigned job within 60 seconds, the job is queued at all levels and waits for a matching runner from any level to come online and pick up the job.
- If {% data variables.product.prodname_dotcom %} doesn't find an online and idle runner at any level, the job is queued to all levels and waits for a matching runner from any level to come online and pick up the job.
- If the job remains queued for more than 24 hours, the job will fail.
{% elsif ghes < 3.3 %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<= 3.2 ? 😆 I saw the linter error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, unfortunately, we can't use <= or >= in our liquid helper 🙁

TingluoHuang
TingluoHuang previously approved these changes Sep 1, 2021
@lucascosti lucascosti enabled auto-merge (squash) September 3, 2021 06:50
@lucascosti lucascosti merged commit 49a9224 into main Sep 3, 2021
@lucascosti lucascosti deleted the hross-update-assign-logic branch September 3, 2021 06:59
@github-actions
Copy link
Contributor

github-actions bot commented Sep 3, 2021

Thanks very much for contributing! Your pull request has been merged 🎉 You should see your changes appear on the site in approximately 24 hours. If you're looking for your next contribution, check out our help wanted issues

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
content This issue or pull request belongs to the Docs Content team waiting for review Issue/PR is waiting for a writer's review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants