-
Notifications
You must be signed in to change notification settings - Fork 61.7k
Update the routing logic based on recent changes #9307
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Thanks for opening this pull request! A GitHub docs team member should be by to give feedback soon. In the meantime, please check out the contributing guidelines. |
@hross |
@@ -74,5 +74,5 @@ When routing a job to a self-hosted runner, {% data variables.product.prodname_d | |||
2. The job is then sent to the first matching runner that is online and idle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{% data variables.product.prodname_dotcom %} first searches for an online and enabled runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}.
- If we don't find an online and enabled runner at any level, the job is queued to all levels and wait for any runner from any level to come online and pickup the job.
- If the job remains queued for more than 24 hours, the job will fail.
- If we find an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner.
- 60 seconds after sending the job, if the job is not picked up by the preferred runner, we will try to send the same job to all other levels as well.
- If the job remains queued for more than 24 hours, the job will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TingluoHuang, I've updated the draft accordingly ⚡
@@ -71,8 +71,7 @@ These labels operate cumulatively, so a self-hosted runner’s labels must match | |||
When routing a job to a self-hosted runner, {% data variables.product.prodname_dotcom %} looks for a runner that matches the job's `runs-on` labels: | |||
|
|||
1. {% data variables.product.prodname_dotcom %} first searches for a runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}. | |||
- If no online runner is found, the job will be queued to all levels and whichever level first has an online and availabile runner will pick up the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If no online runner and enabled is found, the job will be queued to all levels and whichever level first has an online and enabled runner will pick up the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to draft 👍
Thanks @TingluoHuang -- I've updated the draft with your comments, and this is ready for another review 👍 |
- If the job remains queued for more than 24 hours, the job will fail. | ||
- If {% data variables.product.prodname_dotcom %} finds an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner. | ||
- If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we need to add any more detail after we send the job to all levels.
If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well and waits for any runner from any level to come online and pickup the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean that send the same job to all other levels as well.
? Is it the same behavior as the earlier description of the job is queued to all levels and waits for any runner from any level to come online and pickup the job.
?
Or, is this the same as saying something like: "If the runner doesn't pick up the assigned job within 60 seconds, GitHub starts searching again for an online and enabled runner at all levels."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't search for an online and enabled runner after 60 seconds, we queue the job to all levels and wait for a label matched runner from one of the levels that comes online/enable to pick up the job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked some questions and made a suggestion to make this a little clearer.
@TingluoHuang When did these change come in to effect? I assume it won't be included in GHES 3.2?
@martin389 We'll probably need to keep the old description for the GHES versions the new one doesn't apply to yet.
- If the job remains queued for more than 24 hours, the job will fail. | ||
- If {% data variables.product.prodname_dotcom %} finds an online and enabled runner (preferred runner) at a certain level, the job is then sent to the preferred runner. | ||
- If the job is not picked up by the preferred runner within 60 seconds after sending the job, {% data variables.product.prodname_dotcom %} will try send the same job to all other levels as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean that send the same job to all other levels as well.
? Is it the same behavior as the earlier description of the job is queued to all levels and waits for any runner from any level to come online and pickup the job.
?
Or, is this the same as saying something like: "If the runner doesn't pick up the assigned job within 60 seconds, GitHub starts searching again for an online and enabled runner at all levels."?
- If all matching runners are offline, the job will queue at the level with the highest number of matching offline runners. | ||
- If there are no matching runners at any level, the job will fail. | ||
- {% data variables.product.prodname_dotcom %} first searches for an online and enabled runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}. | ||
- If {% data variables.product.prodname_dotcom %} doesn't find an online and enabled runner at any level, the job is queued to all levels and waits for any runner from any level to come online and pickup the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TingluoHuang In the previous description, we said that If there are no matching runners at any level, the job will fail.
. With this new behavior, if there are no runners configured at any level that match the specified labels for the job, will the job be queued and wait 24 hours before failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job will be queued and wait for 24 hours before failing. Within 24 hours, any label matched runner from any level (repo/org/enterprise) that comes online can pick up the job
content/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow.md
Outdated
Show resolved
Hide resolved
@lucascosti the change is NOT in GHES 3.2 |
Ok wording is ready for review: @TingluoHuang / @hross could you please confirm its accuracy? I've opened a docs-engineering issue internally to look at the check that is failing. |
@@ -70,9 +70,17 @@ These labels operate cumulatively, so a self-hosted runner’s labels must match | |||
|
|||
When routing a job to a self-hosted runner, {% data variables.product.prodname_dotcom %} looks for a runner that matches the job's `runs-on` labels: | |||
|
|||
1. {% data variables.product.prodname_dotcom %} first searches for a runner at the repository level, then at the organization level{% ifversion ghes or ghae %}, then at the enterprise level{% endif %}. | |||
{% ifversion fpt or ghes > 3.2 or ghae %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this behavior is not on for GHAE M1, not sure whether that matters to the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Hmm, ok; I'll edit this for -next
- If the runner doesn't pick up the assigned job within 60 seconds, the job is queued at all levels and waits for a matching runner from any level to come online and pick up the job. | ||
- If {% data variables.product.prodname_dotcom %} doesn't find an online and idle runner at any level, the job is queued to all levels and waits for a matching runner from any level to come online and pick up the job. | ||
- If the job remains queued for more than 24 hours, the job will fail. | ||
{% elsif ghes < 3.3 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<= 3.2
? 😆 I saw the linter error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, unfortunately, we can't use <=
or >=
in our liquid helper 🙁
Thanks very much for contributing! Your pull request has been merged 🎉 You should see your changes appear on the site in approximately 24 hours. If you're looking for your next contribution, check out our help wanted issues ⚡ |
Why:
We have updated the routing logic for runners and want to make it clear in the docs.
What's being changed:
Routing logic for self hosted runners documentation.
Check off the following:
Writer impact (This section is for GitHub staff members only):