-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: filter GitHub workflows via query parameter for better queue count accuracy #6519
Conversation
I've run into the same issue where I have a workflow that has >30 jobs, but looking at the Github REST Api, I don't see any filters for status. Maybe a better fix would be to support pagination and just fetch enough pages until |
@vogonistic Actually, there is a status filter on the /actions/runs API endpoint. This is the call I'm trying to change in this PR which should solve the issue. The API you're referring to is /actions/runs/{run_id}/jobs. That one doesn't have a filter, but we don't need it anyway since it's unlikely that a single run has more than 30 jobs. I think adding this query parameter should work, but I'll need some time to understand and fix the failing tests. |
…unt accuracy Signed-off-by: silviu-dinu <silviudn@gmail.com>
@silviu-dinu Nice! I missed that you switched API. While this solved my problem, it does add a known limitation that others might have an issue with and should probably either be documented or solved with pagination as well. |
@vogonistic Not sure I understand. Which known limitation is the change in this PR adding? |
I’m talking about this part:
I’m guessing someone thought the same in the initial implementation, but I’ve spent several workdays trying to figure out where the problem stems from. So if it’ll never return more than 30 queued + 30 in_progress, it’s worth documenting in my opinion. Am I misunderstanding how it’ll work? |
@vogonistic You're right, it will only scale up to 30 + 30 runners maximum, but that is per each call cycle. KEDA would keep calling GitHub (i.e., every minute) and should spin more runners if it still finds pending jobs. So, I don't see an issue with this, except maybe a slight delay when there are lots of jobs pending depending on the configured loop interval.
I also increased the job page size query parameter to the maximum value |
Signed-off-by: Silviu Dinu <silviudn@gmail.com>
Signed-off-by: silviu-dinu <silviudn@gmail.com>
Signed-off-by: silviu-dinu <silviudn@gmail.com>
If someone is using KEDA at scale, this could be a significant limitation :( As the API looks as paginated, what about browsing the pages? |
Signed-off-by: Silviu Dinu <silviudn@gmail.com>
@JorTurFer Thanks for your inputs! I understand the concern, but the main intent behind this PR is improving the count accuracy and fixing the scenario when jobs are not getting picked up at all when they are not within the last 30 workflows. We currently have an integration between GitHub and Microsoft Azure Container Apps which uses KEDA under the hood and we see jobs getting stuck frequently due to this issue. So maybe this fix can make it into Microsoft's product eventually. The proposed improvement that should fix jobs getting stuck forever consists of the following:
The scale can be tuned currently by setting the pollingInterval when configuring this particular scaler. In theory, setting this interval to 1 second, would result in spinning up to Pagination may improve this speed slightly, but I see other challenges with it. For example:
At the moment, I don't really see a viable way to overcome this limit for this particular scaler, until GitHub introduces a new API or enhances the existing one to fetch pending jobs filtered by labels in a single call. |
@silviu-dinu thanks for the summary, it does make sense to me and I think that at the moment it is a great fix. What we can do is to document this behavior, @silviu-dinu could you please open a docs PR to improve the documentation of the scaler and discuss this specific issue (basically what you proposed with the @JorTurFer WDYT? |
yeah, thanks for the explanation! Lets just document the current situation and go ahead |
@zroubalik Thanks for reviewing!
Sure, no problem. I raised this PR for updating the docs as suggested: kedacore/keda-docs#1535 |
/run-e2e github |
The current GitHub runner scaler implementation tries to determine the workflow queue length by fetching the latest 30 workflow runs via an API request to GitHub and then filtering the results by specific statuses (queued, in_progress) at the client side. This can be inaccurate when there are queued workflows older than the latest 30 items (the default limit of GitHub API) and results in queued jobs not being picked up. This issue usually manifests when re-running older jobs.
The proposed solution is to filter workflows on the server side by using the
?status=queued/in_progress
query parameter for the /actions/runs API call. Additionally, setting?per_page=100
(maximum value) when calling /actions/runs/{run_id}/jobs API, instead of the default limit of 30.Checklist
Fixes #6519