Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] Feature Autoscaling: Enable organization runner autoscaler #213

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,9 @@ example-runnerdeploy2475h595fr mumoshu/actions-runner-controller-ci Running
example-runnerdeploy2475ht2qbr mumoshu/actions-runner-controller-ci Running
```

#### Autoscaling
### Autoscaling

#### Repository runners Autoscaling

`RunnerDeployment` can scale the number of runners between `minReplicas` and `maxReplicas` fields, depending on pending workflow runs.

Expand Down Expand Up @@ -241,6 +243,8 @@ The scale out performance is controlled via the manager containers startup `--sy
Additionally, the autoscaling feature has an anti-flapping option that prevents periodic loop of scaling up and down.
By default, it doesn't scale down until the grace period of 10 minutes passes after a scale up. The grace period can be configured by setting `scaleDownDelaySecondsAfterScaleUp`:

Please note that if your `RunnerDeployment` has the `Repository` key set, then do not use the `<owner/repository_name>` notation, only provide the `<repository_name>`.

```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
Expand All @@ -267,6 +271,36 @@ spec:
- summerwind/actions-runner-controller
```

#### Organization runners Autoscaling
To autoscale on an organizational level, you need to remove the `repositoryNames` mapping and leave it empty. The Github Actions API doesn’t offer an endpoint to check the currently queued workflows on an organizational level. The way how the controller tries to get around this is by - after each `sync-period` - select the repositories with the latest `pushed` time and check the Actions workflow queue of those repositories. At the moment, the controller checks the last 10 repositories.

Please note, in case you want to autoscale your organization runners, that you should modify your Github organization permissions accordingly; for instance, if you are using an organization PAT (Personal Access Token), update the permissions of your PAT to allow the controller to list all the repositories under your organization.

An example of scaling your organization runners is shown below:

```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeploy
spec:
template:
spec:
organization: "your-organization-name"
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: example-runnerdeploy-autoscaler
spec:
scaleTargetRef:
name: example-runnerdeploy
minReplicas: 1
maxReplicas: 3
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
```

## Runner with DinD

When using default runner, runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there's `LimitRange` set to namespace.
Expand Down Expand Up @@ -321,7 +355,7 @@ spec:
requests:
cpu: "2.0"
memory: "4Gi"
# If set to false, there are no privileged container and you cannot use docker.
# If set to false, there are no privileged container and you cannot use docker.
dockerEnabled: false
# If set to true, runner pod container only 1 container that's expected to be able to run docker, too.
# image summerwind/actions-runner-dind or custom one should be used with true -value
Expand Down
70 changes: 51 additions & 19 deletions controllers/autoscaling.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@ package controllers

import (
"context"
"errors"
"fmt"
"log"
"strings"
"time"

"github.com/summerwind/actions-runner-controller/api/v1alpha1"
)
Expand All @@ -18,30 +19,61 @@ func (r *HorizontalRunnerAutoscalerReconciler) determineDesiredReplicas(rd v1alp

var repos [][]string

repoID := rd.Spec.Template.Spec.Repository
if repoID == "" {
orgName := rd.Spec.Template.Spec.Organization
if orgName == "" {
return nil, fmt.Errorf("asserting runner deployment spec to detect bug: spec.template.organization should not be empty on this code path")
}
orgName := rd.Spec.Template.Spec.Organization
if orgName == "" {
return nil, fmt.Errorf("asserting runner deployment spec to detect bug: spec.template.organization should not be empty on this code path")
}

metrics := hra.Spec.Metrics
metrics := hra.Spec.Metrics
if len(metrics) == 0 {
return nil, fmt.Errorf("validating autoscaling metrics: one or more metrics is required")
} else if tpe := metrics[0].Type; tpe != v1alpha1.AutoscalingMetricTypeTotalNumberOfQueuedAndInProgressWorkflowRuns {
return nil, fmt.Errorf("validting autoscaling metrics: unsupported metric type %q: only supported value is %s", tpe, v1alpha1.AutoscalingMetricTypeTotalNumberOfQueuedAndInProgressWorkflowRuns)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually add another metrics type here instead of overwriting the previous implementation. What do you think about what I've written in https://github.com/summerwind/actions-runner-controller/pull/223/files#diff-2fb5cc41bbf098ed839a501baea824efe104844213ce3969a63523439a132600R31

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@erikkn erikkn Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work you did on your PR!

So when I look at the new calculateReplicasByPercentageRunnersBusy method you introduce, you are essentially fetching all the runners and subsequently iterate over that list by checking which one is busy. Next, you have a couple of conditionals that instructs you to either scale up or down.

I like it because this also allows us to get completely rid of checking queues on an individual repository level. What do you reckon to do next here? I haven't given your PR a very thorough review yet, but with yours merged I guess we can close this one?
I am not sure if it makes sense of cherry-picking your changes, refactor this PR accordingly and keeping the logic, including the ugly
if len(metrics[0].RepositoryNames) < 1 && r.GitHubClient.GithubEnterprise conditional, specifically just for the calculateReplicasByQueuedAndInProgressWorkflowRuns metric.

Copy link

@ghost ghost Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your hpa scheme is also very much needed. I recommend you create another hpa scheme and add it into this if else block and pull out your changes into a separate func. Something like v1aplha1.AutoscalingMetricTypeNumQueuedAndInProgressForEnabledRepositories and then leave the original scheme as is.

I don't mind keeping these as separate PRs and if yours goes in first, merge your changes into mine afterwards. What are your thoughts?


// Github Enterprise API doesn't offer an endpoint for checking the Actions queue on org level, neither does it offer an endpoint for checking what repos Actions is enabled. This means that, in case of GH Enterprise, you must pass a slice of repositories.
if len(metrics[0].RepositoryNames) < 1 && r.GitHubClient.GithubEnterprise {
return nil, fmt.Errorf("[ERROR] user didn't pass any repository! Please pass a list of repositories the controller has to monitor")
}

if len(metrics) == 0 {
return nil, fmt.Errorf("validating autoscaling metrics: one or more metrics is required")
} else if tpe := metrics[0].Type; tpe != v1alpha1.AutoscalingMetricTypeTotalNumberOfQueuedAndInProgressWorkflowRuns {
return nil, fmt.Errorf("validting autoscaling metrics: unsupported metric type %q: only supported value is %s", tpe, v1alpha1.AutoscalingMetricTypeTotalNumberOfQueuedAndInProgressWorkflowRuns)
} else if len(metrics[0].RepositoryNames) == 0 {
return nil, errors.New("validating autoscaling metrics: spec.autoscaling.metrics[].repositoryNames is required and must have one more more entries for organizational runner deployment")
// If the above conditionally didn't return an error, we automatically assume, in case on an empty repository slice, that an organization is used.
if len(metrics[0].RepositoryNames) == 0 {
enabledRepos, _, err := r.GitHubClient.Actions.ListEnabledReposInOrg(context.Background(), orgName, nil)
if err != nil {
return nil, fmt.Errorf("[ERROR] 'ListEnabledReposInOrg' failed with error message: %s", err)
}

for _, repoName := range metrics[0].RepositoryNames {
repos = append(repos, []string{orgName, repoName})
if len(enabledRepos.Repositories) < 1 {
return nil, fmt.Errorf("[ERROR] 'ListEnabledReposInOrg' returned an empty slice of repositories, check your permissions. Error message: %s", err)
}
} else {
repo := strings.Split(repoID, "/")

repos = append(repos, repo)
for _, v := range enabledRepos.Repositories {
repoName := fmt.Sprint(*v.Name)

if *v.Archived || *v.Disabled {
continue
}

lastChange := (int(time.Now().UTC().Sub(v.PushedAt.Time).Minutes()))
// We need a conditional here, since the `ListEnabledReposInOrg(ListOptions)` doesn't allow us to filter on `pushedAt`: https://docs.github.com/en/free-pro-team@latest/rest/reference/actions#list-selected-repositories-enabled-for-github-actions-in-an-organization
if lastChange > int(r.SyncPeriod.Minutes()) {
continue
} else if len(repos) < 20 {
repos = append(repos, []string{orgName, repoName})
}
}
log.Printf("[INFO] watching the following organizational repositories: %s", repos)

} else {
repoID := rd.Spec.Template.Spec.Repository
if repoID == "" {
for _, repoName := range metrics[0].RepositoryNames {
repos = append(repos, []string{orgName, repoName})
}
} else {
repo := strings.Split(repoID, "/")
repos = append(repos, repo)
}
}

var total, inProgress, queued, completed, unknown int
Expand Down
1 change: 1 addition & 0 deletions controllers/horizontalrunnerautoscaler_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ type HorizontalRunnerAutoscalerReconciler struct {
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
SyncPeriod *time.Duration
}

// +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runnerdeployments,verbs=get;list;watch;update;patch
Expand Down
21 changes: 14 additions & 7 deletions github/github.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,19 @@ type Client struct {
regTokens map[string]*github.RegistrationToken
mu sync.Mutex
// GithubBaseURL to Github without API suffix.
GithubBaseURL string
GithubBaseURL string
GithubEnterprise bool
}

// NewClient creates a Github Client
func (c *Config) NewClient() (*Client, error) {
var (
httpClient *http.Client
client *github.Client
httpClient *http.Client
client *github.Client
githubEnterprise bool
)
githubBaseURL := "https://github.com/"

if len(c.Token) > 0 {
httpClient = oauth2.NewClient(context.Background(), oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: c.Token},
Expand All @@ -60,20 +63,24 @@ func (c *Config) NewClient() (*Client, error) {

if len(c.EnterpriseURL) > 0 {
var err error
githubEnterprise = true

client, err = github.NewEnterpriseClient(c.EnterpriseURL, c.EnterpriseURL, httpClient)
if err != nil {
return nil, fmt.Errorf("enterprise client creation failed: %v", err)
}
githubBaseURL = fmt.Sprintf("%s://%s%s", client.BaseURL.Scheme, client.BaseURL.Host, strings.TrimSuffix(client.BaseURL.Path, "api/v3/"))
} else {
client = github.NewClient(httpClient)
githubEnterprise = false
}

return &Client{
Client: client,
regTokens: map[string]*github.RegistrationToken{},
mu: sync.Mutex{},
GithubBaseURL: githubBaseURL,
Client: client,
regTokens: map[string]*github.RegistrationToken{},
mu: sync.Mutex{},
GithubBaseURL: githubBaseURL,
GithubEnterprise: githubEnterprise,
}, nil
}

Expand Down
2 changes: 0 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ require (
github.com/bradleyfalzon/ghinstallation v1.1.1
github.com/davecgh/go-spew v1.1.1
github.com/go-logr/logr v0.1.0
github.com/google/go-github v17.0.0+incompatible // indirect
github.com/google/go-github/v32 v32.1.1-0.20200822031813-d57a3a84ba04
github.com/google/go-github/v33 v33.0.0
github.com/google/go-querystring v1.0.0
github.com/gorilla/mux v1.8.0
Expand Down
4 changes: 0 additions & 4 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,8 @@ github.com/google/go-cmp v0.3.0 h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY=
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/google/go-cmp v0.3.1 h1:Xye71clBPdm5HgqGwUkwhbynsUJZhDbS20FvLhQ2izg=
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/google/go-github v17.0.0+incompatible h1:N0LgJ1j65A7kfXrZnUDaYCs/Sf4rEjNlfyDHW9dolSY=
github.com/google/go-github v17.0.0+incompatible/go.mod h1:zLgOLi98H3fifZn+44m+umXrS52loVEgC2AApnigrVQ=
github.com/google/go-github/v29 v29.0.2 h1:opYN6Wc7DOz7Ku3Oh4l7prmkOMwEcQxpFtxdU8N8Pts=
github.com/google/go-github/v29 v29.0.2/go.mod h1:CHKiKKPHJ0REzfwc14QMklvtHwCveD0PxlMjLlzAM5E=
github.com/google/go-github/v32 v32.1.1-0.20200822031813-d57a3a84ba04 h1:wEYk2h/GwOhImcVjiTIceP88WxVbXw2F+ARYUQMEsfg=
github.com/google/go-github/v32 v32.1.1-0.20200822031813-d57a3a84ba04/go.mod h1:rIEpZD9CTDQwDK9GDrtMTycQNA4JU3qBsCizh3q2WCI=
github.com/google/go-github/v33 v33.0.0 h1:qAf9yP0qc54ufQxzwv+u9H0tiVOnPJxo0lI/JXqw3ZM=
github.com/google/go-github/v33 v33.0.0/go.mod h1:GMdDnVZY/2TsWgp/lkYnpSAh6TrzhANBBwm6k6TTEXg=
github.com/google/go-querystring v1.0.0 h1:Xkwi/a1rcvNg1PPYe5vI8GbeBY/jrVuDX5ASuANWTrk=
Expand Down
1 change: 1 addition & 0 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ func main() {
Log: ctrl.Log.WithName("controllers").WithName("HorizontalRunnerAutoscaler"),
Scheme: mgr.GetScheme(),
GitHubClient: ghClient,
SyncPeriod: &syncPeriod,
}

if err = horizontalRunnerAutoscaler.SetupWithManager(mgr); err != nil {
Expand Down