Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals #104

kkizer · 2022-07-17T17:54:29Z

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals

Overview

We are deploying applications to Azure Active Directory (AAD) enabled clusters and are required to use kubelogin. Since these are automated deployment pipelines we cannot have interactive logins. We deploy using Azure service principals so we need to use the Azure CLI login method in kubelogin.

The Azure CLI login method does not work properly with AAD enabled AKS clusters when more than one service principal is used by Azure DevOps deployment pipelines on the same build agent. Each pipeline is producing the same token cache file name even though they are for different service principals. This results in helm and kubectl commands failing due to authorization errors.

This problem became an issue when some deployment pipelines were using kubelogin v0.0.14 and others were using v0.0.10.

Scenario

There are two different Azure Active Directory (AAD) enabled AKS clusters.
Each AAD enabled cluster has its own service principal to deploy applications to the cluster. This is required since the clusters are for different applications and permissions must be assigned appropriately.
For Azure DevOps deployment pipelines, the cluster configuration file is retrieved for the cluster using the az aks get-credentials Azure CLI command. The configuration needs to be converted to use Azure CLI logins (non-interactive) as documented here.
An Azure DevOps deployment pipeline is executed for cluster 1 on a given build agent. This is running using its deployment service principal (SP) SP 1. This results in a token cache file in ~/.kube/cache/kubelogin with a file name like: AzurePublicCloud-6dae42f8-4368-4678-94ff-3960e28e3630--.json. NOTE that the last two of the tokens in the name are BLANK.
An Azure DevOps deployment pipeline is executed for cluster 2 on the same build agent as the first pipeline soon after the first pipeline completes. This is running using its deployment service principal (SP) SP 2. This results in a token cache file in ~/.kube/cache/kubelogin with a the same file name as the name generated for the first pipeline.

This causes pipeline 2 to fail with authorization errors since the access token in the cache file is for the wrong service principal.

Source and cluster config info

Source code to generate token file

Cache token file name is generated here:

cacheFile := getCacheFileName(o.Environment, o.ServerID, o.ClientID, o.TenantID, o.IsLegacy)

And you see the name is formed as follows:

func getCacheFileName(environment, serverID, clientID, tenantID string, legacy bool) string {
	// format: ${environment}-${server-id}-${client-id}-${tenant-id}[_legacy].json
	cacheFileNameFormat := "%s-%s-%s-%s.json"
	if legacy {
		cacheFileNameFormat = "%s-%s-%s-%s_legacy.json"
	}
	return fmt.Sprintf(cacheFileNameFormat, environment, serverID, clientID, tenantID)
}

The last two tokens in the generated file name are blank, so the ClientID and TenantID must not be set. The user information in the original cluster configuration is:

users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    auth-provider:
      config:
        apiserver-id: 6dae42f8-4368-4678-94ff-3960e28e3630
        client-id: 80faf920-1908-4b52-b5ef-a8e7bedfc67a
        config-mode: '1'
        environment: AzurePublicCloud
        tenant-id: ca56a4a5-e300-406a-98ff-7e36a0baac5b
      name: azure

And after kubelogin convert-kubeconfig -l azurecli it is (Azure DevOps is masking the server-id value using ***):

- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --server-id
      - ***
      - --login
      - azurecli
      command: kubelogin
      env: null
      provideClusterInfo: false

This would explain how clientID and tenantID are blank in the generated token cache file name: neither the clientId or tenantID are passed to kubelogin as arguments.

Changes Between kubelogin v0.0.10 and v0.0.14

It doesn't look like the code that generates the token cache file name has changed between v.0.0.10 and v.0.0.14. So I'm not sure how this ever worked for us. Perhaps we were lucky and builds ran on different build agents and the name collision wasn't an issue. More likely the token expired and a new one was created (and cache file updated) so the issue was hidden.

However, the contents on the file DID change.

In v0.0.10, the resource property in the token file is blank:

"resource":""

While in v.0.14, it has a value:

"resource":"6dae42f8-4368-4678-94ff-3960e28e3630"

It looks like in v0.0.14 it is putting the object id of the service principal in the resource property. That won't work if two different pipelines using different service principals use the same kubelogin token cache file!

We need to update to the latest version of kubelogin especially since Kubernetes 1.24 requires the use of kubelogin with AAD enabled clusters. The current version of kubelogin (v0.0.14) is also required since it handles the new cluster configuration file that is generated starting in Kubernetes 1.24. We cannot stay on older versions of kubelogin (nor should we so that we keep up with bug fixes and security fixes).

Possible Workaround?

The kubelogin docs do not mention this (but kubelogin convert-kubeconfig --help does), but the kubelogin executable allows you to specify --client-id <client ID>. In my test YAML pipeline, the convert-kubeconfig -l azurecli is running in an Azure CLI task. I made the following changes:

Added addSpnToEnvironment: true to the task properties.
Changed the command to: kubelogin convert-kubeconfig -l azurecli --client-id $env:servicePrincipalId

This results in the following in the converted cluster config file (Azure DevOps masks the service principal id in log output. The *** is the service principal id):

users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --server-id
      - 6dae42f8-4368-4678-94ff-3960e28e3630
      - --client-id
      - ***
      - --login
      - azurecli
      command: kubelogin
      env: null
      provideClusterInfo: false

And I get file names like AzurePublicCloud-6dae42f8-4368-4678-94ff-3960e28e3630-***-.json

*** is the service principal id. Azure DevOps masks the service principal id in log output.

So the clientID value is set in the file name.

Kubelogin also supports -t <tenant ID> but that did not get written to the converted cluster config file.

Is this an appropriate solution to this issue?

kubelogin v.0.0.14 Does not Honor --client-id Flag with Kubernetes 1.24 Cluster Config Files

With Kubernetes 1.24, kubelogin v0.0.14 is also broken. The cluster config file you get is:

users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --environment
      - AzurePublicCloud
      - --server-id
      - 6dae42f8-4368-4678-94ff-3960e28e3630
      - --client-id
      - 80faf920-1908-4b52-b5ef-a8e7bedfc67a
      - --tenant-id
      - ca56a4a5-e300-406a-98ff-7e36a0baac5b
      - --login
      - devicecode
      command: kubelogin
      env: null

and if you convert to to use Azure CLI login like our workaround before:
kubelogin convert-kubeconfig -l azurecli --client-id $env:servicePrincipalId

The converted file no longer contains the client-id argument:

- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --server-id
      - 6dae42f8-4368-4678-94ff-3960e28e3630
      - --login
      - azurecli
      command: kubelogin
      env: null
      provideClusterInfo: false

AND YOU GET DUPLICATE FILE NAMES AGAIN!

Conclusion

The kubelogin code needs to generate unique file names for the token cache file name when using the Azure CLI login method. It cannot assume that the same service principal is being used.

Kevin Kizer
Lead Engineer
MetLife

The text was updated successfully, but these errors were encountered:

kkizer · 2022-07-18T15:04:52Z

It looks like the option token-cache-dir should work and do what we need, but kubelogin doesn't accept it as a valid option. In our Azure DevOps pipelines, we could set it to $(Agent.TempDirectory).

kkizer · 2022-07-18T15:50:19Z

It looks like the token-cache-dir option IS on the kubeconfig get-token command: kubelogin get-token --token-cache-dir=/mnt/src/test/

So we would have to edit the config file kubelogin convert-kubeconfig generates to include the token-cache-dir value

weinong · 2022-07-18T16:15:11Z

~~@kkizer looks like I missed this arg when I refactor the conversion code from exec to exec.. will fix it today~~

Oh interesting, we never had --token-cache-dir in convert-kubeconfig sub-command. I can still add it, though.

kkizer · 2022-07-18T16:41:25Z

Yes please! Then we can specify that in our pipelines and not have to manually edit the converted cluster configuration file. When would a new kubelogin be available for me to try out? Asking because I may need to do a temporary hack an update the converted cluster configuration file. We have a lot of deployment pipelines stepping on each other now.

weinong · 2022-07-18T18:05:45Z

@kkizer working on it. should have a PR out soon

* added --token-cache-dir in convert-kubeconfig sub-command. Addresses #104

weinong · 2022-07-18T20:16:39Z

@kkizer do you want to give it a try before I cut the release?

kkizer · 2022-07-19T11:54:45Z

Thank you very much for your quick support on this issue. I am working on getting it tested in our environment today. I will post a comment with test results.

kkizer · 2022-07-19T12:22:25Z

The fix does exactly what we need.

In my Azure DevOps pipeline I get the cluster configuration using az aks-getcredentials and convert it using:
kubelkogin convert-kubeconfig -l azurecli --token-cache-dir $(Agent.TempDirectory)

and the cluster configuration file contains:

- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --login
      - azurecli
      - --server-id
      - 6dae42f8-4368-4678-94ff-3960e28e3630
      - --token-cache-dir
      - /AzDOAgents/Agent1/_work/_temp
      command: kubelogin
      env: null
      provideClusterInfo: false

It has the token-cache-dir argument!

We still need to test the change with multiple builds on the same build machine and each build uses a different service principal. I don't expect any issues since each build would have its own agent temp directory.

weinong · 2022-07-20T16:24:50Z

actually v0.0.16 introduced a bug such that --token-cache-dir passed to get-token would not be effective as reported in #108

weinong · 2022-07-23T02:26:08Z

@kkizer you should upgrade to v0.0.17 or higher. the --token-cache-dir wasn't really being used in actual token acquisition. I think it worked out for you as your build workspace is different

kkizer · 2022-10-11T07:49:03Z

Thanks, but my day ended at 4 pm EDT. Please cut the release and we can test tomorrow. I can't just take an executable: I will need to get it loaded into our corporate Artifactory repository. Please be sure to note new/changed options in the release notes. Thank you very much for your support on this issue.

…

On Mon, Jul 18, 2022, 4:16 PM Weinong Wang ***@***.***> wrote: @kkizer <https://github.com/kkizer> do you want to give it a try before I cut the release? — Reply to this email directly, view it on GitHub <#104 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUSKGBWZVJ55QJFUDGVUV3VUW3TDANCNFSM532DEQCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kkizer · 2022-10-11T09:14:34Z

Great! So we would be able to specify the token cache dir on the kubelogin convert-kubeconfig command? If true, that would be a simple thing for us to define in our pipelines. When would a new kubelogin be available that has this fix? Regards, Kevin Kizer

…

On Mon, Jul 18, 2022 at 12:15 PM Weinong Wang ***@***.***> wrote: @kkizer looks like I missed this arg when I refactor the conversion code from exec to exec.. will fix it today — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

weinong · 2022-10-11T21:45:15Z

@kkizer the latest v0.0.20 has this change

mbaykara · 2023-03-05T16:31:40Z

Hey @kkizer what is the --server-id? Where can I get this?

weinong · 2023-03-06T18:04:23Z

@mbaykara https://github.com/Azure/kubelogin#exec-plugin-format

Note: The AAD server app ID of AKS Managed AAD is always 6dae42f8-4368-4678-94ff-3960e28e3630 in any environments.

mbaykara · 2023-03-07T13:27:52Z

he AAD server ap

I am not using azure managed cluster, using another provider actually. Will not work right? @weinong

weinong · 2023-03-07T20:21:11Z

@mbaykara what do you mean by another provider? if you know what's being setup, what exactly were you asking for?

Hey @kkizer what is the --server-id? Where can I get this?

mbaykara · 2023-03-07T23:48:37Z

Another public cloud provider i.e GKE @weinong

weinong · 2023-03-22T00:16:28Z

@mbaykara the --server-id should be the AAD application ID you configure your kubernetes cluster with. you can refer to https://github.com/Azure/kubelogin#setup-for-kubernetes-oidc-provider-using-azure-ad

weinong mentioned this issue Jul 18, 2022

added --token-cache-dir support in convert-kubeconfig #105

Merged

weinong added a commit that referenced this issue Jul 18, 2022

added --token-cache-dir support in convert-kubeconfig (#105)

b61a6ef

* added --token-cache-dir in convert-kubeconfig sub-command. Addresses #104

weinong closed this as completed Jul 20, 2022

weinong reopened this Jul 20, 2022

weinong closed this as completed Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals #104

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals #104

kkizer commented Jul 17, 2022 •

edited

Loading

kkizer commented Jul 18, 2022 •

edited

Loading

kkizer commented Jul 18, 2022

weinong commented Jul 18, 2022 •

edited

Loading

kkizer commented Jul 18, 2022

weinong commented Jul 18, 2022

weinong commented Jul 18, 2022

kkizer commented Jul 19, 2022

kkizer commented Jul 19, 2022

weinong commented Jul 20, 2022

weinong commented Jul 23, 2022

kkizer commented Oct 11, 2022 via email

kkizer commented Oct 11, 2022 via email

weinong commented Oct 11, 2022

mbaykara commented Mar 5, 2023

weinong commented Mar 6, 2023

mbaykara commented Mar 7, 2023

weinong commented Mar 7, 2023

mbaykara commented Mar 7, 2023

weinong commented Mar 22, 2023

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals #104

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals #104

Comments

kkizer commented Jul 17, 2022 • edited Loading

Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals

Overview

Scenario

Source and cluster config info

Changes Between kubelogin v0.0.10 and v0.0.14

Possible Workaround?

kubelogin v.0.0.14 Does not Honor --client-id Flag with Kubernetes 1.24 Cluster Config Files

Conclusion

kkizer commented Jul 18, 2022 • edited Loading

kkizer commented Jul 18, 2022

weinong commented Jul 18, 2022 • edited Loading

kkizer commented Jul 18, 2022

weinong commented Jul 18, 2022

weinong commented Jul 18, 2022

kkizer commented Jul 19, 2022

kkizer commented Jul 19, 2022

weinong commented Jul 20, 2022

weinong commented Jul 23, 2022

kkizer commented Oct 11, 2022 via email

kkizer commented Oct 11, 2022 via email

weinong commented Oct 11, 2022

mbaykara commented Mar 5, 2023

weinong commented Mar 6, 2023

mbaykara commented Mar 7, 2023

weinong commented Mar 7, 2023

mbaykara commented Mar 7, 2023

weinong commented Mar 22, 2023

kkizer commented Jul 17, 2022 •

edited

Loading

kkizer commented Jul 18, 2022 •

edited

Loading

weinong commented Jul 18, 2022 •

edited

Loading