Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pod-level identity improvements #237

Merged
merged 6 commits into from
Aug 8, 2024

Conversation

unexge
Copy link
Contributor

@unexge unexge commented Aug 7, 2024

This PR adds the following improvements:

  • Ensure we don't log sensitive information from csi.NodePublishVolumeRequest
  • Disable caching with pod-level credentials
  • Ensure to disable all other credential providers except STS provider
  • Add a test case to make sure we're not using secrets if pod-level identity is enabled

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

killCSIDriverPods(ctx, f)
}

updateDriverLevelCredentials := func(ctx context.Context, policyARN string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something like updateDriverLevelK8sSecret? The current name doesn't make it obvious

It was getting skipped if we `Skip` some tests and it was causing
cluster to stay in an invalid state.
@unexge unexge merged commit 598d260 into feature/pod-level-identity Aug 8, 2024
8 checks passed
@unexge unexge deleted the unexge/pod-level-identity branch August 8, 2024 14:42
muddyfish pushed a commit that referenced this pull request Aug 23, 2024
* Add a test case to make sure we're not using secrets if pod-level
identity is enabled

* Ensure to disable all other credential providers except STS provider

* Add an utility function to extract arguments from Mountpoint args

* Disable caching with pod-level credentials

* Ensure we don't log sensitive information from `csi.NodePublishVolumeRequest`

* Move cluster-wide cleanup to `AfterEach` to ensure its always called

It was getting skipped if we `Skip` some tests and it was causing
cluster to stay in an invalid state.
unexge added a commit that referenced this pull request Sep 11, 2024
* Add a test case to make sure we're not using secrets if pod-level
identity is enabled

* Ensure to disable all other credential providers except STS provider

* Add an utility function to extract arguments from Mountpoint args

* Disable caching with pod-level credentials

* Ensure we don't log sensitive information from `csi.NodePublishVolumeRequest`

* Move cluster-wide cleanup to `AfterEach` to ensure its always called

It was getting skipped if we `Skip` some tests and it was causing
cluster to stay in an invalid state.

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>
muddyfish added a commit that referenced this pull request Sep 23, 2024
* Initial implementation of multicredential support (#223)

Implements a proof of concept of pod level identity

Co-authored-by: Simon Beal <simobeal@amazon.com>
Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Add cluster role to helm service account definition (#226)

Co-authored-by: Simon Beal <simobeal@amazon.com>
Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Add initial set of end-to-end credential tests (#231)

* Initial e2e test for pod level identity

* Temporarily remove other e2e tests

In WIP state where configuration options for pod level identity arent yet complete.

* Add some initial set of credential tests

* Revert `tests/e2e-kubernetes/scripts/eksctl-patch.yaml` to original

* Pass created files seed to `expectReadToSucceed` function

* Use updated SA object while restoring the override

* Improve assertions

---------

Co-authored-by: Simon Beal <simobeal@amazon.com>
Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Create credential provider and refactor pod-level e2e tests (#235)

* Create credential provider and refactor pod-level e2e tests

* Enable all end-to-end tests

* Fix sanity tests

* Update CI scripts

* Clean resources in reverse order similar to how `defer` works

* Delete pod with `gracePeriod=0` in `expectFailToMount`

* Increase end-to-end test timeout to 30 minutes

* Implement STS region detection

* Cleanup service account tokens before and after mount

* Separate pod-level tests for ease of review

* Add pod-level tests back

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Ensure service account token files are unique per mount (#236)

* Add `WriteFileAtomic` function

This is copied from Tailscale codebase.

* Ensure service account token files are unique per mount

We're using Pod ID + Volume ID to ensure uniqueness.

* Fix race condition in unit tests

* Use `github.com/google/renameio` for atomic writes to a file

* Fix test coverage report in CI

* Replace spaces with tab in `Makefile`
Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Pod-level identity improvements (#237)

* Add a test case to make sure we're not using secrets if pod-level
identity is enabled

* Ensure to disable all other credential providers except STS provider

* Add an utility function to extract arguments from Mountpoint args

* Disable caching with pod-level credentials

* Ensure we don't log sensitive information from `csi.NodePublishVolumeRequest`

* Move cluster-wide cleanup to `AfterEach` to ensure its always called

It was getting skipped if we `Skip` some tests and it was causing
cluster to stay in an invalid state.

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Feature/pod level identity cache key (#245)

* Support `UNSTABLE_MOUNTPOINT_CACHE_KEY` when using pod level identity

When using pod level identity, use send a cache key to MP, which separates
caches even if the cache location is the same. This is frequent when using the
PLI feature.

* Enable cache with PLI

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Clean up tokens by parsing target path in `NodeUnpublishVolume` (#251)

* Add `ParseTargetPath` function

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Clean up tokens by parsing target path in `NodeUnpublishVolume`

We need Pod ID and Volume ID to clean up any previously created
tokens. `NodeUnpublishVolume` call receives Volume ID but not Pod ID.
We were keeping the mapping from target path to Pod ID in memory
before but this is not ideal as it's not persistent and we were losing
this data when CSI driver restarts. With this change, we're parsing
target path to get Pod ID and use that to clean up tokens without
storing any state in the CSI driver.

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Use a RegExp to parse target path

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Update target path RegExp

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

---------

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Add authentication source to user-agent (#253)

* Add authentication source to user-agent

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Add authentication source to user-agent unconditionally

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

---------

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Pod-level identity improvements (#257)

* Update Go to `1.22.7`

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Escape Pod ID in `tokenFilename` function

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Set permissions of Unix socket explicitly

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

---------

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Pod-level identity improvements (#258)

* Set max receive message size to 2MB for the gRPC server

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Set `readOnlyRootFilesystem: true` and `allowPrivilegeEscalation:
false` for containers

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Check device type before unmounting the mount point

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Ensure to escape newlines from log entries

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Return pointer of `NodePublishVolumeRequest` to use nice formatting

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Add tests for `IsMountPoint` method

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

---------

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>

* Feature/pod level identity docs (#255)

PLI docs

- Add diagrams for the various approaches
- Add descriptions for the driver and pod level approaches
- Migrate configuration instructions from install.md to new configuration file

* Remove unneeded nodes in docs diagrams (#259)

Add notes to say that only IRSA is supported with PLI, and also that driver level credential sources will be ignored.

* Update Kustomization manifest (#261)

* Update `deploy` directory for Pod Level Identity

* Move resources around in `deploy` directory

Rendered with `helm template --debug charts/aws-mountpoint-s3-csi-driver`

---------

Co-authored-by: Simon Beal <simobeal@amazon.com>

* Enable `podInfoOnMount` when using k8s>=1.30 (#262)

* Enable `podInfoOnMount` when using k8s>=1.30

* Add warning about needing to pass in special config for PLI on clusters <1.30

Add instructions on configuring STS regions

* Use `.Capabilities.KubeVersion.Version` rather than GitVersion

* Add `podInfoOnMount: true` to kustomize install

---------

Signed-off-by: Burak Varlı <burakvar@amazon.co.uk>
Co-authored-by: Burak <burakvar@amazon.co.uk>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants