Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stop deleting underlying services when federation service is deleted #37353

Merged
merged 1 commit into from
Nov 30, 2016

Conversation

nikhiljindal
Copy link
Contributor

@nikhiljindal nikhiljindal commented Nov 23, 2016

Fixes #36799

Fixing federation service controller to not delete services from underlying clusters when federated service is deleted.
None of the federation controller should do this unless explicitly asked by the user using DeleteOptions. This is the only federation controller that does that.

cc @kubernetes/sig-cluster-federation @madhusudancs

federation service controller: stop deleting services from underlying clusters when federated service is deleted.

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 23, 2016
@nikhiljindal nikhiljindal added area/cluster-federation kind/bug Categorizes issue or PR as related to a bug. labels Nov 23, 2016
@nikhiljindal nikhiljindal added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 23, 2016
@nikhiljindal
Copy link
Contributor Author

cc @saad-ali as FYI for 1.5.

This is a bug fix for 1.5. Required in 1.5 since we are introducing cascading deletion based on DeleteOptions.OrphanDependents for federation resource. Federation service controller was doing it even without that option which is unexpected.

@nikhiljindal nikhiljindal added this to the v1.5 milestone Nov 23, 2016
@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 23, 2016
@nikhiljindal
Copy link
Contributor Author

Brought up a cluster and verified that services are not deleted in underlying clusters and DNS records are not touched.

Copy link
Contributor

@madhusudancs madhusudancs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, a few minor nits. Please feel free to apply the LGTM label after addressing them.

Could you please also audit the federated service e2e tests to ensure that we don't accidentally leak services in the underlying clusters after this change?

// or we do nothing for service deletion
// TODO: Should uncomment this?
// We should delete the load balancer when service is deleted
// Look at the same method in kube service controller.
//err := s.dns.balancer.EnsureLoadBalancerDeleted(service)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this as well. We are not going to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm. Why not? kube service controller calls this.
Leaving this as is. Can remove it in a separate PR if we want to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikhiljindal by "this" I meant the TODO comment and commented code. What was your "this"?

What loadbalancer do you think we are going to delete here? Also, the commented code is wrong anyway. s.dns is an interface that doesn't contain a field called balancer. My gut feeling is that, this was a copy-paste from cluster-level service controller and doesn't apply here.

Please remove it. The comment change is making things worse. And the right fix for that is just removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you are right. Removed it. Thanks for clarifying

err := s.deleteClusterService(clusterName, cachedService, cluster.clientset)
if err != nil {
hasErr = true
} else if err := s.ensureDnsRecords(clusterName, cachedService); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure that this behavior is carried over to the cascading deletion. Could you please add a TODO somewhere? Or open an issue may be? One of the two, either one is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR for it: #36390.
Also tracked in #33612.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the effect of s.ensureDnsRecords() in PR #36390. May be I am missing something?

for clusterName, clusterClientset := range clusters {
_, err := clusterClientset.Core().Services(service.Namespace).Get(service.Name)
if err != nil {
framework.Failf("Unexpected error in fetching service %s in cluster %s, %s", service.Name, clusterName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure about failing here. It should log an error and continue?

Also, I would just put this in a defer block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah the comment "Cleanup" was wrong. This is the test. We want to verify that services are not deleted from underlying clusters. Removed the comment.

AfterEach deletes these services.

@nikhiljindal
Copy link
Contributor Author

Thanks @madhusudancs
Updated as per comments.
Also verified that services are not being leaked.
DNS records are being leaked tough. Tracked in #29335.

Adding LGTM label

@nikhiljindal nikhiljindal added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2016
@madhusudancs
Copy link
Contributor

@nikhiljindal please look at the comment.

@nikhiljindal nikhiljindal removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2016
@nikhiljindal
Copy link
Contributor Author

Thanks @madhusudancs
Updated as per comment.

@nikhiljindal nikhiljindal added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2016
@dims
Copy link
Member

dims commented Nov 28, 2016

@k8s-bot test this

@nikhiljindal
Copy link
Contributor Author

@k8s-bot cvm gce e2e test this issue: #IGNORE (stuck on Waiting for status to be reported)

@nikhiljindal
Copy link
Contributor Author

@k8s-bot cvm gce e2e test this issue: #IGNORE (failed with "Error starting build")

@nikhiljindal
Copy link
Contributor Author

Removing requires release czar attention as per offline discussion with saad last week.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 34eae22. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@nikhiljindal
Copy link
Contributor Author

k8s-bot cvm gce e2e test this (build failed)

@nikhiljindal
Copy link
Contributor Author

@k8s-bot cvm gce e2e test this (build failed)

@nikhiljindal
Copy link
Contributor Author

@k8s-bot cvm gce e2e test this

@dims
Copy link
Member

dims commented Nov 29, 2016

@k8s-bot gci gke e2e test this

@saad-ali
Copy link
Member

cc @saad-ali as FYI for 1.5.

This is a bug fix for 1.5. Required in 1.5 since we are introducing cascading deletion based on DeleteOptions.OrphanDependents for federation resource. Federation service controller was doing it even without that option which is unexpected.

Ack. Thanks

@nikhiljindal
Copy link
Contributor Author

Unit tests seem to have been stuck for more than 11 hours.
@k8s-bot unit test this

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit d51f07b into kubernetes:master Nov 30, 2016
@saad-ali saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 1, 2016
saad-ali added a commit that referenced this pull request Dec 1, 2016
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants