-
Notifications
You must be signed in to change notification settings - Fork 192
Conversation
* Fewer concurrent reconciles for packageinstallstatus_controller * Higher ratelimiting for packageinstallstatus_controller * Bubble up errors in reconcile method so that retries back off * Remove timestamp addition in status * Allow enabling pprof Signed-off-by: Vijay Katam <vkatam@vmware.com>
Codecov Report
@@ Coverage Diff @@
## main #3004 +/- ##
==========================================
+ Coverage 44.00% 44.13% +0.12%
==========================================
Files 416 416
Lines 41852 42108 +256
==========================================
+ Hits 18417 18583 +166
- Misses 21714 21799 +85
- Partials 1721 1726 +5
Continue to review full report at Codecov.
|
@@ -378,11 +396,17 @@ func enableWebhooks(ctx context.Context, mgr ctrl.Manager, flags *addonFlags) { | |||
} | |||
|
|||
func enablePackageInstallStatusController(ctx context.Context, mgr ctrl.Manager, flags *addonFlags) { | |||
// set up a ClusterCacheTracker to provide to PackageInstallStatus controller which requires a connection to remote clusters | |||
// the informers/caches are created only for objects accessed through Get/List in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first two lines of this comment are valid still. Why removing those in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back with updated comment. Note that by default remote tracker excludes configmap and resource.
addons/main.go
Outdated
&corev1.ConfigMap{}, | ||
&corev1.Secret{}, | ||
&kapppkg.PackageInstall{}, | ||
&kappdatapkg.Package{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we add ConfigMap
and Secret
to the ClientUncachedObjects
? We never read any of those types in PackageInstallStatus
Controller code.
addons/main.go
Outdated
ClientUncachedObjects: []client.Object{ | ||
&corev1.ConfigMap{}, | ||
&corev1.Secret{}, | ||
&kapppkg.PackageInstall{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't adding PackageInstall
& Package
to ClientUncachedObjects
, mean that we'll not allow objects of those types to be cached anymore? Does that mean that we will be reading from api server for those now and that we'll be not using the cache functionality of remote watch anymore?
Status: pkgiCondition.Status, | ||
Message: util.GetKappUsefulErrorMessage(pkgi.Status.UsefulErrorMessage), | ||
Reason: pkgiCondition.Reason, | ||
LastTransitionTime: metav1.NewTime(time.Now().UTC().Truncate(time.Second)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sad to see timestamp go, it was adding more insight to the reconciliation status. But I can imagine it can add to the churn a lot as well.
@@ -294,9 +294,6 @@ func (r *VSphereCSIConfigReconciler) reconcileVSphereCSIConfigNormal(ctx context | |||
} | |||
} | |||
|
|||
logger.Info(fmt.Sprintf("'%s' the secret '%s'", opResult, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing lot of unnecessary logs
} | ||
|
||
// isPackageManaged checks if the provided PackageInstall is among the list of managed(core/additional) packages | ||
func (r *PackageInstallStatusReconciler) isPackageManaged(clusterObjKey client.ObjectKey, pkgiName string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we can find another way to address checking if a package is managed, I am removing this because it requires a client and putting a stateful function (one that has a ref to reconciler) is contributing towards memory leaks.
@@ -338,43 +335,43 @@ func (r *PackageInstallStatusReconciler) removeConditionIfExistsForPkgName(clust | |||
} | |||
|
|||
// watchPackageInstalls sets a remote watch on the provided cluster on the Kind resource | |||
func (r *PackageInstallStatusReconciler) watchPackageInstalls(cluster *clusterapiv1beta1.Cluster, log logr.Logger) error { | |||
func watchPackageInstalls(ctx context.Context, watcher remote.Watcher, tracker *remote.ClusterCacheTracker, cluster *clusterapiv1beta1.Cluster, log logr.Logger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made these functions not receivers of the PackageInstallStatusReconciler because these are passed to Watch in the remote tracker and is contributing to memory leaks.
installPackage(wlcCluster.Name, "pkg.test.carvel.dev.1.0.0", wlcCluster.Namespace) | ||
wlcClusterBootstrap := clusterBootstrapGet(client.ObjectKeyFromObject(wlcCluster)) | ||
Expect(len(wlcClusterBootstrap.Status.Conditions)).Should(Equal(0)) | ||
//By("verifying un-managed packages do not update the 'Status.Conditions' for ClusterBootstrap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can restore this test after we figure out a better way to check managed packages.
1. Reassigning variables remote tracker client variable which is a pointer causes the object to be tracked in memory 2. Remote tracker seems to be very sensitive with respect to how watch is called. Change watch to only include stateless functions. Signed-off-by: Vijay Katam <vkatam@vmware.com>
2165ed0
to
796e9f4
Compare
affedbc
to
f178b2b
Compare
f178b2b
to
8fc4a78
Compare
patchHelper, err := clusterapipatchutil.NewHelper(clusterBootstrap, r.Client) | ||
if err != nil { | ||
errorList = append(errorList, errors.Wrap(err, "error patching ClusterBootstrapStatus")) | ||
retErr = kerrors.NewAggregate(errorList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vijaykatam This block needed to be outside of the defer, the tests were failing sue to this change.
* Fix memory leaks in packageinstallstatus_controller Signed-off-by: Vijay Katam <vkatam@vmware.com> Update capabilities.go Detect TKGS environment for any version of TKC API
Signed-off-by: Vijay Katam vkatam@vmware.com
What this PR does / why we need it
Address OOM kills in addons-manager due to memory leaks
Which issue(s) this PR fixes
Fixes #2963
Describe testing done for PR
Tested on a scale test cluster with 158 clusters. The pod has been running stable for over 7 hours without OOM.
The memory stays around ~400 mb
Release note
Additional information
Before fix - notice packageinstallstatusreconciler
After fix Note that packageinstallstatusreconciler does not show up anymore
Special notes for your reviewer