CPU Spikes on Openshift with unusual operator behaviour #3441

Mohid-A · 2022-07-12T13:54:35Z

Hi Community,

We are running into an issue when the camel-k operator restarts when we have more than four integrations running on OCP(version mentioned below). Upon restart, the operator keeps on reconciling the integrations continuously which causes CPU spikes on the master node, also resulting in the latency on the kube-api server. The logs for the issue is mentioned below,

VERSION

Camel-k-operator 1.9.1
Camel K Client 1.9.1
OCP 4.9.37Using Kubernetes 1.22

Command to produce the Issue

kamel --kube-config=$QA_KUBECONFIG run $APP(integration file variable) --trait container.enabled=$ENABLED --trait container.request-cpu=$REQUESTCPU --trait container.request-memory=$REQUESTMEMORY --trait container.limit-cpu=$LIMITCPU --trait container.limit-memory=$LIMITMEMORY --trait jvm.options=-Doracle.jdbc.timezoneAsRegion=false --pod-template $PVC2 --config secret:$SECRET --config configmap:$CONFIGMAP -t logging.level=DEBUG

Error Log

{"level":"info","ts":1657633698.0446534,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.0447812,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.3031335,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.3032014,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.5536115,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.5536752,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.9342558,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.9343414,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1299353,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1300168,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}

Expected Behavior

We want the operator to be stable upon restart, as restarting has an impact on the platform and other workloads.

Thanks

The text was updated successfully, but these errors were encountered:

heiko-braun · 2022-07-12T14:04:58Z

@christophd Have you encountered this before?

squakez · 2022-07-12T14:59:05Z

Thanks for reporting the problem. I managed to replicate the issue on a local environment as well. Strangely this happens whenever there are more than a few integrations running (I tried with 5). If you stop the operator pod, as soon as it restarts, it tries reconciling all the running integrations for a few seconds repeatedly. In my case it stops after less than a minute, but it is worth to investigate and to see how to fix.

Mohid-A · 2022-07-12T15:48:36Z

For us, we noticed when the operator pod restarts the reconciling does not stop, the only fix is we had to delete the running integrations and bring the count to 4 to stop this operator behavior

gtata007 · 2022-07-12T17:16:37Z

Is this a CamelK operator issue or an environment(OpenShift) issue?.
If this is an Operator issue, can we have any other Camel K version (1.6.0 or 1.6.3 ) which might be stable on the OpenShift environment?

heiko-braun · 2022-07-12T17:54:06Z

If I remember correctly, @christophd and @astefanutti talked about this recently

astefanutti · 2022-07-13T08:21:51Z

If I understand the issue correctly, it is two folds:

1. All the Integration resources are reconciled upon the operator startup:

This is the standard operator behavior, i.e., all the managed resources are reconciled once, so any changes to their state, that could have occurred while the operator was down, are taken into account, so the system can achieve eventual consistency.
That indeed may cause a spike w.r.t. compute resources and API server requests. We could look into further tuning the client side QPS and Burst parameters that control API request throttling. These have been increased as part of #2814, but we could make them configurable.

2. The reconciliation goes on indefinitely:

This may be an occurence of the issue fixed by #3285, which has yet to be released in the upcoming 1.9.3 version.

squakez · 2022-07-13T08:53:35Z

Thanks for the feedback @astefanutti. I've just tested with 1.10-nightly and I confirm the indefinite reconciliation loop has been fixed:

camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.6277504,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it2"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.955836,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it3"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.074853,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it4"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.6067405,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it5"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.981502,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it1"}

I am keeping this open until we do release officially both 1.10 and 1.9.3

tadayosi · 2022-08-25T04:30:28Z

I mistakenly put it to 1.11.0. Moving it back to 1.10.0 as it can be closed once we release 1.10.0.

squakez added the kind/bug Something isn't working label Jul 12, 2022

squakez added the area/operator label Jul 12, 2022

squakez modified the milestones: 1.10.0, 1.9.3 Jul 13, 2022

tadayosi modified the milestones: 1.10.0, 1.11.0 Aug 25, 2022

oscerd modified the milestones: 1.10.0, 1.11.0 Sep 5, 2022

oscerd closed this as completed Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Spikes on Openshift with unusual operator behaviour #3441

CPU Spikes on Openshift with unusual operator behaviour #3441

Mohid-A commented Jul 12, 2022

heiko-braun commented Jul 12, 2022

squakez commented Jul 12, 2022

Mohid-A commented Jul 12, 2022

gtata007 commented Jul 12, 2022

heiko-braun commented Jul 12, 2022

astefanutti commented Jul 13, 2022 •

edited

Loading

squakez commented Jul 13, 2022

tadayosi commented Aug 25, 2022

CPU Spikes on Openshift with unusual operator behaviour #3441

CPU Spikes on Openshift with unusual operator behaviour #3441

Comments

Mohid-A commented Jul 12, 2022

heiko-braun commented Jul 12, 2022

squakez commented Jul 12, 2022

Mohid-A commented Jul 12, 2022

gtata007 commented Jul 12, 2022

heiko-braun commented Jul 12, 2022

astefanutti commented Jul 13, 2022 • edited Loading

squakez commented Jul 13, 2022

tadayosi commented Aug 25, 2022

astefanutti commented Jul 13, 2022 •

edited

Loading