Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Endless loop of reconciliation #782

Open
rassie opened this issue Feb 21, 2025 · 0 comments
Open

Endless loop of reconciliation #782

rassie opened this issue Feb 21, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@rassie
Copy link

rassie commented Feb 21, 2025

Describe the bug

After getting a cluster running at long last (OpenShift woes), it seems to be stuck in some kind of endless reconciliation loop:

{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "start operation",
  "operationId": "op-wbdjt",
  "origin": "81293e20-0258-4ddf-83e0-d81e2c3dd4ed"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "32721860-8378-4a82-a05d-24e19fe78c18",
  "newPartition": 2,
  "oldPartition": 3
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "32721860-8378-4a82-a05d-24e19fe78c18"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "6b9c0377-f70e-456d-82f4-22e7e414de73",
  "newPartition": 1,
  "oldPartition": 2
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "6b9c0377-f70e-456d-82f4-22e7e414de73"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "a41c7399-b9f8-4a53-9e62-e030e9d1ea4e",
  "newPartition": 0,
  "oldPartition": 1
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "a41c7399-b9f8-4a53-9e62-e030e9d1ea4e"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "cluster state is Healthy",
  "operationId": "op-wbdjt"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "finish",
  "operationId": "op-wbdjt",
  "duration": 0.114975796
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "reconciled StatefulSet",
  "controller": "mysqlcluster",
  "controllerGroup": "moco.cybozu.com",
  "controllerKind": "MySQLCluster",
  "MySQLCluster": {
    "name": "myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "myapp",
  "reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758",
  "statefulSetName": "moco-myapp"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "reconciled CronJob for backup",
  "controller": "mysqlcluster",
  "controllerGroup": "moco.cybozu.com",
  "controllerKind": "MySQLCluster",
  "MySQLCluster": {
    "name": "myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "myapp",
  "reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758",
  "cronJobName": "moco-backup-myapp"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "start operation",
  "operationId": "op-fn89n",
  "origin": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "update status successfully",
  "controller": "mysqlcluster",
  "controllerGroup": "moco.cybozu.com",
  "controllerKind": "MySQLCluster",
  "MySQLCluster": {
    "name": "myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "myapp",
  "reconcileID": "d3640c7a-d4bf-4408-9a8c-e0aad62e7758"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "bbab67e9-2011-4481-bd7c-ecc0a433d2e8",
  "newPartition": 2,
  "oldPartition": 3
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "bbab67e9-2011-4481-bd7c-ecc0a433d2e8"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "ff21a7d5-0c90-4ddf-b98e-0b3996c3c5a2",
  "newPartition": 1,
  "oldPartition": 2
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:42Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "ff21a7d5-0c90-4ddf-b98e-0b3996c3c5a2"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:43Z",
  "msg": "Updated partition",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "487bdf78-2a05-489c-9473-578ea26ee907",
  "newPartition": 0,
  "oldPartition": 1
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:43Z",
  "msg": "partition is updated",
  "controller": "statefulset",
  "controllerGroup": "apps",
  "controllerKind": "StatefulSet",
  "StatefulSet": {
    "name": "moco-myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "moco-myapp",
  "reconcileID": "487bdf78-2a05-489c-9473-578ea26ee907"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:43Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "cluster state is Healthy",
  "operationId": "op-fn89n"
}
{
  "level": "info",
  "ts": "2025-02-20T19:41:43Z",
  "logger": "cluster-manager.myapp/myapp",
  "msg": "finish",
  "operationId": "op-fn89n",
  "duration": 0.113314412
}

I have no idea (yet) what could be causing this. An extremely wild hunch is that maybe #781 (comment) is a fix for this, but I have currently no way to check that.

Every now and then the logs also contain this:

{
  "level": "error",
  "ts": "2025-02-20T19:41:43Z",
  "msg": "failed to update status",
  "controller": "mysqlcluster",
  "controllerGroup": "moco.cybozu.com",
  "controllerKind": "MySQLCluster",
  "MySQLCluster": {
    "name": "myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "myapp",
  "reconcileID": "e6f09dd0-b294-4a46-b12a-81879497244a",
  "error": "Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"myapp\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "github.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1.func1\n\t/work/controllers/mysqlcluster_controller.go:260\ngithub.heygears.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).reconcileV1\n\t/work/controllers/mysqlcluster_controller.go:328\ngithub.heygears.com/cybozu-go/moco/controllers.(*MySQLClusterReconciler).Reconcile\n\t/work/controllers/mysqlcluster_controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"
}
{
  "level": "error",
  "ts": "2025-02-20T19:41:43Z",
  "msg": "Reconciler error",
  "controller": "mysqlcluster",
  "controllerGroup": "moco.cybozu.com",
  "controllerKind": "MySQLCluster",
  "MySQLCluster": {
    "name": "myapp",
    "namespace": "myapp"
  },
  "namespace": "myapp",
  "name": "myapp",
  "reconcileID": "e6f09dd0-b294-4a46-b12a-81879497244a",
  "error": "Operation cannot be fulfilled on mysqlclusters.moco.cybozu.com \"myapp\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"
}

but then again, I'm pretty sure I didn't change the object (fully synced with ArgoCD).

Environments

  • Version: 0.25.1

To Reproduce
Steps to reproduce the behavior:

  1. Create a new cluster
  2. Wait for the cluster to finish setting up
  3. Look into moco-controller's logs and see the above logs repeating

Expected behavior
After initial reconciliation, it should happen again only when something has changed.

@rassie rassie added the bug Something isn't working label Feb 21, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant