-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Bug]: Etcd Pods Entering CrashLoopBackOff After Restart, Leading to Milvus Cluster Degradation #40575
Comments
@aashishsingh2803 Hello there! I'm here to assist you with any bugs, questions, or contribution-related matters. Let's work together to resolve the issue you're facing. A similar issue was previously reported, where etcd pods entered a CrashLoopBackOff state after a forced restart, leading to Milvus cluster degradation. Unfortunately, no solution was found in the comments of that issue [1]. To resolve this issue, you can follow these steps:
Additionally, ensure that you follow the proper shutdown sequence in the future: stop the Milvus standalone instance first, and then stop the etcd pods. This allows Milvus to properly persist its final state to etcd before etcd is shut down [2][3]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
"level":"warn","ts":"2025-03-11T12:17:08.160747Z","caller":"etcdserver/server.go:1148","msg":"server error","error":"the member has been permanently removed from the cluster"} |
@aashishsingh2803 |
/assign @aashishsingh2803 |
Is there an existing issue for this?
Environment
Current Behavior
While testing the reliability of the Milvus cluster, I forcefully terminated all five etcd pods at the same time. After the termination, new pods were automatically recreated; however, most of them entered a continuous CrashLoopBackOff state, resulting in cluster degradation.
Upon investigating the etcd cluster state, we found that four out of five nodes were no longer part of the etcd cluster, causing significant instability.
Please investigate this issue and provide guidance on resolving it. If additional details are required, let us know.
Expected Behavior
No response
Steps To Reproduce
Milvus Log
Attached logs of the etcd cluster
etcd-milvusetcdupgrade1-0.log
etcd-milvusetcdupgrade1-1.log
etcd-milvusetcdupgrade1-2.log
etcd-milvusetcdupgrade1-3.log
etcd-milvusetcdupgrade1-4.log
Anything else?
No response
The text was updated successfully, but these errors were encountered: