You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug: A clear and concise description of what the bug is.
Sometimes when removing a Pod, which is mounted with a NFS PV, with the corresponding NFS PVC/PV simultaneously, both the Pod/PVC/PV and the backend NFS Deployment/Service/PVC/PV are cleaned so fast that the kubelet on the worker node where the pod used to run can not unmount the NFS volume in time. This makes the remaining NFS volume on the worker node stale and won't be unmounted unless manually doing so. But the IO process will be blocked there forever until rebooting the node.
It's weird though that the Pod object is successfully removed from the cluster even without kubelet completing cleaning mount on the node.
Expected behaviour: A concise description of what you expected to happen
The NFS volume mounted on the worker node is cleaned up.
Steps to reproduce the bug:
Steps to reproduce the bug should be clear and easily reproducible to help people gain an understanding of the problem
Define a Pod requiring NFS PV
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openebs-nfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: network-file # This is the SC name related to the openebs-nfs-provisioner
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: sleep
name: sleep
spec:
containers:
- image: nginx
name: sleep
resources: {}
volumeMounts:
- name: openebs-nfs
mountPath: /mnt
dnsPolicy: ClusterFirst
terminationGracePeriodSeconds: 0 # intentionally set this to 0
restartPolicy: Always
volumes:
- name: openebs-nfs
persistentVolumeClaim:
claimName: openebs-nfs
status: {}
Set the terminationGracePeriodSeconds to 0 so the pod can be quickly removed when deleting it.
Deploy above things and wait all things are up including those backend NFS Deployment/Service/PVC/PV
Use kubectl get po - o wide to get the node where the Pod is running
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sleep 1/1 Running 0 2m11s 192.168.171.133 node-10-158-36-65 <none> <none>
Delete above things at the same time via like kubectl delete -f <path_file_of_above_content>
kubectl delete -f pod.yml
persistentvolumeclaim "openebs-nfs" deleted
pod "sleep" deleted
Everything from the kubectl's view will be successfully removed
kubectl get po
No resources found in default namespace.
kubectl -n kube-system get all | grep nfs-pvc
Go the node where the Pod ran and do df -h which will get stuck. Then via mount will see the NFS volume is leftover
# ssh to the node
mount | grep nfs
10.105.148.166:/ on /var/lib/kubelet/pods/947b2765-78f0-4908-8856-5fe09269999e/volumes/kubernetes.io~nfs/pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.158.36.65,local_lock=none,addr=10.105.148.166)
The output of the following commands will help us better understand what's going on:
kubectl get pods -n <openebs_namespace> --show-labels
kubectl get pvc -n <openebs_namespace>
kubectl get pvc -n <application_namespace>
Anything else we need to know?:
Add any other context about the problem here.
Environment details:
OpenEBS version (use kubectl get po -n openebs --show-labels):
Got the same issue, with ISCSI backend storage. k8s make unmount only once and when it gets timoeut it just forgots about it. k8s version is 1.21, @jiuchen1986 did you solve this problem?
This sounds more like a problem or inconvenience in the k8s behaviour. I'm not sure if k8s does a lazy unmount when pod goes away, or if there is a way to specify that.
Taking a look...
Describe the bug: A clear and concise description of what the bug is.
Sometimes when removing a Pod, which is mounted with a NFS PV, with the corresponding NFS PVC/PV simultaneously, both the Pod/PVC/PV and the backend NFS Deployment/Service/PVC/PV are cleaned so fast that the kubelet on the worker node where the pod used to run can not unmount the NFS volume in time. This makes the remaining NFS volume on the worker node stale and won't be unmounted unless manually doing so. But the IO process will be blocked there forever until rebooting the node.
It's weird though that the Pod object is successfully removed from the cluster even without kubelet completing cleaning mount on the node.
Expected behaviour: A concise description of what you expected to happen
The NFS volume mounted on the worker node is cleaned up.
Steps to reproduce the bug:
Steps to reproduce the bug should be clear and easily reproducible to help people gain an understanding of the problem
Set the
terminationGracePeriodSeconds
to 0 so the pod can be quickly removed when deleting it.kubectl get po - o wide
to get the node where the Pod is runningkubectl delete -f <path_file_of_above_content>
kubectl
's view will be successfully removeddf -h
which will get stuck. Then viamount
will see the NFS volume is leftoverThe output of the following commands will help us better understand what's going on:
kubectl get pods -n <openebs_namespace> --show-labels
kubectl get pvc -n <openebs_namespace>
kubectl get pvc -n <application_namespace>
Anything else we need to know?:
Add any other context about the problem here.
Environment details:
kubectl get po -n openebs --show-labels
):v0.9.0
kubectl version
):cat /etc/os-release
):uname -a
):The backend storage is Ceph CSI RBD.
StorageClass:
The text was updated successfully, but these errors were encountered: