failed to unmount due to "host is down" #64

andyzhangx · 2020-07-11T14:23:52Z

What happened:
This issue happened on k8s v1.15.11, need to check whether latest k8s version has fixed this issue:

Jul 11 07:53:08 aks-agentpool-60632172-vmss000007 kubelet[4580]: I0711 07:53:08.728754    4580 controlbuf.go:382] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
Jul 11 07:53:18 aks-agentpool-60632172-vmss000007 kubelet[4580]: E0711 07:53:18.968266    4580 csi_mounter.go:428] kubernetes.io/csi: isDirMounted IsLikelyNotMountPoint test failed for dir [/var/lib/kubelet/pods/aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6/volumes/kubernetes.io~csi/pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1/mount]
Jul 11 07:53:18 aks-agentpool-60632172-vmss000007 kubelet[4580]: E0711 07:53:18.968312    4580 csi_mounter.go:378] kubernetes.io/csi: mounter.TearDownAt failed to clean mount dir [/var/lib/kubelet/pods/aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6/volumes/kubernetes.io~csi/pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1/mount]: stat /var/lib/kubelet/pods/aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6/volumes/kubernetes.io~csi/pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1/mount: host is down
Jul 11 07:53:18 aks-agentpool-60632172-vmss000007 kubelet[4580]: E0711 07:53:18.968397    4580 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/csi/smb.csi.k8s.io^pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1\" (\"aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6\")" failed. No retries permitted until 2020-07-11 07:53:19.968350377 +0000 UTC m=+174446.119043970 (durationBeforeRetry 1s). Error: "UnmountVolume.TearDown failed for volume \"smb\" (UniqueName: \"kubernetes.io/csi/smb.csi.k8s.io^pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1\") pod \"aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6\" (UID: \"aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6\") : stat /var/lib/kubelet/pods/aebb8a69-b5fe-4b36-b8cc-59800e5e6fa6/volumes/kubernetes.io~csi/pvc-255b2e00-87d3-4d33-b02d-6cdc2d6394b1/mount: host is down"

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

CSI Driver version:
Kubernetes version (use kubectl version): 1.15.11
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

boddumanohar · 2020-09-07T05:56:42Z

@andyzhangx Can you please more instructions on how to reproduce the issue? Would like to see if I can fix this.

fejta-bot · 2020-12-06T06:00:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

andyzhangx · 2021-02-23T02:37:11Z

similar error msgs:

-- Logs begin at Sat 2021-02-20 16:54:54 GMT, end at Mon 2021-02-22 22:04:28 GMT. --
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.193070    1917 kubelet_volumes.go:65] pod "2517cdab-e91e-4006-b5a0-e824bb25f83c" found, but error stat /var/lib/kubelet/pods/2517cdab-e91e-4006-b5a0-e824bb25f83c/volumes/kubernetes.io~azure-file/app-name-persistent-volume: host is down occurred during checking mounted volumes from disk
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.193972    1917 kubelet_volumes.go:65] pod "a4a19c0b-d254-467e-9d26-86e84f6b85ed" found, but error stat /var/lib/kubelet/pods/a4a19c0b-d254-467e-9d26-86e84f6b85ed/volumes/kubernetes.io~azure-file/app-name-test-persistent-volume: host is down occurred during checking mounted volumes from disk
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.194774    1917 kubelet_volumes.go:65] pod "d3806c88-a022-46a7-bfb8-a4e20fa992fe" found, but error stat /var/lib/kubelet/pods/d3806c88-a022-46a7-bfb8-a4e20fa992fe/volumes/kubernetes.io~azure-file/dxf-persistent-volume: host is down occurred during checking mounted volumes from disk
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.195421    1917 kubelet_volumes.go:65] pod "2517cdab-e91e-4006-b5a0-e824bb25f83c" found, but error stat /var/lib/kubelet/pods/2517cdab-e91e-4006-b5a0-e824bb25f83c/volumes/kubernetes.io~azure-file/app-name-persistent-volume: host is down occurred during checking mounted volumes from disk
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.196135    1917 kubelet_volumes.go:65] pod "a4a19c0b-d254-467e-9d26-86e84f6b85ed" found, but error stat /var/lib/kubelet/pods/a4a19c0b-d254-467e-9d26-86e84f6b85ed/volumes/kubernetes.io~azure-file/app-name-test-persistent-volume: host is down occurred during checking mounted volumes from disk
Feb 20 16:54:56 agent-node000002 kubelet[1917]: E0220 16:54:56.196903    1917 kubelet_volumes.go:65] pod "d3806c88-a022-46a7-bfb8-a4e20fa992fe" found, but error stat /var/lib/kubelet/pods/d3806c88-a022-46a7-bfb8-a4e20fa992fe/volumes/kubernetes.io~azure-file/dxf-persistent-volume: host is down occurred during checking mounted volumes from disk

andyzhangx · 2021-04-20T11:49:23Z

Finally I spend some time, trying to fix this issue, there should be two PRs at least, first PR: kubernetes/utils#203

andyzhangx · 2021-04-21T03:52:38Z

would be fixed by this PR: kubernetes/kubernetes#101305

fejta-bot · 2021-07-20T04:32:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-19T04:58:09Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-09-18T05:34:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-09-18T05:34:02Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…rsion_update Update snapshotter to version v2.0.0

andyzhangx added the kind/bug Categorizes issue or PR as related to a bug. label Jul 11, 2020

This was referenced Oct 5, 2020

Volume cannot be accessed when csi-smb-node-xxxxx pod on the same node as the node of the pod which is using PV is restarted. #130

Closed

fix: cifs mount broken after driver restart #131

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2020

andyzhangx removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2020

drigz mentioned this issue Feb 3, 2021

SubPath unmount fails with "directory not empty" after smb-server Service IP changes #222

Closed

andyzhangx mentioned this issue Apr 21, 2021

fix unmount failure for SMB volume in host is down state kubernetes/kubernetes#101305

Closed

andyzhangx mentioned this issue May 19, 2021

fix: set "host is down" as corrupted mount #268

Merged

4 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2021

k8s-ci-robot closed this as completed Sep 18, 2021

andyzhangx pushed a commit to andyzhangx/csi-driver-smb that referenced this issue May 1, 2022

Merge pull request kubernetes-csi#64 from ggriffiths/snapshotter_2_ve…

4cc9174

…rsion_update Update snapshotter to version v2.0.0

mloskot mentioned this issue May 24, 2024

[Feedback] lstat /var/lib/kubelet/pods/9c7edeed-db1a-47d8-bf56-2295553a7b79/volumes/kubernetes.io~csi/pvc-d6e6360a-f58c-4079-bd7a-afc7ed5a657f/mount: host is down Azure/AKS#4310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to unmount due to "host is down" #64

failed to unmount due to "host is down" #64

andyzhangx commented Jul 11, 2020

boddumanohar commented Sep 7, 2020

fejta-bot commented Dec 6, 2020

andyzhangx commented Feb 23, 2021 •

edited

Loading

andyzhangx commented Apr 20, 2021

andyzhangx commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

k8s-triage-robot commented Aug 19, 2021

k8s-triage-robot commented Sep 18, 2021

k8s-ci-robot commented Sep 18, 2021

failed to unmount due to "host is down" #64

failed to unmount due to "host is down" #64

Comments

andyzhangx commented Jul 11, 2020

boddumanohar commented Sep 7, 2020

fejta-bot commented Dec 6, 2020

andyzhangx commented Feb 23, 2021 • edited Loading

andyzhangx commented Apr 20, 2021

andyzhangx commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

k8s-triage-robot commented Aug 19, 2021

k8s-triage-robot commented Sep 18, 2021

k8s-ci-robot commented Sep 18, 2021

andyzhangx commented Feb 23, 2021 •

edited

Loading