Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

added writecache causes topolvm to be unable to find an LV it has created #1003

Closed
guppy0130 opened this issue Jan 4, 2025 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@guppy0130
Copy link
Contributor

Describe the bug

Created a raid1 LV via topolvm for a PVC. Later on, manually added a writecache to that LV outside of topolvm. After adding the writecache, topolvm can no longer find the LV. After removing the writecache, topolvm can find the LV again.

Environments

  • Version: ghcr.io/topolvm/topolvm-with-sidecar:0.36.0
  • OS: PRETTY_NAME="Ubuntu 24.04.1 LTS"

To Reproduce
Steps to reproduce the behavior:

  1. configure --type raid 1 LVs to be created (e.g., https://github.com/guppy0130/kubernetes/blob/99063376b1232b90164c31137f5db627838dd7be/topolvm/values.yaml#L138)
  2. create the PVC, which in turn creates a new LV
  3. systemctl stop k3s
  4. add the writecache to the LV outside of topolvm
  5. systemctl start k3s
  6. attempt to mount the now-writecache'd LV into a pod (topolvm says it can't find the LV at this point)
  7. stop the kubelet
  8. remove the writecache
  9. start the kubelet
  10. attempt to mount the LV into a pod (succeeds now?)

Expected behavior
Since the name of the LV didn't change, topolvm should be able to find the LV and mount it at step 6

Additional context

this is what I did for step 4:

root@k3s:~# vgextend hdd-vg /dev/nvme1n1p1
  Volume group "hdd-vg" successfully extended
root@k3s:~# lvcreate -n nvme-cache-for-hdd-vg -L 64G hdd-vg /dev/nvme1n1p1
  Logical volume "nvme-cache-for-hdd-vg" created.
root@k3s:~# lvchange -an /dev/hdd-vg/nvme-cache-for-hdd-vg
root@k3s:~# lvconvert --type writecache --cachevol nvme-cache-for-hdd-vg hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d
Erase all existing data on hdd-vg/nvme-cache-for-hdd-vg? [y/n]: y
  Logical volume hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d now has writecache.

this is after step 4:

root@k3s:~# lvs -a
  LV                                                     VG        Attr       LSize    Pool                         Origin                                        Data%  Meta%  Move Log Cpy%Sync Convert
  5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d                   hdd-vg    Cwi-aoC---   13.00t [nvme-cache-for-hdd-vg_cvol] [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig] 0.00
  [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig]          hdd-vg    rwi-aoC---   13.00t                                                                                                   100.00
  [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig_rimage_0] hdd-vg    iwi-aor---   13.00t
  [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig_rimage_1] hdd-vg    iwi-aor---   13.00t
  [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig_rmeta_0]  hdd-vg    ewi-aor---    4.00m
  [5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig_rmeta_1]  hdd-vg    ewi-aor---    4.00m
  [nvme-cache-for-hdd-vg_cvol]                           hdd-vg    Cwi-aoC---   64.00g

topoLVM should be able to see it though; assuming that topoLVM runs this to get all the LVs:

root@k3s:~# lvm fullreport --reportformat json --units b --nosuffix --configreport vg -o vg_name,vg_uuid,vg_size,vg_free --configreport lv -o lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv --configreport pv -o, --configreport pvseg -o, --configreport seg -o, | jq .

this JSON blob is inside that report (I've truncated it because the other LVs aren't relevant):

{
  "lv_uuid": "Hfpar5-AE3Q-NBDD-h7Jk-ITzm-S6hc-HR4xvc",
  "lv_name": "5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_full_name": "hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_path": "/dev/hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_size": "14293651161088",
  "lv_kernel_major": "252",
  "lv_kernel_minor": "16",
  "origin": "[5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d_wcorig]",
  "origin_size": "",
  "pool_lv": "[nvme-cache-for-hdd-vg_cvol]",
  "lv_tags": "",
  "lv_attr": "Cwi-aoC---",
  "vg_name": "hdd-vg",
  "data_percent": "0.00",
  "metadata_percent": ""
}

the log from my topolvm-node-* container:

{
  "level": "error",
  "ts": "2025-01-02T04:00:42Z",
  "msg": "error on grpc call",
  "method": "/csi.v1.Node/NodePublishVolume",
  "error": "rpc error: code = NotFound desc = failed to find LV: 5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "stacktrace": ""  // extracted into next code block
}

the stacktrace, extracted from the JSON blob and formatted a little better:

github.com/topolvm/topolvm/cmd/topolvm-node/app.ErrorLoggingInterceptor
  /workdir/cmd/topolvm-node/app/run.go:202
github.com/container-storage-interface/spec/lib/go/csi._Node_NodePublishVolume_Handler
  /go/pkg/mod/github.com/container-storage-interface/spec@v1.9.0/lib/go/csi/csi.pb.go:7261
google.golang.org/grpc.(*Server).processUnaryRPC
  /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1374
google.golang.org/grpc.(*Server).handleStream
  /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1751
google.golang.org/grpc.(*Server).serveStreams.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:986

and some stuff from journal (which says more of the same stuff, not finding the LV)

Jan 03 21:28:23 k3s k3s[452759]: E0103 21:28:23.704752  452759 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/topolvm.io^5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d podName: nodeName:}" failed. No retries permitted until 2025-01-03 21:28:27.70471428 -0800 PST m=+11.818499352 (durationBeforeRetry 4s). Error: MountVolume.SetUp failed for volume "pvc-1d9b46ff-ac68-4a0a-90bb-cf1844653762" (UniqueName: "kubernetes.io/csi/topolvm.io^5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d") pod "radarr-6b577679d6-pspcp" (UID: "b45cd810-d9c9-4e16-9168-9a533e1cfc0a") : rpc error: code = NotFound desc = failed to find LV: 5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d

after I un-writecache the LV, the output of the previous lvm fullreport (similarly truncated and formatted):

{
  "lv_uuid": "Hfpar5-AE3Q-NBDD-h7Jk-ITzm-S6hc-HR4xvc",
  "lv_name": "5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_full_name": "hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_path": "/dev/hdd-vg/5bd73b55-6ee4-4f0a-9af0-0d29c4cabc4d",
  "lv_size": "14293651161088",
  "lv_kernel_major": "252",
  "lv_kernel_minor": "16",
  "origin": "",
  "origin_size": "",
  "pool_lv": "",
  "lv_tags": "",
  "lv_attr": "rwi-aor---",
  "vg_name": "hdd-vg",
  "data_percent": "",
  "metadata_percent": ""
}

I suspect it might be due to the missing C in this rune:

type VolumeType rune
const (
VolumeTypeMirrored VolumeType = 'm'
VolumeTypeMirroredNoInitialSync VolumeType = 'M'
VolumeTypeOrigin VolumeType = 'o'
VolumeTypeOriginWithMergingSnapshot VolumeType = 'O'
VolumeTypeRAID VolumeType = 'r'
VolumeTypeRAIDNoInitialSync VolumeType = 'R'
VolumeTypeSnapshot VolumeType = 's'
VolumeTypeMergingSnapshot VolumeType = 'S'
VolumeTypePVMove VolumeType = 'p'
VolumeTypeVirtual VolumeType = 'v'
VolumeTypeMirrorOrRAIDImage VolumeType = 'i'
VolumeTypeMirrorOrRAIDImageOutOfSync VolumeType = 'I'
VolumeTypeMirrorLogDevice VolumeType = 'l'
VolumeTypeUnderConversion VolumeType = 'c'
VolumeTypeThinVolume VolumeType = 'V'
VolumeTypeThinPool VolumeType = 't'
VolumeTypeThinPoolData VolumeType = 'T'
VolumeTypeThinPoolMetadata VolumeType = 'e'
VolumeTypeDefault VolumeType = '-'
)

related docs: https://github.com/lvmteam/lvm2/blob/main/man/lvs.8_end#L6
related source: https://github.com/lvmteam/lvm2/blob/5ef958704c82c45a6bd8215d920e4366c0c5e1bd/lib/metadata/lv.c#L1333-L1334

happy to file a PR to add to the rune (?) + related tests, just let me know if that's desirable

@guppy0130 guppy0130 added the bug Something isn't working label Jan 4, 2025
@guppy0130 guppy0130 changed the title added writecache causes topolvm to be unable to find an LV it's created added writecache causes topolvm to be unable to find an LV it has created Jan 4, 2025
@pluser pluser moved this from To do to In progress in Development Jan 6, 2025
@toshipp
Copy link
Contributor

toshipp commented Jan 6, 2025

Thank you for your report. I'm looking into it.

@toshipp
Copy link
Contributor

toshipp commented Jan 9, 2025

I can reproduce this by this commit 144c37e.
The root cause is that the hdd device class is defined as (implicitly) thick, but topolvm fails to recognize the volume created from it as a thin volume.
We assumed a non-empty pool for a volume meant it was a thin volume, so we let LogicalVolume.IsThin() method return true if the pool was set. This failed assumption causes vgService.GetLVList to return an empty volume list, and finally, topolvm raises a NotFound error.

if dc.Type == lvmdTypes.TypeThick && lv.IsThin() {
// do not send thin lvs if request is on TypeThick
continue
}

My quick fix is to use the attr field instead of the pool field. 7b15db5

If you try to fix this, feel free to send a PR. To test it, add a test file in internal/lvmd/command, then create volumes and check IsThin method for them.

guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType const
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType const
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType const
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
* add tests for cached LVs
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType and OpenTarget consts
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
* add tests for cached LVs
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType and OpenTarget consts
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
* add tests for cached LVs
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
* add `C` to VolumeType and OpenTarget consts
* `lv.IsThin` should check LV attrs for thin (e.g., cached LVs have a
  pool but are not necessarily thin)
* add tests for cached LVs

Signed-off-by: guppy0130 <guppy0130@yahoo.com>
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
Co-authored-by: Yuji Ito <llamerada-jp@users.noreply.github.com>
guppy0130 added a commit to guppy0130/topolvm that referenced this issue Jan 10, 2025
Co-authored-by: Yuji Ito <llamerada-jp@users.noreply.github.com>
llamerada-jp added a commit that referenced this issue Jan 14, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in Development Jan 14, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants