Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ebsnvme-id creates broken sd* symlinks #37

Open
martinpitt opened this issue May 29, 2024 · 7 comments
Open

ebsnvme-id creates broken sd* symlinks #37

martinpitt opened this issue May 29, 2024 · 7 comments

Comments

@martinpitt
Copy link

martinpitt commented May 29, 2024

We spent quite some time debugging a storage test regression in Fedora rawhide which essentially breaks scsi_debug and other devices, but only on RedHat's/Fedora's Testing Farm infrastructure -- which is essentially AWS EC2 machines with an API.

Latest Fedora rawhide instances now have amazon-ec2-utils-2.2.0-2.fc41.noarch (which got introduced into Fedora very recently), which ships /usr/lib/udev/rules.d/70-ec2-nvme-devices.rules with

KERNEL=="nvme[0-9]*n[0-9]*",        ENV{DEVTYPE}=="disk",      ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c"
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c%n"

These instances have an NVME block device, and these rules cause the following symlinks to be created:

lrwxrwxrwx. 1 root root 7 May 29 03:52 /dev/sda1 -> nvme0n1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda11 -> nvme0n1p1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda12 -> nvme0n1p2
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda13 -> nvme0n1p3
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda14 -> nvme0n1p4

This is problematic in multiple ways:

  • Pretending that these are SCSI drives tramples on the kernel's namespace. udev symlinks should never create names which the kernel uses.
  • "nvme0n1" is the raw block device, not a partition. So it's very confusing to name it "sda1", it should be "sda". Likewise, the first partition should be "sda1", not "sda11".

If then a real sda comes along (e.g. with modprobe scsi_debug), this will create an actual /dev/sda, but then it's impossible to create/see partitions on that, as the sda1 etc. names are already taken.

This is most easily reproduced with

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1
sda1

Curiously, it also does that for a partition:

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1p2
sda1

that explains how the second udev rule can even work -- but this is really hackish!

My recommendation as former udev co-upstream is to just entirely remove these rules. They are not helpful, confusing, and break stuff. You can of course create symlinks in subdirs of /dev all you like, but please don't collide with kernel names.

Thanks!

@martinpitt
Copy link
Author

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt
Copy link
Author

Note: This only affects Fedora rawhide because Testing Farm Fedora 40 instances don't install amazon-ec2-utils by default. When I install it manually, the issue happens there as well.

martinpitt added a commit to martinpitt/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes cockpit-project#20520
@mvollmer
Copy link

@martinpitt, thanks for filing this! I have a hard time understanding what problem these symlinks are trying to solve. They only seem to create chaos.

If they are supposed to help with giving stable names to NVMe drives, I think that problem is already solved by ID_SERIAL, ID_WWN, and filesystem UUIDs.

@martinpitt
Copy link
Author

https://gitlab.com/testing-farm/infrastructure doesn't actually install that package -- I figure it's now part of the official Fedora rawhide AMIs?

martinpitt added a commit to cockpit-project/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
@major
Copy link

major commented May 29, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt That would be helpful. Thanks for detailing out the problems you found. I missed these during testing!

@martinpitt
Copy link
Author

@major OK, I filed https://bugzilla.redhat.com/show_bug.cgi?id=2284397 . Thanks!

@tbzatek
Copy link

tbzatek commented Sep 23, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

This has cost mine and @vojtechtrefny's an hour or two of our lives as well: https://bugzilla.redhat.com/show_bug.cgi?id=2313526

cowboyox pushed a commit to cowboyox/cockpit that referenced this issue Oct 8, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants