-
Notifications
You must be signed in to change notification settings - Fork 45
[12/n] [sled-agent] don't start install dataset zones on zone manifest error #8237
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[12/n] [sled-agent] don't start install dataset zones on zone manifest error #8237
Conversation
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1 [skip ci]
We need to figure out what to do if the mupdate override is not in place (i.e. if it was removed by sled-agent). Presumably the install dataset will still be around in that case. |
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1 [skip ci]
I've split up the zone manifest into its own file in #8190. |
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1 [skip ci]
From madrid:
|
…8155) Part of RFD 556. In upcoming work, sled-agent will check these hashes at boot time, and mark an error if there's a mismatch. The stack was [tested on a racklette](#8237 (comment)).
Created using spr 1.3.6-beta.1 [skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1 [skip ci]
// | ||
// Some zones are distributed from the host OS image and are never | ||
// placed in the install dataset; the Ramdisk enum variant more | ||
// accurately reflects that we are only search `/opt/oxide` for those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// accurately reflects that we are only search `/opt/oxide` for those | |
// accurately reflects that we are only searching `/opt/oxide` for those |
// Any zones not part of the RAM disk are managed via the | ||
// zone manifest. | ||
// | ||
// XXX: we ask for the boot zpool to be passed in here. But |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to pass in the boot_zpool
here? Can we remove it and use the cached version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I was going to ask a question in the other direction - could we loop over all the current internal disks, and append paths for any disk that is (a) present and (b) has a successful zone manifest result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to pass in the boot_zpool here? Can we remove it and use the cached version?
We could, yeah.
Hah, I was going to ask a question in the other direction - could we loop over all the current internal disks, and append paths for any disk that is (a) present and (b) has a successful zone manifest result?
We could also do this but it's a bit more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to discuss this in a followup to avoid blocking the rest of the PR set from landing.
// Any zones not part of the RAM disk are managed via the | ||
// zone manifest. | ||
// | ||
// XXX: we ask for the boot zpool to be passed in here. But |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I was going to ask a question in the other direction - could we loop over all the current internal disks, and append paths for any disk that is (a) present and (b) has a successful zone manifest result?
// TODO: implement mupdate override here. This will return an | ||
// error if the override isn't found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this TODO
- if there's no mupdate override, doesn't that mean we want to run the artifacts by hash as requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I got this wrong. I meant to say that this would return an error if there is an issue retrieving the override -- but it's not worth saying this in this comment. Fixed.
Add these two new zpool kinds that have stricter semantics, and allow upcasts from them to `ZpoolKind`. Depends on #8237.
For mupdate overrides, in order to be safe, we must know that the data stored in the JSON is consistent with the images stored on disk. We currently apply this logic to install dataset hashes, and will apply it to artifact hashes in the future.
TODO before landing:
Questions:
Should we apply this logic to more critical zones like the switch zone? They're always going to be part of the ramdisk maybe?We unconditionally succeed for RAMdisk zones, and in particular the switch zone, now.Depends on: