-
Notifications
You must be signed in to change notification settings - Fork 95
Corrupted metadata file should be handled gracefully and not terminate operations #662
Comments
Taking a step back, the current behavior isn't really bad. First of all how did the meta-data get corrupted? Unless there is a code path or some sort of intentional tampering of the KV store there is no way to corrupt the KV store file. Prefer to identify the code path or workflow that can corrupt the KV and fix that. And if the KV store is corrupt then pretty much that volume isn't usable or shouldn't be usable thereafter. Or we can move the volume KV to the SQL DB. |
The volume is perfectly usable wiithout the metadata in the majority of cases. And if it not needed, it least it should be deleteable. Then KV file was not corrupted, it was just saying detached and nothing else, so moving to SQLite won't help, I suspect it's a specific path in code rather that external corruption. This is why I did not want KV on decision path. I agree its much better to underatand how the KV got that ways, but meanwhile, the volume should be deleteable at the very least, and I think we should optimize for more reliable volume working even in some bugs case, rather than less reliable and becoming unusable and undeleteable on a slightest bug in KV. On moving KV to SQLite. Itnqouls be good since it drops sidecar (and thus esx 6.0) dependency , but now the KV will be totally independent from vmdk and we'd need to understand the b and svmotion use cases. Maybe worth investigation as a separate issue. |
The only time that a code path preemptively resets the meta-data for a volume is when the disk isn't found in the VM configuration on a detach request or the VM itself isn't found. Which, while its an error path, the volume meta-data is reset to "detached state" in these scenarios. Looks like that has happened in this test for the KV to report the volume as detached. Although why the disk that was meant to be attached to the VM got removed from the VM config is unknown. But yes, the KV state need not restrict what a Get() call returns as meta-data for a volume. |
@msterin can I get the KV file contents. Looks like this is a storage issue, if even "created-by" isn;t there in the KV file then the FS is loosing data written to the file and unreliable. |
Great, so looks like we have an agreement that KV issues should not impact get() so at least 'rm' and 'inspect' and others would work. Do you want to get it fixed or do you want to use it as a training exercise for others ? |
@msterin - frankly this is a very very contrived way to get changes submitted into the code. Training or otherwise is left to the individual. Beats all reasoning how one KV pair alone got left inside the KV file while all other pairs got removed. I'd love to use this issue though to figure out how these stupid things happen!! Not surprising :-)) |
Modified the code to implement volume status the way Docker defines it. Status on a Get() call is optional and in the event that getting status fails from the server its perfectly OK to return a NULL map to Docker (on Get()). Once testing is complete I'll post the changes for review. |
Closed via #699 |
Today if metadata file is not complete (e.g. missing created-by, which may happen in diffent error paths) , getVMDK returns error. Since docker runs "get()" command before anything else, that blocks operations like
remove
or inspect. This was found when fixing #661I think it would be much better if
vol_info
simply returned "N/A" or "Unknown" or "Failed to get" for fields it cannot fill for any reason. This would still check for volume existence first in getVMDK , and all admin commands (inspect/ls) would keep working.Example of blocked operations:
Assigning to @govint to take a look first, and decide the best way to handle it.
The text was updated successfully, but these errors were encountered: