Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a version indicator to the various DMDs we create. #420

Closed
tomprince opened this issue Jun 14, 2021 · 28 comments · Fixed by #563
Closed

Add a version indicator to the various DMDs we create. #420

tomprince opened this issue Jun 14, 2021 · 28 comments · Fixed by #563

Comments

@tomprince
Copy link
Contributor

I can imagine that at some point in the future, we'll want to change the structure of DMDs, or add extra non-snapshot data to them. We should add version metadata to them (perhaps a @version file, which won't conflict with any of the snapshot entries, see #395).

We can then check that version before doing anything with a remote DMD.

It probably also makes sense to include the version in the snapshot metadata, for similar reasons.

@meejah
Copy link
Collaborator

meejah commented Jun 14, 2021

We already have a version in the snapshot metadata.

@meejah
Copy link
Collaborator

meejah commented Jun 18, 2021

I don't think we need an immediate version now. If we do need a second version in the future, we can add one using an "impossible" / invalid-snapshot-name like you suggest and that can be "version 2" essentially.

..or, we could just add additional information that way and "never" need a version for the directories themselves. Or we could use the Tahoe metadata for this.

So closing this in favour of doing #395 and nothing else.

@meejah meejah closed this as completed Jun 18, 2021
@tomprince tomprince reopened this Jun 18, 2021
@tomprince
Copy link
Contributor Author

We'll need to ensure that we notice a file with an "invalid" name, before doing any processing on the folder, so that we notice that we have an incompatible folder before doing anything with it.

@tomprince
Copy link
Contributor Author

To expand on that, if we depend on just #395, we need to somehow ensure that the entry we use gets processed before anything else. Looking at the current code, the downloader appears to use a dictionary for files, which at least on py2.7 will be in a random order, and I don't think the uploader ever looks at the DMD, so we have no way to reliably notice a newer version of the DMD in either of the processes.

@meejah
Copy link
Collaborator

meejah commented Jun 18, 2021

We'll need to ensure that we notice a file with an "invalid" name,

That is addressed by #395

To expand on that, if we depend on just #395, we need to somehow ensure that the entry we use gets processed before anything else.

Yes, if we introduce a second version or anything else, we'd need to add code (for that plus probably other things). We don't need that code now (nor do we need a version marker now).

@meejah meejah closed this as completed Jun 18, 2021
@tomprince
Copy link
Contributor Author

That is addressed by #395

We might only notice it after we made changes to either the DMD or the file system, which might cause issues.

@tomprince tomprince reopened this Jun 18, 2021
@meejah
Copy link
Collaborator

meejah commented Jun 19, 2021

Are you saying it's impossible to add @version or any other "special" thing later?

@tomprince
Copy link
Contributor Author

Are you saying it's impossible to add @version or any other "special" thing later?

I'm saying it is impossible[1] to add it in such a way that the current version will notice that it is dealing with a later version early enough to avoid doing something with the DMD that is inappropriate because we've changed the format in some incompatible way.

Now, I don't what incompatible changes we might make, or what could go wrong if an older client were to try to read or write such a version, but the cost of adding a version field now is low enough, that we should do it now, before we have an installed base to worry about.

One thing to consider, is that if we need to depend on the the behaviour of old clients not breaking a newer incompatible format because they error out early, that it may get harder to do, the longer it is before we introduce an incompatible change, since there will be more versions we'll need to ensure trigger an error early enough.

[1] Maybe not entirely impossible, but very much constrained by incidental implementation details. Have a look at the discussion here, on the hoops that were needed to be dealt with to be able to extend a protocol that didn't originally have it.

@tomprince
Copy link
Contributor Author

Thinking about this a bit more, in terms of implementation, I think it may make sense to also make sense to carve out some space that the initial version will ignore, where we can add options or metadata for newer versions, that are not incompatible. (That is, options or metadata that can safely be ignored by a client that doesn't understand them). That would give us some space for compatible extensibility as well.

@meejah
Copy link
Collaborator

meejah commented Jun 19, 2021

All we have to do now is ignore "impossible" names like @version or anything else. The if we do add, say, @version or whatever then existing clients will ignore it and new clients can do whatever fun stuff they want.

@tomprince
Copy link
Contributor Author

All we have to do now is ignore "impossible" names like @Version or anything else. The if we do add, say, @Version or whatever then existing clients will ignore it and new clients can do whatever fun stuff they want.

We'd need to error out on finding a name like @version before we did anything, not ignore them. Otherwise we could not make incompatible changes.

@tomprince
Copy link
Contributor Author

I'm specifically considering the possibility of us wanting to make a change in the future, where the current uploader or downloader code would do something bad with a new version of the DMD, even after ignoring impossible names. I don't have a specific example, but I suspect we will encounter one, if the project is successful.

@meejah
Copy link
Collaborator

meejah commented Jun 19, 2021

We'd need to error out on finding a name like @Version before we did anything, not ignore them.

Why? What's wrong with saying we ignore anything that starts with @ that is an invalid Snapshot name?

@tomprince
Copy link
Contributor Author

We'd need to error out on finding a name like @Version before we did anything, not ignore them.

Why? What's wrong with saying we ignore anything that starts with @ that is an invalid Snapshot name?

For a lot of changes, that may be enough.

I am presuming that at some point, we will want to make a change where just ignoring those files will cause a severe issue, such as data loss or something. I don't have a specific example, but experience (both mine and others) suggests that not having a such mechanism is something that we will eventually regret.

In some ways, ignoring things would be worse than erroring with something unexpected. At least you would get an error after some amount of time, rather than silently corrupting data or doing whatever bad thing that happens.

@exarkun
Copy link
Member

exarkun commented Jun 21, 2021

I think that having an explicit schema for persistent data is extremely important. It is certainly possible to play tricks where the schema is only partially defined (like "@Version is an illegal name so we can always encroach on that part of the namespace in the future") but these are much more difficult to reason about in general and it only gets harder to do so as the software continues to change.

Recording a version number in the data and defining the schema elsewhere (eg, in the implementation) is fine. Sometimes you get the first one for free by saying "if the [schema] version isn't explicitly defined, it's 0". Ideally the space in the persistent data for the version would still be protected by the implementation. I think @version is protected by virtue of the "magic path" name mangling (so, not specifically, but by falling into the category of strings that the encoder can never emit).

So the minimum necessary to make magic-folder (let's call it) 2 safe with respect to magic-folder >2 is to check for @version and decline to operate if it is found. It's not actually necessary to create it yet, just to stake the claim.

One thing we do want to consider right now is whether some particular versioning scheme lets us atomically retrieve DMD contents and know their version. A scheme that involves two independent reads probably does not allow this. If @version points at a short piece of data (like "7") then it ends up as a litcap and the value probably comes along with the read of the DMD - but determining whether this is the case or not involves violating a lot of layers of abstraction. Another option to encode the information might be to have @version be empty but to have it associated with metadata in the DMD containing the actual version. The metadata always comes with the DMD and without any layering violations.

I think this still means we can treat "lack of @Version" as meaning "[schema] version 0" and only do the implementation work to have magic-folder create a [schema] version 1 "@Version" when we eventually change the schema.

@tomprince
Copy link
Contributor Author

So the minimum necessary to make magic-folder (let's call it) 2 safe with respect to magic-folder >2 is to check for @Version and decline to operate if it is found. It's not actually necessary to create it yet, just to stake the claim.

[...]

I think this still means we can treat "lack of @Version" as meaning "[schema] version 0" and only do the implementation work to have magic-folder create a [schema] version 1 "@Version" when we eventually change the schema.

Yes, checking and erroring (before reading or writing the rest of the DMD) on a @version file is sufficient.

@tomprince
Copy link
Contributor Author

There is the question of if we want to notice old tahoe-based magic-folders, and error on them, since they aren't compatible. Adding a file now would allow us to do that. But that isn't a hard requirement for future compatibility.

@meejah
Copy link
Collaborator

meejah commented Jun 21, 2021

There is the question of if we want to notice old tahoe-based magic-folders, [..]

Under what scenario would we encounter something like that?

@tomprince
Copy link
Contributor Author

There is the question of if we want to notice old tahoe-based magic-folders, [..]

Under what scenario would we encounter something like that?

Somebody tries to migrate from tahoe-based magic-folder by adding the DMD as a participant or something?
That is definitely user error, but it would be nice to give a reasonable error, if we encounter that case. Handling the specific case is definitely a nice-to-have only.

@meejah
Copy link
Collaborator

meejah commented Jun 21, 2021

Somebody tries to migrate from tahoe-based magic-folder by adding the DMD as a participant or something?

If we aren't already, we should be explicit that if you screw up while using the low-level API -- which is what "add participant" is -- then you own all the pieces.

We don't need to guard against people doing weird stuff like that since it's pretty far outside what any user will be doing (actual users will add participants via "invites").

I think what @exarkun is suggesting (and I agree) is that we can say "a mutable directory with no @version is the first version" and thus don't have to spec out and implement any @version thing as of yet. If it happens that we want different versions in the future, then "version 2" is the first thing that has @version files in it.

If even that is insufficient in some weird pathological case nobody has even imagined as of yet then we can always fall back on the Snapshots versions to bootstrap something else.

@tomprince
Copy link
Contributor Author

Somebody tries to migrate from tahoe-based magic-folder by adding the DMD as a participant or something?

If we aren't already, we should be explicit that if you screw up while using the low-level API -- which is what "add participant" is -- then you own all the pieces. We don't need to guard against people doing weird stuff like that since it's pretty far outside what any user will be doing (actual users will add participants via "invites").

I agree that it isn't necessary, but it does seem like a nice quality-of-life feature, that we may not get to in time. I will note that, right now, the low-level API is what is documented in #381 as the way to recover, so, if that were to land, it doesn't seem too far outside what a user will be doing. Slightly more plausible, but with the same behaviour, is if gridsync doesn't differentiate between backups from old-magic-folders and new-magic-folders, it could mistakenly pass the DMD to the former to us.

Again, let me re-iterate, I'm not suggesting that handling this particularly case is necessary, but it does seem like a plausible thing that could happen, that it would be nice to guard against.

@meejah
Copy link
Collaborator

meejah commented Jun 21, 2021

the low-level API is what is documented in #381 as the way to recover

No, the user will be using GridSync to do the recovery flow. The document you're referencing is explicitly for "other developers integrating with magic-folder".

Anyway, we're in agreement that there's nothing to be spec'd or implemented here?

@tomprince
Copy link
Contributor Author

Anyway, we're in agreement that there's nothing to be spec'd or implemented here?

No. We need to implement checking for a @version file in DMD's, and erroring out if we find one.

@meejah
Copy link
Collaborator

meejah commented Jun 22, 2021

No. We need to implement checking for a @Version file in DMD's, and erroring out if we find one.

I thought we agreed that we want to error out on any invalid name (including @version of course), as per #395 ... so that means: implement #395, do nothing here.

@meejah meejah closed this as completed Jun 22, 2021
@tomprince
Copy link
Contributor Author

No. We need to implement checking for a @Version file in DMD's, and erroring out if we find one.

I thought we agreed that we want to error out on any invalid name (including @version of course), as per #395 ... so that means: implement #395, do nothing here.

I don't think that is sufficient, at least if we implement #395 by making the conversion functions raise an error. We would need to proactively check that there are no such files before reading or writing to a DMD.

In any case, I think having an explicit check, as @exarkun explained, is easier to reason about, so we should implement this directly.

@tomprince tomprince reopened this Jun 22, 2021
@meejah
Copy link
Collaborator

meejah commented Jul 5, 2021

I take @exarkun's comment to mean "nothing to do here now, because we can add a @version or anything else later". Can you please confirm, @exarkun?

This future-proofs current clients because they will error upon seeing the invalid name (if, say, a new magic-folder client created a new DMD with @version in it and somehow an old version got access to that DMD).

I can't even imagine how that might happen, though, because we have no user-visible way to add a new participant (and when we do, it'll be via the magic-wormhole invite process so we could always just stop there instead of messing about with DMD contents).

@tomprince
Copy link
Contributor Author

This future-proofs current clients because they will error upon seeing the invalid name (if, say, a new magic-folder client created a new DMD with @Version in it and somehow an old version got access to that DMD).

They don't error soon enough, with just #395 implemented, as demonstrated by the integration test in https://github.com/tp-la/magic-folder/commit/038fbdc4b56962bf33caf6d0e358edf59858d96a. We don't notice that there is a @version file until after we've already written 00sylvester to disk.

@tomprince
Copy link
Contributor Author

I can't even imagine how that might happen, though, because we have no user-visible way to add a new participant (and when we do, it'll be via the magic-wormhole invite process so we could always just stop there instead of messing about with DMD contents).

It could happen if somebody tries to recover a folder from a newer magic-folder with an old magic-folder. (That is, via the corresponding gridsync in both cases)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants