Point-in-time history #318

paulgb · 2024-10-12T14:06:27Z

It would be good to be able to see past snapshots of the data on a point-in-time basis. Object storage is cheap, we should take advantage of it.

I see three approaches to investigate:

Disable GC and store a history of state vectors with version metadata, then reconstruct from the state vector. ref
Store occasional snapshots of the data as whole separate .ysweet files
Implement history at the key/value store level, by creating a Store implementation that is temporal.

-1. is the best way if we always want to store the full document history, but it gets very complex (per discussion with @dmonad) to selectively prune some but not all past versions.

-2. Is a clean way to implement it without changing how we store the data, but does require storing a lot of potentially-redundant data between snapshots.

-3. Is kind of a sweet-spot, it would be almost as efficient as 1 for storing an entire point-in-time history, but would be much simpler to prune past versions because it is a lot easier to reason about the semantics of a point-in-time key/value map than about a richer CRDT data structure. It would effectively be the same as the second approach as far as Yrs is concerned (it would not even use Yrs' snapshot functionality), except that instead of storing each snapshot as individual files, we would be colocating them in a temporal key-value store to take advantage of the large overlap in data between snapshots.

dmonad · 2024-10-12T14:39:14Z

You could also consider implementing the necessary APIs for customers to implement their own versioning approach. 1) and 2) are both valid approaches. However, I would strongly suggest going for 1), for most use-cases.

Some might want to store versions in a Yjs document. Some might want to store them in S3, some might want to store versions in their own database, tagged with additional information.

If I implemented a versions feature, I would probably store them in a Yjs document. But I wouldn't store the encoded document, I would store URIs (pointing to a file hosted in S3, or similar).

For that I would only need two API endpoints createVersion(docid, versioname): Promise<URI>, which would call the backend, create a version with the current Yjs doc, and return a URI so that I can request the version using getVersion(uri).

The versionName is just a suggestion which would be helpful to implement the following endpoints:

listVersions(docid): Promise<Array<{ uri: string, versionName: string, creationTime: Date, creator: string }>>
deleteVersion(docid, uri)

It might make sense to support a meta field when creating a version. Users could use it to store arbitrary information that might be relevant for their application (version description, tagged, custom resource identifier, kind of version (autosave, regular snapshot, published version), ..).

paulgb · 2024-10-12T14:50:49Z

Thanks for the input, this inclines me more towards the first option then. I like the idea of a createVersion endpoint for creating (and returning) a version from the current document state, and I could also imagine y-sweet itself creating time-based version snapshots.

The update endpoint (which takes a Yjs endpoint over POST request) could also optionally create and return a version.

paulgb mentioned this issue Oct 12, 2024

Project Roadmap #311

Open

paulgb added the roadmap label Oct 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Point-in-time history #318

Point-in-time history #318

paulgb commented Oct 12, 2024 •

edited

Loading

dmonad commented Oct 12, 2024

paulgb commented Oct 12, 2024

Point-in-time history #318

Point-in-time history #318

Comments

paulgb commented Oct 12, 2024 • edited Loading

dmonad commented Oct 12, 2024

paulgb commented Oct 12, 2024

paulgb commented Oct 12, 2024 •

edited

Loading