Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Point-in-time history #318

Open
paulgb opened this issue Oct 12, 2024 · 2 comments
Open

Point-in-time history #318

paulgb opened this issue Oct 12, 2024 · 2 comments
Labels

Comments

@paulgb
Copy link
Member

paulgb commented Oct 12, 2024

It would be good to be able to see past snapshots of the data on a point-in-time basis. Object storage is cheap, we should take advantage of it.

I see three approaches to investigate:

  1. Disable GC and store a history of state vectors with version metadata, then reconstruct from the state vector. ref
  2. Store occasional snapshots of the data as whole separate .ysweet files
  3. Implement history at the key/value store level, by creating a Store implementation that is temporal.

-1. is the best way if we always want to store the full document history, but it gets very complex (per discussion with @dmonad) to selectively prune some but not all past versions.

-2. Is a clean way to implement it without changing how we store the data, but does require storing a lot of potentially-redundant data between snapshots.

-3. Is kind of a sweet-spot, it would be almost as efficient as 1 for storing an entire point-in-time history, but would be much simpler to prune past versions because it is a lot easier to reason about the semantics of a point-in-time key/value map than about a richer CRDT data structure. It would effectively be the same as the second approach as far as Yrs is concerned (it would not even use Yrs' snapshot functionality), except that instead of storing each snapshot as individual files, we would be colocating them in a temporal key-value store to take advantage of the large overlap in data between snapshots.

@dmonad
Copy link

dmonad commented Oct 12, 2024

You could also consider implementing the necessary APIs for customers to implement their own versioning approach. 1) and 2) are both valid approaches. However, I would strongly suggest going for 1), for most use-cases.

Some might want to store versions in a Yjs document. Some might want to store them in S3, some might want to store versions in their own database, tagged with additional information.

If I implemented a versions feature, I would probably store them in a Yjs document. But I wouldn't store the encoded document, I would store URIs (pointing to a file hosted in S3, or similar).

For that I would only need two API endpoints createVersion(docid, versioname): Promise<URI>, which would call the backend, create a version with the current Yjs doc, and return a URI so that I can request the version using getVersion(uri).

The versionName is just a suggestion which would be helpful to implement the following endpoints:

listVersions(docid): Promise<Array<{ uri: string, versionName: string, creationTime: Date, creator: string }>>
deleteVersion(docid, uri)

It might make sense to support a meta field when creating a version. Users could use it to store arbitrary information that might be relevant for their application (version description, tagged, custom resource identifier, kind of version (autosave, regular snapshot, published version), ..).

@paulgb
Copy link
Member Author

paulgb commented Oct 12, 2024

Thanks for the input, this inclines me more towards the first option then. I like the idea of a createVersion endpoint for creating (and returning) a version from the current document state, and I could also imagine y-sweet itself creating time-based version snapshots.

The update endpoint (which takes a Yjs endpoint over POST request) could also optionally create and return a version.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants