Lexicon Versioning #30

verdverm · 2025-01-27T08:25:04Z

verdverm
Jan 27, 2025

Lexicon Versioning

Versioning is a useful mechanic for applications,
both for themselves, their dependencies, and of the payloads they process.
In this regard, Lexicon are the schemas in ATProto
and applications on the network could benefit from their versioning.

Note, examples are written in CUE for brevity.

Also, I don't know why the rendering of the markdown is using the newlines between lines which are not separated by multiple newlines. It has better formatting here: https://github.com/blebbit/lexicon/blob/main/notes/versioning.md This will also be the most up-to-date place as I work on this. (related https://github.com/orgs/lexicon-community/discussions/32)

ATProto Today

The ATproto spec has the following to offer us

Lexicon Files

from (https://atproto.com/specs/lexicon#lexicon-files):

lexicon (integer, required): indicates Lexicon language version. In this version, a fixed value of 1
id (string, required): the NSID of the Lexicon
revision (integer, optional): indicates the version of this Lexicon, if changes have occurred

The semantics of the revision field have not been worked out yet, but are intended to help third parties identity the most recent among multiple versions or copies of a Lexicon.

{
  lexicon:   1
  revision?: int
  id:!       string

  ...
}

Note, in practice, the revision field is not used. I'm not sure why.

Lexicon Evolution

from: (https://atproto.com/specs/lexicon#lexicon-evolution)

Lexicons are allowed to change over time, within some bounds to ensure both forwards and backwards compatibility. The basic principle is that all old data must still be valid under the updated Lexicon, and new data must be valid under the old Lexicon.

Any new fields must be optional
Non-optional fields can not be removed. A best practice is to retain all fields in the Lexicon and mark them as deprecated if they are no longer used.
Types can not change
Fields can not be renamed

If larger breaking changes are necessary, a new Lexicon name must be used.

It can be ambiguous when a Lexicon has been published and becomes "set in stone". At a minimum, public adoption and implementation by a third party, even without explicit permission, indicates that the Lexicon has been released and should not break compatibility. A best practice is to clearly indicate in the Lexicon type name any experimental or development status. Eg, com.corp.experimental.newRecord.

Version Identifiers

There are various versioning schemes, some examples are
(in increasing flexibility order)

ATProto revision, a monotonic int
Name with version suffix, used in Bluesky lexicon today
Kubernetes apiVersion, a vX with an optional {alpha,beta}Y
Semver, a widely used format with notions of sizing and compatibility

We can use or represent versioned lexicon in several ways today.

Monotonic Int (using the ATProto Lexicon.revision)

version 1:

{
  lexicon: 1
  revision: 1
  id: "app.blebbit.example"
  defs: {
    foo: { type: "string" }
  }
}

version 2:

{
  lexicon: 1
  revision: 2
  id: "app.blebbit.example"
  defs: {
    foo: { type: "string" }
    bar: { type: "boolean" }
  }
}

It is unclear to me how one refers to a specific revision of a lexicon today

Name with Version Suffix

Bluesky has the following pattern in their own Lexicon.

(atproto/lexicons/app/bsky/actor/def.json)

(ref: "app.bsky.actor.defs#savedFeedsPrefV2")

"savedFeedsPrefV2": {
  "type": "object",
  "required": ["items"],
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "ref",
        "ref": "app.bsky.actor.defs#savedFeed"
      }
    }
  }
},
"savedFeedsPref": {
  "type": "object",
  "required": ["pinned", "saved"],
  "properties": {
    "pinned": {
      "type": "array",
      "items": {
        "type": "string",
        "format": "at-uri"
      }
    },
    "saved": {
      "type": "array",
      "items": {
        "type": "string",
        "format": "at-uri"
      }
    },
    "timelineIndex": {
      "type": "integer"
    }
  }
},

Kubernetes Style

Kubernetes uses v1 and v2alpha2 version segments for their apiVersion field.
This can be seen as an extension to what Bluesky has done themselves,
by adding a maturity component to the end of the major version.
They can already be used in the scheme they are using above.
Kubernetes also prefixes versions in apiVersion with an NSID,
but I'm going to set that aside for this document because
we have similar information in the lexicon id.

We could also set the version as the defs field names themselves
if we want to use independent Lexicon instead of the defs pattern.

{
  lexicon: 1
  id: "app.blebbit.example"
  defs: {
    v1: {
      ...
      foo: { type: "string" }
    }
    v2alpha1: {
      ...
      foo: { type: "string" }
      bar: { type: "boolean" }
    }
  }
}

We can then refer to a specific version using fragments,
where we gain an amount of separation between name and version.

{
  type: "ref"
  ref: "app.example#v2alpha1"
}

Using main could be the equivalent of "latest" (which isn't a version).

{
  type: "ref"
  ref: "app.example"
}

Semver Style

This would work like the previous examples,
but with semver def names or suffixes,
assuming the charset needed is valid in the ATProto spec.

{
  lexicon: 1
  id: "app.blebbit.example"
  defs: {
    v1.2.4: { ...  }
    // or
    profileV1.2.4: { ... }
  }
}

Discussion

Today, with no one using revisions.
We are essentially always using the "latest" version of a Lexicon.
If we publish a new version, consuming applications will start using it,
and can break from externally changing factors beyond their control.
We could declare this is the expected behavior and contract, but I think we can do better.
Application developers would benefit from having some amount of control
over the versions they use for dependencies beyond their control.
Even Bluesky has found versioning useful for their own Lexicon,
as evident with app.bsky.actor.defs#savedFeedsPrefV2.

The ATProto spec says we should not ship backwards incompatible changes,
but in practice this is unrealistic.
Indeed, Bluesky has shipped "breaking changes" themselves,
between #savedFeedsPref and #savedFeedsPrefV2.
Doing this is valid and allowed within the Lexicon spec
because you are only "adding new fields".
Is the Bluesky application filling in both fields when a user updates
their preferences today? Are older app views that only understand v1 seeing those updates?

Monotonic int gives us the most basic versioning on the full lexicon,
while using fieldVX give us this versioning within a lexicon, but still on
full defs as is done in the app.bsky.actor.defs#savedFeedsPrefV2.
When using the field level versioning of defs, omitting the lexicon revision
is probably the correct thing to do so you are always getting the most up to date
list of available versions. We are essentially publishing every version forever.
Both options lack the ability to express maturity like alpha|beta or major.minor.patch.

Kubernetes style is an extension of the fieldVX and would give us maturity markers.
Semver is common and widely adopted, offering the greatest flexibility,
with both maturity and breaking change semantics. (major.minor.patch-<extra>)

Where do we set the version?

We should also consider where the version is specified.
Ideally the version is separate from the record details,
as is with the revision field on Lexicon.
The methods we see being used merge the name and version into a single string.
This, in example, complicates both the construction and decomposition of a ref
if you want to present a different view of a record depending on its version.
Without a clear delineation marker, this makes the decomposition even more difficult.

In order to have richer versioning as a stand alone field
would require changing the spec, something I would support.
At this point, I prefer the vXbetaY (Kubernetes style).

Another consideration for version location is the depth or scope of versioning.
Are we versioning the full lexicon or definitions within them?
Should the practice of versioning Lexicon like Bluesky has be recommended against?
(with app.bsky.actor.defs#savedFeedsPrefV2 and "v1" intermixed with other defs)
Is the better practice to make them separate lexicon? (using the revision field,
which would be equivalent, at least in terms of information)

Other

@sdboyer also has some interesting ideas and insights around many interacting components
with lots of versioning of the objects and nested references.
Schemas should be able to evolve and we should also be able to express
how we move between versions directly in the schema system.
This is some pretty advance stuff and is a good vision to keep in mind.
Even without all of this, there are complexities in a system with lots
of records, each having their own version, and referring to each other at various version.
Sam can surely articulate these better than I can.

https://github.com/grafana/thema is the CUE project that implements these ideas.

verdverm · 2025-01-28T06:29:55Z

verdverm
Jan 28, 2025
Author

another place version could appear is in the lexicon NSID, will add this on next update

already added in working copy here: https://github.com/blebbit/lexicon/blob/main/notes/versioning.md

0 replies

ebwinters · 2025-01-28T14:32:01Z

ebwinters
Jan 28, 2025
Collaborator

I am personally in favor of using the revision field since it is in the lexicon spec and the least disruptive to NSID naming. I much prefer looking at x.y.bookmark vs x.y.bookmarkV2 because I have no idea what V2 means. I know what a bookmark is.

if you have an application listening to the firehose for specific events, you could add a filter for revision, and peg your code to some client package that validates records based on the schema for the revision you specify. I think adding a revision to edited lexicons is something we can/should enforce in lexicon-community

1 reply

verdverm Jan 28, 2025
Author

I agree that having the version in the NSID is not a great idea. I mainly point it out as something that could be done, so we are covering all the possibilities. We should add some words to an eventual proposal around why we think it is a bad idea, for example, it is probably bad practice to only listen to specific versions of a lexicon. The version should be used to control handling, render, response for example.

Where I think using the spec defined revision falls short are

cannot express maturity of a specific version, betas are helpful in practice to give consumers time to prepare ahead of stable versions
cannot use revision in ref, so how does one lexicon indicate it is using a specific version of another lexicon?

I'm leaning mostly towards the defs / fragment option so these bullet points are something that can be done today.

ngerakines · 2025-01-28T15:17:52Z

ngerakines
Jan 28, 2025
Maintainer

I'm a fan of having a special "reserved" or "deprecated" property in the schema used to indicate that a field cannot be used currently. Used in conjunction with an optional "$rev" field, it checks all of the boxes.

Consider the following schema where in an alternative universe the "post" field was first required and then later changed to "subject".

{
  "lexicon": 1,
  "id": "community.lexicon.bookmarks.bookmark",
  "defs": {
    "main": {
      "type": "record",
      "revision": "2",
      "description": "Record bookmarking a link to come back to later.",
      "key": "tid",
      "record": {
        "type": "object",
        "reserved": [
          "post"
        ],
        "required": [
          "subject",
          "createdAt"
        ],
        "properties": {
          "subject": {
            "type": "string",
            "format": "uri"
          },
          "createdAt": {
            "type": "string",
            "format": "datetime"
          },
          "tags": {
            "type": "array",
            "description": "Tags for content the bookmark may be related to, for example 'news' or 'funny videos'",
            "items": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

In use:

{
  "$type": "community.lexicon.bookmark",
  "$rev": "2",
  "subject": "at://did:plc:xyz123/app.bsky.feed.post/abc456",
  "createdAt": "2024-09-13T08:00:00.000Z"
}

This approach is minimal, but intentionally. I think that schema files don't need to convey the entire history, but instead should only show what fields are available, expected, and required. Version control is a solved problem and I think it would be over-prescriptive to stuff a lot of additional context of what was added, removed, or changed in a lexicon and why within the schema.

Having a minimal "$rev" field that defaults to "1" when not present, provides developers all of the "routing" information they need to handle backwards compatibility, if they want to handle it at all. Making that field a string and encouraging the use of semantic versioning can get us a long way in supporting development versions and future/backwards compatibility.

3 replies

ngerakines Jan 28, 2025
Maintainer

What I want to avoid is a system or process that results in NSID or type changes. A bookmark type, for example, should be consistent regardless of the version of schema. A lot of things from storage structures in the PDS, to referencing collections, and even service features like Jetstream that allow for filtering by collection and type rely on consistent NSID values.

With an internal field, an SDK or application can discard a type that it doesn't support. It can do so by lightweight inspection of the "$rev" field.

Using $ prefixes for both type and rev also means that some tooling and libraries that normalize and sort object keys will stuff those values in front of the object.

$ jo '$type'='community.lexicom.bookmarks.bookmark' subject='foo' '$rev'="1" | jq -S .
{
  "$rev": 1,
  "$type": "community.lexicom.bookmarks.bookmark",
  "subject": "foo"
}

verdverm Jan 28, 2025
Author

"deprecated" is a good call, we want some way to express this. I believe the CUE team is looking towards attestations in their module / dependency implementation, largely because you can go an mark something that has already been published, rather than forcing clients to have to go find the latest version to then find out if the version they are using is still valid. There is a debate to be had on whether this should be applied only at the lexicon level or to specific fields. The spec as it is today, shuns backwards incompatible changes, however this is unrealistic in practice imho.

verdverm Jan 28, 2025
Author

If not set, should $rev default to 1 (the oldest / "first") or "latest". I would imagine latest would be more sustainable in the long-run, otherwise we are always going to have to support records in the first schema, even if they are newly created. If we best-practice the defs/fragment versioning scheme, do we sidestep this issue? The main def could be the equivalent of latest, though it would in theory evolve and become a breaking change for the given lexicon

verdverm · 2025-01-29T06:47:00Z

verdverm
Jan 29, 2025
Author

It occurs to me, people will likely want to say "these lexicon all go together at this version" which means we need to associate different lexicon files / records with each other at specific versions, as a group.

At some point it begins looking exactly like modules & dependencies from most language ecosystems

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lexicon.community

Lexicon Versioning #30

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

lexicon.community

Lexicon Versioning #30

verdverm Jan 27, 2025

Lexicon Versioning

ATProto Today

Lexicon Files

Lexicon Evolution

Version Identifiers

Monotonic Int (using the ATProto Lexicon.revision)

Name with Version Suffix

Kubernetes Style

Semver Style

Discussion

Where do we set the version?

Other

Replies: 4 comments · 4 replies

verdverm Jan 28, 2025 Author

ebwinters Jan 28, 2025 Collaborator

verdverm Jan 28, 2025 Author

ngerakines Jan 28, 2025 Maintainer

ngerakines Jan 28, 2025 Maintainer

verdverm Jan 28, 2025 Author

verdverm Jan 28, 2025 Author

verdverm Jan 29, 2025 Author

verdverm
Jan 27, 2025

Replies: 4 comments 4 replies

verdverm
Jan 28, 2025
Author

ebwinters
Jan 28, 2025
Collaborator

verdverm Jan 28, 2025
Author

ngerakines
Jan 28, 2025
Maintainer

ngerakines Jan 28, 2025
Maintainer

verdverm Jan 28, 2025
Author

verdverm Jan 28, 2025
Author

verdverm
Jan 29, 2025
Author