Skip to content

Interface version canonicalization #536

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lann
Copy link
Contributor

@lann lann commented Jun 25, 2025

See #534

  • I stuck fullversion in the import/export productions rather than interfacename because I wanted it to be clear that it wouldn't be lowered into the core name.
  • The version canonicalization rules are adapted from Add BuildTargets.md #378. I'm still leaning toward omitting prerelease versions but I've only thought "medium hard" about it.
  • Still needs binary encoding; see comment below.
  • Not sure how best to capture the discussion about making canonicalization mandatory pre-1.0; the "Binary Warts" section doesn't seem quite right.

@lann lann force-pushed the truncated-versions branch 2 times, most recently from 79c15f7 to 7b6bd7d Compare June 25, 2025 19:54
@lann lann force-pushed the truncated-versions branch from 7b6bd7d to 2f8eda8 Compare June 25, 2025 20:46
@lann lann changed the title WIP: Truncated interface versions Interface version canonicalization Jun 25, 2025
@lann
Copy link
Contributor Author

lann commented Jun 25, 2025

For the binary encoding the most straightforward option from a quick review would seem to be adding variants of importname' / exportname' along the lines of:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname> fullverlen:<u16> fullver:<valid semver>

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

On this field width:

fullverlen:<u16>

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

🤷

@lukewagner
Copy link
Member

@lann Thanks for starting this! For the binary encoding question: yes, taking over the 0x00 byte and using it as a discriminant is a nice coincidence we can take advantage of (and could you update the corresponding bullet in the "Warts" section at the end)?

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

Is there a simplicity argument to be made that requiring the concatenation of the version and the fullversion to match <valid semver> is simpler than allowing the fullversion to be <valid semver> and then adding the additional validation requirement (which I assume we want) that the fullversion has to "match" the version? If so, that could be a second argument in favor in addition to size.

Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! A few drive-by comments:

export ::= (export <id>? "<exportname>" <sortidx> <externdesc>?)
import ::= (import "<importname>" <fullversion>? bind-id(<externdesc>))
export ::= (export <id>? "<exportname>" <fullversion>? <sortidx> <externdesc>?)
fullversion ::= (fullversion <valid semver>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fullversion ::= (fullversion <valid semver>)
fullversion ::= (fullversion "<valid semver>")

@@ -294,7 +294,7 @@ sort ::= core <core:sort>
| type
| component
| instance
inlineexport ::= (export <exportname> <sortidx>)
inlineexport ::= (export <exportname> <fullversion>? <sortidx>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inlineexport ::= (export <exportname> <fullversion>? <sortidx>)
inlineexport ::= (export "<exportname>" <fullversion>? <sortidx>)

(pre-existing, but since we're touching this line)

Comment on lines +577 to +578
importdecl ::= (import <importname> <fullversion>? bind-id(<externdesc>))
exportdecl ::= (export <exportname> <fullversion>? bind-id(<externdesc>))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
importdecl ::= (import <importname> <fullversion>? bind-id(<externdesc>))
exportdecl ::= (export <exportname> <fullversion>? bind-id(<externdesc>))
importdecl ::= (import "<importname>" <fullversion>? bind-id(<externdesc>))
exportdecl ::= (export "<exportname>" <fullversion>? bind-id(<externdesc>))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pre-existing)

@@ -2379,6 +2383,33 @@ interpreted with the same [semantics][SemVerRange]. (Mostly this
interpretation is the usual SemVer-spec-defined ordering, but note the
particular behavior of pre-release tags.)

The `version` production used in `interfacename`s accepts both `valid semver`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you frame "canonicalized interface version" as a validation rule factored out into a new "Canonical Interface Name" section alongside and symmetric to "Name Uniquness" and say that it is temporarily not enforced but will start issuing warnings and be enforced post-Preview-3?

@@ -2283,10 +2284,13 @@ words ::= <word>
| <words> '-' <word>
projection ::= '/' <label>
version ::= '@' <valid semver>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to rename this interfaceversion since it's specific to interfacename (and to be symmetric to pkgversion). (I know that'll mess up the column alignment and fixing bloats the diff obscuring the change; maybe leave it unaligned and fix it right before merging.)

Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops, meant to "comment" not approve before it's even ready to review 🙃 )

@alexcrichton
Copy link
Collaborator

For the binary encoding, here's another possible encoding:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname>                       => "${in.name}@N"  (if len = |in|,  in.version = N.*)
              | 0x02 len:<u32> in:<importname>                       => "${in.name}@0.N"  (if len = |in|,  in.version = 0.N.*)
              | 0x03 len:<u32> in:<importname>                       => "${in.name}@0.0.N"  (if len = |in|,  in.version = 0.0.N.*)

maybe with affordances for rc/etc unsure. The basic idea though is that the actual import name would always be foo:bar/baz@0.1.2 in the binary format but the semantic meaning (e.g. the text format) would be a subslice of such a string. This codifies that in the binary format it's always a valid semver and the discriminant byte says basically how to shorten it. The goal here would be to make the binary format still pretty clear what it can be without changing the meaning of the meaning at a parsed layer.

fullverlen:

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

For this I'd recommend using <u32> regardless. We already limit many strings far below the theoretical 4G limit with a 32-bit length and keeping <u32> makes it more consistent with the rest of the decoding process. Otherwise when implementing a decoder you'd have to implement a specific function for decoding a 16-bit LEB which is otherwise not required when parsing WebAssembly today. Basically while I agree that >255 characters for a version is silly, I'd say that for consistency with the rest of the binary format this'd want to be <u32> if we go with this variant.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants