Skip to content

Commit 087c613

Browse files
authoredMar 20, 2025
[Strings] Make string a subtype of ext, not any (#7373)
StringLowering converts strings to externs, which makes sense as we lower stringrefs to imported JS strings. For the reverse transform it is convenient to just have strings be subtypes of ext, see #7370 - that makes it simple to switch stringref to externref and vice versa. This also adds support for internalizing externref strings, which we represent as anyref literals (basically a hidden subtype of anyref).
1 parent 2639074 commit 087c613

18 files changed

+186
-120
lines changed
 

‎CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ Current Trunk
1818
- Add an option to preserve imports and exports in the fuzzer (for fuzzer
1919
harnesses where they only want Binaryen to modify their given testcases, not
2020
generate new things in them).
21+
- `string` is now a subtype of `ext` (rather than `any`). This allows better
22+
transformations for strings, like an inverse of StringLowering, but will
23+
error on codebases that depend on being able to pass strings into anyrefs.
2124

2225
v122
2326
----

‎README.md

+9
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,13 @@ There are a few differences between Binaryen IR and the WebAssembly language:
105105
Binaryen IR is more structured than WebAssembly as noted earlier). Note
106106
that Binaryen does support unreachable code in .wat text files, since as we
107107
saw Binaryen only supports s-expressions there, which are structured.
108+
* Binaryen supports a `stringref` type. This is similar to the currently-
109+
frozen [stringref proposal], with the difference that the string type is a
110+
subtype of `externref` rather than `anyref`. Doing so allows toolchains to
111+
emit code in a form that uses [js string builtins] which Binaryen can then
112+
"lift" into stringref in its internal IR, optimize (for example, a
113+
concatenation of "a" and "b" can be optimized at compile time to "ab"), and
114+
then "lower" that into js string builtins once more.
108115
* Blocks
109116
* Binaryen IR has only one node that contains a variable-length list of
110117
operands: the block. WebAssembly on the other hand allows lists in loops,
@@ -1039,3 +1046,5 @@ Windows and OS X as most of the core devs are on Linux.
10391046
[minification]: https://kripken.github.io/talks/binaryen.html#/2
10401047
[unreachable]: https://github.com/WebAssembly/binaryen/issues/903
10411048
[binaryen_ir]: https://github.com/WebAssembly/binaryen/issues/663
1049+
[stringref proposal]: https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
1050+
[js string builtins]: https://github.com/WebAssembly/js-string-builtins/blob/main/proposals/js-string-builtins/Overview.md

‎scripts/bundle_clusterfuzz.py

+1
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@
107107
'--disable-shared-everything',
108108
'--disable-fp16',
109109
'--disable-custom-descriptors',
110+
'--disable-strings',
110111
]
111112

112113
with tarfile.open(output_file, "w:gz") as tar:

‎scripts/clusterfuzz/run.py

+1
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
'--disable-shared-everything',
8989
'--disable-fp16',
9090
'--disable-custom-descriptors',
91+
'--disable-strings',
9192
]
9293

9394

‎scripts/fuzz_opt.py

+8-5
Original file line numberDiff line numberDiff line change
@@ -151,10 +151,12 @@ def randomize_feature_opts():
151151

152152
# The shared-everything feature is new and we want to fuzz it, but it
153153
# also currently disables fuzzing V8, so disable it most of the time.
154-
# Same with custom descriptors.
154+
# Same with custom descriptors and strings - all these cannot be run in
155+
# V8 for now.
155156
if random.random() < 0.9:
156157
FEATURE_OPTS.append('--disable-shared-everything')
157158
FEATURE_OPTS.append('--disable-custom-descriptors')
159+
FEATURE_OPTS.append('--disable-strings')
158160

159161
print('randomized feature opts:', '\n ' + '\n '.join(FEATURE_OPTS))
160162

@@ -815,8 +817,9 @@ def run(self, wasm, extra_d8_flags=[]):
815817
def can_run(self, wasm):
816818
# V8 does not support shared memories when running with
817819
# shared-everything enabled, so do not fuzz shared-everything
818-
# for now. It also does not yet support custom descriptors.
819-
return all_disallowed(['shared-everything', 'custom-descriptors'])
820+
# for now. It also does not yet support custom descriptors, nor
821+
# strings.
822+
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])
820823

821824
def can_compare_to_self(self):
822825
# With nans, VM differences can confuse us, so only very simple VMs
@@ -1573,7 +1576,7 @@ def can_run_on_wasm(self, wasm):
15731576
return False
15741577

15751578
# see D8.can_run
1576-
return all_disallowed(['shared-everything', 'custom-descriptors'])
1579+
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])
15771580

15781581

15791582
# Check that the text format round-trips without error.
@@ -1758,7 +1761,7 @@ def can_run_on_wasm(self, wasm):
17581761
return False
17591762
if NANS:
17601763
return False
1761-
return all_disallowed(['shared-everything', 'custom-descriptors'])
1764+
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])
17621765

17631766

17641767
# Test --fuzz-preserve-imports-exports, which never modifies imports or exports.

‎src/literal.h

+3
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ class Literal {
5757
// Externalized i31 references have a gcData containing the internal i31
5858
// reference as its sole value even though internal i31 references do not
5959
// have a gcData.
60+
//
61+
// Note that strings can be internalized, in which case they keep the same
62+
// gcData, but their type becomes anyref.
6063
std::shared_ptr<GCData> gcData;
6164
// A reference to Exn data.
6265
std::shared_ptr<ExnData> exnData;

‎src/tools/execution-results.h

+10-2
Original file line numberDiff line numberDiff line change
@@ -303,11 +303,19 @@ struct ExecutionResults {
303303
}
304304

305305
void printValue(Literal value) {
306-
// Unwrap an externalized value to get the actual value.
307-
if (Type::isSubType(value.type, Type(HeapType::ext, Nullable))) {
306+
// Unwrap an externalized GC value to get the actual value, but not strings,
307+
// which are normally a subtype of ext.
308+
if (Type::isSubType(value.type, Type(HeapType::ext, Nullable)) &&
309+
!value.type.isString()) {
308310
value = value.internalize();
309311
}
310312

313+
// An anyref literal is a string.
314+
if (value.type.isRef() &&
315+
value.type.getHeapType().isMaybeShared(HeapType::any)) {
316+
value = value.externalize();
317+
}
318+
311319
// Don't print most reference values, as e.g. funcref(N) contains an index,
312320
// which is not guaranteed to remain identical after optimizations. Do not
313321
// print the type in detail (as even that may change due to closed-world

‎src/tools/fuzzing/fuzzing.cpp

+14-13
Original file line numberDiff line numberDiff line change
@@ -3399,6 +3399,10 @@ Expression* TranslateToFuzzReader::makeBasicRef(Type type) {
33993399
auto share = heapType.getShared();
34003400
switch (heapType.getBasic(Unshared)) {
34013401
case HeapType::ext: {
3402+
if (wasm.features.hasStrings() && share == Unshared && oneIn(2)) {
3403+
// Shared strings not yet supported.
3404+
return makeConst(Type(HeapType::string, NonNullable));
3405+
}
34023406
auto null = builder.makeRefNull(HeapTypes::ext.getBasic(share));
34033407
// TODO: support actual non-nullable externrefs via imported globals or
34043408
// similar.
@@ -3429,10 +3433,6 @@ Expression* TranslateToFuzzReader::makeBasicRef(Type type) {
34293433
HeapType::i31,
34303434
HeapType::struct_,
34313435
HeapType::array);
3432-
if (share == Unshared) {
3433-
// Shared strings not yet supported.
3434-
subtypeOpts.add(FeatureSet::Strings, HeapType::string);
3435-
}
34363436
auto subtype = pick(subtypeOpts).getBasic(share);
34373437
return makeConst(Type(subtype, nullability));
34383438
}
@@ -5376,11 +5376,16 @@ HeapType TranslateToFuzzReader::getSubType(HeapType type) {
53765376
.getBasic(share);
53775377
case HeapType::cont:
53785378
return pick(HeapTypes::cont, HeapTypes::nocont).getBasic(share);
5379-
case HeapType::ext:
5380-
return pick(FeatureOptions<HeapType>()
5381-
.add(FeatureSet::ReferenceTypes, HeapType::ext)
5382-
.add(FeatureSet::GC, HeapType::noext))
5383-
.getBasic(share);
5379+
case HeapType::ext: {
5380+
auto options = FeatureOptions<HeapType>()
5381+
.add(FeatureSet::ReferenceTypes, HeapType::ext)
5382+
.add(FeatureSet::GC, HeapType::noext);
5383+
if (share == Unshared) {
5384+
// Shared strings not yet supported.
5385+
options.add(FeatureSet::Strings, HeapType::string);
5386+
}
5387+
return pick(options).getBasic(share);
5388+
}
53845389
case HeapType::any: {
53855390
assert(wasm.features.hasReferenceTypes());
53865391
assert(wasm.features.hasGC());
@@ -5391,10 +5396,6 @@ HeapType TranslateToFuzzReader::getSubType(HeapType type) {
53915396
HeapType::struct_,
53925397
HeapType::array,
53935398
HeapType::none);
5394-
if (share == Unshared) {
5395-
// Shared strings not yet supported.
5396-
options.add(FeatureSet::Strings, HeapType::string);
5397-
}
53985399
return pick(options).getBasic(share);
53995400
}
54005401
case HeapType::eq:

‎src/tools/fuzzing/heap-types.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,7 @@ struct HeapTypeGeneratorImpl {
510510
candidates.push_back(HeapTypes::any.getBasic(share));
511511
break;
512512
case HeapType::string:
513-
candidates.push_back(HeapTypes::any.getBasic(share));
513+
candidates.push_back(HeapTypes::ext.getBasic(share));
514514
break;
515515
case HeapType::none:
516516
return pickSubAny(share);

‎src/wasm/literal.cpp

+23-4
Original file line numberDiff line numberDiff line change
@@ -76,10 +76,13 @@ Literal::Literal(std::shared_ptr<GCData> gcData, HeapType type)
7676
type(type, gcData ? NonNullable : Nullable, gcData ? Inexact : Exact) {
7777
// TODO: Use exact types for more than just nulls.
7878
// The type must be a proper type for GC data: either a struct, array, or
79-
// string; or an externalized version of the same; or a null.
79+
// string; or an externalized version of the same; or a null; or an
80+
// internalized string (which appears as an anyref).
8081
assert((isData() && gcData) ||
8182
(type.isMaybeShared(HeapType::ext) && gcData) ||
82-
(type.isBottom() && !gcData));
83+
(type.isBottom() && !gcData) ||
84+
(type.isMaybeShared(HeapType::any) && gcData &&
85+
gcData->type.isMaybeShared(HeapType::string)));
8386
}
8487

8588
Literal::Literal(std::shared_ptr<ExnData> exnData)
@@ -153,6 +156,11 @@ Literal::Literal(const Literal& other) : type(other.type) {
153156
case HeapType::nocont:
154157
WASM_UNREACHABLE("null literals should already have been handled");
155158
case HeapType::any:
159+
// This must be an anyref literal, which is an internalized string.
160+
assert(other.gcData &&
161+
other.gcData->type.isMaybeShared(HeapType::string));
162+
new (&gcData) std::shared_ptr<GCData>(other.gcData);
163+
return;
156164
case HeapType::eq:
157165
case HeapType::func:
158166
case HeapType::cont:
@@ -169,7 +177,8 @@ Literal::~Literal() {
169177
if (type.isBasic()) {
170178
return;
171179
}
172-
if (isNull() || isData() || type.getHeapType().isMaybeShared(HeapType::ext)) {
180+
if (isNull() || isData() || type.getHeapType().isMaybeShared(HeapType::ext) ||
181+
type.getHeapType().isMaybeShared(HeapType::any)) {
173182
gcData.~shared_ptr();
174183
} else if (isExn()) {
175184
exnData.~shared_ptr();
@@ -652,13 +661,14 @@ std::ostream& operator<<(std::ostream& o, Literal literal) {
652661
case HeapType::exn:
653662
o << "exnref";
654663
break;
655-
case HeapType::any:
656664
case HeapType::eq:
657665
case HeapType::func:
658666
case HeapType::cont:
659667
case HeapType::struct_:
660668
case HeapType::array:
661669
WASM_UNREACHABLE("invalid type");
670+
case HeapType::any:
671+
// Anyref literals contain strings.
662672
case HeapType::string: {
663673
auto data = literal.getGCData();
664674
if (!data) {
@@ -2868,6 +2878,11 @@ Literal Literal::externalize() const {
28682878
return Literal(std::make_shared<GCData>(heapType, Literals{*this}),
28692879
extType);
28702880
}
2881+
if (heapType.isMaybeShared(HeapType::any)) {
2882+
// Anyref literals turn into strings (if we add any other anyref literals,
2883+
// we will need to be more careful here).
2884+
return Literal(gcData, HeapTypes::string.getBasic(share));
2885+
}
28712886
return Literal(gcData, extType);
28722887
}
28732888

@@ -2883,6 +2898,10 @@ Literal Literal::internalize() const {
28832898
assert(gcData->values[0].type.getHeapType().isMaybeShared(HeapType::i31));
28842899
return gcData->values[0];
28852900
}
2901+
if (gcData->type.isMaybeShared(HeapType::string)) {
2902+
// Strings turn into anyref literals.
2903+
return Literal(gcData, HeapTypes::any.getBasic(share));
2904+
}
28862905
return Literal(gcData, gcData->type);
28872906
}
28882907

‎src/wasm/wasm-type.cpp

+17-5
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,11 @@ std::optional<HeapType> getBasicHeapTypeLUB(HeapType::BasicHeapType a,
407407
HeapType lubUnshared;
408408
switch (HeapType(a).getBasic(Unshared)) {
409409
case HeapType::ext:
410+
if (bUnshared != HeapType::string) {
411+
return std::nullopt;
412+
}
413+
lubUnshared = HeapType::ext;
414+
break;
410415
case HeapType::func:
411416
case HeapType::cont:
412417
case HeapType::exn:
@@ -437,9 +442,14 @@ std::optional<HeapType> getBasicHeapTypeLUB(HeapType::BasicHeapType a,
437442
}
438443
break;
439444
case HeapType::array:
440-
case HeapType::string:
441445
lubUnshared = HeapType::any;
442446
break;
447+
case HeapType::string:
448+
// String has already been handled: we sorted before in a way that ensures
449+
// the type the string is compared to is of a higher index, which means it
450+
// is a bottom type (string is the last type that is not a bottom type),
451+
// but we have handled the case of either a or b being a bottom type
452+
// earlier already.
443453
case HeapType::none:
444454
case HeapType::noext:
445455
case HeapType::nofunc:
@@ -953,8 +963,9 @@ std::optional<HeapType> HeapType::getSuperType() const {
953963
case none:
954964
case exn:
955965
case noexn:
956-
case string:
957966
return {};
967+
case string:
968+
return HeapType(ext).getBasic(share);
958969
case eq:
959970
return HeapType(any).getBasic(share);
960971
case i31:
@@ -1021,12 +1032,12 @@ size_t HeapType::getDepth() const {
10211032
case HeapType::exn:
10221033
break;
10231034
case HeapType::eq:
1035+
case HeapType::string:
10241036
depth++;
10251037
break;
10261038
case HeapType::i31:
10271039
case HeapType::struct_:
10281040
case HeapType::array:
1029-
case HeapType::string:
10301041
depth += 2;
10311042
break;
10321043
case HeapType::none:
@@ -1070,9 +1081,9 @@ HeapType::BasicHeapType HeapType::getUnsharedBottom() const {
10701081
case i31:
10711082
case struct_:
10721083
case array:
1073-
case string:
10741084
case none:
10751085
return none;
1086+
case string:
10761087
case noext:
10771088
return noext;
10781089
case nofunc:
@@ -1530,8 +1541,9 @@ bool SubTyper::isSubType(HeapType a, HeapType b) {
15301541
aUnshared == HeapType::struct_ || aUnshared == HeapType::array ||
15311542
a.isStruct() || a.isArray();
15321543
case HeapType::i31:
1533-
case HeapType::string:
15341544
return aUnshared == HeapType::none;
1545+
case HeapType::string:
1546+
return aUnshared == HeapType::noext;
15351547
case HeapType::struct_:
15361548
return aUnshared == HeapType::none || a.isStruct();
15371549
case HeapType::array:

0 commit comments

Comments
 (0)