Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Don't optimize string literals to string.const if they are not valid UTF-16 strings. #7324

Merged
merged 5 commits into from
Feb 25, 2025
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 27 additions & 37 deletions src/passes/Precompute.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,38 +240,6 @@ class PrecomputingExpressionRunner
// string.encode_wtf16_array anyhow.)
return Flow(NONCONSTANT_FLOW);
}

Flow visitStringSliceWTF(StringSliceWTF* curr) {
auto flow = Super::visitStringSliceWTF(curr);
if (flow.breaking()) {
return flow;
}

auto refData = flow.getSingleValue().getGCData();
if (!refData) {
return Flow(NONCONSTANT_FLOW);
}

auto& refValues = refData->values;
if (refValues.size() == 0) {
return flow;
}

// Check that the slice is valid; since we can assume that we have a valid
// UTF-16, we only need to check that it did not split surrogate pairs.
auto firstChar = refValues[0].getInteger();
if (firstChar >= 0xDC00 && firstChar <= 0xDFFF) {
// The first char cannot be a low surrogate.
return Flow(NONCONSTANT_FLOW);
}

auto lastChar = refValues[refValues.size() - 1].getInteger();
if (lastChar >= 0xD800 && lastChar <= 0xDBFF) {
// The last char cannot be a high surrogate.
return Flow(NONCONSTANT_FLOW);
}
return flow;
}
};

struct Precompute
Expand Down Expand Up @@ -967,18 +935,17 @@ struct Precompute
if (value.isNull()) {
return true;
}
return canEmitConstantFor(value.type);
}

bool canEmitConstantFor(Type type) {
auto type = value.type;
// A function is fine to emit a constant for - we'll emit a RefFunc, which
// is compact and immutable, so there can't be a problem.
if (type.isFunction()) {
return true;
}
// We can emit a StringConst for a string constant.
// We can emit a StringConst for a string constant if the string is a
// UTF-16 string.
if (type.isString()) {
return true;
return isValidUTF16Literal(value);
}
// All other reference types cannot be precomputed. Even an immutable GC
// reference is not currently something this pass can handle, as it will
Expand All @@ -991,6 +958,29 @@ struct Precompute
return true;
}

bool isValidUTF16Literal(const Literal& value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best place for this might be

src/support/string.h, cpp

alongside the existing isUTF8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I go about the Literal representation and std::string_view or std:string? I assume we don't want isValidUtf16Literal taking a Literal in string.cpp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we have related logic here:

case HeapType::string: {
auto data = literal.getGCData();
if (!data) {
o << "nullstring";
} else {
o << "string(";
// Convert WTF-16 literals to WTF-16 string.
std::stringstream wtf16;
for (auto c : data->values) {
auto u = c.getInteger();
assert(u < 0x10000);
wtf16 << uint8_t(u & 0xFF);
wtf16 << uint8_t(u >> 8);
}
// Escape to ensure we have valid unicode output and to make
// unprintable characters visible.
// TODO: Use wtf16.view() once we have C++20.
String::printEscapedJSON(o, wtf16.str());

Perhaps in this PR we can leave it as is and just add a TODO, something like

// TODO: move this logic to src/support/string, and refactor to share code with wasm/literal.cpp string printing's conversion from a Literal to a raw string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg. Added a TODO. Thanks.

bool expectLowSurrogate = false;
for (auto& v : value.getGCData()->values) {
auto c = v.getInteger();
if (c >= 0xDC00 && c <= 0xDFFF) {
if (expectLowSurrogate) {
expectLowSurrogate = false;
continue;
}
// We got a low surrogate but weren't expecting one.
return false;
}
if (expectLowSurrogate) {
// We are expecting a low surrogate but didn't get one.
return false;
}
if (c >= 0xD800 && c <= 0xDBFF) {
expectLowSurrogate = true;
}
}
return !expectLowSurrogate;
}

// Helpers for partial precomputing.

// Given a stack of expressions and the index of an expression in it, find
Expand Down
Loading