-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
VM handling of leading whitespace in multiline strings does not match the spec. #23020
Comments
The test was added in https://codereview.chromium.org/1042523004/. |
This comment was originally written by @mhausner I don't think this bug (and the test referenced in comment 1) is valid. In Dart string literals, a backslash followed by a character that is not a recognized escape sequence is identical to the character, i.e. the backslash is swallowed. "\h" is the same as "h". The multiline string """ Is the same as """bla""", but """\ is not the same. The first \ converts to a single , which is not a whitespace character. The single \ that results from this escape sequence is not used again to build escape sequences with the following characters. cc @gbracha. |
I'm not sure what you're referring to. Nowhere does the test assume that the result of applying an escape sequence would then be combined with the following character as another escape sequence. |
It appears that we need you to weigh in on whether the added test correctly matches the intent of the specification. If it does, please re-assign this issue to hausner@google.com. |
I commented in the review. What we do next depends less on the spec than on existing behavior of both VM and dart2js. |
Gilad-- I visited https://codereview.chromium.org/1042523004/ but I don't see any comments from you. Is it possible that your comments are still in draft form and haven't been sent yet? |
Yeah, I forgot to publish the comment. The more I think about it, the more I think that spec pedantry is not the point. We need to do what ordinary people can understand. I think the VM is ok in this case, and the test is flawed. Re-assigning to analyzer. Removed Area-VM label. |
You wrote:
I think the idea was (or should have been) to allow people to left-justify the first character of the string so that the alignment of characters in the resulting string would be obvious when looking at the source code. For example, consider the following: WindowManager.openModelDialog(title: "Error", message: """You obviously don't know what you're doing
and shouldn't be using a computer!"""); compared to: WindowManager.openModelDialog(title: "Error", message: """
You obviously don't know what you're doing
and shouldn't be using a computer!"""); The reason for allowing whitespace before the end-of-line marker to be removed is because users can't see it and therefore won't realize that it is there (and that the string they think they're getting isn't what they're actually getting). It seems to me that it would be much cleaner if the spec said: If the first line of a multiline string, before escape characters are processed and before interpolation occurs, consists solely of the whitespace characters defined by the production WHITESPACE, then that line is ignored, including the new line at its end. Honestly, how many people are going to say to themselves "I want my string to contain a leading tab character that will be stripped out, but if I don't use an escape sequence nobody will be able to tell that it's there, so I'll explicitly type '\t'"? :-) Obviously that's a silly example, but what's the use case for wanting some escape sequence to be ignored? Similarly, what's the use case for wanting white space to be stripped out after interpolation? Removed Area-Analyzer label. |
As I said, maybe we should have left it at that. But we didn't because someone had an issue. We don't change the spec as easily as in the past. So the question is how to interpret it, not whether to change it. I've interpreted the existing spec in the way that is most obvious for users. Removed Area-Language label. |
cc @gbracha. |
If the specification requires an external interpretation to decide what it means, then I do think the specification should be changed. Otherwise it's just ambiguous, which, for a spec, means it's broken. And I'd vote for changing it to ignore only whitespace, no backslashes. |
For the record, I strongly disagree. I don't think most users will expect character escapes and interpolation to be performed before leading whitespace is removed. |
The spec will be clarified. It won't be changed. The bar for spec changes is higher than in the early wild west days. |
If "clarifying" the spec will make one current implementation invalid, it's not just a clarification. The spec is (at best) ambiguous, and disambiguating in one direction isn't any worse than disambiguating it in the other direction. So I'm arguing that the disambiguation should pick the sanest interpretation, which is to not allow \t escapes as part of the "invisible initial whitespace" of a multiline string. It'll still just be a clarification, it's just clarifying that this was the intent. Interpreting escapes first also opens the can on interpreting interpolations first: """${' '}""" - should that space be removed as well? (It's a single compile-time constant, so it is possible). Also, if a multi-line string is on a single line, the spec also requires its content to be ignored if it's pure WHITESPACE, for example """ """. It's probably only intended to happen if the string literal actually contains a NEWLINE character (outside of interpolations). And while I appreciate that this might not be the most important part of the Dart language, anything the spec says must be implemented by the implementations as well, and explained to users. I can't explain that """\t""" is an empty string :) |
See also issue #14073 and issue #19514. For the record, I'd consider CR LF a NEWLINE for the tokenizer, but BACKSLASH CR BACKSLASH NL is two individual NEWLINE characters, each prefixed by backslash. The latter backslash should be part of the multiline string. |
And another curiosity. I think the "prefix by " should be changed to at least only apply to cooked strings. |
Where are we at with this? Is it still a useful issue to have open? |
It still not following the spec. If the first line of a multiline string consists solely of the whitespace characters defined by the production **WHITESPACE** possibly prefixed by `\`, then that line is ignored, including the line break at its end. The "line" is ended by the first occurrence of CR+LF, CR (not followed by LF) or LF. var x = """\
x"""; (That is: quotes-backslash-linefeed), then all characters before the line end are not whitespace potentially prefixed by backslash, because there is a backslash which is not followed by a whitespace on that line. Both analyzer and VM accepts this. We can then argue whether backslash escaping should happen before or after line splitting. On the "before' case, the backslash-linefeed reduces to linefeed, then you break before that linefeed, but the I'd still prefer to completely disallow backslashes on the first line - we ignore the first line of a multiline string if it contains nothing by whitespace characters. It's a tokenization issue to determine when the string contents start. Having to do string escape interpretation in order to figure out where the string starts leaves us with questions like the above, and we'd be better off just not worrying. |
Everything in the comments here makes me believe this is a bug for back-ends (or maybe CFE). I'm not sure why it's tagged as analyzer. Tentatively tagging with VM. |
Back-ends don't parse Dart sources. This looks like a parser / CFE issue. |
So @lrhn what do we actually want here? |
Honestly: I want to change the spec. It's not really worth spending much time on being spec compliant for something that is never used, so I'd mark it P3 or P4 and carry on, unless you really want to work on it. |
In section 16.5, the specification says:
If the first line of a multiline string consists solely of the whitespace characters defined by the production WHITESPACE 20.1, possibly prefixed by
\
, then that line is ignored, including the new line at its end.The specification doesn't state this explicitly, but I believe that this means that leading whitespace should be removed before any other processing has been done, such as processing escape sequences (otherwise, the whitespace characters would no longer be prefixed by backslashes). The VM appears to be processing escape characters and even performing interpolation before removing leading whitespace.
I have a test case (language/multiline_strings_test) that I will commit after this issue is created (I need the issue number for the status files).
The text was updated successfully, but these errors were encountered: