Skip to content

Commit 5ba5934

Browse files
jensjohaCommit Queue
authored and
Commit Queue
committed
[scanner] Specialized scanner recovery for missing end curly brace
*TL;DR* This improves scanner recovery for a missing `}` in certain situations, reducing the risk of an in-body change causing a (temporary) outline change (which in turn could result in the analyzer becoming unresponsive for "no reason"). *Details* The behavior of IntelliJ is that when typing `{` it only inserts a matching end brace `}` when hitting enter. Imagine you are typing an if: `if (1 + 1 == 2) {`, where you don't hit enter quickly enough and you trigger a re-analysis at this point. What happens then is that every method below where you are typing looks to be local function declarations and thus the outline change. When the outline change the analyzer has to do a lot of work: everything (transitively) depending on the file has to be recompiled, and every strongly connected component is compiled "in one go" where the analyzer can't respond to queries. So if you have one or more large strongly connected components depending on the file, or the file itself is part of such a chain, you will (or at least might) experience that the analyzer is slow to respond, and it will be extra puzzling because logically you're just doing an in-body change. For some code the user might not even naturally hit enter, e.g. `var foo = {"I'm", "a", "set"};`. The recovery in the scanner has always been that - upon reaching the end of the file - it sees that we're missing a `}` and it inserts it at the end. This CL instead tries to figure out a better place to insert it, and if successful, will rerun the scanner, instructing it to insert it at the better place and (hopefully) avoiding a subsequent outline change. It does this by looking at the indentation - which is new for recovery - and under the assumption that the indentation was correct before, will find the position where the start curly brace was inserted. Note that if it finds a position it will always be between the start curly brace (the one missing the end curly brace) and the end of file, and inserting the missing curly end brace there can't really be "more wrong" than inserting it at the end (if the new place is not correct it's just "still wrong"). In the benchmark added we see how quickly we can get completion after having typed `if (1+1==2) {`, then adding `\n ge\n}` and requesting completion on the `ge` part, i.e. a simulation of typing ``` if (1+1==2) { ge } ``` and asking for completion at the `ge`. The change in this CL - on cycles of size 1024 - caused the time to completion response to come in between ~5 times faster (going from ~10.2 to ~2.1 seconds) to ~18 times faster (going from ~10.3 seconds to ~0.56 seconds): `CodeType.ImportExportCycle` goes from: ``` +------+-----------+------------+ | Size | Initial | Completion | +------+-----------+------------+ | 16 | 2.019581 | 0.97504 | | 32 | 3.028976 | 1.031008 | | 64 | 4.422884 | 1.198383 | | 128 | 7.612125 | 1.597091 | | 256 | 12.860864 | 2.906553 | | 512 | 24.391894 | 5.017093 | | 1024 | 48.390993 | 10.243085 | +------+-----------+------------+ ``` to ``` +------+-----------+------------+ | Size | Initial | Completion | +------+-----------+------------+ | 16 | 2.107213 | 0.661066 | | 32 | 3.012952 | 0.70554 | | 64 | 4.682508 | 0.731176 | | 128 | 7.508434 | 0.745501 | | 256 | 13.105477 | 0.852413 | | 512 | 24.520184 | 1.278403 | | 1024 | 48.804348 | 2.11903 | +------+-----------+------------+ ``` and `CodeType.ImportExportChain` goes from: ``` +------+-----------+------------+ | Size | Initial | Completion | +------+-----------+------------+ | 16 | 2.059196 | 0.892082 | | 32 | 3.080717 | 0.93232 | | 64 | 4.647163 | 1.240303 | | 128 | 7.377035 | 1.674859 | | 256 | 12.939432 | 2.705483 | | 512 | 24.529501 | 5.02689 | | 1024 | 47.713553 | 10.385469 | +------+-----------+------------+ ``` to ``` +------+-----------+------------+ | Size | Initial | Completion | +------+-----------+------------+ | 16 | 2.020809 | 0.709643 | | 32 | 3.106856 | 0.648818 | | 64 | 4.503067 | 0.593152 | | 128 | 7.45692 | 0.622423 | | 256 | 13.140592 | 0.606948 | | 512 | 24.933216 | 0.612687 | | 1024 | 50.167541 | 0.567544 | +------+-----------+------------+ ``` Change-Id: I8dbefe215162d00a209206ae3db83b2b17505853 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/415581 Commit-Queue: Jens Johansen <jensj@google.com> Reviewed-by: Brian Wilkerson <brianwilkerson@google.com> Reviewed-by: Phil Quitslund <pquitslund@google.com>
1 parent f276d3c commit 5ba5934

File tree

91 files changed

+6899
-100
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+6899
-100
lines changed

pkg/_fe_analyzer_shared/lib/src/scanner/abstract_scanner.dart

+140
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,10 @@ abstract class AbstractScanner implements Scanner {
114114
@override
115115
bool hasErrors = false;
116116

117+
Token? openBraceWithMissingEndForPossibleRecovery;
118+
119+
int? offsetForCurlyBracketRecoveryStart;
120+
117121
/**
118122
* A pointer to the stream of comment tokens created by this scanner
119123
* before they are assigned to the [Token] precedingComments field
@@ -352,9 +356,136 @@ abstract class AbstractScanner implements Scanner {
352356
appendToken(new KeywordToken(keyword, tokenStart, comments));
353357
}
354358

359+
/// Find the indentation of the logical line of [token].
360+
///
361+
/// By logical take this as an example:
362+
///
363+
/// ```
364+
/// if (a &&
365+
/// b) {
366+
/// }
367+
/// ```
368+
///
369+
/// the indentation of `{` should logically be the same as the indentation of
370+
/// `a` because it's the indentation of the `if`, even though the _line_ of a
371+
/// has a different indentation of the _line_ of `{`.
372+
int _spacesAtStartOfLogicalLineOf(Token token) {
373+
if (lineStarts.isEmpty) return -1;
374+
375+
// If the previous token is a `)`, e.g. if this is the start curly brace in
376+
// an if, we find the first token before the corresponding `(` (in this case
377+
// the if) in an attempt to find "the right" token to get the indentation
378+
// for - e.g. if the if is spread over several lines the given token itself
379+
// will - if formatted by the formatter - be indented more than the "if".
380+
if (token.isA(TokenType.OPEN_CURLY_BRACKET)) {
381+
if (token.previous == null) return -1;
382+
Token previous = token.previous!;
383+
bool foundWanted = false;
384+
if (previous.isA(TokenType.CLOSE_PAREN)) {
385+
Token closeParen = token.previous!;
386+
Token? candidate = closeParen.previous;
387+
while (candidate != null && candidate.endGroup != closeParen) {
388+
candidate = candidate.previous;
389+
}
390+
if (candidate?.endGroup == closeParen && candidate!.previous != null) {
391+
token = candidate.previous!;
392+
if (token.isA(Keyword.IF) ||
393+
token.isA(Keyword.FOR) ||
394+
token.isA(Keyword.WHILE) ||
395+
token.isA(Keyword.SWITCH) ||
396+
token.isA(Keyword.CATCH)) {
397+
foundWanted = true;
398+
}
399+
}
400+
} else if (previous.isA(Keyword.ELSE) ||
401+
previous.isA(Keyword.TRY) ||
402+
previous.isA(Keyword.FINALLY)) {
403+
foundWanted = true;
404+
} else if (previous.isA(TokenType.EQ) &&
405+
(previous.previous?.isA(TokenType.IDENTIFIER) ?? false)) {
406+
// `someIdentifier = {`
407+
foundWanted = true;
408+
} else if (previous.isA(Keyword.CONST) &&
409+
(previous.previous?.isA(TokenType.EQ) ?? false) &&
410+
(previous.previous?.previous?.isA(TokenType.IDENTIFIER) ?? false)) {
411+
// `someIdentifier = const {`
412+
foundWanted = true;
413+
}
414+
if (!foundWanted) return -1;
415+
}
416+
417+
// Now find the line of [token].
418+
final int offset = token.offset;
419+
int low = 0, high = lineStarts.length - 1;
420+
while (low < high) {
421+
int mid = high - ((high - low) >> 1); // Get middle, rounding up.
422+
int pivot = lineStarts[mid];
423+
if (pivot <= offset) {
424+
low = mid;
425+
} else {
426+
high = mid - 1;
427+
}
428+
}
429+
final int lineIndex = low;
430+
if (lineIndex == 0) {
431+
// On first line.
432+
return tokens.next?.charOffset ?? -1;
433+
}
434+
435+
// Find the first token of the line.
436+
int lineStartOfToken = lineStarts[lineIndex];
437+
Token? candidate = token.previous;
438+
while (candidate != null && candidate.offset >= lineStartOfToken) {
439+
candidate = candidate.previous;
440+
}
441+
if (candidate != null) {
442+
// candidate.next is the first token of the line.
443+
return candidate.next!.offset - lineStartOfToken;
444+
}
445+
446+
return -1;
447+
}
448+
449+
/// If there was a single missing `}` when tokenizing, try to find the actual
450+
/// `{` that is missing the `}`.
451+
///
452+
/// This is done by checking all `{` tokens between the one (currently)
453+
/// missing a `}` and the end of the token stream. For each we try to identify
454+
/// the indentation of the `{` and the indentation of the endGroup (i.e. the
455+
/// `}` it has been matched with). If the indentations mismatch we assume this
456+
/// is a bad match. There might be more bad matches because of how the curly
457+
/// braces are nested and we pick the last one, which should be the inner-most
458+
/// one and thus the one introducing the error: That is going to be the one
459+
/// that has "misaligned" the rest of the end-braces.
460+
int? getOffsetForCurlyBracketRecoveryStart() {
461+
if (openBraceWithMissingEndForPossibleRecovery == null) return null;
462+
Token? next = openBraceWithMissingEndForPossibleRecovery!.next;
463+
Token? lastMismatch;
464+
while (next != null && !next.isEof) {
465+
if (next.isA(TokenType.OPEN_CURLY_BRACKET)) {
466+
int indentOfNext = _spacesAtStartOfLogicalLineOf(next);
467+
if (indentOfNext >= 0 &&
468+
indentOfNext != _spacesAtStartOfLogicalLineOf(next.endGroup!)) {
469+
lastMismatch = next;
470+
}
471+
}
472+
next = next.next;
473+
}
474+
if (lastMismatch != null) {
475+
return lastMismatch.offset;
476+
}
477+
return null;
478+
}
479+
355480
void appendEofToken() {
356481
beginToken();
357482
discardOpenLt();
483+
if (!groupingStack.isEmpty &&
484+
groupingStack.head.isA(TokenType.OPEN_CURLY_BRACKET) &&
485+
groupingStack.tail!.isEmpty) {
486+
// We have a single `{` that's missing a `}`. Maybe the user is typing?
487+
openBraceWithMissingEndForPossibleRecovery = groupingStack.head;
488+
}
358489
while (!groupingStack.isEmpty) {
359490
unmatchedBeginGroup(groupingStack.head);
360491
groupingStack = groupingStack.tail!;
@@ -870,6 +1001,15 @@ abstract class AbstractScanner implements Scanner {
8701001
}
8711002

8721003
if (next == $CLOSE_CURLY_BRACKET) {
1004+
if (offsetForCurlyBracketRecoveryStart != null &&
1005+
!groupingStack.isEmpty &&
1006+
groupingStack.head.isA(TokenType.OPEN_CURLY_BRACKET) &&
1007+
groupingStack.head.offset == offsetForCurlyBracketRecoveryStart) {
1008+
// This instance of the scanner was instructed to recover this
1009+
// opening curly bracket.
1010+
unmatchedBeginGroup(groupingStack.head);
1011+
groupingStack = groupingStack.tail!;
1012+
}
8731013
return appendEndGroup(
8741014
TokenType.CLOSE_CURLY_BRACKET, OPEN_CURLY_BRACKET_TOKEN);
8751015
}

pkg/_fe_analyzer_shared/lib/src/scanner/recover.dart

-55
Original file line numberDiff line numberDiff line change
@@ -4,61 +4,6 @@
44

55
library _fe_analyzer_shared.scanner.recover;
66

7-
import 'token.dart' show Token, TokenType;
8-
9-
import 'token_impl.dart' show StringTokenImpl;
10-
11-
import 'error_token.dart' show ErrorToken;
12-
13-
/// Recover from errors in [tokens]. The original sources are provided as
14-
/// [bytes]. [lineStarts] are the beginning character offsets of lines, and
15-
/// must be updated if recovery is performed rewriting the original source
16-
/// code.
17-
Token scannerRecovery(List<int> bytes, Token tokens, List<int> lineStarts) {
18-
// Sanity check that all error tokens are prepended.
19-
20-
// TODO(danrubel): Remove this in a while after the dust has settled.
21-
22-
// Skip over prepended error tokens
23-
Token token = tokens;
24-
while (token is ErrorToken) {
25-
token = token.next!;
26-
}
27-
28-
// Assert no error tokens in the remaining tokens
29-
while (!token.isEof) {
30-
if (token is ErrorToken) {
31-
for (int count = 0; count < 3; ++count) {
32-
Token previous = token.previous!;
33-
if (previous.isEof) break;
34-
token = previous;
35-
}
36-
StringBuffer msg = new StringBuffer(
37-
"Internal error: All error tokens should have been prepended:");
38-
for (int count = 0; count < 7; ++count) {
39-
if (token.isEof) break;
40-
msg.write(' ${token.runtimeType},');
41-
token = token.next!;
42-
}
43-
throw msg.toString();
44-
}
45-
token = token.next!;
46-
}
47-
48-
return tokens;
49-
}
50-
51-
Token synthesizeToken(int charOffset, String value, TokenType type) {
52-
return new StringTokenImpl.fromString(type, value, charOffset);
53-
}
54-
55-
Token skipToEof(Token token) {
56-
while (!token.isEof) {
57-
token = token.next!;
58-
}
59-
return token;
60-
}
61-
627
String closeBraceFor(String openBrace) {
638
return const {
649
'(': ')',

pkg/_fe_analyzer_shared/lib/src/scanner/scanner.dart

+35-13
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,11 @@
44

55
library _fe_analyzer_shared.scanner;
66

7-
import 'dart:convert' show Utf8Encoder, unicodeReplacementCharacterRune;
7+
import 'dart:convert' show unicodeReplacementCharacterRune;
88
import 'dart:typed_data' show Uint8List;
99

1010
import 'abstract_scanner.dart'
1111
show LanguageVersionChanged, ScannerConfiguration;
12-
import 'recover.dart' show scannerRecovery;
1312
import 'string_scanner.dart' show StringScanner;
1413
import 'token.dart' show Token;
1514
import 'utf8_bytes_scanner.dart' show Utf8BytesScanner;
@@ -33,8 +32,6 @@ export 'utf8_bytes_scanner.dart' show Utf8BytesScanner;
3332

3433
const int unicodeReplacementCharacter = unicodeReplacementCharacterRune;
3534

36-
typedef Token Recover(List<int> bytes, Token tokens, List<int> lineStarts);
37-
3835
abstract class Scanner {
3936
/// Returns true if an error occurred during [tokenize].
4037
bool get hasErrors;
@@ -61,12 +58,30 @@ ScannerResult scan(Uint8List bytes,
6158
bool includeComments = false,
6259
LanguageVersionChanged? languageVersionChanged,
6360
bool allowLazyStrings = true}) {
64-
Scanner scanner = new Utf8BytesScanner(bytes,
61+
Utf8BytesScanner scanner = new Utf8BytesScanner(bytes,
6562
configuration: configuration,
6663
includeComments: includeComments,
6764
languageVersionChanged: languageVersionChanged,
6865
allowLazyStrings: allowLazyStrings);
69-
return _tokenizeAndRecover(scanner, bytes: bytes);
66+
Token tokens = scanner.tokenize();
67+
if (scanner.hasErrors) {
68+
// If there was a single missing `}` and the scanner can identify a good
69+
// candidate for better recovery, create a new scanner and instruct it to
70+
// do that recovery.
71+
int? offsetForCurlyBracketRecoveryStart =
72+
scanner.getOffsetForCurlyBracketRecoveryStart();
73+
if (offsetForCurlyBracketRecoveryStart != null) {
74+
scanner = new Utf8BytesScanner(bytes,
75+
configuration: configuration,
76+
includeComments: includeComments,
77+
languageVersionChanged: languageVersionChanged,
78+
allowLazyStrings: allowLazyStrings);
79+
scanner.offsetForCurlyBracketRecoveryStart =
80+
offsetForCurlyBracketRecoveryStart;
81+
tokens = scanner.tokenize();
82+
}
83+
}
84+
return new ScannerResult(tokens, scanner.lineStarts, scanner.hasErrors);
7085
}
7186

7287
/// Scan/tokenize the given [source].
@@ -78,15 +93,22 @@ ScannerResult scanString(String source,
7893
configuration: configuration,
7994
includeComments: includeComments,
8095
languageVersionChanged: languageVersionChanged);
81-
return _tokenizeAndRecover(scanner, source: source);
82-
}
83-
84-
ScannerResult _tokenizeAndRecover(Scanner scanner,
85-
{List<int>? bytes, String? source}) {
8696
Token tokens = scanner.tokenize();
8797
if (scanner.hasErrors) {
88-
if (bytes == null) bytes = const Utf8Encoder().convert(source!);
89-
tokens = scannerRecovery(bytes, tokens, scanner.lineStarts);
98+
// If there was a single missing `}` and the scanner can identify a good
99+
// candidate for better recovery, create a new scanner and instruct it to
100+
// do that recovery.
101+
int? offsetForCurlyBracketRecoveryStart =
102+
scanner.getOffsetForCurlyBracketRecoveryStart();
103+
if (offsetForCurlyBracketRecoveryStart != null) {
104+
scanner = new StringScanner(source,
105+
configuration: configuration,
106+
includeComments: includeComments,
107+
languageVersionChanged: languageVersionChanged);
108+
scanner.offsetForCurlyBracketRecoveryStart =
109+
offsetForCurlyBracketRecoveryStart;
110+
tokens = scanner.tokenize();
111+
}
90112
}
91113
return new ScannerResult(tokens, scanner.lineStarts, scanner.hasErrors);
92114
}

0 commit comments

Comments
 (0)