-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Make function regexp match JS identifier syntax exactly #1119
Conversation
Note that this also fixes the ternary operator case that Peter mentioned, but I couldn't quickly write a unit test for that since we don't have an easy way right now to verify that it doesn't find a match for a nonexistent name. Would be good if someone could add that. |
Reviewing |
@@ -53,7 +53,7 @@ define(function (require, exports, module) { | |||
* RegExp matches any sequence of characters that is not whitespace. | |||
* @type {RegExp} | |||
*/ | |||
var _functionRegExp = /(function\b\s+([^\s]+)(\([^)]*\)))|(([^\s\.]+)\s*[:=]\s*function\s*(\([^)]*\)))/g; | |||
var _functionRegExp = /(function\s+([$_A-Za-z][$_A-Za-z0-9]*)\s*(\([^)]*\)))|(([$_A-Za-z][$_A-Za-z0-9]*)\s*[:=]\s*function\s*(\([^)]*\)))/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided this change originally since identifiers can include unicode letters http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf. I think this is good enough for now until we implement the CodeMirror parser here. I'll add a TODO and reference #1125.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't support unicode or upper-ascii. We may want to update the expression to exclude all ascii punctuation except $.
http://stackoverflow.com/questions/1661197/valid-characters-for-javascript-variable-names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding high ascii and all unicode may be good enough for now. This works for the test cases I threw at it:
(function\s+([$A-Za-z\u007F-\uFFFF][$A-Za-z0-9\u007F-\uFFFF])\s(([^)])))|(([$A-Za-z\u007F-\uFFFF][$A-Za-z0-9\u007F-\uFFFF])\s[:=]\s_function\s(([^)]_)))
…s. This doesn't completely match the JS identifier spec, but is good enough for now.
RegExps look good. |
I'm working on some false positive tests now. I'll update this pull. |
Make function regexp match JS identifier syntax exactly
Fixes #1108.
In addition to changing the function name regexps to exactly match JS identifier syntax (first char is$, _, or alpha; later chars are $ , _, or alphanumeric), I removed a
\b
from the first arm of the regexp that seems unnecessary since it explicitly looks for a\s
afterwards anyway.Added unit tests for various whitespace cases, but we might want to add some more as well (including false positives).