Skip to content
This repository has been archived by the owner on Sep 6, 2021. It is now read-only.

Make function regexp match JS identifier syntax exactly #1119

Merged
merged 4 commits into from
Jun 22, 2012

Conversation

njx
Copy link

@njx njx commented Jun 22, 2012

Fixes #1108.

In addition to changing the function name regexps to exactly match JS identifier syntax (first char is $, _, or alpha; later chars are $, _, or alphanumeric), I removed a \b from the first arm of the regexp that seems unnecessary since it explicitly looks for a \s afterwards anyway.

Added unit tests for various whitespace cases, but we might want to add some more as well (including false positives).

@njx
Copy link
Author

njx commented Jun 22, 2012

Note that this also fixes the ternary operator case that Peter mentioned, but I couldn't quickly write a unit test for that since we don't have an easy way right now to verify that it doesn't find a match for a nonexistent name. Would be good if someone could add that.

@ghost ghost assigned jasonsanjose Jun 22, 2012
@jasonsanjose
Copy link
Member

Reviewing

@@ -53,7 +53,7 @@ define(function (require, exports, module) {
* RegExp matches any sequence of characters that is not whitespace.
* @type {RegExp}
*/
var _functionRegExp = /(function\b\s+([^\s]+)(\([^)]*\)))|(([^\s\.]+)\s*[:=]\s*function\s*(\([^)]*\)))/g;
var _functionRegExp = /(function\s+([$_A-Za-z][$_A-Za-z0-9]*)\s*(\([^)]*\)))|(([$_A-Za-z][$_A-Za-z0-9]*)\s*[:=]\s*function\s*(\([^)]*\)))/g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided this change originally since identifiers can include unicode letters http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf. I think this is good enough for now until we implement the CodeMirror parser here. I'll add a TODO and reference #1125.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't support unicode or upper-ascii. We may want to update the expression to exclude all ascii punctuation except $.

http://stackoverflow.com/questions/1661197/valid-characters-for-javascript-variable-names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding high ascii and all unicode may be good enough for now. This works for the test cases I threw at it:

(function\s+([$A-Za-z\u007F-\uFFFF][$A-Za-z0-9\u007F-\uFFFF])\s(([^)])))|(([$A-Za-z\u007F-\uFFFF][$A-Za-z0-9\u007F-\uFFFF])\s[:=]\s_function\s(([^)]_)))

…s. This doesn't completely match the JS identifier spec, but is good enough for now.
@redmunds
Copy link
Contributor

RegExps look good.

@jasonsanjose
Copy link
Member

I'm working on some false positive tests now. I'll update this pull.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quick Open JS returns spurious results in codemirror.js
4 participants