Skip to content

Autosemi after comments #134

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 5 commits into
base: new-ast
Choose a base branch
from

Conversation

flbulgarelli
Copy link
Contributor

@flbulgarelli flbulgarelli commented Mar 12, 2022

Hi! Currently, there is a subtle difference in the way language-javascript deals with auto semicolons when comments are present.

Let's have a look at this code sample:

function f1() {
  return // hello
    4
}

function f2() {
  return /* hello */
    4
}


function f3() {
  return /* hello
    */ 4
}

function f4() {
  return
    4
}

And how esprima parses it:

{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f1"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f2"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f3"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f4"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    }
  ],
  "sourceType": "script"
}

As you can see, the parser is introducing an automatic semicolon in all the four scenarios. This behavior is consistent with modern V8 and SpiderMonkey engines.

However, let's have a look at how language-javascript handles these scenarios. If you add the following tests to ProgramParser.hs...

describe "function with autosemi on return" $ do
  it "return with inline comment" $ do
      testProg "function f1(){ return //hello\n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment" $ do
      testProg "function f2(){ return /* hello */\n  4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment with newline" $ do
      testProg "function f3(){ return /* hello\n  */ 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with newline" $ do
      testProg "function f4(){ return \n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f4' () (JSBlock [JSReturn ,JSDecimal '4'])])"

...only the fourth and last test will actually produce the expected results. The other three will fail with the same error:

  test/Test/Language/Javascript/ProgramParser.hs:29:13:
  1) Program parser:, function with autosemi on return, return with inline comment
      expected: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with inline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:32:13:
  2) Program parser:, function with autosemi on return, return with multiline comment
      expected: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:35:13:
  3) Program parser:, function with autosemi on return, return with multiline comment with newline
      expected: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment with newline/"

In other words, this parser is only inserting an automatic semicolon after a newline, but not after comments. This PR fixes this issue.

I have also refactored the Lexer tests, so that they don't just inspect the tokens produced by alex, but also those sent to happy.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant