Skip to content

JavaTokenParsers stringLiteral Unicode literals not fully correct #324

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
scabug opened this issue Dec 4, 2008 · 2 comments
Open

JavaTokenParsers stringLiteral Unicode literals not fully correct #324

scabug opened this issue Dec 4, 2008 · 2 comments

Comments

@scabug
Copy link

scabug commented Dec 4, 2008

The stringLiteral function in scala.util.parsing.combinator.!JavaTokenParsers
tries to parse String literals with Unicode escape
sequence in them

  def stringLiteral: Parser[String] = 
    ("\""+"""([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})*"""+"\"").r

However, such Unicode escapes can occur elsewhere,
including the first \ of an escape sequence,
or even instead of the " characters themselves (\u0022).

If you wish to support Unicode escapes, they should be
handled in a lower level stream Char => Char parser only,
where they would apply to string literals, numbers,
identifiers, etc.

Compile and run the following program

import scala.util.parsing.combinator.JavaTokenParsers
object StringLiteral extends JavaTokenParsers with Application {
  override def main(args: Array[String]) {
    args.foreach { a => println(a)
                        println(parseAll(stringLiteral, a)) }
  }
}

and pass it the strings:

  scala StringLiteral "\"trivial\"" "\"A backslash \u005c\ character\u0022"

and it parses the first correctly but fails on the second argument

"trivial"
[1.10] parsed: "trivial"
"A backslash \u005c\ character\u0022
[1.1] failure: string matching regex `"([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})*"' expected but `"' found

"A backslash \u005c\ character\u0022
^

(Be careful about quoting arguments to the shell.)

@scabug
Copy link
Author

scabug commented Dec 4, 2008

@scabug
Copy link
Author

scabug commented Jan 25, 2012

@dcsobral said:
I'm siding on removing the unicode treatment with a migration warning.

@scabug scabug closed this as completed Jul 17, 2015
@SethTisue SethTisue transferred this issue from scala/bug Nov 19, 2020
@SethTisue SethTisue reopened this Nov 19, 2020
@scala scala deleted a comment from scabug Nov 19, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants