Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Does not handle escaped strings #19

Open
cboscolo opened this issue Jul 3, 2016 · 5 comments
Open

Does not handle escaped strings #19

cboscolo opened this issue Jul 3, 2016 · 5 comments

Comments

@cboscolo
Copy link

cboscolo commented Jul 3, 2016

I am using csv-streamify in my log parsing project: https://github.com/cboscolo/elb2loggly.
Parsing lines with escaped strings does not work properly. For example, using var csvToJson = csv({objectMode: true, delimiter: ' '}); to parse this line:

2016-06-01T14:09:00.418027Z anaconda-org "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1" "pip/8.1.2 {\"openssl_version\":\"OpenSSL 1.0.2d 9 Jul 2015\"}" TLSv1.2

Results in this:
[ "2016-06-01T14:09:00.418027Z", "anaconda-org", "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1", "pip/8.1.2 {\\openssl_version\\:\\OpenSSL", "1.0.2d", "9", "Jul", "2015\\}", "TLSv1.2" ]

Instead of this:
[ "2016-06-01T14:09:00.418027Z", "anaconda-org", "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1", "pip/8.1.2 {"openssl_version":"OpenSSL 1.0.2d 9 Jul 2015"}", "TLSv1.2" ]

Have you considered adding support for escaped strings? If not, do you know of any other csv parsing npm modules that do?

@stephenakearns
Copy link

Adding my support for this item - we are downstream from elb2loggly and this is a blocker for getting reliable logging from our load-balancers due to this issue. Tagging the downstream issue: cboscolo/elb2loggly#22

@klaemo
Copy link
Owner

klaemo commented Jul 25, 2016

Sorry for not responding. I'm looking into into it right now!

@klaemo
Copy link
Owner

klaemo commented Jul 25, 2016

so, as far as i understand it, we have the following problem:

'"' === '\"' // true

The escaped and the unescaped quotation marks are the same in JS, because '' is the escape character of javascript strings (or I just haven't found a way to make sense of this yet).

You can't choose another escape char, I assume? For example in csv "" is often use to indicate an escaped ". This is also supported by this parser.

2016-06-01T14:09:00.418027Z anaconda-org "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1" "pip/8.1.2 {""openssl_version"":""OpenSSL 1.0.2d 9 Jul 2015""}" TLSv1.2

[
    '2016-06-01T14:09:00.418027Z',
    'anaconda-org',
    'GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1',
    'pip/8.1.2 {"openssl_version":"OpenSSL 1.0.2d 9 Jul 2015"}',
    'TLSv1.2' 
]

@cboscolo
Copy link
Author

Unfortunately, the CSV files are being generated by AWS, so I have no control over how they escape the double quote. I tried to two double quotes "", but that did not work either. I think using \" to escape the " inside a string that is delimited with " is the proper thing.

@klaemo
Copy link
Owner

klaemo commented Jul 26, 2016

I'm not saying \ isn't a thing, it's just that JavaScript interprets it as an escape character. Create two strings in your browser console or node REPL, one with backslash escaping and one without. You'll see they end up being identical. The backslash doesn't appear in the string anymore.
That being the case, I don't think I can add support for this.

But: I could also be totally wrong and missing the point.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants