Skip to content
This repository has been archived by the owner on Aug 17, 2019. It is now read-only.

fix: strip BOM when reading JSON files #8

Merged
merged 4 commits into from
Sep 6, 2017
Merged

fix: strip BOM when reading JSON files #8

merged 4 commits into from
Sep 6, 2017

Conversation

rmg
Copy link
Contributor

@rmg rmg commented Sep 2, 2017

fs.readFile* don't strip the BOM header, even when specifying 'utf8' as
the encoding, and JSON.parse() doesn't handle it either. There are
technically a bunch of BOM indicators, but this is the one seen most
commonly and actually appears in a number of package.json files in the
wild.

See nodejs/node#6924, nodejs/node#3040 for background.

@rmg
Copy link
Contributor Author

rmg commented Sep 2, 2017

Rebased after #6 was merged.

@rmg
Copy link
Contributor Author

rmg commented Sep 2, 2017

I don't have a large sample size, but it looks like among my local cache, the problem file is a singular occurrence:

$ find ~/.npm/_cacache/content-v2/sha1 -type file | while read f; do \
   tar -C ~/deleteme/bom --strip-components 1 -xUzf $f package/package.json \
    && file ~/deleteme/bom/package.json; \
  done | cut -d: -f2 | sort | uniq -c
2810  ASCII text
  56  ASCII text, with CRLF line terminators
   7  ASCII text, with very long lines
   1  ASCII text, with very long lines, with no line terminators
   1  HTML document text, ASCII text
   1  UTF-8 Unicode (with BOM) text
 170  UTF-8 Unicode text
   1  UTF-8 Unicode text, with CRLF line terminators
   2  UTF-8 Unicode text, with very long lines

@rmg
Copy link
Contributor Author

rmg commented Sep 3, 2017

Went hunting to see how npm does this, in case it has a clever way of handling it.. Turns out it is basically doing the same thing: https://github.com/npm/npm/blob/latest/lib/utils/parse-json.js

if (content.charCodeAt(0) === 0xFEFF) {
  content = content.slice(1)
}

@rmg
Copy link
Contributor Author

rmg commented Sep 3, 2017

@zkat if this feels like deja vu.. npm/npm#8724

@rmg rmg force-pushed the stripBOM branch 2 times, most recently from 66a3363 to 6fb3db0 Compare September 3, 2017 02:59
@zkat
Copy link
Owner

zkat commented Sep 5, 2017

This LGTM.

Can we get a test for it, too?

Copy link
Owner

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but needs a test (for all of npm-shrinkwrap.json, package-lock.json, and package.json, imo)

@rmg
Copy link
Contributor Author

rmg commented Sep 5, 2017

Can we get a test for it, too?

Incoming. I skipped it on the first pass because every time I looked at failing file with vim it "helpfully" stripped the BOM for me.

Looks good, but needs a test (for all of npm-shrinkwrap.json, package-lock.json, and package.json, imo)

Is it really necessary to test 3 different filenames in a unit test for readJson?

rmg added 2 commits September 5, 2017 08:37
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as
the encoding, and JSON.parse() doesn't handle it either. There are
technically a bunch of BOM indicators, but this is the one seen most
commonly and actually appears in a number of package.json files in the
wild.

See nodejs/node#6924, nodejs/node#3040 for background.
@rmg
Copy link
Contributor Author

rmg commented Sep 5, 2017

@zkat I've rebased and added a unit test for for readJson.

@rmg rmg mentioned this pull request Sep 5, 2017
@zkat zkat merged commit 2529149 into zkat:latest Sep 6, 2017
@rmg rmg deleted the stripBOM branch September 6, 2017 15:10
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants