Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Account for bytes processed by encoding detection #534

Merged
merged 1 commit into from
May 29, 2019

Conversation

fabienrenaud
Copy link
Contributor

UTF8StreamJsonParser tracks read pointer (offset) and bytes processed
separately and uses those to generate JsonLocation. When the byte
payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a
few bytes ahead of the parser, moves/increases the offset and passes the
newly computed offset to the parser without telling it some bytes have
been pre-processed.
With this change, the number of bytes pre-processed for encoding
detection is passed to the parser. JsonLocation instances returned by
the parser now point to the correct byte offset when payload has a BOM.

Issue: #533

@cowtowncoder
Copy link
Member

cowtowncoder commented May 14, 2019

Ah! Sorry I somehow missed this part. Since we already got your CLA (I think?) I'll just need to quickly skim over this and should be able to merge. The only blocked is that there's a CVE that I need to handle first, but I should be able to get to this next!

Thank you again for contributing this!

@@ -252,7 +254,7 @@ public JsonParser constructParser(ObjectReadContext readCtxt,
ByteQuadsCanonicalizer can = rootByteSymbols.makeChild(factoryFeatures);
return new UTF8StreamJsonParser(readCtxt, _context,
streamReadFeatures, formatReadFeatures, _in, can,
_inputBuffer, _inputPtr, _inputEnd, _bufferRecyclable);
_inputBuffer, _inputPtr, _inputEnd, bytesProcessed, _bufferRecyclable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmh. This is bit problematic wrt backwards-compatibility. Need to think about it, needs to go in 2.10.0 not 2.9.x.

Copy link
Contributor Author

@fabienrenaud fabienrenaud May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problematic because UTF8StreamJsonParser no longer has a constructor with 10 arguments (it has 11 now)? I could add a UTF8StreamJsonParser public constructor that defaults bytesProcessed to 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, only for that reason. So although technically one is supposed to always make sure jackson-core / jackson-databind minor versions align, we try to make sure that "adjacent" versions still have implementations of deprecated methods for at least one version.

So in this case keeping deprecated "old" version around would allow 2.9/2.10 combination to still work (2.9 databind, 2.10 core).

Copy link
Contributor Author

@fabienrenaud fabienrenaud May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add back 10 arg constructor.
Is it really worth deprecating that 10 args constructor in favor of the 11 args one? We could keep both...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 arg constructor is back!

UTF8StreamJsonParser tracks read pointer (offset) and bytes processed
separately and uses those to generate JsonLocation. When the byte
payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a
few bytes ahead of the parser, moves/increases the offset and passes the
newly computed offset to the parser without telling it some bytes have
been pre-processed.
With this change, the number of bytes pre-processed for encoding
detection is passed to the parser. JsonLocation instances returned by
the parser now point to the correct byte offset when payload has a BOM.

Issue: FasterXML#533
@cowtowncoder cowtowncoder merged commit 5573f4a into FasterXML:master May 29, 2019
@cowtowncoder cowtowncoder added 2.10 and removed 2.9 labels May 29, 2019
@cowtowncoder cowtowncoder added this to the 2.10.0 milestone May 29, 2019
@fabienrenaud fabienrenaud deleted the fix-533 branch November 17, 2019 22:57
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants