-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Account for bytes processed by encoding detection #534
Conversation
Ah! Sorry I somehow missed this part. Since we already got your CLA (I think?) I'll just need to quickly skim over this and should be able to merge. The only blocked is that there's a CVE that I need to handle first, but I should be able to get to this next! Thank you again for contributing this! |
@@ -252,7 +254,7 @@ public JsonParser constructParser(ObjectReadContext readCtxt, | |||
ByteQuadsCanonicalizer can = rootByteSymbols.makeChild(factoryFeatures); | |||
return new UTF8StreamJsonParser(readCtxt, _context, | |||
streamReadFeatures, formatReadFeatures, _in, can, | |||
_inputBuffer, _inputPtr, _inputEnd, _bufferRecyclable); | |||
_inputBuffer, _inputPtr, _inputEnd, bytesProcessed, _bufferRecyclable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmh. This is bit problematic wrt backwards-compatibility. Need to think about it, needs to go in 2.10.0 not 2.9.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problematic because UTF8StreamJsonParser
no longer has a constructor with 10 arguments (it has 11 now)? I could add a UTF8StreamJsonParser
public constructor that defaults bytesProcessed
to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, only for that reason. So although technically one is supposed to always make sure jackson-core / jackson-databind minor versions align, we try to make sure that "adjacent" versions still have implementations of deprecated methods for at least one version.
So in this case keeping deprecated "old" version around would allow 2.9/2.10 combination to still work (2.9 databind, 2.10 core).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to add back 10 arg constructor.
Is it really worth deprecating that 10 args constructor in favor of the 11 args one? We could keep both...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 arg constructor is back!
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed
separately and uses those to generate JsonLocation. When the byte
payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a
few bytes ahead of the parser, moves/increases the offset and passes the
newly computed offset to the parser without telling it some bytes have
been pre-processed.
With this change, the number of bytes pre-processed for encoding
detection is passed to the parser. JsonLocation instances returned by
the parser now point to the correct byte offset when payload has a BOM.
Issue: #533