-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
UTF-8 BOM not accounted for in JsonLocation.getByteOffset() #533
Comments
JsonLocation.getByteOffset()
Thank you for reporting this and doing troubleshooting. Sounds like a bug. |
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
I submitted a PR a couple of weeks ago. Please let me know of any feedback or necessary changes. |
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
Looks good. I can merge this in The only thing we now need is the Contributor License Agreement, found from https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf Thank you in advance! |
Just submitted contributor agreement. |
@fabienrenaud I think I'd rather backport it only in 2.10 (had to do that manually). But 2.10.0 should be out within month. |
Version: Jackson 2.9.8
parser.getCurrentLocation().getByteOffset()
returns the wrong byte offset for the underlying byte array when the payload start with a BOM.The json parser processes well such json payloads but the
JsonLocation
it returns ignores the offset introduced by the BOM.Full standalone repro:
Output:
You can see the result for the BOM payload gets shifted to the left by exactly 3 bytes while other payloads with or without padding characters are handled as expected.
The text was updated successfully, but these errors were encountered: