Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix decoding UTF-8 constant pool entries #150

Merged
merged 1 commit into from
Oct 4, 2021

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Oct 4, 2021

Constant pool entries of type CONSTANT_Utf8_info do not, despite
the name, use the UTF-8 encoding. They use a modified variant
of UTF-8, as specified by java.io.Data{Input,Output}.

When decoding these entries, the Indexer.decodeUtf8Entry method
interpreted the data as UTF-8. This didn't cause issues, because
the constants are usually human-readable strings that fit the ASCII
table, in which case the two encodings do not differ.

In case of machine-generated content, the difference may easily
occur; for example in case of a string that contains the null
character. One realistic example is the Kotlin standard library
JAR, where the kotlin/collections/ArraysKt___ArraysKt.class class
contains a @KotlinMetadata annotation whose d1 member contains
such "weird" string.

The fix is simple: use DataInputStream to read a String out of
the byte array.

Fixes #49

Constant pool entries of type `CONSTANT_Utf8_info` do not, despite
the name, use the UTF-8 encoding. They use a modified variant
of UTF-8, as specified by `java.io.Data{Input,Output}`.

When decoding these entries, the `Indexer.decodeUtf8Entry` method
interpreted the data as UTF-8. This didn't cause issues, because
the constants are usually human-readable strings that fit the ASCII
table, in which case the two encodings do not differ.

In case of machine-generated content, the difference may easily
occur; for example in case of a string that contains the null
character. One realistic example is the Kotlin standard library
JAR, where the `kotlin/collections/ArraysKt___ArraysKt.class` class
contains a `@KotlinMetadata` annotation whose `d1` member contains
such "weird" string.

The fix is simple: use `DataInputStream` to read a `String` out of
the byte array.
@Ladicek Ladicek added this to the 3.0.0 milestone Oct 4, 2021
@Ladicek Ladicek merged commit aac1389 into smallrye:smallrye Oct 4, 2021
@Ladicek Ladicek deleted the utf8-constant-encoding branch October 4, 2021 11:05
@Ladicek Ladicek linked an issue Oct 4, 2021 that may be closed by this pull request
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception when writing big indices
1 participant