Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix resource decompression when resource has multi-byte characters #139

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

rspencer01
Copy link
Collaborator

@rspencer01 rspencer01 commented Jan 8, 2025

The decompression code has bugs that will be difficult to fix with the version of JS supported by wkhtmltox. This prevents us from compressing the resources when saving the html to convert.

Once this is done we can then use TextDecoder to decode resources as utf-8 instead of utf-16 (the default for JavaScript) which fixes #136 . There is still a small bug that if a character lies on a 1024 byte boundary we truncate it, but this should be addressed by #137

The decompression code has bugs that will be difficult to fix with the
version of JS supported by wkhtmltox. This prevents us from compressing
the resources when saving the html to convert.
@coveralls
Copy link

coveralls commented Jan 8, 2025

Coverage Status

coverage: 87.329% (-0.005%) from 87.334%
when pulling becc70d on rspencer01:permit_no_compression
into 7dfb490 on man-group:master.

The standard `fromCharCode` we were using to decode
character-by-character was using UTF-16 by default. This was a problem
when we had utf-8 encoded resources and so we use TextDecoder to decode
these.

However, the TextDecoder API is not available in wkhtmltox, which is the
reason for disabling the compression entirely when using wkhtmltox.
@rspencer01 rspencer01 changed the title [wip] Forbid storing compressed JS when rendering with wkhtmltox Fix resource decompression when resource has multi-byte characters Jan 8, 2025
@rspencer01 rspencer01 merged commit 1c48cea into man-group:master Jan 8, 2025
3 checks passed
@rspencer01 rspencer01 deleted the permit_no_compression branch January 8, 2025 16:13
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deflation doesn't handle multi-byte characters
3 participants