Fix resource decompression when resource has multi-byte characters #139

rspencer01 · 2025-01-08T13:25:29Z

The decompression code has bugs that will be difficult to fix with the version of JS supported by wkhtmltox. This prevents us from compressing the resources when saving the html to convert.

Once this is done we can then use TextDecoder to decode resources as utf-8 instead of utf-16 (the default for JavaScript) which fixes #136 . There is still a small bug that if a character lies on a 1024 byte boundary we truncate it, but this should be addressed by #137

The decompression code has bugs that will be difficult to fix with the version of JS supported by wkhtmltox. This prevents us from compressing the resources when saving the html to convert.

coveralls · 2025-01-08T13:32:32Z

coverage: 87.329% (-0.005%) from 87.334%
when pulling becc70d on rspencer01:permit_no_compression
into 7dfb490 on man-group:master.

The standard `fromCharCode` we were using to decode character-by-character was using UTF-16 by default. This was a problem when we had utf-8 encoded resources and so we use TextDecoder to decode these. However, the TextDecoder API is not available in wkhtmltox, which is the reason for disabling the compression entirely when using wkhtmltox.

Forbid storing compressed JS when rendering with wkhtmltox

e4001dc

The decompression code has bugs that will be difficult to fix with the version of JS supported by wkhtmltox. This prevents us from compressing the resources when saving the html to convert.

rspencer01 changed the title ~~[wip] Forbid storing compressed JS when rendering with wkhtmltox~~ Fix resource decompression when resource has multi-byte characters Jan 8, 2025

mrjackbarker approved these changes Jan 8, 2025

View reviewed changes

rspencer01 merged commit 1c48cea into man-group:master Jan 8, 2025
3 checks passed

rspencer01 deleted the permit_no_compression branch January 8, 2025 16:13

rspencer01 mentioned this pull request Jan 11, 2025

Use browsers' DecompressionStream to do zlib decompression instead of rolling our own #142

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix resource decompression when resource has multi-byte characters #139

Fix resource decompression when resource has multi-byte characters #139

rspencer01 commented Jan 8, 2025 •

edited

Loading

coveralls commented Jan 8, 2025 •

edited

Loading

Fix resource decompression when resource has multi-byte characters #139

Fix resource decompression when resource has multi-byte characters #139

Conversation

rspencer01 commented Jan 8, 2025 • edited Loading

coveralls commented Jan 8, 2025 • edited Loading

rspencer01 commented Jan 8, 2025 •

edited

Loading

coveralls commented Jan 8, 2025 •

edited

Loading