Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

WASM build can crash with "RuntimeError Index out of bounds" when converting UTF-16LE text to UTF8 #11

Closed
ajrcarey opened this issue Feb 11, 2022 · 7 comments

Comments

@ajrcarey
Copy link
Owner

Follow-on from #10. The examples/objects sample, when compiled to WASM and run in the browser, will crash with a RuntimeError Index out of bounds during text conversion. Strangely, this appears to be happening when attempting to return a value from utils::get_string_from_pdfium_utf16le_bytes(), which may suggest an out-of-memory problem with the stack frame sizing in the default WASM compile settings.

@ajrcarey
Copy link
Owner Author

Possibly related to rustwasm/wasm-pack#479

@ajrcarey
Copy link
Owner Author

ajrcarey commented Feb 24, 2022

Set examples/.cargo/config.toml to create an 8 Mb stack rather than the default 1 Mb, as per rustwasm/wasm-pack#479. Confirmed 8 Mb stack size applied correctly using wasm2wat tool. RuntimeError continues. Used debug profile to confirm problem occurs in the following WASM instruction:

call $core::ptr::drop_in_place<alloc::vec::Vec<u8>>::h52e9c51caa5dfb98

inside the compiled utils::get_string_from_pdfium_utf16le_bytes() function. This appears to happen right on the exit from the function. Since the byte buffer containing the UTF-16LE data is taken by ownership into utils::get_string_from_pdfium_utf16le_bytes(), I suspect this is the freeing of that byte buffer.

drop_in_place() calls a stack of functions that ultimately end up inside the memory allocator.

@ajrcarey
Copy link
Owner Author

ajrcarey commented Feb 24, 2022

The actual error occurs in call $dlmalloc::dlmalloc::Dlmalloc<A>::malloc::h1538d4b11d1da1be, i.e. inside the standard Rust allocator used when targeting the wasm32-unknown-unknown architecture, dlmalloc. Could consider switching out for a different allocator when compiling to WASM?

@ajrcarey
Copy link
Owner Author

ajrcarey commented Feb 24, 2022

Changing the allocator to wee-alloc instead of dlmalloc changes the pattern at which failure occurs (it carries on a bit longer), but a RuntimeError is still thrown. On Edge and Chrome, the error is reported as "memory access out of bounds", which is a bit more descriptive at least. A big hint, however, is that all three browsers show the heap size as 35.6 Mb, with pdfium consuming roughly 10 Mb for its WASM heap and pdfium-render consuming a little over 20 Mb. The heap sizes are the same across browsers, which strongly suggests to me a set size limit of about 32 Mb for the entire runtime of the browser tab.

@ajrcarey
Copy link
Owner Author

ajrcarey commented Feb 24, 2022

Growing the module heap using instance.memory.grow() does correctly raise the heap limit - in my testing I grew the heaps assigned to both pdfium and pdfium-render by 100 Mb each, but the RuntimeError occurs at the same place :/

Forcing Rust to avoid freeing the byte buffer via std::mem::forget() shifts the allocation failure to a call to call $log::__private_api_log::h8c2be2e67ed23b4a, which itself fails on a call to call $alloc::fmt::format::h6ab9c6dede04b06a. This suggests that the failure is now occuring, somewhat ironically, in a log::info!() debugging statement. That ultimately does not matter; the point is that, whether the error occurs in an allocation or a deallocation, it is nevertheless occuring.

I need better disassembly in order to diagnose exactly what values are triggering the out of bounds error.

@ajrcarey
Copy link
Owner Author

ajrcarey commented Feb 24, 2022

Well, the stack frame sizing turned out to be a red herring after all. It was the error during the drop_in_place() deallocation that should have given the big hint: a buffer was being allocated with the wrong buffer length. It turned out that the call to FPDFTextObj_GetText() was the culprit; the return result specifies the number of bytes copied into the buffer in Pdfium's WASM heap, but when we copy that buffer back to pdfium-render's WASM heap we were using core::ptr::mut_ptr::copy_from() with the result specifying the count of FPDF_WCHARs to copy back, which is twice as many bytes. This quite predictably ended up corrupting the memory heap. What an annoying wild goose chase.

@ajrcarey
Copy link
Owner Author

Removed resizing of stack frame in examples/.cargo/config.toml, as it was never the problem. Reset default allocator from wee-alloc to dlmalloc. Removed explicit heap memory growth in Javascript. Corrected error in WasmPdfiumBindings::FPDFTextObj_GetText(). Removed debugging statements. Bumped crate version to 0.5.5. Pushed bug fix release to crates.io.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant