Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Inefficient String Concatenation When Compiling C to WASM/WAT #389

Open
XinyuShe opened this issue Mar 4, 2024 · 2 comments
Open

Inefficient String Concatenation When Compiling C to WASM/WAT #389

XinyuShe opened this issue Mar 4, 2024 · 2 comments

Comments

@XinyuShe
Copy link

XinyuShe commented Mar 4, 2024

I've encountered an issue while compiling C code to WASM, and subsequently converting it to WAT. The issue pertains to the way string concatenation is handled in the WAT output.
b.zip
Here's a snippet of my C source code snippet:

char src[50] = "Hello, ";
char dest[50] = "World!";
strcat(src, dest);

After compiling this C code to WASM and then converting it to WAT, I expected to find both strings 'Hello, ' and 'World!' in the data section of the WAT file. However, I could only find 'Hello, ' in the data section.

Instead of finding 'World!' as a contiguous string in the data section, I found it concatenated character by character in the function body, like so:

i32.const 87
local.set 44
local.get 4
local.get 44
i32.store8 offset=16
i32.const 111
local.set 45
local.get 4
local.get 45
i32.store8 offset=17
i32.const 114
local.set 46
local.get 4
local.get 46
i32.store8 offset=18
i32.const 108
local.set 47
local.get 4
local.get 47
i32.store8 offset=19
i32.const 100
local.set 48
local.get 4
local.get 48
i32.store8 offset=20
i32.const 33
local.set 49
local.get 4
local.get 49
i32.store8 offset=21

image

I'm puzzled by this behavior. Storing the strings in the data section seems to be a more efficient approach than concatenating them character by character in the function body. Is there a specific reason for this implementation? Could this be an optimization issue with the compiler?

@sunfishcode
Copy link
Member

It's probably a target-independent optimization in upstream LLVM doing this. Are you compiling with -O2? If so, it may be worth trying with -Oz or -Os instead.

@XinyuShe
Copy link
Author

XinyuShe commented Mar 5, 2024

Hi, thanks for your suggestion! @sunfishcode
I try O0,O1,O2,O3,Os,Oz one by one, but only O0 has string 'Hello, ', and no one has string 'World!'
here is my cmd:

 clang -O0 --target=wasm32-wasi -o b_o.wasm b.c ; wasm2wat b_o.wasm -o b_o.wat

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants