Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Inline TLS field access for linux/osx x64/arm64 #87082

Merged
merged 88 commits into from
Jul 6, 2023

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jun 2, 2023

Extending #85619 and #82973 for linux.

       mov      rdi, 0x7FA9DB182F28      ; GOT descriptor address of `maxThreadStaticBlocks`
       mov      rax, 0x7FA9DB7A5DB0      ; function address of __tls_get_addr
       call     rax                      ; gets the address of `maxThreadStaticBlocks`
       cmp      dword ptr [rax], 2
       jl       SHORT G_M8151_IG06
       mov      rdi, qword ptr [rax+08H]   ; access the `threadStaticBlocks` variable
       mov      rdi, qword ptr [rdi+10H]
       test     rdi, rdi
       je       SHORT G_M8151_IG06
       mov      rax, bword ptr [rdi]
       add      rax, 16

Base:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 3.383 ns | 0.0378 ns | 0.0316 ns | 3.374 ns | 3.346 ns | 3.459 ns |         - |
| SetThreadStatic | 5.334 ns | 0.0733 ns | 0.0650 ns | 5.338 ns | 5.261 ns | 5.451 ns |         - |

Diff:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 1.048 ns | 0.0200 ns | 0.0177 ns | 1.041 ns | 1.026 ns | 1.087 ns |         - |
| SetThreadStatic | 2.844 ns | 0.0379 ns | 0.0354 ns | 2.837 ns | 2.807 ns | 2.915 ns |         - | 

This also adds mrs instruction in arm64 to access TLS on linux. The codegen is:

G_M8151_IG02:  
            mrs     x0, tpidr_el0
            ldr     w1, [x0, #0xE8]  ; `maxThreadStaticBlocks` value
            cmp     w1, #2
            blt     G_M8151_IG05
            ldr     x0, [x0, #0xF0]  ; `threadStaticBlocks` array
            ldr     x1, [x0, #0x10]
            cbz     x1, G_M8151_IG05

G_M8151_IG03:  
            mov     w0, #5
            str     w0, [x1, #0x24]

osx/arm64:

Because of register allocator doesn't reuse the x0 properly, we end up creating the same constant again in x1, but this is how osx/arm64 looks like:

            movz    x0, #0x8C00
            movk    x0, #817 LSL #16
            movk    x0, #1 LSL #32
            movz    x1, #0x8C00
            movk    x1, #817 LSL #16
            movk    x1, #1 LSL #32
            ldr     x1, [x1]
            blr     x1

osx/x64:

image

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2023
@ghost ghost assigned kunalspathak Jun 2, 2023
@ghost
Copy link

ghost commented Jun 2, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extending #85619 and #82973 for linux.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

JITDUMP("offsetOfThreadLocalStoragePointer= %u\n", threadStaticBlocksInfo.offsetOfThreadLocalStoragePointer);
JITDUMP("offsetOfMaxThreadStaticBlocks= %u\n", offsetOfMaxThreadStaticBlocksVal);
JITDUMP("offsetOfThreadStaticBlocks= %u\n", offsetOfThreadStaticBlocksVal);
JITDUMP("tlsIndex= %u\n", (ssize_t)threadStaticBlocksInfo.tlsIndex.addr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dump threadStaticBlocksInfo.tlsIndex.accessType also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

I don't know. Structurally, it seems odd to omit it. Up to you.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 1, 2023
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 3, 2023
@jakobbotsch jakobbotsch dismissed their stale review July 3, 2023 21:15

Addressed

@kunalspathak
Copy link
Member Author

Ping @BruceForstall

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ one nit

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants