Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[AArch64][CodeGen][CodeSize] Redundant 'and' can be remove with shifts in addr mode #34101

Open
llvmbot opened this issue Sep 27, 2017 · 7 comments
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla confirmed Verified by a second party good first issue https://github.com/llvm/llvm-project/contribute missed-optimization

Comments

@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2017

Bugzilla Link 34753
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor

Extended Description

Test case:

int test(unsigned long long a, unsigned long long b, int *table) {    
  return table[(a * b) >> 58];  
}

Current assembly:

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret

We should be able to remove the 'and' for a code size with as follows:

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl#2]
        ret

I'm not interested in pursuing this optimization, but I figured I'd file the bug, regardless.

@llvmbot
Copy link
Member Author

llvmbot commented Sep 28, 2017

In DAGCombiner.cpp around line 5590 we have this target-independent combine which is folding the 'srl' and introducing the 'and'.

// fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) or
//                               (and (srl x, (sub c1, c2), MASK)
// Only fold this if the inner shift has no other uses -- if it does, folding
// this will increase the total number of instructions.

@llvmbot
Copy link
Member Author

llvmbot commented Sep 28, 2017

Here's the test in IR:

define i32 @test(i64 %a, i64 %b, i32* nocapture readonly %table) {
entry:
  %mul = mul i64 %b, %a
  %shr = lshr i64 %mul, 58
  %arrayidx = getelementptr inbounds i32, i32* %table, i64 %shr
  %0 = load i32, i32* %arrayidx, align 4
  ret i32 %0
}

And the initial DAG:

Initial selection DAG: BB#0 'test:entry'
SelectionDAG has 19 nodes:
  t0: ch = EntryToken
  t13: i64 = Constant<0>
        t6: i64,ch = CopyFromReg t0, Register:i64 %vreg2
              t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
              t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
            t7: i64 = mul t4, t2
          t9: i64 = srl t7, Constant:i64<58>
        t11: i64 = shl t9, Constant:i64<2>
      t12: i64 = add t6, t11
    t15: i32,ch = load<LD4[%arrayidx]> t0, t12, undef:i64
  t17: ch,glue = CopyToReg t0, Register:i32 %W0, t15
  t18: ch = AArch64ISD::RET_FLAG t17, Register:i32 %W0, t17:1

@llvmbot
Copy link
Member Author

llvmbot commented Nov 13, 2017

I'm not sure how to implement this. This assembly

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #&#8203;58
        ldr w0, [x2, x8, lsl#2]
        ret

I get when I skip fold (shl (srl x, c1), c2) -> (and (srl x, (sub c1, c2), MASK).
I'm not sure what opt-addr-mode means. As I understood I need to add an if statement with canFoldInAddressingMode before fold to check add mode. Please, correct me.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
@ilinpv ilinpv added the good first issue https://github.com/llvm/llvm-project/contribute label Mar 30, 2022
@llvmbot
Copy link
Member Author

llvmbot commented Mar 30, 2022

@llvm/issue-subscribers-good-first-issue

@varunkumare99
Copy link
Contributor

is this still open to work on ?

@RKSimon
Copy link
Collaborator

RKSimon commented May 30, 2023

Still seems to be a problem in trunk: https://simd.godbolt.org/z/595f991fP

@ParkHanbum
Copy link
Contributor

can this issue be resolved by disabling optimization fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) if code size-related option enabled?

ParkHanbum added a commit to ParkHanbum/llvm-project that referenced this issue Apr 21, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, llvm#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL llvm#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, llvm#58
        ldr w0, [x2, x8, lsl llvm#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
ParkHanbum added a commit to ParkHanbum/llvm-project that referenced this issue Apr 21, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, llvm#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL llvm#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, llvm#58
        ldr w0, [x2, x8, lsl llvm#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
davemgreen pushed a commit that referenced this issue Aug 24, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl #2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this issue Sep 30, 2024
…1863d2644

Local branch amd-gfx 08f1863 Merged main:6f618a7b8249e7baa3b2d18f8bbec3c5b6f6d24e into amd-gfx:e5edfda5900b
Remote branch main 77fccb3 [AArch64] Replace AND with LSL#2 for LDR target (llvm#34101) (llvm#89531)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla confirmed Verified by a second party good first issue https://github.com/llvm/llvm-project/contribute missed-optimization
Projects
None yet
Development

No branches or pull requests

7 participants