SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

RKSimon · 2022-02-16T14:30:37Z

These get lowered quite early, meaning that there are missed opportunities to further simplify the DAG based on the masked bits.

Noticed while looking at Issue #53622

llvmbot · 2022-02-16T14:31:04Z

@llvm/issue-subscribers-backend-aarch64

llvmbot · 2022-09-23T14:45:14Z

@llvm/issue-subscribers-good-first-issue

sjoerdmeijer · 2023-08-11T07:44:46Z

Hi @RKSimon, can you expand a little bit what the idea of this work is? Do we expect that it triggers a rewrite?

RKSimon · 2023-08-11T10:09:48Z

@sjoerdmeijer I'll try to find the work I was doing on #53622 and see if I can repro

RKSimon · 2023-08-13T18:27:08Z

@davemgreen @sjoerdmeijer This is the kind of thing I had in mind: https://rust.godbolt.org/z/M6WbxTaYv

define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) {
  %x0 = zext <8 x i8> %a0 to <8 x i16>
  %x1 = zext <8 x i8> %a1 to <8 x i16>
  %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1)
  %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511>
  ret <8 x i16> %res
}
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)

->

Combining: t0: ch,glue = EntryToken
Optimized lowered selection DAG: %bb.0 'haddu_known:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
          t2: v8i8,ch = CopyFromReg t0, Register:v8i8 %0
        t5: v8i16 = zero_extend t2
          t4: v8i8,ch = CopyFromReg t0, Register:v8i8 %1
        t6: v8i16 = zero_extend t4
      t8: v8i16 = llvm.aarch64.neon.uhadd TargetConstant:i64<635>, t5, t6
    t18: v8i16 = AArch64ISD::BICi t8, Constant:i32<254>, Constant:i32<8>
  t13: ch,glue = CopyToReg t0, Register:v8i16 $q0, t18
  t14: ch = AArch64ISD::RET_GLUE t13, Register:v8i16 $q0, t13:1

The AND should be removable as the uhadd (ISD::AVGFLOORU) node should never have the top most bits set, but the AND gets coverted to a AArch64ISD::BICi node very early, so we need:

SelectionDAG::computeKnownBits to handle ISD::AVGFLOORU nodes (this is a very basic extension of the ISD::AVGCEILU handling from D119629 for [DAG] Add knownbits/signbits handling for ISD::AVG* nodes #53622)
SimplifyDemandedBitsForTargetNode handling for AArch64ISD::BICi to correctly grab the known bits for the input operand and realise the AArch64ISD::BICi is superfluous.

snikitav · 2023-12-12T00:17:12Z

May I take this one?

RKSimon · 2023-12-12T09:12:05Z

Sure, go for it

RKSimon · 2023-12-12T09:13:47Z

#53622 needs addressing as well if you're interested :)

snikitav · 2023-12-12T10:21:03Z

Sure

snikitav · 2023-12-13T23:38:42Z

@RKSimon could you please assign it to me so no one will be confused? and #53622 as well

snikitav · 2023-12-31T01:12:57Z

@RKSimon could you please provide an example similar to https://rust.godbolt.org/z/M6WbxTaYv but with BIC instead of BICi. Because I don't see how this optimization could be applied to BIC version without immediate, sorry.

#76644 PTAL

RKSimon · 2023-12-31T14:24:33Z

It wasn't necessarily for HADD etc. But you should be able to at least use KnownBits getMinValue/getMaxValue bounds to work out the known upper zero bits etc.

…ndling (#76644) Fold BICi if all destination bits are already known to be zeroes ```llvm define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) { %x0 = zext <8 x i8> %a0 to <8 x i16> %x1 = zext <8 x i8> %a1 to <8 x i16> %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1) %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511> ret <8 x i16> %res } declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>) ``` ``` haddu_known: // @haddu_known ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 uhadd v0.8h, v0.8h, v1.8h bic v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended ret ``` Fixes #53881 Fixes #53622

RKSimon added the backend:AArch64 label Feb 16, 2022

RKSimon added the good first issue https://github.com/llvm/llvm-project/contribute label Sep 23, 2022

davemgreen assigned snikitav Dec 13, 2023

snikitav mentioned this issue Dec 31, 2023

[AArch64] SimplifyDemandedBitsForTargetNode - add AArch64ISD::BICi handling #76644

Merged

davemgreen closed this as completed in #76644 Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

RKSimon commented Feb 16, 2022

llvmbot commented Feb 16, 2022

llvmbot commented Sep 23, 2022

sjoerdmeijer commented Aug 11, 2023

RKSimon commented Aug 11, 2023

RKSimon commented Aug 13, 2023 •

edited

Loading

snikitav commented Dec 12, 2023

RKSimon commented Dec 12, 2023

RKSimon commented Dec 12, 2023

snikitav commented Dec 12, 2023

snikitav commented Dec 13, 2023

snikitav commented Dec 31, 2023 •

edited

Loading

RKSimon commented Dec 31, 2023

SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

Comments

RKSimon commented Feb 16, 2022

llvmbot commented Feb 16, 2022

llvmbot commented Sep 23, 2022

sjoerdmeijer commented Aug 11, 2023

RKSimon commented Aug 11, 2023

RKSimon commented Aug 13, 2023 • edited Loading

snikitav commented Dec 12, 2023

RKSimon commented Dec 12, 2023

RKSimon commented Dec 12, 2023

snikitav commented Dec 12, 2023

snikitav commented Dec 13, 2023

snikitav commented Dec 31, 2023 • edited Loading

RKSimon commented Dec 31, 2023

RKSimon commented Aug 13, 2023 •

edited

Loading

snikitav commented Dec 31, 2023 •

edited

Loading