Skip to content

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
dianqk opened this issue Sep 8, 2023 · 7 comments · Fixed by #65875
Closed

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

dianqk opened this issue Sep 8, 2023 · 7 comments · Fixed by #65875

Comments

@dianqk
Copy link
Member

dianqk commented Sep 8, 2023

I tried to disable the opaque pointer in rust-lang/rust#115339 to get the SLPVectorizer optimization.
See https://godbolt.org/z/GqdPGarsn.

According to my investigation, the success of optimization in non-opaque pointer mode relates to SROA.cpp#L1831-L1845 or InstructionCombining.cpp#L2220-L2274. Unfortunately, these two transformations have been removed during opaque pointer migration.

My plan is to restore the behavior of InstructionCombining.cpp#L2220-L2274 using alloc instead of bitcast.
Looking at the code comments, this seems to be an important optimization. So it might be worth backporting to LLVM 17?

@fhahn
Copy link
Contributor

fhahn commented Sep 8, 2023

Could you share the LLVM IR before the SLP vectorizer?

@dianqk
Copy link
Member Author

dianqk commented Sep 9, 2023

Could you share the LLVM IR before the SLP vectorizer?

There are two smaller IRs:

But I dont's see the value. See #65764 (comment).

@dianqk
Copy link
Member Author

dianqk commented Sep 10, 2023

Could you share the LLVM IR before the SLP vectorizer?

I added an IR that looks pretty good in #65875.

dianqk added a commit that referenced this issue Sep 19, 2023
Closes #65763.
This will provide more opportunities for constant propagation for
subsequent optimizations.
@krasimirgg
Copy link
Contributor

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing:
https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

The test code looks like this: https://github.com/rust-lang/rust/blob/ed33e408c5299b4c1fd87a37e050180db9d642d5/tests/codegen/simd/simd-wide-sum.rs#L52-L59

#[no_mangle]
// CHECK-LABEL: @wider_reduce_into_iter
pub fn wider_reduce_into_iter(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <8 x i8>
    // CHECK-SAME: to <8 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v8i16(<8 x i16>
    x.to_array().into_iter().map(u16::from).sum()
}

With 2d1e8a0, it looks like now the code is exploded into a long sequence that doesn't call llvm vector anymore: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-740:

33: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 34: define noundef i16 @wider_reduce_into_iter(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #1 personality ptr @rust_eh_personality {
  | check:55'0                                               X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
  | 35: start:
  | check:55'0     ~~~~~~~
  | 36:  %0 = load i64, ptr %x, align 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 37:  %self.sroa.4.16.extract.trunc = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 38:  %_0.i.i.i.i.i.i.i.i.i = and i16 %self.sroa.4.16.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 39:  %1 = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 40:  %2 = lshr i16 %1, 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~
  | 41:  %_4.0.i.i.i.i.i.i.i.i.1 = add nuw nsw i16 %_0.i.i.i.i.i.i.i.i.i, %2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 42:  %self.sroa.4.18.extract.shift = lshr i64 %0, 16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 43:  %self.sroa.4.18.extract.trunc = trunc i64 %self.sroa.4.18.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 44:  %_0.i.i.i.i.i.i.i.i.i.2 = and i16 %self.sroa.4.18.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 45:  %_4.0.i.i.i.i.i.i.i.i.2 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.1, %_0.i.i.i.i.i.i.i.i.i.2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 46:  %self.sroa.4.19.extract.shift = lshr i64 %0, 24
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 47:  %self.sroa.4.19.extract.trunc = trunc i64 %self.sroa.4.19.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 48:  %_0.i.i.i.i.i.i.i.i.i.3 = and i16 %self.sroa.4.19.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 49:  %_4.0.i.i.i.i.i.i.i.i.3 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.2, %_0.i.i.i.i.i.i.i.i.i.3
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 50:  %self.sroa.4.20.extract.shift = lshr i64 %0, 32
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 51:  %self.sroa.4.20.extract.trunc = trunc i64 %self.sroa.4.20.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 52:  %_0.i.i.i.i.i.i.i.i.i.4 = and i16 %self.sroa.4.20.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 53:  %_4.0.i.i.i.i.i.i.i.i.4 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.3, %_0.i.i.i.i.i.i.i.i.i.4
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 54:  %self.sroa.4.21.extract.shift = lshr i64 %0, 40
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 55:  %self.sroa.4.21.extract.trunc = trunc i64 %self.sroa.4.21.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 56:  %_0.i.i.i.i.i.i.i.i.i.5 = and i16 %self.sroa.4.21.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 57:  %_4.0.i.i.i.i.i.i.i.i.5 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.4, %_0.i.i.i.i.i.i.i.i.i.5
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 58:  %self.sroa.4.22.extract.shift = lshr i64 %0, 48
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 59:  %self.sroa.4.22.extract.trunc = trunc i64 %self.sroa.4.22.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 60:  %_0.i.i.i.i.i.i.i.i.i.6 = and i16 %self.sroa.4.22.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 61:  %_4.0.i.i.i.i.i.i.i.i.6 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.5, %_0.i.i.i.i.i.i.i.i.i.6
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 62:  %self.sroa.4.23.extract.shift = lshr i64 %0, 56
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 63:  %self.sroa.4.23.extract.trunc = trunc i64 %self.sroa.4.23.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 64:  %_4.0.i.i.i.i.i.i.i.i.7 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.6, %self.sroa.4.23.extract.trunc
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 65:  ret i16 %_4.0.i.i.i.i.i.i.i.i.7
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 66: }

I suspect that this may be a potential regression since the other related functions in this test still expand to short sequences that use llvm.vector as expected, e.g.:

6: ; Function Attrs: mustprogress nofree nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 7: define noundef i16 @wider_reduce_simd(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #0 {
  | 8: start:
  | 9:  %0 = load <8 x i8>, ptr %x, align 8
  | 10:  %1 = zext <8 x i8> %0 to <8 x i16>
  | 11:  %2 = tail call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> %1)
  | 12:  ret i16 %2
  | 13: }

https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-712.

@dianqk
Copy link
Member Author

dianqk commented Sep 20, 2023

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

I'm sorry for the trouble. I'm not sure yet. But I'll look into it tomorrow. If this blocks something, you can revert this commit

@dianqk dianqk reopened this Sep 20, 2023
@nikic
Copy link
Contributor

nikic commented Sep 20, 2023

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

@dianqk
Copy link
Member Author

dianqk commented Sep 21, 2023

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

Yes. I close this issue. The new one is vectorize-related.

@dianqk dianqk closed this as completed Sep 21, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment