[PATCH] D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.

Tue Jan 17 15:31:27 PST 2023

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll:78-87
+; SSE-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[C:%.*]], i32 0
+; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
+; SSE-NEXT:    [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[A:%.*]], i32 0
+; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
+; SSE-NEXT:    [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
+; SSE-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[B:%.*]], i32 1
+; SSE-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[C]], i32 2
----------------
vdmitrie wrote:
> Why this test changed behavior?
> My expectation would be the original scalar code is likely noticeably faster than vectorized for this case.
Forgot to adjust the cost of the remaining buildvector elements, the problem of separating the code from the big patch, will fix.
The cost will be synchronized with the codegen once the main patch is committed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141512/new/

https://reviews.llvm.org/D141512