[PATCH] D102920: [SLP]Better detection of perfect/shuffles matches for gather nodes.

Fri May 21 07:27:53 PDT 2021

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll:25-65
+; CHECK-NEXT:    [[X0:%.*]] = extractelement <4 x i8> [[X:%.*]], i32 0
+; CHECK-NEXT:    [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3
+; CHECK-NEXT:    [[Y1:%.*]] = extractelement <4 x i8> [[Y:%.*]], i32 1
+; CHECK-NEXT:    [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2
+; CHECK-NEXT:    [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
+; CHECK-NEXT:    [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
+; CHECK-NEXT:    [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
----------------
Regression is caused by the incorrect cost model. It returns cost 12 for mul <4 x i8> and it is compensated by the fact that we calculate the cost of gather of extractelement instructions twice (its cost is -3). After this patch we correctly calculate the cost for the gather node only once (-3 for the first gather and 0 for the second one, perfect diamond match). llvm-mca returns the cost of code is 2 or 4 (for normalized mul <16 x i8> it is 4, for the original code it is 2). Need to tweak the cost model.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102920/new/

https://reviews.llvm.org/D102920