[PATCH] D102920: [SLP]Better detection of perfect/shuffles matches for gather nodes.

Wed May 26 07:57:52 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll:84-124
+; CHECK-NEXT:    [[X0:%.*]] = extractelement <4 x i8> [[X:%.*]], i32 0
+; CHECK-NEXT:    [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3
+; CHECK-NEXT:    [[Y1:%.*]] = extractelement <4 x i8> [[Y:%.*]], i32 1
+; CHECK-NEXT:    [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2
+; CHECK-NEXT:    [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
+; CHECK-NEXT:    [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
+; CHECK-NEXT:    [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
----------------
ABataev wrote:
> RKSimon wrote:
> > ABataev wrote:
> > > Regressions caused by the incorrect cost of `mul <2 x i8>`. Per mca tool the cost is `2`, cost model reports `3`.
> > Are we counting the costs of the v4i8 mul twice here?
> Yes, but the cost of mul of v2i8. It is extended to mul v4i8 by the instcombine.
The cost is trickier than that as the costs tables aren't usually cpu specific- the worst case for v2i8 multiply is at least 4, so that's what the cost table reports - incidently, this test is for bdver2 (cost = 3.5),

But tbh, I wouldn't worry too much about this scalarization.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102920/new/

https://reviews.llvm.org/D102920