[PATCH] D125712: [SLP][X86] Improve reordering to consider alternate instruction bundles

Wed May 25 10:07:22 PDT 2022

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll:121-123
+; CHECK-NEXT:    [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
+; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP3]], <2 x i32> <i32 0, i32 3>
----------------
vporpo wrote:
> ABataev wrote:
> > vporpo wrote:
> > > ABataev wrote:
> > > > I don't quite understand what's the difference here. Could you explain, please?
> > > Before this patch the pattern `shuffle + fadd + fsub` lowers to 3 instructions: blend  + vector add + vector sub (the shuffle selects TMP3[0],TMP2[1], which is fadd[0],fsub[1] , the inverse of the addsub pattern).
> > > 
> > > With this patch `shuffle + fadd +fsub` lowers to a single addsub instruction (the shuffle selects TMP2[0], TMP3[2] which is fsub[0],fadd[1]). 
> > > This saves 2 instructions  which means that during reordering we should keep track of this pattern since reordering it can increase the overhead. 
> > Ok, why not fixing it in the backend? This new function you added does not affect the cost, but ignoring shuffle actually increases the cost of the tree.
> I think it is better to do it here because we may end up not vectorizing some code because of the additional cost of the blend + add + sub pattern. I think I can fix the tree cost with a follow-up patch that fixes the cost of the altshuffle pattern when it corresponds to the addsub instruction.
Ah, I see, thanks. What about trying to do both - lowering in the backend and the cost adjustment? Is it possible?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125712/new/

https://reviews.llvm.org/D125712