[PATCH] D125712: [SLP][X86] Improve reordering to consider alternate instruction bundles

Wed May 25 11:02:06 PDT 2022

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll:121-123
+; CHECK-NEXT:    [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
+; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP3]], <2 x i32> <i32 0, i32 3>
----------------
vporpo wrote:
> ABataev wrote:
> > vporpo wrote:
> > > ABataev wrote:
> > > > vporpo wrote:
> > > > > ABataev wrote:
> > > > > > I don't quite understand what's the difference here. Could you explain, please?
> > > > > Before this patch the pattern `shuffle + fadd + fsub` lowers to 3 instructions: blend  + vector add + vector sub (the shuffle selects TMP3[0],TMP2[1], which is fadd[0],fsub[1] , the inverse of the addsub pattern).
> > > > > 
> > > > > With this patch `shuffle + fadd +fsub` lowers to a single addsub instruction (the shuffle selects TMP2[0], TMP3[2] which is fsub[0],fadd[1]). 
> > > > > This saves 2 instructions  which means that during reordering we should keep track of this pattern since reordering it can increase the overhead. 
> > > > Ok, why not fixing it in the backend? This new function you added does not affect the cost, but ignoring shuffle actually increases the cost of the tree.
> > > I think it is better to do it here because we may end up not vectorizing some code because of the additional cost of the blend + add + sub pattern. I think I can fix the tree cost with a follow-up patch that fixes the cost of the altshuffle pattern when it corresponds to the addsub instruction.
> > Ah, I see, thanks. What about trying to do both - lowering in the backend and the cost adjustment? Is it possible?
> Hmm I guess in the back-end we won't have access to cost benefit analysis like the one we are doing during the reordering step (i.e., finding the most popular order). So we would have to do a simple conversion of any sub-add pattern to an add-sub + shuffles, but I am not sure that this would always be profitable. I think that the addsub pattern should be taken into consideration when looking for the most popular ordering so that it can influence the decision, but I am not so sure we can always justify the cost of the extra shuffles.
The backend does not need the cost, it sjut checks the pattern and lowers the sequence to the instructions. Why does it need the cost?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125712/new/

https://reviews.llvm.org/D125712