[PATCH] D29540: Scalarization overhead estimation in getIntrinsicInstrCost() improved

Thu Mar 9 13:13:49 PST 2017

hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.

LGTM.

> In BBVectorize, things got a bit tricky while handling merged arguments. Here, the scalarization cost is computed locally, by considering all the input operands of both instructions, plus the vectorized return type. Since vectorization is done by arguments merging, getOperandsScalarizationOverhead() is then called for all operands with a VF of 1. I hope this is right.

Eeh, don't worry about it. I'm going to delete the whole pass at some point soon anyway.

================
Comment at: include/llvm/CodeGen/BasicTTIImpl.h:741
+      unsigned ScalarizationCost = UINT_MAX;
+      if (RetVF > 1 || VF > 1) {
+        ScalarizationCost = getScalarizationOverhead(RetTy, true, false);
----------------
jonpa wrote:
> hfinkel wrote:
> > I don't understand why `RetVF > 1` is here. If RetVF is not one, then VF needs to be one, and so we're not vectorizing. As a result, we're not scalarizing either, and so adding the scalarization cost associated with the return type seems odd.
> My idea was that this function can be called from different contexts ("three cases"). 
> 
> The LoopVectorizer can pass the scalar instruction and VF > 1.
> CostModel can pass a vector instruction, RetVF > 1 (and VF argument defaults to 1).
> 
> Both cases should produce the same return value.
> 
> Am I missing something?
> 
Maybe the underlying problem here is the:

   // Assume that we need to scalarize this intrinsic.

this seems like a bad assumption for a general code model (although it probably makes sense for the vectorizer because the vectorizer's legality checking will bail out before we get here for a vector intrinsic).

In any case, a comment here explaining the logic would be good.

================
Comment at: include/llvm/CodeGen/BasicTTIImpl.h:782
     default: {
       // Assume that we need to scalarize this intrinsic.
+      unsigned ScalarizationCost = ScalarizationCostPassed;
----------------
jonpa wrote:
> hfinkel wrote:
> > This interface seems a bit broken for how we're using it. Why are we assuming that we need to scalarize all intrinsics? (many target-specific intrinsics are really vector intrinsics - maybe only assume we need to scalarize the generic ones for which we don't have special handling?).
> > 
> I was assuming this made sense in the 'default:' case at least.
> 
> How scalarization cost is computed, and when it is done are two separate issues, so perhaps that can be handled later with a separate patch?
> 
> 
Yea, this should be cleaned up as a separate patch.

https://reviews.llvm.org/D29540