[PATCH] D21251: [TTI] The cost model should not assume illegal vector casts get completely scalarized

Wed Jun 22 23:37:10 PDT 2016

mkuper added a comment.

In http://reviews.llvm.org/D21251#465230, @arsenm wrote:

> In http://reviews.llvm.org/D21251#462734, @mkuper wrote:
>
> > Unfortunately, we can't just plug in getVectorSplitCost() (either this or a more sophisticated version) into getTypeLegalizationCost().
> >
> > This is because most of the users of getTypeLegalizationCost() apparently don't use the returned value as a "cost". Rather, they seem assume it's just the number of post-legalization pieces, and multiply per-piece costs by that. So that needs a completely separate clean-up first. 
> >  AMDGPU itself is also guilty of this - it multiplies by getTypeLegalizationCost().first, so if getTypeLegalizationCost(LegalType) were to return 0 - which it ought to, if we really consider it the "cost" of legalization - things would get seriously out of whack. But this happens across all targets.
>
>
> I was thinking that was to get the cost of the split. Everything should be cost * NElts * scalar cost, so e.g. <8 x float> -> <4 x float>, <4 x float> = 2 * 4 * cost


What exactly do you mean when you say "cost of the split"?
My expectation was for getTypeLegalizationCost() to return an approximation of the total cost of legalization (cost of INSERT_SUBVECTOR and EXTRACT_SUBVECTOR operations), that is, an additive factor.  So, it would make sense to have "legalization cost + NElts * scalar cost". Or, in the more general case, when we're not fully scalarizing, "legalizaiton cost + NPieces * piece cost". Instead, what it's actually used for is to get NPieces. This is even explicitly referred to as the "split factor" in one of the X86 TTI uses:

  std::pair<int, MVT> IdxsLT = TLI->getTypeLegalizationCost(DL, IndexVTy);
  std::pair<int, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, SrcVTy);
  int SplitFactor = std::max(IdxsLT.first, SrcLT.first);

I'll have to go over all the uses of getTypeLegalizationCost(). If everything uses it as a proxy for the split factor, it's probably just a matter of changing the name to indicate what it really does. If not, then we may need two different functions, one to get the split factor, and one to get the *cost* of performing the splits. 
But I think that's independent of this patch.


================
Comment at: test/Analysis/CostModel/ARM/cast.ll:267
@@ -266,3 +266,3 @@
   %r117 = fptosi <4 x float> undef to <4 x i32>
-  ; CHECK:  Found an estimated cost of 64 for instruction:   %r118 = fptoui <4 x float> undef to <4 x i64>
+  ; CHECK:  Found an estimated cost of 65 for instruction:   %r118 = fptoui <4 x float> undef to <4 x i64>
   %r118 = fptoui <4 x float> undef to <4 x i64>
----------------
arsenm wrote:
> LGTM, but it looks to me like this should be adding 0, so not increasing by 1?
Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay too much attention to those little changes (what concerned me more were the big drops e.g. 64 -> 11 on line 326). But you're right, I need to verify that this is really reasonable and not some unexpected artifact. 
Thanks!


http://reviews.llvm.org/D21251