[PATCH] D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI.

Sat Mar 16 05:32:15 PDT 2019

rengolin added a comment.

In D59149#1427946 <https://reviews.llvm.org/D59149#1427946>, @hsaito wrote:

> >> Cost model for emulated masked load/store is completely broken.
> > 
> > comment is still valid. What would it take to address this properly?
>
> Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

Indeed, not something that can be done overnight.

But if the intention of this patch is to start preparing terrain for that work to start (at least on Intel), then I guess the current strategy needs to start changing from "hacked big enough values overriding the cost" to "adding up an estimate of the number of cycles or something".

As I said on my comment, this doesn't mean the patch needs to do everything, just the difference between (Cost = BigEnough) versus (Cost += Adjustment) in the current case.

cheers,
--renato

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:270
+                         (isa<StoreInst>(Inst) && NumPredStores > Threshold);
+    return useHackedCost ? 3000000 : 0;
+  }
----------------
hsaito wrote:
> rengolin wrote:
> > markus wrote:
> > > This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to represent something infinitely expensive?
> > Costs will be added to others, INT_MAX would wrap.
> I can certainly add a comment saying that this is an arbitrarily chosen "big enough value" --- just need to be sufficiently bigger than a typical instruction cost, which is a single digit or a low double digit, so as to disable most cases of emulated masked memory references.
> 
This is not the only case, and I think it would be interesting to look at all the other "big enough values" and perhaps create a global const int PROHIBITIVE_COST and set to a "big enough value" that would suit all cases.

But this is not necessarily for this patch, though. Just an idea.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5488

-    if (useEmulatedMaskMemRefHack(I))
+    if (int HackedCost = TTI.getEmulatedMaskMemRefCost(
+            I, NumPredStores, NumberOfStoresToPredicate))
----------------
hsaito wrote:
> rengolin wrote:
> > This if doesn't need the Cost above, so you can avoid the division by moving it up to the beginning of the block.
> > 
> > Nit: you can just return `HackedCost` instead of assigning, but that will depend on the new pattern.
> This is actually a good question.
> 
> Maybe, we want to structure this TTI interface as the adjustment to the base cost computed above.
> In which case, the code would look like
>    Cost += TTI.getEmulatedMaskMemRefAdjustment(...)
> 
> Does this make more sense?
> 
> Even then, I still agree that we can return early like
> if (!isPredicaredInst(I))
>      return Cost;
> Cost /= getReciprocalPredBlockProb();
> return Cost + TTI.getEmulatedMaskMemRefAdjustment(...)
> 
> Does this sound better?
If the prohibitive cost is large enough, then adding up will make no difference, and you can early return on either Cost or Adjusted.

If the cost isn't meant as just "big enough" but to (later) actually emulate the masking costs, then it makes sense to add like you propose on the comment above.

Right now, the case looks like the former, but if you're planning it to be more like the latter, than the new proposal makes sense.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59149/new/

https://reviews.llvm.org/D59149