[PATCH] D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI.

Wed Mar 13 12:43:19 PDT 2019

hsaito marked 2 inline comments as done.
hsaito added a comment.

In D59149#1426070 <https://reviews.llvm.org/D59149#1426070>, @markus wrote:

> Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the
>
> > Cost model for emulated masked load/store is completely broken.
>
> comment is still valid. What would it take to address this properly?

Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:270
+                         (isa<StoreInst>(Inst) && NumPredStores > Threshold);
+    return useHackedCost ? 3000000 : 0;
+  }
----------------
rengolin wrote:
> markus wrote:
> > This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to represent something infinitely expensive?
> Costs will be added to others, INT_MAX would wrap.
I can certainly add a comment saying that this is an arbitrarily chosen "big enough value" --- just need to be sufficiently bigger than a typical instruction cost, which is a single digit or a low double digit, so as to disable most cases of emulated masked memory references.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5488

-    if (useEmulatedMaskMemRefHack(I))
+    if (int HackedCost = TTI.getEmulatedMaskMemRefCost(
+            I, NumPredStores, NumberOfStoresToPredicate))
----------------
rengolin wrote:
> This if doesn't need the Cost above, so you can avoid the division by moving it up to the beginning of the block.
> 
> Nit: you can just return `HackedCost` instead of assigning, but that will depend on the new pattern.
This is actually a good question.

Maybe, we want to structure this TTI interface as the adjustment to the base cost computed above.
In which case, the code would look like
   Cost += TTI.getEmulatedMaskMemRefAdjustment(...)

Does this make more sense?

Even then, I still agree that we can return early like
if (!isPredicaredInst(I))
     return Cost;
Cost /= getReciprocalPredBlockProb();
return Cost + TTI.getEmulatedMaskMemRefAdjustment(...)

Does this sound better?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59149/new/

https://reviews.llvm.org/D59149