[PATCH] D42309: [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking

Wed Jan 24 16:06:14 PST 2018

mssimpso added a comment.

In https://reviews.llvm.org/D42309#984646, @dcaballe wrote:

> Hi Matthew,
>
> thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:
>
>   // TODO: We should not rely on InstCombine to rewrite the reduction in the
>   //       smaller type. We should just generate a correctly typed expression
>   //       to begin with.
>   
>
> The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
>  That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.
>
> In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
> Diego

Yes, my comment was intended to draw attention to something that needs to be addressed in the future VPlan work. I think we will want to generate the actual type-shrunk code instead of inserting trunc/ext pairs, and assuming InstCombine will do what we want it to. We also generate trunc/ext pairs in SLP for type-shrinking, which needs to be addressed as well, I think. As you point out, this will better align the cost model with code generation.

================
Comment at: lib/Transforms/Utils/LoopUtils.cpp:132

-      // Terminate the traversal if the operand is not an instruction, or we
-      // reach the starting value.
-      Instruction *J = dyn_cast<Instruction>(U.get());
-      if (!J || J == Start)
-        continue;
+  if (MaxBitWidth == DL.getTypeSizeInBits(Exit->getType()) && AC && DT) {
+    // If demanded bits wasn't able to limit the bit width, we can try to use
----------------
dcaballe wrote:
> (I don't know DB/ValueTracking in depth so maybe I'm missing something.)
> If I understand correctly, we would try with value tracking in the following scenario:
> 
> ```
> 117    MaxBitWidth = 64
> ...
> 127    MaxBitWidth = 33
> ...
> 129    MaxBitWidth = 64
> ...
> 132    First condition is true
> }
> ```
> 
> But we wouldn't in the following one:
> 
> ```
> 117    MaxBitWidth = 64
> ...
> 127    MaxBitWidth = 31
> ...
> 129    MaxBitWidth = 32
> ...
> 132    First condition is false
> }
> ```
> 
> Is this the expected behavior? In other words, if DB returns a width narrower than the original one and it's later rounded up to the original width (first scenario), could value tracking return an even narrower width?
> 
> If so, shouldn't we always try with value tracking, even when MaxBitWidth != DL.getTypeSizeInBits?
> Otherwise, shouldn't we skip value tracking for those cases (first scenario)? We could invoke isPowerOf2_64 only once at the end of the function.
Demanded bits and value tracking will report different bit widths, in general. My thinking was that if demanded bits is able to limit the bit width at all, we use that width. Otherwise, we can try to do something with value tracking, which I think can be a more expensive analysis.

So I think what we should do there is move the isPowerOf2_64 round-ups to the very end. The `MaxBitWidth == DL.getTypeSizeInBits(Exit->getType())` condition will then check if demanded bits was able to tell us anything before we do the rounding.

================
Comment at: lib/Transforms/Utils/LoopUtils.cpp:389
+    // looked through an 'and' instruction when evaluating a potential
+    // arithmetic reduction to determine if it may been type-promoted.
+    //
----------------
dcaballe wrote:
> may been -> may have been?
Nice catch!

Repository:
  rL LLVM

https://reviews.llvm.org/D42309