[PATCH] D21620: [LV] Don't widen trivial induction variables
    Adam Nemet via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue Jun 28 11:33:59 PDT 2016
    
    
  
anemet added inline comments.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2191
@@ +2190,3 @@
+
+  // We can't create a vector with less than two elements.
+  assert(VF > 1 && "VF should be greater than one");
----------------
mssimpso wrote:
> anemet wrote:
> > mssimpso wrote:
> > > anemet wrote:
> > > > Matt, your argument about instcombine not being adequate for this is a convincing one.
> > > > 
> > > > On the other hand, I am still wondering if the solution is general enough.  There should be cases where we'd be better off having *both* the vector and the scalar version of the same induction variable.  I.e. depending on the use, you may want to use the vector version (e.g. store) or the scalar version (address calc, compares).
> > > > 
> > > > I am not saying that we need to necessarily implement this but the current solution should be taking us incrementally closer to the full solution.
> > > Adam,
> > > 
> > > One way to handle the cases you mention would be to allow the vectorizer to generate both a scalar version and a vector version of the same induction variable. Then it could select whether to generate a use of the scalar or vector version based on the instruction (address calculation, store, etc.). DCE and Instcombine would presumably clean up afterwards.
> > > 
> > > The current patch does take us incrementally to that kind of solution. It handles the case where we know we will always prefer to use the scalar version. If that's the case, we generate the scalar steps; otherwise, we continue to generate the vector steps. It doesn't, however, generate the scalar version of the induction variable along side the vector version.
> > > One way to handle the cases you mention would be to allow the vectorizer to generate both a scalar version and a vector version of the same induction variable. Then it could select whether to generate a use of the scalar or vector version based on the instruction (address calculation, store, etc.). DCE and Instcombine would presumably clean up afterwards.
> > 
> > Yes that is the direction I was thinking of going in the long term.  (If the later passes are ineffective cleaning up unused versions we can also generate them on demand.)
> > 
> > > The current patch does take us incrementally to that kind of solution. It handles the case where we know we will always prefer to use the scalar version. If that's the case, we generate the scalar steps; otherwise, we continue to generate the vector steps. It doesn't, however, generate the scalar version of the induction variable along side the vector version.
> > 
> > Fair enough.
> > 
> > I have one last high-level question.  Does this fix the pointer arithmetic code in https://llvm.org/bugs/show_bug.cgi?id=27881#c6 ?
> > 
> > I will look at the specifics of the patch later today.
> > 
> > Does this fix the pointer arithmetic code in https://llvm.org/bugs/show_bug.cgi?id=27881#c6 ?
> 
> I think so. With this patch //post-instcombine//, we generate code like the following for the pointer arithmetic:
> 
> ```
> vector.body:
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>   %offset.idx = shl i64 %index, 3
>   %3 = or i64 %offset.idx, 8
>   %4 = or i64 %offset.idx, 16
>   %5 = or i64 %offset.idx, 24
>   %6 = getelementptr inbounds float, float* %a, i64 %offset.idx
>   %7 = getelementptr inbounds float, float* %a, i64 %3
>   %8 = getelementptr inbounds float, float* %a, i64 %4
>   %9 = getelementptr inbounds float, float* %a, i64 %5
> ```
> 
> The bigger issue there seems to be the cost model though. The loop still builds up vectors with the loaded values that feed the vector fadd's. 
> 
> Regarding the IR explosion mentioned in that bug //pre-instcombine//, this patch will not help. This is because I scalarize the arithmetic for the step vectors, but still insert the results into a vector, since the rest of the code expects the IV's to be vectors. (All the scalarization is handled in this way I believe). There's perhaps room for a compile-time optimization here where the vectorizer would avoid all the scalar-to-vector-to-scalar conversions.
> 
> > I will look at the specifics of the patch later today.
> 
> Thanks, as always!
> I think so. With this patch post-instcombine, we generate code like the following for the pointer arithmetic:
Great, thanks!
http://reviews.llvm.org/D21620
    
    
More information about the llvm-commits
mailing list