[PATCH] D92132: [LV] Support widened induction variables in epilogue vectorization.

Thu Oct 13 08:06:09 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10528
+            ResumeV = MainILV.createInductionResumeValue(
+                IndPhi, *ID, {EPI.MainLoopIterationCountCheck});
           }
----------------
venkataramanan.kumar.llvm wrote:
> fhahn wrote:
> > venkataramanan.kumar.llvm wrote:
> > > Just a question here we are creating induction resume values two times for the main vector loop.  One here while setting up resume value for epilog vector loop. The second one while setting up resume value for the scalar loop.  is it possible to reuse already created one? 
> > > 
> > > If implementing the reuse is not worth the effort than creating a new one please ignore my comment. 
> > > 
> > > 
> > > Just a question here we are creating induction resume values two times for the main vector loop. One here while setting up resume value for epilog vector loop. The second one while setting up resume value for the scalar loop. is it possible to reuse already created one?
> > 
> > I think in general, the resume values could be different, e.g the edge main-vector-loop-successor -> scalar loop it would be based on (N - main vector TC), the edge epilogue-vector-loop-successor it would be based on (N - main vector TC - epilogue vector TC).
> > 
> > 
> > Is it possible that you were looking at the resume values in `@test_widen_ptr_induction`? I think those are a bit misleading, the resume values are the same as the epilogue vector loop never executes AFAICT. I added a few more interesting variants with runtime trip counts (e.g. `@test_widen_induction`).
> > 
> I am referring to induction end value computation in "InnerLoopVectorizer::createInductionResumeValue".   It is called at this line (N - main vector TC).
> 
> Then again in CreateInductionResumeValues({VecEpilogueIterationCountCheck, EPI.VectorTripCount} /* AdditionalBypass */).  in "EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton".  We create again at Additional bypass block (N - main vector TC)
> 
> Example:
> 
> int a[10000],n;
> void fn() {
>  for (int i=2;i<n;i++) {
>    a[i] = a[i] + i;
>  }
> }
> 
> ---Snip--
> vector.ph:                                        ; preds = %vector.main.loop.iter.check
>   %n.vec = and i64 %1, -64
>   **%ind.end = or i64 %n.vec, 2**
> 
> vec.epilog.iter.check:                            ; preds = %middle.block
>  **%ind.end20 = or i64 %n.vec, 2**
> ---Snip--
> 
> But I think creating two times should be Ok.
> 
Oh right, thanks for sharing the case. I added a similar test case in  518bccfd6e8b. 

I don't think it is worth caching those values for now, as it would require additional state which might add extra complexity and the extra instructions should be cleaned up by later passes. Going forward it might make sense to use SCEVExpander to create the end values, which would mean we would get re-use for free.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92132/new/

https://reviews.llvm.org/D92132