[PATCH] D79976: [LV] Handle Fold-Tail of loops with vectorizarion factor (VF) equal to 1

Sat May 16 16:25:34 PDT 2020

anhtuyen marked 2 inline comments as done.
anhtuyen added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-scalar.ll:17
+; CHECK:         [[INDEX_NEXT]] = add i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
----------------
fhahn wrote:
> bmahjour wrote:
> > anhtuyen wrote:
> > > Ayal wrote:
> > > > bmahjour wrote:
> > > > > How is it that the original loop executes 15 iterations, but the vector loop iterates 16? It seems the minimum iteration count check branch at the top should branch to the scalar loop instead of vector.ph.
> > > > (Thanks for asking, reminded to check above and below that fold-tail emits the desired scalar `icmp ule`'s, which are the focus of this patch.)
> > > > 
> > > > Fold-tail is responsible for rounding-up the trip count from 15 to 16, see https://reviews.llvm.org/D50480.
> > > > Regarding minimum iteration count check branch, fold-tail is also responsible in general for branching directly to vector.ph w/o an "if (trip-count < VF*UF)", which in this case is known to be false anyhow.
> > > Ayal @Ayal , thank you very much for your help to clarify. My guess is that Bardia @bmahjour might have been concerned whether by any chance the effective addition of the 16th iteration would affect the correctness of the generated code. Because I have neither evidence nor counterevidence to address his concern, if you can shed some light on it when you have some time, that will be great.  
> > > 
> > > Back to this patch but given the fact that the value of the trip-count is not the main focus hereof, I take the liberty of omitting its value from the checks. If that is not acceptable to any of us here, please let me know. 
> > > 
> > I guess I would understand how the rounding-up would work, if the instructions in the body were somehow predicated, but I don't see any predication in the output IR. Is that because there are no instructions in this test case with side-effects, read/writes, etc?
> yes, this would probably need some actual instructions in the loop body, so there is something to predicate. @anhtuyen could you add a small vectorizable body, e.g. just storing to ptr+induction?
I will add it later tonight.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll:111
+
+!llvm.module.flags = !{!0}
+
----------------
fhahn wrote:
> metadata here and below unused?
Hi Florian @fhahn, The short answer is yes, we need it. 
As you can see on https://reviews.llvm.org/D78847, it is simple to create a minimal reproducer asserting when going through function **VPlan::execute()**. It is, however, a bit tricky to craft a minimal LIT test which exhibits the same problem when going through the other function **VPWidenCanonicalIVRecipe::execute()**. I came up with an idea to use profile data for the function online 57 

```
define void @vectorize-factor-1-vector-bound(double* %pt1) !prof !12 {
```
to guide it through  **VPWidenCanonicalIVRecipe::execute()**. Although I made the data up. I am sure almost any meaningful values will serve the purpose. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79976/new/

https://reviews.llvm.org/D79976