[PATCH] D79976: [LV] Handle Fold-Tail of loops with vectorizarion factor (VF) equal to 1

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat May 16 14:17:59 PDT 2020


fhahn added inline comments.


================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-scalar.ll:17
+; CHECK:         [[INDEX_NEXT]] = add i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
----------------
bmahjour wrote:
> anhtuyen wrote:
> > Ayal wrote:
> > > bmahjour wrote:
> > > > How is it that the original loop executes 15 iterations, but the vector loop iterates 16? It seems the minimum iteration count check branch at the top should branch to the scalar loop instead of vector.ph.
> > > (Thanks for asking, reminded to check above and below that fold-tail emits the desired scalar `icmp ule`'s, which are the focus of this patch.)
> > > 
> > > Fold-tail is responsible for rounding-up the trip count from 15 to 16, see https://reviews.llvm.org/D50480.
> > > Regarding minimum iteration count check branch, fold-tail is also responsible in general for branching directly to vector.ph w/o an "if (trip-count < VF*UF)", which in this case is known to be false anyhow.
> > Ayal @Ayal , thank you very much for your help to clarify. My guess is that Bardia @bmahjour might have been concerned whether by any chance the effective addition of the 16th iteration would affect the correctness of the generated code. Because I have neither evidence nor counterevidence to address his concern, if you can shed some light on it when you have some time, that will be great.  
> > 
> > Back to this patch but given the fact that the value of the trip-count is not the main focus hereof, I take the liberty of omitting its value from the checks. If that is not acceptable to any of us here, please let me know. 
> > 
> I guess I would understand how the rounding-up would work, if the instructions in the body were somehow predicated, but I don't see any predication in the output IR. Is that because there are no instructions in this test case with side-effects, read/writes, etc?
yes, this would probably need some actual instructions in the loop body, so there is something to predicate. @anhtuyen could you add a small vectorizable body, e.g. just storing to ptr+induction?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79976/new/

https://reviews.llvm.org/D79976





More information about the llvm-commits mailing list