[PATCH] D154314: [LV] Remove the reminder loop if we know the mask is always true

Fri Jul 7 05:00:39 PDT 2023

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll:34
 ; CHECK:       middle.block:
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
-; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
----------------
Allen wrote:
> david-arm wrote:
> > I'm guessing that InstCombine does not determine this is guaranteed to always be true? However, I thought that someone did work in the DAGCombiner that will replace this with
> > 
> >   br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
> > 
> > when vscale is known to be a power of 2? Are you hoping to benefit from eliminating the scalar tail in IR because it helps us to make better decisions later in the pipeline? I can imagine it's beneficial for LTO where the scalar tail could prevent inlining.
> > 
> > If I remember correctly one of the problems with folding away the icmp in InstCombine is that it doesn't have access to the TTI interface so we cannot query the target.
> I may not have caught your idea, are you saying that the current optimization needs to be handled in combinine ?
Well, I'm just trying to understand what this patch is trying to achieve that's all. I'm not against it because it does clean up the IR generated by the vectoriser. However, I'm not sure if expect you to see many real performance gains from doing this because we should delete the scalar tail during codegen.

It might also be worth investigating whether or not InstCombine already optimises the urem calculation, similar to what this patch (https://reviews.llvm.org/D129609) did in codegen.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154314/new/

https://reviews.llvm.org/D154314