[PATCH] D78847: [LV] Fix recording of BranchTakenCount for FoldTail

Anh Tuyen Tran via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun May 10 21:49:01 PDT 2020


anhtuyen added a comment.

In D78847#2028819 <https://reviews.llvm.org/D78847#2028819>, @anhtuyen wrote:

> In D78847#2028759 <https://reviews.llvm.org/D78847#2028759>, @Ayal wrote:
>
> > In D78847#2028708 <https://reviews.llvm.org/D78847#2028708>, @anhtuyen wrote:
> >
> > > Hello,
> >
> >
> > [snip]
> >
> > > In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>
> > > 
> > > There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.
> > > 
> > > 1. Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.
> > > 2. Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.
> > > 
> > >   I will open a Bugzzila and copy its link to this page when my password reset goes through.
> > > 
> > >   Thanks, Anh
> >
> > Yes, thanks for catching this!
> >  One quick fix is indeed to set VTCMO to TCMO when State->VF == 1, instead of "splatting" it into a vector of a single element.
> >  Thinking if fold-tail-by-masking should be restricted to work for VF>1 only, given that only vectors (loads/stores) get masked.
>
>
> I also came up (and gave up) that fix last week, because it would not work for a loop whose VF is 1, but the loop bound is a vector. I will come up with an example shortly to demonstrate my thought.


Below is an example to demonstrate that **setting VTCMO to TCMO when State->VF == 1** will not help in the case of a loop of VF 1 having a vector loop-bound.
In my example below, I add some profile meta data to guide the optimization selection process. 
As you can tell it from the meta data, the numeric values were pretty much randomly selected as 10, 20, 30, and so on. 
Basically, almost any arbitrary data will work as long as it is meaningful.

  $ cat ./simple2.ll
  
  define void @foo() !prof !12 {
  entry:
    br label %for.body
  
  for.cond.cleanup:
    ret void
  
  for.body:
    %addr = phi double* [ %ptr, %for.body ], [ undef, %entry ]
    %ptr = getelementptr inbounds double, double* %addr, i64 1
    %cond = icmp eq double* %ptr, undef
    br i1 %cond, label %for.cond.cleanup, label %for.body
  }
  
  !llvm.module.flags = !{!0}
  
  !0 = !{i32 1, !"ProfileSummary", !1}
  !1 = !{!2, !3, !4, !5, !6, !7, !8, !9}
  !2 = !{!"ProfileFormat", !"InstrProf"}
  !3 = !{!"TotalCount", i64 10}
  !4 = !{!"MaxCount", i64 20}
  !5 = !{!"MaxInternalCount", i64 30}
  !6 = !{!"MaxFunctionCount", i64 40}
  !7 = !{!"NumCounts", i64 50}
  !8 = !{!"NumFunctions", i64 60}
  !9 = !{!"DetailedSummary", !10}
  !10 = !{!11}
  !11 = !{i32 999999, i64 70, i32 80}
  !12 = !{!"function_entry_count", i64 0}

Again, I will use the same option **-loop-vectorize -force-vector-interleave=4**

  $ opt -loop-vectorize -force-vector-interleave=4 -S ./simple2.ll
  
  opt: llvm-project/llvm/include/llvm/IR/Instructions.h:1144: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.
  
  Stack dump:
  0.      Program arguments: build/bin/opt -loop-vectorize -force-vector-interleave=4 -S -debug-only=loop-vectorize simple2.ll
  1.      Running pass 'Function Pass Manager' on module 'simple2.ll'.
  2.      Running pass 'Loop Vectorization' on function '@foo'
   #0 0x000070cd938b9bf4 PrintStackTraceSignalHandler(void*) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9bf4)
   #1 0x000070cd938b6b98 llvm::sys::RunSignalHandlers() (build/bin/../lib/libLLVMSupport.so.11git+0x1e6b98)
   #2 0x000070cd938b9f04 SignalHandler(int) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9f04)
   #3 0x000070cd998f04d8 (linux-vdso64.so.1+0x4d8)
   #4 0x000070cd92ace98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
   #5 0x000070cd92ace98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
   #6 0x000070cd92ad0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
   #7 0x000070cd92abbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
   #8 0x000070cd92abbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
   #9 0x000070cd932ee5bc llvm::ICmpInst::AssertOK() (build/bin/../lib/libLLVMVectorize.so.11git+0x7e5bc)
  #10 0x000070cd932bc37c llvm::IRBuilderBase::CreateICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) (build/bin/../lib/libLLVMVectorize.so.11git+0x4c37c)
  #11 0x000070cd933513cc llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) (build/bin/../lib/libLLVMVectorize.so.11git+0xe13cc)
  #12 0x000070cd933518d0 llvm::VPInstruction::execute(llvm::VPTransformState&) (build/bin/../lib/libLLVMVectorize.so.11git+0xe18d0)
  #13 0x000070cd9335057c llvm::VPBasicBlock::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe057c)
  #14 0x000070cd93352188 llvm::VPlan::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe2188)
  #15 0x000070cd932db3c8 llvm::LoopVectorizationPlanner::executePlan(llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (build/bin/../lib/libLLVMVectorize.so.11git+0x6b3c8)
  #16 0x000070cd932e73e0 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build/bin/../lib/libLLVMVectorize.so.11git+0x773e0)
  #17 0x000070cd932e8bd4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (build/bin/../lib/libLLVMVectorize.so.11git+0x78bd4)
  #18 0x000070cd932f33e0 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMVectorize.so.11git+0x833e0)
  #19 0x000070cd948f073c llvm::FPPassManager::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMCore.so.11git+0x27073c)
  #20 0x000070cd948f0ba0 llvm::FPPassManager::runOnModule(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x270ba0)
  #21 0x000070cd948f1354 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x271354)
  #22 0x000070cd948f19fc llvm::legacy::PassManager::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x2719fc)
  #23 0x0000000010031aec main (build/bin/opt+0x10031aec)
  #24 0x000070cd92aa441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
  #25 0x000070cd92aa4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0

When setting BackedgeTakenCount with TCMO if State->VF == 1, the mismatch will occur

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %0 = add i64 %index, 0
    %next.gep = getelementptr double, double* undef, i64 %0
    %1 = add i64 %index, 1
    %next.gep1 = getelementptr double, double* undef, i64 %1
    %2 = add i64 %index, 2
    %next.gep2 = getelementptr double, double* undef, i64 %2
    %3 = add i64 %index, 3
    %next.gep3 = getelementptr double, double* undef, i64 %3
    %broadcast.splatinsert = insertelement <1 x i64> undef, i64 %index, i32 0
    %broadcast.splat = shufflevector <1 x i64> %broadcast.splatinsert, <1 x i64> undef, <1 x i32> zeroinitializer
    %vec.iv = add <1 x i64> %broadcast.splat, zeroinitializer
    %vec.iv4 = add <1 x i64> %broadcast.splat, <i64 1>
    %vec.iv5 = add <1 x i64> %broadcast.splat, <i64 2>
    %vec.iv6 = add <1 x i64> %broadcast.splat, <i64 3>
    %4 = icmp ule <1 x i64> %vec.iv, <i64 2305843009213693951>  <======== assert when generating this line

In this case, the operand[0] (which is %vec.iv) has type <1 x i64>. The loop-bound, however, will get the type as **i64** instead of the expected **<i64 2305843009213693951>** .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78847/new/

https://reviews.llvm.org/D78847





More information about the llvm-commits mailing list