[PATCH] D78847: [LV] Fix recording of BranchTakenCount for FoldTail
Anh Tuyen Tran via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun May 10 21:49:01 PDT 2020
anhtuyen added a comment.
In D78847#2028819 <https://reviews.llvm.org/D78847#2028819>, @anhtuyen wrote:
> In D78847#2028759 <https://reviews.llvm.org/D78847#2028759>, @Ayal wrote:
>
> > In D78847#2028708 <https://reviews.llvm.org/D78847#2028708>, @anhtuyen wrote:
> >
> > > Hello,
> >
> >
> > [snip]
> >
> > > In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>
> > >
> > > There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.
> > >
> > > 1. Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.
> > > 2. Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.
> > >
> > > I will open a Bugzzila and copy its link to this page when my password reset goes through.
> > >
> > > Thanks, Anh
> >
> > Yes, thanks for catching this!
> > One quick fix is indeed to set VTCMO to TCMO when State->VF == 1, instead of "splatting" it into a vector of a single element.
> > Thinking if fold-tail-by-masking should be restricted to work for VF>1 only, given that only vectors (loads/stores) get masked.
>
>
> I also came up (and gave up) that fix last week, because it would not work for a loop whose VF is 1, but the loop bound is a vector. I will come up with an example shortly to demonstrate my thought.
Below is an example to demonstrate that **setting VTCMO to TCMO when State->VF == 1** will not help in the case of a loop of VF 1 having a vector loop-bound.
In my example below, I add some profile meta data to guide the optimization selection process.
As you can tell it from the meta data, the numeric values were pretty much randomly selected as 10, 20, 30, and so on.
Basically, almost any arbitrary data will work as long as it is meaningful.
$ cat ./simple2.ll
define void @foo() !prof !12 {
entry:
br label %for.body
for.cond.cleanup:
ret void
for.body:
%addr = phi double* [ %ptr, %for.body ], [ undef, %entry ]
%ptr = getelementptr inbounds double, double* %addr, i64 1
%cond = icmp eq double* %ptr, undef
br i1 %cond, label %for.cond.cleanup, label %for.body
}
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"ProfileSummary", !1}
!1 = !{!2, !3, !4, !5, !6, !7, !8, !9}
!2 = !{!"ProfileFormat", !"InstrProf"}
!3 = !{!"TotalCount", i64 10}
!4 = !{!"MaxCount", i64 20}
!5 = !{!"MaxInternalCount", i64 30}
!6 = !{!"MaxFunctionCount", i64 40}
!7 = !{!"NumCounts", i64 50}
!8 = !{!"NumFunctions", i64 60}
!9 = !{!"DetailedSummary", !10}
!10 = !{!11}
!11 = !{i32 999999, i64 70, i32 80}
!12 = !{!"function_entry_count", i64 0}
Again, I will use the same option **-loop-vectorize -force-vector-interleave=4**
$ opt -loop-vectorize -force-vector-interleave=4 -S ./simple2.ll
opt: llvm-project/llvm/include/llvm/IR/Instructions.h:1144: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.
Stack dump:
0. Program arguments: build/bin/opt -loop-vectorize -force-vector-interleave=4 -S -debug-only=loop-vectorize simple2.ll
1. Running pass 'Function Pass Manager' on module 'simple2.ll'.
2. Running pass 'Loop Vectorization' on function '@foo'
#0 0x000070cd938b9bf4 PrintStackTraceSignalHandler(void*) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9bf4)
#1 0x000070cd938b6b98 llvm::sys::RunSignalHandlers() (build/bin/../lib/libLLVMSupport.so.11git+0x1e6b98)
#2 0x000070cd938b9f04 SignalHandler(int) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9f04)
#3 0x000070cd998f04d8 (linux-vdso64.so.1+0x4d8)
#4 0x000070cd92ace98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
#5 0x000070cd92ace98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
#6 0x000070cd92ad0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
#7 0x000070cd92abbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
#8 0x000070cd92abbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
#9 0x000070cd932ee5bc llvm::ICmpInst::AssertOK() (build/bin/../lib/libLLVMVectorize.so.11git+0x7e5bc)
#10 0x000070cd932bc37c llvm::IRBuilderBase::CreateICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) (build/bin/../lib/libLLVMVectorize.so.11git+0x4c37c)
#11 0x000070cd933513cc llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) (build/bin/../lib/libLLVMVectorize.so.11git+0xe13cc)
#12 0x000070cd933518d0 llvm::VPInstruction::execute(llvm::VPTransformState&) (build/bin/../lib/libLLVMVectorize.so.11git+0xe18d0)
#13 0x000070cd9335057c llvm::VPBasicBlock::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe057c)
#14 0x000070cd93352188 llvm::VPlan::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe2188)
#15 0x000070cd932db3c8 llvm::LoopVectorizationPlanner::executePlan(llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (build/bin/../lib/libLLVMVectorize.so.11git+0x6b3c8)
#16 0x000070cd932e73e0 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build/bin/../lib/libLLVMVectorize.so.11git+0x773e0)
#17 0x000070cd932e8bd4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (build/bin/../lib/libLLVMVectorize.so.11git+0x78bd4)
#18 0x000070cd932f33e0 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMVectorize.so.11git+0x833e0)
#19 0x000070cd948f073c llvm::FPPassManager::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMCore.so.11git+0x27073c)
#20 0x000070cd948f0ba0 llvm::FPPassManager::runOnModule(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x270ba0)
#21 0x000070cd948f1354 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x271354)
#22 0x000070cd948f19fc llvm::legacy::PassManager::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x2719fc)
#23 0x0000000010031aec main (build/bin/opt+0x10031aec)
#24 0x000070cd92aa441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
#25 0x000070cd92aa4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0
When setting BackedgeTakenCount with TCMO if State->VF == 1, the mismatch will occur
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%0 = add i64 %index, 0
%next.gep = getelementptr double, double* undef, i64 %0
%1 = add i64 %index, 1
%next.gep1 = getelementptr double, double* undef, i64 %1
%2 = add i64 %index, 2
%next.gep2 = getelementptr double, double* undef, i64 %2
%3 = add i64 %index, 3
%next.gep3 = getelementptr double, double* undef, i64 %3
%broadcast.splatinsert = insertelement <1 x i64> undef, i64 %index, i32 0
%broadcast.splat = shufflevector <1 x i64> %broadcast.splatinsert, <1 x i64> undef, <1 x i32> zeroinitializer
%vec.iv = add <1 x i64> %broadcast.splat, zeroinitializer
%vec.iv4 = add <1 x i64> %broadcast.splat, <i64 1>
%vec.iv5 = add <1 x i64> %broadcast.splat, <i64 2>
%vec.iv6 = add <1 x i64> %broadcast.splat, <i64 3>
%4 = icmp ule <1 x i64> %vec.iv, <i64 2305843009213693951> <======== assert when generating this line
In this case, the operand[0] (which is %vec.iv) has type <1 x i64>. The loop-bound, however, will get the type as **i64** instead of the expected **<i64 2305843009213693951>** .
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D78847/new/
https://reviews.llvm.org/D78847
More information about the llvm-commits
mailing list