[llvm] r220345 - LTO: respect command-line options that disable vectorization.
Arnold Schwaighofer
aschwaighofer at apple.com
Mon Oct 27 19:44:52 PDT 2014
Alexey,
Are you sure your linker is picking up the right libLTO library? It works for me:
Vectorize per default:
$ "/usr/bin/ld" -mllvm -debug-only=loop-vectorize -dynamic -arch x86_64 -o a.out test_lto_vectorization.o -lSystem -lto_library /.../debug-cmake/lib/libLTO.dylib
LV: Checking a loop in "foo" from ld-temp.o
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Found a loop: for.body
LV: Found an induction variable.
LV: Found a write-only loop!
LV: We can vectorize this loop!
LV: Found trip count: 128
LV: The Widest type: 32 bits.
LV: The Widest register is: 128 bits.
LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds float* %r, i64 %indvars.iv
LV: Found an estimated cost of 1 for VF 1 For instruction: %0 = load float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 2 for VF 1 For instruction: %mul = fmul float %0, 2.000000e+00
LV: Found an estimated cost of 1 for VF 1 For instruction: store float %mul, float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond1 = icmp eq i64 %indvars.iv, 127
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond1, label %for.end, label %for.body
LV: Scalar loop costs: 6.
LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds float* %r, i64 %indvars.iv
LV: Found an estimated cost of 1 for VF 2 For instruction: %0 = load float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 2 for VF 2 For instruction: %mul = fmul float %0, 2.000000e+00
LV: Found an estimated cost of 1 for VF 2 For instruction: store float %mul, float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond1 = icmp eq i64 %indvars.iv, 127
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond1, label %for.end, label %for.body
LV: Vector loop of width 2 costs: 3.
LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds float* %r, i64 %indvars.iv
LV: Found an estimated cost of 1 for VF 4 For instruction: %0 = load float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 2 for VF 4 For instruction: %mul = fmul float %0, 2.000000e+00
LV: Found an estimated cost of 1 for VF 4 For instruction: store float %mul, float* %arrayidx, align 4, !tbaa !1
LV: Found an estimated cost of 4 for VF 4 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond1 = icmp eq i64 %indvars.iv, 127
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond1, label %for.end, label %for.body
LV: Vector loop of width 4 costs: 2.
LV: Selecting VF: 4.
LV: Found a vectorizable loop (4) in ld-temp.o
LV: Unroll Factor is 1
Disable by passing "disable-lto-vectorization":
"/usr/bin/ld" -mllvm -debug-only=loop-vectorize -dynamic -arch x86_64 -o a.out test_lto_vectorization.o -lSystem -lto_library /.../debug-cmake/lib/libLTO.dylib -mllvm -disable-lto-vectorization
LV: Checking a loop in "foo" from ld-temp.o
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Not vectorizing: No #pragma vectorize enable.
LV: Checking a loop in "main" from ld-temp.o
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Not vectorizing: No #pragma vectorize enable.
On 10/27/14, Alexey Volkov wrote:
> Hi Arnold,
>
> Unfortunately this change doesn't fix the problem.
> I still see messages "LV: Not vectorizing: No #pragma vectorize enable." in debug log.
>
>
> 2014-10-27 1:07 GMT+03:00 Arnold Schwaighofer <aschwaighofer at apple.com <aschwaighofer at apple.com>>:
>
> >
> >
>
> > r220652 restored the old behavior (defaulting to vectorization) while allowing to disable lto vectorization.
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> > 'disable-lto-vectorization' allows to disable lto vectorization if so desired.
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> > Alexey could you let me know whether your testcase works again?
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> >
> >
>
> > Thanks,
> >
>
> >
> >
>
> > Arnold
> >
>
> >
> >
>
> > On 10/24/14, JF Bastien wrote:
> >
>
> > > Yes, cl::opt<bool> defaults to false when there's no init. Changing the default will affect non-LTO too. This may be the right thing to do, but isn't my call. Turning it on only for LTO sounds better IMO, but I'm not sure what you're suggesting: I think there shouldn't be a vectorization flags that are different for LTO and for non-LTO.
> >
>
> > >
> >
>
> > > On Fri, Oct 24, 2014 at 8:57 AM, Arnold Schwaighofer <aschwaighofer at apple.com <aschwaighofer at apple.com> <aschwaighofer at apple.com <aschwaighofer at apple.com>>> wrote:
> >
>
> > >
> >
>
> > > > JF are you sure that “LoopVectorize” is set to true by default by the PassManager instance of libLTO?
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > The reason why I forced these parameters to true is that this is not the case if I remember correctly.
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > We wanted the default for libLTO to be with vectorization.
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PassManager.cpp:
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > static cl::opt<bool>
> >
>
> > > >
> >
>
> > >
> >
>
> > > > RunLoopVectorization("vectorize-loops", cl::Hidden,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > cl::desc("Run the Loop vectorization passes"));
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PassManagerBuilder::PassManagerBuilder() {
> >
>
> > > >
> >
>
> > >
> >
>
> > > > OptLevel = 2;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > SizeLevel = 0;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > LibraryInfo = nullptr;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > Inliner = nullptr;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > DisableTailCalls = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > DisableUnitAtATime = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > DisableUnrollLoops = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > BBVectorize = RunBBVectorization;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > SLPVectorize = RunSLPVectorization;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > LoopVectorize = RunLoopVectorization;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > RerollLoops = RunLoopRerolling;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > LoadCombine = RunLoadCombine;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > DisableGVNLoadPRE = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > VerifyInput = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > VerifyOutput = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > StripDebug = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > MergeFunctions = false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > }
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > LTOCodeGenerator.cpp:
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > /// Optimize merged modules using various IPO passes
> >
>
> > > >
> >
>
> > >
> >
>
> > > > bool LTOCodeGenerator::generateObjectFile(raw_ostream &out,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > bool DisableOpt,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > bool DisableInline,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > bool DisableGVNLoadPRE,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > std::string &errMsg) {
> >
>
> > > >
> >
>
> > >
> >
>
> > > > if (!this->determineTarget(errMsg))
> >
>
> > > >
> >
>
> > >
> >
>
> > > > return false;
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > Module *mergedModule = IRLinker.getModule();
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > // Mark which symbols can not be internalized
> >
>
> > > >
> >
>
> > >
> >
>
> > > > this->applyScopeRestrictions();
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > // Instantiate the pass manager to organize the passes.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PassManager passes;
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > // Add an appropriate DataLayout instance for this module...
> >
>
> > > >
> >
>
> > >
> >
>
> > > > mergedModule->setDataLayout(TargetMach->getSubtargetImpl()->getDataLayout());
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > Triple TargetTriple(TargetMach->getTargetTriple());
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PassManagerBuilder PMB;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.DisableGVNLoadPRE = DisableGVNLoadPRE;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > if (!DisableInline)
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.Inliner = createFunctionInliningPass();
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.LibraryInfo = new TargetLibraryInfo(TargetTriple);
> >
>
> > > >
> >
>
> > >
> >
>
> > > > if (DisableOpt)
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.OptLevel = 0;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.VerifyInput = true;
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.VerifyOutput = true;
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.populateLTOPassManager(passes, TargetMach);
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> >
> >
> > > > I think cl::opt<bool> defaults to false and your commit effectively disabled vectorization during LTO. We can recover this by changing the default cl::opt flags 'vectorize-loops’ and 'vectorize-slp' to true. If that does not work (because we make assumption somewhere about the default being false) we can follow the example of “DisableGVNLoadPRE” in LTOCodeGenerator.cpp and a a flag to disable Vectorization during LTO and pass that to the PassManager created in generateObjectFile.
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > PMB.LoopVectorize = !DisableLTOVectorization;
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > Thanks,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > Arnold
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > DisableUnrollLoops
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > On Oct 24, 2014, at 5:27 AM, Alexey Volkov <avolkov.intel at gmail.com <avolkov.intel at gmail.com> <avolkov.intel at gmail.com <avolkov.intel at gmail.com>>> wrote:
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Hi JF,
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > After your commit I saw a performance regression because of disabled Loop Vectorizer:
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > LV: Not vectorizing: No #pragma vectorize enable.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > It is really strange since I used -Ofast -flto clang's options to build an application.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Before this change loop was successfully vectorized by Loop Vectorizer.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Thanks, Alexey.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > 2014-10-22 3:18 GMT+04:00 JF Bastien <jfb at google.com <jfb at google.com> <jfb at google.com <jfb at google.com>>>:
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Author: jfb
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Date: Tue Oct 21 18:18:21 2014
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > New Revision: 220345
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > URL: http://llvm.org/viewvc/llvm-project?rev=220345&view=rev
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Log:
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > LTO: respect command-line options that disable vectorization.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Reviewers: nadav, aschwaighofer, yijiang
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Subscribers: llvm-commits
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Differential Revision: http://reviews.llvm.org/D5884
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Modified:
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Modified: llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=220345&r1=220344&r2=220345&view=diff
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > ==============================================================================
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > --- llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp (original)
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > +++ llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp Tue Oct 21 18:18:21 2014
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > @@ -440,10 +440,12 @@ void PassManagerBuilder::addLTOOptimizat
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > // More loops are countable; try to optimize them.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > PM.add(createIndVarSimplifyPass());
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > PM.add(createLoopDeletionPass());
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > - PM.add(createLoopVectorizePass(true, true));
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > + PM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > // More scalar chains could be vectorized due to more alias information
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > - PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > + if (RunSLPAfterLoopVectorization)
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > + if (SLPVectorize)
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > + PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > // After vectorization, assume intrinsics may tell us more about pointer
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > // alignments.
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > _______________________________________________
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > llvm-commits mailing list
> >
>
> > > >
> >
>
> > >
> >
>
> >
> >
> > > > > llvm-commits at cs.uiuc.edu <llvm-commits at cs.uiuc.edu> <llvm-commits at cs.uiuc.edu <llvm-commits at cs.uiuc.edu>>
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > --
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Alexey Volkov
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > Intel Corporation
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > _______________________________________________
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > llvm-commits mailing list
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > llvm-commits at cs.uiuc.edu <llvm-commits at cs.uiuc.edu> <llvm-commits at cs.uiuc.edu <llvm-commits at cs.uiuc.edu>>
> >
>
> > > >
> >
>
> > >
> >
>
> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > >
> >
>
> > >
> >
>
> > >
> >
>
> > >
> >
>
> > >
> >
>
> >
>
>
>
>
>
> --
> Alexey VolkovIntel Corporation
>
>
>
>
>
>
>
>
More information about the llvm-commits
mailing list