[PATCH] improve scheduling in SLPVectorizer

Fri Jul 25 10:53:17 PDT 2014

----- Original Message -----
> From: "Erik Eckstein" <eeckstein at apple.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Tuesday, July 15, 2014 3:37:56 AM
> Subject: [PATCH] improve scheduling in SLPVectorizer
> 
> 
> 
> Hi,
> 
> This patch improves the scheduling in the SLPVectorizer. Originally
> the SLPVectorizer could only move instructions of a bundle down. The
> new scheduling algorithm is more general, which means that (in
> theory) it should always schedule better than the old approach.
> 
> 
> 
> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
> Vectorizer needs to be able to start trees at multiple user
> reductions.
> Another example is Bug 19657 , where the new algorithm replaces the
> need for the heuristics for fixing it.
> 
> 
> The compile time and execution times for the test suite look ok.
> Measured with the test suite on x86.
> 
> 
> 
> Compile time: approximately the same as before. The new algorithm
> fixes a compile time problem for the (synthetic) test
> SingleSource/UnitTest/Vector/constpool. This compiles much faster
> now.
> 
> 
> Execution time: not too much changes. A few benchmarks show
> improvements when compiled with -O3 -flto.
> 
> 
> Statistics: With the new scheduling algorithm the SLPVectorizer can
> generate about 9% more llvm vector instructions. Note that this is a
> static count, not the number of _executed_ instructions (So if not
> inside a critical loop it will not show up in execution time).
> 
> 
> As this patch is not trivial, I plan to commit it after the 3.5
> branch.

Sounds good. I took a quick look at the patch...

+STATISTIC(NumVectorInstructions, "Number of vector instructions generated");

+

As a high-level comment (which may not be specific to this patch): It would be really nice to have statistics on the amount of reordering that was done during scheduling. I'm not sure exactly what form this should take, but I'd like to get some idea of how much code that we vectorize was abababab vs aaaabbbb ;)

+      // Not really required in the constructor (except SchedulingRegion) as

+      // it's done as soon as it becomes a member of the scheduling region.

+      init(-1);

I don't like this comment. How about, "We need to call init here because it is not otherwise called when...".

+            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {

+              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst, AA);

+              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc)) {

You probably want to use a ModRef query here for call sites (not that we really care about vectorizing around calls, but we do care about vectorizing around vector intrinsics and memcpy, etc., and this will be important to avoid problems with @llvm.assume).

Otherwise, this looks good to me. Thanks for working on this!

 -Hal

> 
> 
> Please let me know your comments (Arnold and Nadav already looked at
> the patch).
> 
> 
> Thanks,
> Erik
> 
> 
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory