[PATCH] improve scheduling in SLPVectorizer

Mon Jul 28 05:47:54 PDT 2014

Hal, Eric,

thanks for your feedback! I attached an updated version.

> It would be really nice to have statistics on the amount of reordering that was done during scheduling.

I'll let this open for another patch. Just to let you know: the new scheduling algorithm tries to keep the original instruction order as good as possible.

> I don't like this comment

I now initialise everything in the constructor and don't need to call init any more. Therefore this comment is gone anyway.

> You probably want to use a ModRef query here for call sites

I'll also want to keep this open for another patch, unless you think it is necessary. 
Also in the old version the alias checking is done this way. I just "copied" it.

> Please go ahead and initialize everything in the constructor

done

> std::make_unique?

As I understand make_unique is not yet available in C++11

> It'd also be nice if you could break this up to make it a bit more
> incremental. I.e. things like the statistic can just be added first.

I'm not sure I understood.
I tried to make only little changes in this large function.

> Block comment above this function documenting it?

Is at the function declaration in the class definition

> "safe"

done

Thanks,
Erik

On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
>> From: "Erik Eckstein" <eeckstein at apple.com>
>> To: llvm-commits at cs.uiuc.edu
>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>> Subject: [PATCH] improve scheduling in SLPVectorizer
>> 
>> 
>> 
>> Hi,
>> 
>> This patch improves the scheduling in the SLPVectorizer. Originally
>> the SLPVectorizer could only move instructions of a bundle down. The
>> new scheduling algorithm is more general, which means that (in
>> theory) it should always schedule better than the old approach.
>> 
>> 
>> 
>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>> Vectorizer needs to be able to start trees at multiple user
>> reductions.
>> Another example is Bug 19657 , where the new algorithm replaces the
>> need for the heuristics for fixing it.
>> 
>> 
>> The compile time and execution times for the test suite look ok.
>> Measured with the test suite on x86.
>> 
>> 
>> 
>> Compile time: approximately the same as before. The new algorithm
>> fixes a compile time problem for the (synthetic) test
>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>> now.
>> 
>> 
>> Execution time: not too much changes. A few benchmarks show
>> improvements when compiled with -O3 -flto.
>> 
>> 
>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>> generate about 9% more llvm vector instructions. Note that this is a
>> static count, not the number of _executed_ instructions (So if not
>> inside a critical loop it will not show up in execution time).
>> 
>> 
>> As this patch is not trivial, I plan to commit it after the 3.5
>> branch.
> 
> Sounds good. I took a quick look at the patch...
> 
> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
> 
> +
> 
> As a high-level comment (which may not be specific to this patch): It would be really nice to have statistics on the amount of reordering that was done during scheduling. I'm not sure exactly what form this should take, but I'd like to get some idea of how much code that we vectorize was abababab vs aaaabbbb ;)
> 
> +      // Not really required in the constructor (except SchedulingRegion) as
> 
> +      // it's done as soon as it becomes a member of the scheduling region.
> 
> +      init(-1);
> 
> I don't like this comment. How about, "We need to call init here because it is not otherwise called when...".
> 
> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
> 
> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst, AA);
> 
> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc)) {
> 
> You probably want to use a ModRef query here for call sites (not that we really care about vectorizing around calls, but we do care about vectorizing around vector intrinsics and memcpy, etc., and this will be important to avoid problems with @llvm.assume).
> 
> Otherwise, this looks good to me. Thanks for working on this!
> 
> -Hal
> 
>> 
>> 
>> Please let me know your comments (Arnold and Nadav already looked at
>> the patch).
>> 
>> 
>> Thanks,
>> Erik
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140728/219b177e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorizer-scheduling-4.patch
Type: application/octet-stream
Size: 51684 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140728/219b177e/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140728/219b177e/attachment-0001.html>