[PATCH] improve scheduling in SLPVectorizer

Mon Jul 28 15:55:16 PDT 2014

>
> std::make_unique?
>
>
> As I understand make_unique is not yet available in C++11
>

What Dave said here.

> It'd also be nice if you could break this up to make it a bit more
> incremental. I.e. things like the statistic can just be added first.
>
>
> I'm not sure I understood.
> I tried to make only little changes in this large function.

It looks like the statistic here:

+STATISTIC(NumVectorInstructions, "Number of vector instructions generated");

Can be added completely separately from the rest of your patch.

>
> Block comment above this function documenting it?
>
>
> Is at the function declaration in the class definition
>

It's a big function, describing the general use isn't describing the algorithm.

Few new comments:

+          UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
+    {
+    }

formatting.

+    // The initial value for the dependency counters. It menas that the

"means"

+          BS->cancelScheduling(VL);

Seem to have this scattered around a lot. I don't know if the logic
can be refactored so there's a little less cut and paste here.

+  if (NextLoadStore) {
+    if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
+  } else {
+    LastLoadStoreInRegion = CurrentLoadStore;
+  }

Formatting.

+            // Should never happen, just to be safe.
+            // This lets the instruction/bundle never be scheduled
and eventally
+            // disable vectorization.

Why shouldn't it happen?

+    /// Simple memory allocation for ScheduleData.
+    std::vector<ScheduleData *> ScheduleDataChunks;

unique_ptr here maybe so you don't have to do the explicit deletes?

Still using -1/1 a lot without comments or anything describing why
you've got those values.

Mostly for maintainability for someone who didn't write the code and
wants to look at it. Otherwise the patch is fine. You can reformat it
using clang-format on the diff to make it easier.

-eric

> "safe"
>
>
> done
>
> Thanks,
> Erik
>
>
>
>
> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ----- Original Message -----
>
> From: "Erik Eckstein" <eeckstein at apple.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Tuesday, July 15, 2014 3:37:56 AM
> Subject: [PATCH] improve scheduling in SLPVectorizer
>
>
>
> Hi,
>
> This patch improves the scheduling in the SLPVectorizer. Originally
> the SLPVectorizer could only move instructions of a bundle down. The
> new scheduling algorithm is more general, which means that (in
> theory) it should always schedule better than the old approach.
>
>
>
> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
> Vectorizer needs to be able to start trees at multiple user
> reductions.
> Another example is Bug 19657 , where the new algorithm replaces the
> need for the heuristics for fixing it.
>
>
> The compile time and execution times for the test suite look ok.
> Measured with the test suite on x86.
>
>
>
> Compile time: approximately the same as before. The new algorithm
> fixes a compile time problem for the (synthetic) test
> SingleSource/UnitTest/Vector/constpool. This compiles much faster
> now.
>
>
> Execution time: not too much changes. A few benchmarks show
> improvements when compiled with -O3 -flto.
>
>
> Statistics: With the new scheduling algorithm the SLPVectorizer can
> generate about 9% more llvm vector instructions. Note that this is a
> static count, not the number of _executed_ instructions (So if not
> inside a critical loop it will not show up in execution time).
>
>
> As this patch is not trivial, I plan to commit it after the 3.5
> branch.
>
>
> Sounds good. I took a quick look at the patch...
>
> +STATISTIC(NumVectorInstructions, "Number of vector instructions
> generated");
>
> +
>
> As a high-level comment (which may not be specific to this patch): It would
> be really nice to have statistics on the amount of reordering that was done
> during scheduling. I'm not sure exactly what form this should take, but I'd
> like to get some idea of how much code that we vectorize was abababab vs
> aaaabbbb ;)
>
> +      // Not really required in the constructor (except SchedulingRegion)
> as
>
> +      // it's done as soon as it becomes a member of the scheduling region.
>
> +      init(-1);
>
> I don't like this comment. How about, "We need to call init here because it
> is not otherwise called when...".
>
> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>
> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
> AA);
>
> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
> {
>
> You probably want to use a ModRef query here for call sites (not that we
> really care about vectorizing around calls, but we do care about vectorizing
> around vector intrinsics and memcpy, etc., and this will be important to
> avoid problems with @llvm.assume).
>
> Otherwise, this looks good to me. Thanks for working on this!
>
> -Hal
>
>
>
> Please let me know your comments (Arnold and Nadav already looked at
> the patch).
>
>
> Thanks,
> Erik
>
>
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
>
>