[PATCH] improve scheduling in SLPVectorizer

Thu Jul 31 17:29:27 PDT 2014

LGTM with the changes.

> On Jul 30, 2014, at 1:16 PM, Eric Christopher <echristo at gmail.com> wrote:
> 
> On Wed, Jul 30, 2014 at 7:39 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>> Hi Eric,
>> 
>> thanks again for your comments. I attached a new version.
>> 
>>>> std::make_unique?
>> 
>> Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
> 
> I, or someone else (Dave maybe), will have to look a bit at it. We can
> save it for later though.
> 
>> 
>>> Can be added completely separately from the rest of your patch.
>> 
>> I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
> 
> I suppose, but just so I've said it and hope you understand here's the
> reasoning behind it:
> 
> The idea of splitting up the patch into smaller pieces is for everyone
> except the you that is making this commit today.
> 
> a) It's useful for code reviewers. I spent a non-zero time figuring
> out that this part of your patch wasn't related to any other part of
> the patch. If everyone does that then it increases the review load
> quite a bit. I have much less time to review patches than you have to
> write them :)
> 
> b) It's useful to the next person to look at the code or the commit.
> Since you will be committing it as a single hunk someone else will
> assume that it was a necessary part of the patch and go looking for
> the reason why, wasting their time in either reasoning about the code
> to extend it or to debug it.
> 
> c) It's useful to later you, when you go back to the code if you don't
> recall yourself why. :)
> 
> d) Going back to the review, it's much easier to review shorter
> patches than longer ones, so why make them longer if you don't need
> to?
> 
> Hope that helps explain why I cared how big the patch was.
> 
>> 
>>> It's a big function, describing the general use isn't describing the algorithm.
>> 
>> I added some comments above and inside the function.
>> 
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>> 
>> I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
> 
> *nod* Something to think about for a follow-on patch :)
> 
>> 
>>> Why shouldn't it happen?
>> 
>> Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
> 
> Perhaps an assert instead? Is it bad behavior if it happens?
> 
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>> 
>> I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
>> 
> 
> Ah, I see. Poor naming perhaps? Maybe just updateUnscheduledDeps(int)?
> (I'm bad at naming things I admit).
> 
>>> You can reformat it using clang-format
>> 
>> Thanks for this hint! I did this.
> 
> Should be good to go with these changes.
> 
> Thanks :)
> 
> -eric
> 
>> 
>> Erik
>> 
>> 
>> On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>> 
>>>> 
>>>> std::make_unique?
>>>> 
>>>> 
>>>> As I understand make_unique is not yet available in C++11
>>>> 
>>> 
>>> What Dave said here.
>>> 
>>>> It'd also be nice if you could break this up to make it a bit more
>>>> incremental. I.e. things like the statistic can just be added first.
>>>> 
>>>> 
>>>> I'm not sure I understood.
>>>> I tried to make only little changes in this large function.
>>> 
>>> It looks like the statistic here:
>>> 
>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>>> 
>>> Can be added completely separately from the rest of your patch.
>>> 
>>>> 
>>>> Block comment above this function documenting it?
>>>> 
>>>> 
>>>> Is at the function declaration in the class definition
>>>> 
>>> 
>>> It's a big function, describing the general use isn't describing the algorithm.
>>> 
>>> Few new comments:
>>> 
>>> +          UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
>>> +    {
>>> +    }
>>> 
>>> formatting.
>>> 
>>> +    // The initial value for the dependency counters. It menas that the
>>> 
>>> "means"
>>> 
>>> +          BS->cancelScheduling(VL);
>>> 
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>>> 
>>> +  if (NextLoadStore) {
>>> +    if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
>>> +  } else {
>>> +    LastLoadStoreInRegion = CurrentLoadStore;
>>> +  }
>>> 
>>> Formatting.
>>> 
>>> +            // Should never happen, just to be safe.
>>> +            // This lets the instruction/bundle never be scheduled
>>> and eventally
>>> +            // disable vectorization.
>>> 
>>> Why shouldn't it happen?
>>> 
>>> +    /// Simple memory allocation for ScheduleData.
>>> +    std::vector<ScheduleData *> ScheduleDataChunks;
>>> 
>>> unique_ptr here maybe so you don't have to do the explicit deletes?
>>> 
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>>> 
>>> Mostly for maintainability for someone who didn't write the code and
>>> wants to look at it. Otherwise the patch is fine. You can reformat it
>>> using clang-format on the diff to make it easier.
>>> 
>>> -eric
>>> 
>>>> "safe"
>>>> 
>>>> 
>>>> done
>>>> 
>>>> Thanks,
>>>> Erik
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>>> 
>>>> ----- Original Message -----
>>>> 
>>>> From: "Erik Eckstein" <eeckstein at apple.com>
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>>>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>>> 
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> This patch improves the scheduling in the SLPVectorizer. Originally
>>>> the SLPVectorizer could only move instructions of a bundle down. The
>>>> new scheduling algorithm is more general, which means that (in
>>>> theory) it should always schedule better than the old approach.
>>>> 
>>>> 
>>>> 
>>>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>>>> Vectorizer needs to be able to start trees at multiple user
>>>> reductions.
>>>> Another example is Bug 19657 , where the new algorithm replaces the
>>>> need for the heuristics for fixing it.
>>>> 
>>>> 
>>>> The compile time and execution times for the test suite look ok.
>>>> Measured with the test suite on x86.
>>>> 
>>>> 
>>>> 
>>>> Compile time: approximately the same as before. The new algorithm
>>>> fixes a compile time problem for the (synthetic) test
>>>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>>>> now.
>>>> 
>>>> 
>>>> Execution time: not too much changes. A few benchmarks show
>>>> improvements when compiled with -O3 -flto.
>>>> 
>>>> 
>>>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>>>> generate about 9% more llvm vector instructions. Note that this is a
>>>> static count, not the number of _executed_ instructions (So if not
>>>> inside a critical loop it will not show up in execution time).
>>>> 
>>>> 
>>>> As this patch is not trivial, I plan to commit it after the 3.5
>>>> branch.
>>>> 
>>>> 
>>>> Sounds good. I took a quick look at the patch...
>>>> 
>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>>>> generated");
>>>> 
>>>> +
>>>> 
>>>> As a high-level comment (which may not be specific to this patch): It would
>>>> be really nice to have statistics on the amount of reordering that was done
>>>> during scheduling. I'm not sure exactly what form this should take, but I'd
>>>> like to get some idea of how much code that we vectorize was abababab vs
>>>> aaaabbbb ;)
>>>> 
>>>> +      // Not really required in the constructor (except SchedulingRegion)
>>>> as
>>>> 
>>>> +      // it's done as soon as it becomes a member of the scheduling region.
>>>> 
>>>> +      init(-1);
>>>> 
>>>> I don't like this comment. How about, "We need to call init here because it
>>>> is not otherwise called when...".
>>>> 
>>>> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>>> 
>>>> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>>>> AA);
>>>> 
>>>> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>>>> {
>>>> 
>>>> You probably want to use a ModRef query here for call sites (not that we
>>>> really care about vectorizing around calls, but we do care about vectorizing
>>>> around vector intrinsics and memcpy, etc., and this will be important to
>>>> avoid problems with @llvm.assume).
>>>> 
>>>> Otherwise, this looks good to me. Thanks for working on this!
>>>> 
>>>> -Hal
>>>> 
>>>> 
>>>> 
>>>> Please let me know your comments (Arnold and Nadav already looked at
>>>> the patch).
>>>> 
>>>> 
>>>> Thanks,
>>>> Erik
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> 
>>>> 
>>>> 
>> 
>> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits