[PATCH] improve scheduling in SLPVectorizer

Thu Jul 31 23:24:24 PDT 2014

LGTM! 

> On Jul 31, 2014, at 5:29 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> 
> LGTM with the changes.
> 
> 
>> On Jul 30, 2014, at 1:16 PM, Eric Christopher <echristo at gmail.com> wrote:
>> 
>> On Wed, Jul 30, 2014 at 7:39 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>>> Hi Eric,
>>> 
>>> thanks again for your comments. I attached a new version.
>>> 
>>>>> std::make_unique?
>>> 
>>> Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
>> 
>> I, or someone else (Dave maybe), will have to look a bit at it. We can
>> save it for later though.
>> 
>>> 
>>>> Can be added completely separately from the rest of your patch.
>>> 
>>> I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
>> 
>> I suppose, but just so I've said it and hope you understand here's the
>> reasoning behind it:
>> 
>> The idea of splitting up the patch into smaller pieces is for everyone
>> except the you that is making this commit today.
>> 
>> a) It's useful for code reviewers. I spent a non-zero time figuring
>> out that this part of your patch wasn't related to any other part of
>> the patch. If everyone does that then it increases the review load
>> quite a bit. I have much less time to review patches than you have to
>> write them :)
>> 
>> b) It's useful to the next person to look at the code or the commit.
>> Since you will be committing it as a single hunk someone else will
>> assume that it was a necessary part of the patch and go looking for
>> the reason why, wasting their time in either reasoning about the code
>> to extend it or to debug it.
>> 
>> c) It's useful to later you, when you go back to the code if you don't
>> recall yourself why. :)
>> 
>> d) Going back to the review, it's much easier to review shorter
>> patches than longer ones, so why make them longer if you don't need
>> to?
>> 
>> Hope that helps explain why I cared how big the patch was.
>> 
>>> 
>>>> It's a big function, describing the general use isn't describing the algorithm.
>>> 
>>> I added some comments above and inside the function.
>>> 
>>>> Seem to have this scattered around a lot. I don't know if the logic
>>>> can be refactored so there's a little less cut and paste here.
>>> 
>>> I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
>> 
>> *nod* Something to think about for a follow-on patch :)
>> 
>>> 
>>>> Why shouldn't it happen?
>>> 
>>> Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
>> 
>> Perhaps an assert instead? Is it bad behavior if it happens?
>> 
>>>> Still using -1/1 a lot without comments or anything describing why
>>>> you've got those values.
>>> 
>>> I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
>>> 
>> 
>> Ah, I see. Poor naming perhaps? Maybe just updateUnscheduledDeps(int)?
>> (I'm bad at naming things I admit).
>> 
>>>> You can reformat it using clang-format
>>> 
>>> Thanks for this hint! I did this.
>> 
>> Should be good to go with these changes.
>> 
>> Thanks :)
>> 
>> -eric
>> 
>>> 
>>> Erik
>>> 
>>> 
>>> On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>>> 
>>>>> 
>>>>> std::make_unique?
>>>>> 
>>>>> 
>>>>> As I understand make_unique is not yet available in C++11
>>>>> 
>>>> 
>>>> What Dave said here.
>>>> 
>>>>> It'd also be nice if you could break this up to make it a bit more
>>>>> incremental. I.e. things like the statistic can just be added first.
>>>>> 
>>>>> 
>>>>> I'm not sure I understood.
>>>>> I tried to make only little changes in this large function.
>>>> 
>>>> It looks like the statistic here:
>>>> 
>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>>>> 
>>>> Can be added completely separately from the rest of your patch.
>>>> 
>>>>> 
>>>>> Block comment above this function documenting it?
>>>>> 
>>>>> 
>>>>> Is at the function declaration in the class definition
>>>>> 
>>>> 
>>>> It's a big function, describing the general use isn't describing the algorithm.
>>>> 
>>>> Few new comments:
>>>> 
>>>> +          UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
>>>> +    {
>>>> +    }
>>>> 
>>>> formatting.
>>>> 
>>>> +    // The initial value for the dependency counters. It menas that the
>>>> 
>>>> "means"
>>>> 
>>>> +          BS->cancelScheduling(VL);
>>>> 
>>>> Seem to have this scattered around a lot. I don't know if the logic
>>>> can be refactored so there's a little less cut and paste here.
>>>> 
>>>> +  if (NextLoadStore) {
>>>> +    if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
>>>> +  } else {
>>>> +    LastLoadStoreInRegion = CurrentLoadStore;
>>>> +  }
>>>> 
>>>> Formatting.
>>>> 
>>>> +            // Should never happen, just to be safe.
>>>> +            // This lets the instruction/bundle never be scheduled
>>>> and eventally
>>>> +            // disable vectorization.
>>>> 
>>>> Why shouldn't it happen?
>>>> 
>>>> +    /// Simple memory allocation for ScheduleData.
>>>> +    std::vector<ScheduleData *> ScheduleDataChunks;
>>>> 
>>>> unique_ptr here maybe so you don't have to do the explicit deletes?
>>>> 
>>>> Still using -1/1 a lot without comments or anything describing why
>>>> you've got those values.
>>>> 
>>>> Mostly for maintainability for someone who didn't write the code and
>>>> wants to look at it. Otherwise the patch is fine. You can reformat it
>>>> using clang-format on the diff to make it easier.
>>>> 
>>>> -eric
>>>> 
>>>>> "safe"
>>>>> 
>>>>> 
>>>>> done
>>>>> 
>>>>> Thanks,
>>>>> Erik
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Erik Eckstein" <eeckstein at apple.com>
>>>>> To: llvm-commits at cs.uiuc.edu
>>>>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>>>>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This patch improves the scheduling in the SLPVectorizer. Originally
>>>>> the SLPVectorizer could only move instructions of a bundle down. The
>>>>> new scheduling algorithm is more general, which means that (in
>>>>> theory) it should always schedule better than the old approach.
>>>>> 
>>>>> 
>>>>> 
>>>>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>>>>> Vectorizer needs to be able to start trees at multiple user
>>>>> reductions.
>>>>> Another example is Bug 19657 , where the new algorithm replaces the
>>>>> need for the heuristics for fixing it.
>>>>> 
>>>>> 
>>>>> The compile time and execution times for the test suite look ok.
>>>>> Measured with the test suite on x86.
>>>>> 
>>>>> 
>>>>> 
>>>>> Compile time: approximately the same as before. The new algorithm
>>>>> fixes a compile time problem for the (synthetic) test
>>>>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>>>>> now.
>>>>> 
>>>>> 
>>>>> Execution time: not too much changes. A few benchmarks show
>>>>> improvements when compiled with -O3 -flto.
>>>>> 
>>>>> 
>>>>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>>>>> generate about 9% more llvm vector instructions. Note that this is a
>>>>> static count, not the number of _executed_ instructions (So if not
>>>>> inside a critical loop it will not show up in execution time).
>>>>> 
>>>>> 
>>>>> As this patch is not trivial, I plan to commit it after the 3.5
>>>>> branch.
>>>>> 
>>>>> 
>>>>> Sounds good. I took a quick look at the patch...
>>>>> 
>>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>>>>> generated");
>>>>> 
>>>>> +
>>>>> 
>>>>> As a high-level comment (which may not be specific to this patch): It would
>>>>> be really nice to have statistics on the amount of reordering that was done
>>>>> during scheduling. I'm not sure exactly what form this should take, but I'd
>>>>> like to get some idea of how much code that we vectorize was abababab vs
>>>>> aaaabbbb ;)
>>>>> 
>>>>> +      // Not really required in the constructor (except SchedulingRegion)
>>>>> as
>>>>> 
>>>>> +      // it's done as soon as it becomes a member of the scheduling region.
>>>>> 
>>>>> +      init(-1);
>>>>> 
>>>>> I don't like this comment. How about, "We need to call init here because it
>>>>> is not otherwise called when...".
>>>>> 
>>>>> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>>>> 
>>>>> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>>>>> AA);
>>>>> 
>>>>> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>>>>> {
>>>>> 
>>>>> You probably want to use a ModRef query here for call sites (not that we
>>>>> really care about vectorizing around calls, but we do care about vectorizing
>>>>> around vector intrinsics and memcpy, etc., and this will be important to
>>>>> avoid problems with @llvm.assume).
>>>>> 
>>>>> Otherwise, this looks good to me. Thanks for working on this!
>>>>> 
>>>>> -Hal
>>>>> 
>>>>> 
>>>>> 
>>>>> Please let me know your comments (Arnold and Nadav already looked at
>>>>> the patch).
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Erik
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits