[PATCH] improve scheduling in SLPVectorizer
Nadav Rotem
nrotem at apple.com
Thu Jul 31 23:24:24 PDT 2014
LGTM!
> On Jul 31, 2014, at 5:29 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>
> LGTM with the changes.
>
>
>> On Jul 30, 2014, at 1:16 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>> On Wed, Jul 30, 2014 at 7:39 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>>> Hi Eric,
>>>
>>> thanks again for your comments. I attached a new version.
>>>
>>>>> std::make_unique?
>>>
>>> Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
>>
>> I, or someone else (Dave maybe), will have to look a bit at it. We can
>> save it for later though.
>>
>>>
>>>> Can be added completely separately from the rest of your patch.
>>>
>>> I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
>>
>> I suppose, but just so I've said it and hope you understand here's the
>> reasoning behind it:
>>
>> The idea of splitting up the patch into smaller pieces is for everyone
>> except the you that is making this commit today.
>>
>> a) It's useful for code reviewers. I spent a non-zero time figuring
>> out that this part of your patch wasn't related to any other part of
>> the patch. If everyone does that then it increases the review load
>> quite a bit. I have much less time to review patches than you have to
>> write them :)
>>
>> b) It's useful to the next person to look at the code or the commit.
>> Since you will be committing it as a single hunk someone else will
>> assume that it was a necessary part of the patch and go looking for
>> the reason why, wasting their time in either reasoning about the code
>> to extend it or to debug it.
>>
>> c) It's useful to later you, when you go back to the code if you don't
>> recall yourself why. :)
>>
>> d) Going back to the review, it's much easier to review shorter
>> patches than longer ones, so why make them longer if you don't need
>> to?
>>
>> Hope that helps explain why I cared how big the patch was.
>>
>>>
>>>> It's a big function, describing the general use isn't describing the algorithm.
>>>
>>> I added some comments above and inside the function.
>>>
>>>> Seem to have this scattered around a lot. I don't know if the logic
>>>> can be refactored so there's a little less cut and paste here.
>>>
>>> I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
>>
>> *nod* Something to think about for a follow-on patch :)
>>
>>>
>>>> Why shouldn't it happen?
>>>
>>> Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
>>
>> Perhaps an assert instead? Is it bad behavior if it happens?
>>
>>>> Still using -1/1 a lot without comments or anything describing why
>>>> you've got those values.
>>>
>>> I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
>>>
>>
>> Ah, I see. Poor naming perhaps? Maybe just updateUnscheduledDeps(int)?
>> (I'm bad at naming things I admit).
>>
>>>> You can reformat it using clang-format
>>>
>>> Thanks for this hint! I did this.
>>
>> Should be good to go with these changes.
>>
>> Thanks :)
>>
>> -eric
>>
>>>
>>> Erik
>>>
>>>
>>> On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>>>
>>>>>
>>>>> std::make_unique?
>>>>>
>>>>>
>>>>> As I understand make_unique is not yet available in C++11
>>>>>
>>>>
>>>> What Dave said here.
>>>>
>>>>> It'd also be nice if you could break this up to make it a bit more
>>>>> incremental. I.e. things like the statistic can just be added first.
>>>>>
>>>>>
>>>>> I'm not sure I understood.
>>>>> I tried to make only little changes in this large function.
>>>>
>>>> It looks like the statistic here:
>>>>
>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>>>>
>>>> Can be added completely separately from the rest of your patch.
>>>>
>>>>>
>>>>> Block comment above this function documenting it?
>>>>>
>>>>>
>>>>> Is at the function declaration in the class definition
>>>>>
>>>>
>>>> It's a big function, describing the general use isn't describing the algorithm.
>>>>
>>>> Few new comments:
>>>>
>>>> + UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
>>>> + {
>>>> + }
>>>>
>>>> formatting.
>>>>
>>>> + // The initial value for the dependency counters. It menas that the
>>>>
>>>> "means"
>>>>
>>>> + BS->cancelScheduling(VL);
>>>>
>>>> Seem to have this scattered around a lot. I don't know if the logic
>>>> can be refactored so there's a little less cut and paste here.
>>>>
>>>> + if (NextLoadStore) {
>>>> + if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
>>>> + } else {
>>>> + LastLoadStoreInRegion = CurrentLoadStore;
>>>> + }
>>>>
>>>> Formatting.
>>>>
>>>> + // Should never happen, just to be safe.
>>>> + // This lets the instruction/bundle never be scheduled
>>>> and eventally
>>>> + // disable vectorization.
>>>>
>>>> Why shouldn't it happen?
>>>>
>>>> + /// Simple memory allocation for ScheduleData.
>>>> + std::vector<ScheduleData *> ScheduleDataChunks;
>>>>
>>>> unique_ptr here maybe so you don't have to do the explicit deletes?
>>>>
>>>> Still using -1/1 a lot without comments or anything describing why
>>>> you've got those values.
>>>>
>>>> Mostly for maintainability for someone who didn't write the code and
>>>> wants to look at it. Otherwise the patch is fine. You can reformat it
>>>> using clang-format on the diff to make it easier.
>>>>
>>>> -eric
>>>>
>>>>> "safe"
>>>>>
>>>>>
>>>>> done
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>> From: "Erik Eckstein" <eeckstein at apple.com>
>>>>> To: llvm-commits at cs.uiuc.edu
>>>>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>>>>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> This patch improves the scheduling in the SLPVectorizer. Originally
>>>>> the SLPVectorizer could only move instructions of a bundle down. The
>>>>> new scheduling algorithm is more general, which means that (in
>>>>> theory) it should always schedule better than the old approach.
>>>>>
>>>>>
>>>>>
>>>>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>>>>> Vectorizer needs to be able to start trees at multiple user
>>>>> reductions.
>>>>> Another example is Bug 19657 , where the new algorithm replaces the
>>>>> need for the heuristics for fixing it.
>>>>>
>>>>>
>>>>> The compile time and execution times for the test suite look ok.
>>>>> Measured with the test suite on x86.
>>>>>
>>>>>
>>>>>
>>>>> Compile time: approximately the same as before. The new algorithm
>>>>> fixes a compile time problem for the (synthetic) test
>>>>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>>>>> now.
>>>>>
>>>>>
>>>>> Execution time: not too much changes. A few benchmarks show
>>>>> improvements when compiled with -O3 -flto.
>>>>>
>>>>>
>>>>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>>>>> generate about 9% more llvm vector instructions. Note that this is a
>>>>> static count, not the number of _executed_ instructions (So if not
>>>>> inside a critical loop it will not show up in execution time).
>>>>>
>>>>>
>>>>> As this patch is not trivial, I plan to commit it after the 3.5
>>>>> branch.
>>>>>
>>>>>
>>>>> Sounds good. I took a quick look at the patch...
>>>>>
>>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>>>>> generated");
>>>>>
>>>>> +
>>>>>
>>>>> As a high-level comment (which may not be specific to this patch): It would
>>>>> be really nice to have statistics on the amount of reordering that was done
>>>>> during scheduling. I'm not sure exactly what form this should take, but I'd
>>>>> like to get some idea of how much code that we vectorize was abababab vs
>>>>> aaaabbbb ;)
>>>>>
>>>>> + // Not really required in the constructor (except SchedulingRegion)
>>>>> as
>>>>>
>>>>> + // it's done as soon as it becomes a member of the scheduling region.
>>>>>
>>>>> + init(-1);
>>>>>
>>>>> I don't like this comment. How about, "We need to call init here because it
>>>>> is not otherwise called when...".
>>>>>
>>>>> + if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>>>>
>>>>> + AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>>>>> AA);
>>>>>
>>>>> + if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>>>>> {
>>>>>
>>>>> You probably want to use a ModRef query here for call sites (not that we
>>>>> really care about vectorizing around calls, but we do care about vectorizing
>>>>> around vector intrinsics and memcpy, etc., and this will be important to
>>>>> avoid problems with @llvm.assume).
>>>>>
>>>>> Otherwise, this looks good to me. Thanks for working on this!
>>>>>
>>>>> -Hal
>>>>>
>>>>>
>>>>>
>>>>> Please let me know your comments (Arnold and Nadav already looked at
>>>>> the patch).
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>>
>>>>>
>>>>>
>>>
>>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list