[PATCH] improve scheduling in SLPVectorizer
Arnold Schwaighofer
aschwaighofer at apple.com
Thu Jul 31 17:29:27 PDT 2014
LGTM with the changes.
> On Jul 30, 2014, at 1:16 PM, Eric Christopher <echristo at gmail.com> wrote:
>
> On Wed, Jul 30, 2014 at 7:39 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>> Hi Eric,
>>
>> thanks again for your comments. I attached a new version.
>>
>>>> std::make_unique?
>>
>> Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
>
> I, or someone else (Dave maybe), will have to look a bit at it. We can
> save it for later though.
>
>>
>>> Can be added completely separately from the rest of your patch.
>>
>> I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
>
> I suppose, but just so I've said it and hope you understand here's the
> reasoning behind it:
>
> The idea of splitting up the patch into smaller pieces is for everyone
> except the you that is making this commit today.
>
> a) It's useful for code reviewers. I spent a non-zero time figuring
> out that this part of your patch wasn't related to any other part of
> the patch. If everyone does that then it increases the review load
> quite a bit. I have much less time to review patches than you have to
> write them :)
>
> b) It's useful to the next person to look at the code or the commit.
> Since you will be committing it as a single hunk someone else will
> assume that it was a necessary part of the patch and go looking for
> the reason why, wasting their time in either reasoning about the code
> to extend it or to debug it.
>
> c) It's useful to later you, when you go back to the code if you don't
> recall yourself why. :)
>
> d) Going back to the review, it's much easier to review shorter
> patches than longer ones, so why make them longer if you don't need
> to?
>
> Hope that helps explain why I cared how big the patch was.
>
>>
>>> It's a big function, describing the general use isn't describing the algorithm.
>>
>> I added some comments above and inside the function.
>>
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>>
>> I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
>
> *nod* Something to think about for a follow-on patch :)
>
>>
>>> Why shouldn't it happen?
>>
>> Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
>
> Perhaps an assert instead? Is it bad behavior if it happens?
>
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>>
>> I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
>>
>
> Ah, I see. Poor naming perhaps? Maybe just updateUnscheduledDeps(int)?
> (I'm bad at naming things I admit).
>
>>> You can reformat it using clang-format
>>
>> Thanks for this hint! I did this.
>
> Should be good to go with these changes.
>
> Thanks :)
>
> -eric
>
>>
>> Erik
>>
>>
>> On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>>
>>>>
>>>> std::make_unique?
>>>>
>>>>
>>>> As I understand make_unique is not yet available in C++11
>>>>
>>>
>>> What Dave said here.
>>>
>>>> It'd also be nice if you could break this up to make it a bit more
>>>> incremental. I.e. things like the statistic can just be added first.
>>>>
>>>>
>>>> I'm not sure I understood.
>>>> I tried to make only little changes in this large function.
>>>
>>> It looks like the statistic here:
>>>
>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>>>
>>> Can be added completely separately from the rest of your patch.
>>>
>>>>
>>>> Block comment above this function documenting it?
>>>>
>>>>
>>>> Is at the function declaration in the class definition
>>>>
>>>
>>> It's a big function, describing the general use isn't describing the algorithm.
>>>
>>> Few new comments:
>>>
>>> + UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
>>> + {
>>> + }
>>>
>>> formatting.
>>>
>>> + // The initial value for the dependency counters. It menas that the
>>>
>>> "means"
>>>
>>> + BS->cancelScheduling(VL);
>>>
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>>>
>>> + if (NextLoadStore) {
>>> + if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
>>> + } else {
>>> + LastLoadStoreInRegion = CurrentLoadStore;
>>> + }
>>>
>>> Formatting.
>>>
>>> + // Should never happen, just to be safe.
>>> + // This lets the instruction/bundle never be scheduled
>>> and eventally
>>> + // disable vectorization.
>>>
>>> Why shouldn't it happen?
>>>
>>> + /// Simple memory allocation for ScheduleData.
>>> + std::vector<ScheduleData *> ScheduleDataChunks;
>>>
>>> unique_ptr here maybe so you don't have to do the explicit deletes?
>>>
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>>>
>>> Mostly for maintainability for someone who didn't write the code and
>>> wants to look at it. Otherwise the patch is fine. You can reformat it
>>> using clang-format on the diff to make it easier.
>>>
>>> -eric
>>>
>>>> "safe"
>>>>
>>>>
>>>> done
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>>
>>>>
>>>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>
>>>> ----- Original Message -----
>>>>
>>>> From: "Erik Eckstein" <eeckstein at apple.com>
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>>>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> This patch improves the scheduling in the SLPVectorizer. Originally
>>>> the SLPVectorizer could only move instructions of a bundle down. The
>>>> new scheduling algorithm is more general, which means that (in
>>>> theory) it should always schedule better than the old approach.
>>>>
>>>>
>>>>
>>>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>>>> Vectorizer needs to be able to start trees at multiple user
>>>> reductions.
>>>> Another example is Bug 19657 , where the new algorithm replaces the
>>>> need for the heuristics for fixing it.
>>>>
>>>>
>>>> The compile time and execution times for the test suite look ok.
>>>> Measured with the test suite on x86.
>>>>
>>>>
>>>>
>>>> Compile time: approximately the same as before. The new algorithm
>>>> fixes a compile time problem for the (synthetic) test
>>>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>>>> now.
>>>>
>>>>
>>>> Execution time: not too much changes. A few benchmarks show
>>>> improvements when compiled with -O3 -flto.
>>>>
>>>>
>>>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>>>> generate about 9% more llvm vector instructions. Note that this is a
>>>> static count, not the number of _executed_ instructions (So if not
>>>> inside a critical loop it will not show up in execution time).
>>>>
>>>>
>>>> As this patch is not trivial, I plan to commit it after the 3.5
>>>> branch.
>>>>
>>>>
>>>> Sounds good. I took a quick look at the patch...
>>>>
>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>>>> generated");
>>>>
>>>> +
>>>>
>>>> As a high-level comment (which may not be specific to this patch): It would
>>>> be really nice to have statistics on the amount of reordering that was done
>>>> during scheduling. I'm not sure exactly what form this should take, but I'd
>>>> like to get some idea of how much code that we vectorize was abababab vs
>>>> aaaabbbb ;)
>>>>
>>>> + // Not really required in the constructor (except SchedulingRegion)
>>>> as
>>>>
>>>> + // it's done as soon as it becomes a member of the scheduling region.
>>>>
>>>> + init(-1);
>>>>
>>>> I don't like this comment. How about, "We need to call init here because it
>>>> is not otherwise called when...".
>>>>
>>>> + if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>>>
>>>> + AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>>>> AA);
>>>>
>>>> + if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>>>> {
>>>>
>>>> You probably want to use a ModRef query here for call sites (not that we
>>>> really care about vectorizing around calls, but we do care about vectorizing
>>>> around vector intrinsics and memcpy, etc., and this will be important to
>>>> avoid problems with @llvm.assume).
>>>>
>>>> Otherwise, this looks good to me. Thanks for working on this!
>>>>
>>>> -Hal
>>>>
>>>>
>>>>
>>>> Please let me know your comments (Arnold and Nadav already looked at
>>>> the patch).
>>>>
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>>
>>>>
>>
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list