[PATCH] improve scheduling in SLPVectorizer

Wed Jul 30 13:33:35 PDT 2014

On Wed, Jul 30, 2014 at 1:16 PM, Eric Christopher <echristo at gmail.com> wrote:
> On Wed, Jul 30, 2014 at 7:39 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>> Hi Eric,
>>
>> thanks again for your comments. I attached a new version.
>>
>>>> std::make_unique?
>>
>> Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
>
> I, or someone else (Dave maybe), will have to look a bit at it. We can
> save it for later though.

Yeah, assuming Erik just replaced this:

    BS = new BlockScheduling(BB);
    BlocksSchedules[BB] = std::unique_ptr<BlockScheduling>(BS);

with this:

    BlocksSchedules[BB] = llvm::make_unique<BlockScheduling>(BB);

Then "BS" will be out of date.

The usual way to write this sort of code, assuming that BlockSchedules
isn't modified in the rest of this function..

auto &BS = BlocksSchedules[BB];
if (!BS)
  BS = llvm::make_unique<BlockScheduling>(BB);

This avoids the need for an extra assignment (one into the map and one
to the local pointer) and avoids an extra map lookup.

If BlockSchedules will be modified during the required lifetime/use of
BS, then there are a few options:

auto &BSRef = BlocksSchedules[BB];
if (!BSRef)
  BSRef = llvm::make_unique<BlockScheduling>(BB); // assuming the
BlockScheduling ctor doesn't mutate BlocksSchedules
auto *BS = BSRef.get();

(I'd consider making BlockScheduling& instead of BlockScheduling*
here, since you can do that safely now that it's initialized after the
map entry is lazily initialized)

>
>>
>>> Can be added completely separately from the rest of your patch.
>>
>> I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
>
> I suppose, but just so I've said it and hope you understand here's the
> reasoning behind it:
>
> The idea of splitting up the patch into smaller pieces is for everyone
> except the you that is making this commit today.
>
> a) It's useful for code reviewers. I spent a non-zero time figuring
> out that this part of your patch wasn't related to any other part of
> the patch. If everyone does that then it increases the review load
> quite a bit. I have much less time to review patches than you have to
> write them :)
>
> b) It's useful to the next person to look at the code or the commit.
> Since you will be committing it as a single hunk someone else will
> assume that it was a necessary part of the patch and go looking for
> the reason why, wasting their time in either reasoning about the code
> to extend it or to debug it.
>
> c) It's useful to later you, when you go back to the code if you don't
> recall yourself why. :)
>
> d) Going back to the review, it's much easier to review shorter
> patches than longer ones, so why make them longer if you don't need
> to?
>
> Hope that helps explain why I cared how big the patch was.
>
>>
>>> It's a big function, describing the general use isn't describing the algorithm.
>>
>> I added some comments above and inside the function.
>>
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>>
>> I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
>
> *nod* Something to think about for a follow-on patch :)
>
>>
>>> Why shouldn't it happen?
>>
>> Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
>
> Perhaps an assert instead? Is it bad behavior if it happens?
>
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>>
>> I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
>>
>
> Ah, I see. Poor naming perhaps? Maybe just updateUnscheduledDeps(int)?
> (I'm bad at naming things I admit).
>
>>> You can reformat it using clang-format
>>
>> Thanks for this hint! I did this.
>
> Should be good to go with these changes.
>
> Thanks :)
>
> -eric
>
>>
>> Erik
>>
>>
>> On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>>
>>>>
>>>> std::make_unique?
>>>>
>>>>
>>>> As I understand make_unique is not yet available in C++11
>>>>
>>>
>>> What Dave said here.
>>>
>>>> It'd also be nice if you could break this up to make it a bit more
>>>> incremental. I.e. things like the statistic can just be added first.
>>>>
>>>>
>>>> I'm not sure I understood.
>>>> I tried to make only little changes in this large function.
>>>
>>> It looks like the statistic here:
>>>
>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>>>
>>> Can be added completely separately from the rest of your patch.
>>>
>>>>
>>>> Block comment above this function documenting it?
>>>>
>>>>
>>>> Is at the function declaration in the class definition
>>>>
>>>
>>> It's a big function, describing the general use isn't describing the algorithm.
>>>
>>> Few new comments:
>>>
>>> +          UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
>>> +    {
>>> +    }
>>>
>>> formatting.
>>>
>>> +    // The initial value for the dependency counters. It menas that the
>>>
>>> "means"
>>>
>>> +          BS->cancelScheduling(VL);
>>>
>>> Seem to have this scattered around a lot. I don't know if the logic
>>> can be refactored so there's a little less cut and paste here.
>>>
>>> +  if (NextLoadStore) {
>>> +    if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
>>> +  } else {
>>> +    LastLoadStoreInRegion = CurrentLoadStore;
>>> +  }
>>>
>>> Formatting.
>>>
>>> +            // Should never happen, just to be safe.
>>> +            // This lets the instruction/bundle never be scheduled
>>> and eventally
>>> +            // disable vectorization.
>>>
>>> Why shouldn't it happen?
>>>
>>> +    /// Simple memory allocation for ScheduleData.
>>> +    std::vector<ScheduleData *> ScheduleDataChunks;
>>>
>>> unique_ptr here maybe so you don't have to do the explicit deletes?
>>>
>>> Still using -1/1 a lot without comments or anything describing why
>>> you've got those values.
>>>
>>> Mostly for maintainability for someone who didn't write the code and
>>> wants to look at it. Otherwise the patch is fine. You can reformat it
>>> using clang-format on the diff to make it easier.
>>>
>>> -eric
>>>
>>>> "safe"
>>>>
>>>>
>>>> done
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>>
>>>>
>>>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>
>>>> ----- Original Message -----
>>>>
>>>> From: "Erik Eckstein" <eeckstein at apple.com>
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>>>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> This patch improves the scheduling in the SLPVectorizer. Originally
>>>> the SLPVectorizer could only move instructions of a bundle down. The
>>>> new scheduling algorithm is more general, which means that (in
>>>> theory) it should always schedule better than the old approach.
>>>>
>>>>
>>>>
>>>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>>>> Vectorizer needs to be able to start trees at multiple user
>>>> reductions.
>>>> Another example is Bug 19657 , where the new algorithm replaces the
>>>> need for the heuristics for fixing it.
>>>>
>>>>
>>>> The compile time and execution times for the test suite look ok.
>>>> Measured with the test suite on x86.
>>>>
>>>>
>>>>
>>>> Compile time: approximately the same as before. The new algorithm
>>>> fixes a compile time problem for the (synthetic) test
>>>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>>>> now.
>>>>
>>>>
>>>> Execution time: not too much changes. A few benchmarks show
>>>> improvements when compiled with -O3 -flto.
>>>>
>>>>
>>>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>>>> generate about 9% more llvm vector instructions. Note that this is a
>>>> static count, not the number of _executed_ instructions (So if not
>>>> inside a critical loop it will not show up in execution time).
>>>>
>>>>
>>>> As this patch is not trivial, I plan to commit it after the 3.5
>>>> branch.
>>>>
>>>>
>>>> Sounds good. I took a quick look at the patch...
>>>>
>>>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>>>> generated");
>>>>
>>>> +
>>>>
>>>> As a high-level comment (which may not be specific to this patch): It would
>>>> be really nice to have statistics on the amount of reordering that was done
>>>> during scheduling. I'm not sure exactly what form this should take, but I'd
>>>> like to get some idea of how much code that we vectorize was abababab vs
>>>> aaaabbbb ;)
>>>>
>>>> +      // Not really required in the constructor (except SchedulingRegion)
>>>> as
>>>>
>>>> +      // it's done as soon as it becomes a member of the scheduling region.
>>>>
>>>> +      init(-1);
>>>>
>>>> I don't like this comment. How about, "We need to call init here because it
>>>> is not otherwise called when...".
>>>>
>>>> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>>>
>>>> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>>>> AA);
>>>>
>>>> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>>>> {
>>>>
>>>> You probably want to use a ModRef query here for call sites (not that we
>>>> really care about vectorizing around calls, but we do care about vectorizing
>>>> around vector intrinsics and memcpy, etc., and this will be important to
>>>> avoid problems with @llvm.assume).
>>>>
>>>> Otherwise, this looks good to me. Thanks for working on this!
>>>>
>>>> -Hal
>>>>
>>>>
>>>>
>>>> Please let me know your comments (Arnold and Nadav already looked at
>>>> the patch).
>>>>
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>>
>>>>
>>
>>