[PATCH] improve scheduling in SLPVectorizer

Wed Jul 30 07:39:53 PDT 2014

Hi Eric,

thanks again for your comments. I attached a new version.

>> std::make_unique?

Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.

> Can be added completely separately from the rest of your patch.

I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.

> It's a big function, describing the general use isn't describing the algorithm.

I added some comments above and inside the function.

> Seem to have this scattered around a lot. I don't know if the logic
> can be refactored so there's a little less cut and paste here.

I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.

> Why shouldn't it happen?

Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.

> unique_ptr here maybe so you don't have to do the explicit deletes?

Good idea! I changed it.

> Still using -1/1 a lot without comments or anything describing why
> you've got those values.

I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.

> You can reformat it using clang-format

Thanks for this hint! I did this.

Erik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorizer-scheduling-5.patch
Type: application/octet-stream
Size: 52188 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140730/b44fb47d/attachment.obj>
-------------- next part --------------

On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:

>> 
>> std::make_unique?
>> 
>> 
>> As I understand make_unique is not yet available in C++11
>> 
> 
> What Dave said here.
> 
>> It'd also be nice if you could break this up to make it a bit more
>> incremental. I.e. things like the statistic can just be added first.
>> 
>> 
>> I'm not sure I understood.
>> I tried to make only little changes in this large function.
> 
> It looks like the statistic here:
> 
> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
> 
> Can be added completely separately from the rest of your patch.
> 
>> 
>> Block comment above this function documenting it?
>> 
>> 
>> Is at the function declaration in the class definition
>> 
> 
> It's a big function, describing the general use isn't describing the algorithm.
> 
> Few new comments:
> 
> +          UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
> +    {
> +    }
> 
> formatting.
> 
> +    // The initial value for the dependency counters. It menas that the
> 
> "means"
> 
> +          BS->cancelScheduling(VL);
> 
> Seem to have this scattered around a lot. I don't know if the logic
> can be refactored so there's a little less cut and paste here.
> 
> +  if (NextLoadStore) {
> +    if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
> +  } else {
> +    LastLoadStoreInRegion = CurrentLoadStore;
> +  }
> 
> Formatting.
> 
> +            // Should never happen, just to be safe.
> +            // This lets the instruction/bundle never be scheduled
> and eventally
> +            // disable vectorization.
> 
> Why shouldn't it happen?
> 
> +    /// Simple memory allocation for ScheduleData.
> +    std::vector<ScheduleData *> ScheduleDataChunks;
> 
> unique_ptr here maybe so you don't have to do the explicit deletes?
> 
> Still using -1/1 a lot without comments or anything describing why
> you've got those values.
> 
> Mostly for maintainability for someone who didn't write the code and
> wants to look at it. Otherwise the patch is fine. You can reformat it
> using clang-format on the diff to make it easier.
> 
> -eric
> 
>> "safe"
>> 
>> 
>> done
>> 
>> Thanks,
>> Erik
>> 
>> 
>> 
>> 
>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>> ----- Original Message -----
>> 
>> From: "Erik Eckstein" <eeckstein at apple.com>
>> To: llvm-commits at cs.uiuc.edu
>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>> Subject: [PATCH] improve scheduling in SLPVectorizer
>> 
>> 
>> 
>> Hi,
>> 
>> This patch improves the scheduling in the SLPVectorizer. Originally
>> the SLPVectorizer could only move instructions of a bundle down. The
>> new scheduling algorithm is more general, which means that (in
>> theory) it should always schedule better than the old approach.
>> 
>> 
>> 
>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>> Vectorizer needs to be able to start trees at multiple user
>> reductions.
>> Another example is Bug 19657 , where the new algorithm replaces the
>> need for the heuristics for fixing it.
>> 
>> 
>> The compile time and execution times for the test suite look ok.
>> Measured with the test suite on x86.
>> 
>> 
>> 
>> Compile time: approximately the same as before. The new algorithm
>> fixes a compile time problem for the (synthetic) test
>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>> now.
>> 
>> 
>> Execution time: not too much changes. A few benchmarks show
>> improvements when compiled with -O3 -flto.
>> 
>> 
>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>> generate about 9% more llvm vector instructions. Note that this is a
>> static count, not the number of _executed_ instructions (So if not
>> inside a critical loop it will not show up in execution time).
>> 
>> 
>> As this patch is not trivial, I plan to commit it after the 3.5
>> branch.
>> 
>> 
>> Sounds good. I took a quick look at the patch...
>> 
>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>> generated");
>> 
>> +
>> 
>> As a high-level comment (which may not be specific to this patch): It would
>> be really nice to have statistics on the amount of reordering that was done
>> during scheduling. I'm not sure exactly what form this should take, but I'd
>> like to get some idea of how much code that we vectorize was abababab vs
>> aaaabbbb ;)
>> 
>> +      // Not really required in the constructor (except SchedulingRegion)
>> as
>> 
>> +      // it's done as soon as it becomes a member of the scheduling region.
>> 
>> +      init(-1);
>> 
>> I don't like this comment. How about, "We need to call init here because it
>> is not otherwise called when...".
>> 
>> +            if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>> 
>> +              AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>> AA);
>> 
>> +              if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>> {
>> 
>> You probably want to use a ModRef query here for call sites (not that we
>> really care about vectorizing around calls, but we do care about vectorizing
>> around vector intrinsics and memcpy, etc., and this will be important to
>> avoid problems with @llvm.assume).
>> 
>> Otherwise, this looks good to me. Thanks for working on this!
>> 
>> -Hal
>> 
>> 
>> 
>> Please let me know your comments (Arnold and Nadav already looked at
>> the patch).
>> 
>> 
>> Thanks,
>> Erik
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> 
>> 
>>