[PATCH] improve scheduling in SLPVectorizer
Erik Eckstein
eeckstein at apple.com
Wed Jul 30 07:39:53 PDT 2014
Hi Eric,
thanks again for your comments. I attached a new version.
>> std::make_unique?
Argh.. I still failed to use make_unique here because it didn't manage to set the BS pointer in addition to storing it in the map. Maybe I'm missing some c++ knowledge.
> Can be added completely separately from the rest of your patch.
I understand and agree. But I hope you don't mind if I don't split the current patch. I will consider it for the next commits.
> It's a big function, describing the general use isn't describing the algorithm.
I added some comments above and inside the function.
> Seem to have this scattered around a lot. I don't know if the logic
> can be refactored so there's a little less cut and paste here.
I know and it's somehow a compromise. But I couldn't avoid it without changing the complete logic of this large function.
> Why shouldn't it happen?
Actually this is the wrong formulation. I just didn't see this when running the test suite. I changed the wording.
> unique_ptr here maybe so you don't have to do the explicit deletes?
Good idea! I changed it.
> Still using -1/1 a lot without comments or anything describing why
> you've got those values.
I added some comments. Note that the -1/1 as arguments to incrementUnscheduledDeps() really means to decrement/increment by one and are not magic numbers.
> You can reformat it using clang-format
Thanks for this hint! I did this.
Erik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorizer-scheduling-5.patch
Type: application/octet-stream
Size: 52188 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140730/b44fb47d/attachment.obj>
-------------- next part --------------
On 29 Jul 2014, at 00:55, Eric Christopher <echristo at gmail.com> wrote:
>>
>> std::make_unique?
>>
>>
>> As I understand make_unique is not yet available in C++11
>>
>
> What Dave said here.
>
>> It'd also be nice if you could break this up to make it a bit more
>> incremental. I.e. things like the statistic can just be added first.
>>
>>
>> I'm not sure I understood.
>> I tried to make only little changes in this large function.
>
> It looks like the statistic here:
>
> +STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
>
> Can be added completely separately from the rest of your patch.
>
>>
>> Block comment above this function documenting it?
>>
>>
>> Is at the function declaration in the class definition
>>
>
> It's a big function, describing the general use isn't describing the algorithm.
>
> Few new comments:
>
> + UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false)
> + {
> + }
>
> formatting.
>
> + // The initial value for the dependency counters. It menas that the
>
> "means"
>
> + BS->cancelScheduling(VL);
>
> Seem to have this scattered around a lot. I don't know if the logic
> can be refactored so there's a little less cut and paste here.
>
> + if (NextLoadStore) {
> + if (CurrentLoadStore) CurrentLoadStore->NextLoadStore = NextLoadStore;
> + } else {
> + LastLoadStoreInRegion = CurrentLoadStore;
> + }
>
> Formatting.
>
> + // Should never happen, just to be safe.
> + // This lets the instruction/bundle never be scheduled
> and eventally
> + // disable vectorization.
>
> Why shouldn't it happen?
>
> + /// Simple memory allocation for ScheduleData.
> + std::vector<ScheduleData *> ScheduleDataChunks;
>
> unique_ptr here maybe so you don't have to do the explicit deletes?
>
> Still using -1/1 a lot without comments or anything describing why
> you've got those values.
>
> Mostly for maintainability for someone who didn't write the code and
> wants to look at it. Otherwise the patch is fine. You can reformat it
> using clang-format on the diff to make it easier.
>
> -eric
>
>> "safe"
>>
>>
>> done
>>
>> Thanks,
>> Erik
>>
>>
>>
>>
>> On 25 Jul 2014, at 19:53, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>> ----- Original Message -----
>>
>> From: "Erik Eckstein" <eeckstein at apple.com>
>> To: llvm-commits at cs.uiuc.edu
>> Sent: Tuesday, July 15, 2014 3:37:56 AM
>> Subject: [PATCH] improve scheduling in SLPVectorizer
>>
>>
>>
>> Hi,
>>
>> This patch improves the scheduling in the SLPVectorizer. Originally
>> the SLPVectorizer could only move instructions of a bundle down. The
>> new scheduling algorithm is more general, which means that (in
>> theory) it should always schedule better than the old approach.
>>
>>
>>
>> The change fixes: < rdar://problem/13677020 > [SLP vectorizer] - SLP
>> Vectorizer needs to be able to start trees at multiple user
>> reductions.
>> Another example is Bug 19657 , where the new algorithm replaces the
>> need for the heuristics for fixing it.
>>
>>
>> The compile time and execution times for the test suite look ok.
>> Measured with the test suite on x86.
>>
>>
>>
>> Compile time: approximately the same as before. The new algorithm
>> fixes a compile time problem for the (synthetic) test
>> SingleSource/UnitTest/Vector/constpool. This compiles much faster
>> now.
>>
>>
>> Execution time: not too much changes. A few benchmarks show
>> improvements when compiled with -O3 -flto.
>>
>>
>> Statistics: With the new scheduling algorithm the SLPVectorizer can
>> generate about 9% more llvm vector instructions. Note that this is a
>> static count, not the number of _executed_ instructions (So if not
>> inside a critical loop it will not show up in execution time).
>>
>>
>> As this patch is not trivial, I plan to commit it after the 3.5
>> branch.
>>
>>
>> Sounds good. I took a quick look at the patch...
>>
>> +STATISTIC(NumVectorInstructions, "Number of vector instructions
>> generated");
>>
>> +
>>
>> As a high-level comment (which may not be specific to this patch): It would
>> be really nice to have statistics on the amount of reordering that was done
>> during scheduling. I'm not sure exactly what form this should take, but I'd
>> like to get some idea of how much code that we vectorize was abababab vs
>> aaaabbbb ;)
>>
>> + // Not really required in the constructor (except SchedulingRegion)
>> as
>>
>> + // it's done as soon as it becomes a member of the scheduling region.
>>
>> + init(-1);
>>
>> I don't like this comment. How about, "We need to call init here because it
>> is not otherwise called when...".
>>
>> + if (SrcMayWrite || DepDest->Inst->mayWriteToMemory()) {
>>
>> + AliasAnalysis::Location DstLoc = getLocation(DepDest->Inst,
>> AA);
>>
>> + if (!SrcLoc.Ptr || !DstLoc.Ptr || AA->alias(SrcLoc, DstLoc))
>> {
>>
>> You probably want to use a ModRef query here for call sites (not that we
>> really care about vectorizing around calls, but we do care about vectorizing
>> around vector intrinsics and memcpy, etc., and this will be important to
>> avoid problems with @llvm.assume).
>>
>> Otherwise, this looks good to me. Thanks for working on this!
>>
>> -Hal
>>
>>
>>
>> Please let me know your comments (Arnold and Nadav already looked at
>> the patch).
>>
>>
>> Thanks,
>> Erik
>>
>>
>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>>
>>
More information about the llvm-commits
mailing list