[llvm-dev] [Proposal][RFC] Epilog loop vectorization
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 15 09:46:18 PDT 2017
On 03/14/2017 07:50 PM, Adam Nemet wrote:
>
>> On Mar 14, 2017, at 11:33 AM, Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>>
>> On 03/14/2017 12:11 PM, Adam Nemet wrote:
>>>
>>>> On Mar 14, 2017, at 9:49 AM, Hal Finkel <hfinkel at anl.gov
>>>> <mailto:hfinkel at anl.gov>> wrote:
>>>>
>>>>
>>>> On 03/14/2017 11:21 AM, Adam Nemet wrote:
>>>>>
>>>>>> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh
>>>>>> <Ashutosh.Nema at amd.com <mailto:Ashutosh.Nema at amd.com>> wrote:
>>>>>>
>>>>>> Summarizing the discussion on the implementation approaches.
>>>>>> Discussed about two approaches, first running
>>>>>> ‘InnerLoopVectorizer’ again on the epilog loop immediately after
>>>>>> vectorizing the original loop within the same vectorization pass,
>>>>>> the second approach where re-running vectorization pass and
>>>>>> limiting vectorization factor of epilog loop by metadata.
>>>>>> <Approach-2>
>>>>>> Challenges with re-running the vectorizer pass:
>>>>>> 1)Reusing alias check result:
>>>>>> When vectorizer pass runs again it finds the epilog loop as a new
>>>>>> loop and it may generates alias check, this new alias check may
>>>>>> overkill the gains of epilog vectorization.
>>>>>> We should use the already computed alias check result instead of
>>>>>> re computing again.
>>>>>> 2)Rerun the vectorizer and hoist the new alias check:
>>>>>> It’s not possible to hoist alias checks as its not fully
>>>>>> redundant (not dominated by other checks), it’s not getting
>>>>>> execute in all paths.
>>>>>> <Mail Attachment.jpeg>
>>>>>
>>>>>
>>>>> I don’t understand. Looks like you have the same alias checks for
>>>>> the epilog loop too. How is this CFG different from the
>>>>> re-vectorization of the scalar loop?
>>>>
>>>> You're looking at the wrong thing. This *is* the image from
>>>> re-vectorization. The other image (linked below in step (3)) shows
>>>> the other option.
>>>
>>> Ah ok, the numbering confused me here.
>>>
>>>>
>>>>> Would be good to have both CFGs here and highlighting the difference.
>>>>>
>>>>> I thought that the whole point was that *if* you reached the
>>>>> epilog vector loop via the first vector loop, you want to bypass
>>>>> the alias checks before the epilog vector.
>>>>
>>>> Yes, but, that's not quite true now. You can also reach the
>>>> epilogue loop if you fail the min-trip-count check, and so you
>>>> don't know anything about the aliasing checks.
>>>
>>> OK, so we want this loops to be handled specially. We effectively
>>> say that we only vectorize this loop if it does not require any
>>> alias checks or if the alias checks can be predicate-forwarded to
>>> this loop from existing checks.
>>>
>>> This still seems like an orthogonal issue that may be interesting to
>>> solve independently. In other words this could be a nice feature in
>>> the vectorizer anyway: the loop is estimated to be low-trip count so
>>> feel free to predicate the new vector loop so that the alias check
>>> result could be reused from some other block. We obviously don’t
>>> have this capability today but it’s something that could be nice
>>> aside from the vectorizer.
>>
>> That sounds great. I'm not sure, however, exactly what this means.
>> This "predicate forwarding" sounds like a non-local restructuring of
>> the surrounding code (because the predicates aren't known to be true
>> at that point, we need to search different predecessors to find one
>> in which the conditions might be true and insert the vector loop
>> across that predecessor edge instead). Maybe we could do this, for
>> example, by calling SE->isKnownPredicate enhanced with some
>> additional context sensitivity because we currently check dominating
>> conditional branches for things that are AddRecs in some loop?
>> Moreover, we then have the problem of restructuring the order of the
>> trip-count checks (because we need to fast fail to the scalar loop
>> for the smallest trip counts). Maybe we can do this the same way?
>> This means finding a dominating check on the trip count that implies
>> the check we're about to insert, change that check to the check we
>> want, keeping the original only on the true side of the trip-count
>> check (i.e. if the trip count is larger than the small threshold,
>> then check the large threshold).
>>
>> If we're going to do all of that, then I'd lean toward saying that
>> this does not belong in the vectorizer at all. Rather, this seems
>> like something we'd want in some general transformation (this seems
>> somewhat akin to what JumpThreading does). The general transformation
>> seems something like this; the basic vectorized loop looks like this:
>>
>> int start = 0;
>> if (n >= vf) {
>> if (check) {
>> for (...; start += vf)
>> ...
>> }
>> }
>>
>> for (i = start; i < n; ++i) {
>> ...
>> }
>>
>> and after we vectorize the epilogue loop we end up with this:
>>
>> int start = 0;
>> if (n >= vf) {
>> if (check) {
>> for (...; start += vf)
>> ...
>> }
>> }
>>
>> if (n >= vf2) {
>
> This is (n - start >= vf2) which I think makes this a bit more
> complicated since now “check” is not always redundant if n >= vf2 so
> hoisting becomes a cost decision.
>
>> if (check) {
>
> But we can turn this into a predicate we already have: if (n >= vf &&
> check). BTW, this is I think how Ashutosh’ approach 1 works too; if
> the first vector loop does not iterate once the alias result will be
> false in the epilog vector loop.
>
>> for (...; start += vf2)
>> ...
>> }
>> }
>>
>> for (i = start; i < n; ++i) {
>> ...
>> }
>>
>> and we need to end up with this:
>>
>> int start = 0;
>> if (n >= vf2) {
>> if (check) {
>> if (n >= vf) {
>> for (...; start += vf)
>> ...
>> }
>>
>> for (...; start += vf2)
>> ...
>> }
>> }
>>
>> for (i = start; i < n; ++i) {
>> ...
>> }
>>
>> where we've recognized here that 'check' is the same in both cases,
>> and that because vf2 < vf, the one trip-count check implies the
>> other. This latter part seems like the part that our existing passes
>> might not know what to do with currently. Thoughts?
>
>
> And then we get:
>
> int start = 0;
> *bool check_result = false;
> *if (n >= vf) {
> if (check) {
> *check_result = true;*
> for (...; start += vf)
> ...
> }
> }
>
> if (n - start >= vf2) {
> if (*check_result*) {
> for (...; start += vf2)
> ...
> }
> }
>
> for (i = start; i < n; ++i) {
> …
> }
>
> What do you think?
I think this seems about right. We shouldn't re-compare the trip counts
in the epilogue loop if the alias check is false anyway:
int start = 0;*
*if (n >= vf) {
if (check) {
for (...; start += vf)
...
} else { goto scalar; }
} else { goto scalar; }
if (n - start >= vf2) {
for (...; start += vf2)
...
}
scalar:
for (i = start; i < n; ++i) {
…
}
-Hal
>
> Adam
>
>>
>> -Hal
>>
>>>
>>> Adam
>>>
>>>>
>>>> -Hal
>>>>
>>>>>
>>>>> I still don’t understand why that’s not possible with some
>>>>> sophisticated predicate propagation independent from the
>>>>> vectorizer. I am not saying it’s already possible but it should be.
>>>>>
>>>>> Adam
>>>>>
>>>>>
>>>>>> NOTE: We cannot prepone alias check as its expensive compared to
>>>>>> other checks.
>>>>>> <Approach-1>
>>>>>> 1)Current patch depends on the existing functionality of
>>>>>> LoopVectorizer, it uses ‘InnerLoopVectorizer’ again to vectorize
>>>>>> the epilog loop, as it happens in the same vectorization pass we
>>>>>> have flexibility to reuse already computed alias result check &
>>>>>> limit vectorization factor for the epilog loop.
>>>>>> 2)It does not generate the blocks for new block layout
>>>>>> explicitly, rather it depends on
>>>>>> ‘InnerLoopVectorizer::createEmptyLoop’ to generate new block
>>>>>> layout. The new block layout get automatically generated by
>>>>>> calling the ‘InnerLoopVectorizer:: vectorize’ again.
>>>>>> 3)Block layout description with epilog loop vectorization is
>>>>>> available at
>>>>>> https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png
>>>>>> Approach-1 looks feasible, please comment if any objections.
>>>>>> Regards,
>>>>>> Ashutosh
>>>>>> ...
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/a610dec3/attachment-0001.html>
More information about the llvm-dev
mailing list