[llvm-dev] [Proposal][RFC] Epilog loop vectorization

Tue Mar 14 09:49:37 PDT 2017

On 03/14/2017 11:21 AM, Adam Nemet wrote:
>
>> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com 
>> <mailto:Ashutosh.Nema at amd.com>> wrote:
>>
>> Summarizing the discussion on the implementation approaches.
>> Discussed about two approaches, first running ‘InnerLoopVectorizer’ 
>> again on the epilog loop immediately after vectorizing the original 
>> loop within the same vectorization pass, the second approach where 
>> re-running vectorization pass and limiting vectorization factor of 
>> epilog loop by metadata.
>> <Approach-2>
>> Challenges with re-running the vectorizer pass:
>> 1)Reusing alias check result:
>> When vectorizer pass runs again it finds the epilog loop as a new 
>> loop and it may generates alias check, this new alias check may 
>> overkill the gains of epilog vectorization.
>> We should use the already computed alias check result instead of re 
>> computing again.
>> 2)Rerun the vectorizer and hoist the new alias check:
>> It’s not possible to hoist alias checks as its not fully redundant 
>> (not dominated by other checks), it’s not getting execute in all paths.
>
>
> I don’t understand. Looks like you have the same alias checks for the 
> epilog loop too.  How is this CFG different from the re-vectorization 
> of the scalar loop?

You're looking at the wrong thing. This *is* the image from 
re-vectorization. The other image (linked below in step (3)) shows the 
other option.

>  Would be good to have both CFGs here and highlighting the difference.
>
> I thought that the whole point was that *if* you reached the epilog 
> vector loop via the first vector loop, you want to bypass the alias 
> checks before the epilog vector.

Yes, but, that's not quite true now. You can also reach the epilogue 
loop if you fail the min-trip-count check, and so you don't know 
anything about the aliasing checks.

  -Hal

>
> I still don’t understand why that’s not possible with some 
> sophisticated predicate propagation independent from the vectorizer. 
>  I am not saying it’s already possible but it should be.
>
> Adam
>
>
>> NOTE: We cannot prepone alias check as its expensive compared to 
>> other checks.
>> <Approach-1>
>> 1)Current patch depends on the existing functionality of 
>> LoopVectorizer, it uses ‘InnerLoopVectorizer’ again to vectorize the 
>> epilog loop, as it happens in the same vectorization pass we have 
>> flexibility to reuse already computed alias result check & limit 
>> vectorization factor for the epilog loop.
>> 2)It does not generate the blocks for new block layout explicitly, 
>> rather it depends on ‘InnerLoopVectorizer::createEmptyLoop’ to 
>> generate new block layout. The new block layout get automatically 
>> generated by calling the ‘InnerLoopVectorizer:: vectorize’ again.
>> 3)Block layout description with epilog loop vectorization is available at
>> https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png
>> Approach-1 looks feasible, please comment if any objections.
>> Regards,
>> Ashutosh
>> *From:*Nema, Ashutosh
>> *Sent:*Wednesday, March 1, 2017 10:42 AM
>> *To:*'Daniel Berlin' <dberlin at dberlin.org <mailto:dberlin at dberlin.org>>
>> *Cc:*anemet at apple.com <mailto:anemet at apple.com>; Hal Finkel 
>> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Zaks, Ayal 
>> <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato Golin 
>> <renato.golin at linaro.org 
>> <mailto:renato.golin at linaro.org>>;mkuper at google.com 
>> <mailto:mkuper at google.com>; Mehdi Amini <mehdi.amini at apple.com 
>> <mailto:mehdi.amini at apple.com>>; llvm-dev <llvm-dev at lists.llvm.org 
>> <mailto:llvm-dev at lists.llvm.org>>
>> *Subject:*RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization
>> Sorry I misunderstood, gvn/newgvn/gvnhoist cannot help here as these 
>> checks are not dominated by all paths.
>> Regards,
>> Ashutosh
>> *From:*Daniel Berlin [mailto:dberlin at dberlin.org]
>> *Sent:*Tuesday, February 28, 2017 6:58 PM
>> *To:*Nema, Ashutosh <Ashutosh.Nema at amd.com 
>> <mailto:Ashutosh.Nema at amd.com>>
>> *Cc:*anemet at apple.com <mailto:anemet at apple.com>; Hal Finkel 
>> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Zaks, Ayal 
>> <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato Golin 
>> <renato.golin at linaro.org 
>> <mailto:renato.golin at linaro.org>>;mkuper at google.com 
>> <mailto:mkuper at google.com>; Mehdi Amini <mehdi.amini at apple.com 
>> <mailto:mehdi.amini at apple.com>>; llvm-dev <llvm-dev at lists.llvm.org 
>> <mailto:llvm-dev at lists.llvm.org>>
>> *Subject:*Re: [llvm-dev] [Proposal][RFC] Epilog loop vectorization
>> Hoisting or removing?
>> Neither pass does hoisting, you'd need gvnhoist for that.
>> Even then:
>> Staring at your example, none of the checks are fully redundant  (IE 
>> they are not dominated by other checks) or execute on all paths.
>> Thus, hoisting them  would be purely speculative code motion, which 
>> none of our passes do.
>> If you would like these sets of checks to be removed, you would need 
>> to place them in a place that they execute unconditionally.
>> Otherwise, this is not a standard code hoisting/removal transform.
>> The only redundancy i can see here at all is the repeated 
>> getelementptr computation.
>> If you move it to the preheader, it will be eliminated.
>> Otherwise, none of the checks are redundant.
>>
>> What would you hope to happen in this case?
>> On Tue, Feb 28, 2017 at 5:09 AM, Nema, Ashutosh 
>> <Ashutosh.Nema at amd.com <mailto:Ashutosh.Nema at amd.com>> wrote:
>>
>>     I have tried running both gvn and newgvn but it did not helped in
>>     hoisting the alias checks:
>>     Please check, maybe I have missed something.
>>     <TestCase>
>>     void foo (char *A, char *B, char *C, int len) {
>>       int i = 0;
>>       for (i=0 ; i< len; i++)
>>         A[i] = B[i] + C[i];
>>     }
>>     <Command>
>>       $ opt –O3 –gvn test.ll –o test.opt.ll
>>       $ opt –O3 –newgvn test.ll –o test.opt.ll
>>     “test.ll” is attached, it got already vectorized by the approach
>>     running vectorizer twice by annotate the remainder loop with
>>     metadata to limit the vectorization factor for epilog vector loop.
>>     Regards,
>>     Ashutosh
>>     *From:*anemet at apple.com
>>     <mailto:anemet at apple.com>[mailto:anemet at apple.com
>>     <mailto:anemet at apple.com>]
>>     *Sent:*Tuesday, February 28, 2017 1:33 AM
>>     *To:*Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>
>>     *Cc:*Daniel Berlin <dberlin at dberlin.org
>>     <mailto:dberlin at dberlin.org>>; Nema, Ashutosh
>>     <Ashutosh.Nema at amd.com <mailto:Ashutosh.Nema at amd.com>>; Zaks,
>>     Ayal <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato
>>     Golin <renato.golin at linaro.org
>>     <mailto:renato.golin at linaro.org>>;mkuper at google.com
>>     <mailto:mkuper at google.com>; Mehdi Amini <mehdi.amini at apple.com
>>     <mailto:mehdi.amini at apple.com>>; llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>>     *Subject:*Re: [llvm-dev] [Proposal][RFC] Epilog loop vectorization
>>
>>         On Feb 27, 2017, at 12:01 PM, Hal Finkel <hfinkel at anl.gov
>>         <mailto:hfinkel at anl.gov>> wrote:
>>         On 02/27/2017 01:47 PM, Daniel Berlin wrote:
>>
>>             On Mon, Feb 27, 2017 at 11:29 AM, Adam Nemet
>>             <anemet at apple.com <mailto:anemet at apple.com>> wrote:
>>
>>                     On Feb 27, 2017, at 10:11 AM, Hal Finkel
>>                     <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>                     On 02/27/2017 11:47 AM, Adam Nemet wrote:
>>
>>                             On Feb 27, 2017, at 9:39 AM, Daniel
>>                             Berlin <dberlin at dberlin.org
>>                             <mailto:dberlin at dberlin.org>> wrote:
>>                             On Mon, Feb 27, 2017 at 9:29 AM, Adam
>>                             Nemet <anemet at apple.com
>>                             <mailto:anemet at apple.com>> wrote:
>>
>>                                     On Feb 27, 2017, at 7:27 AM, Hal
>>                                     Finkel <hfinkel at anl.gov
>>                                     <mailto:hfinkel at anl.gov>> wrote:
>>
>>                                     On 02/27/2017 06:29 AM, Nema,
>>                                     Ashutosh wrote:
>>
>>                                         Thanks for looking into this.
>>                                         1) Issues with re running
>>                                         vectorizer:
>>                                         Vectorizer might generate
>>                                         redundant alias checks while
>>                                         vectorizing epilog loop.
>>                                         Redundant alias checks are
>>                                         expensive, we like to reuse
>>                                         the results of already
>>                                         computed alias checks.
>>                                         With metadata we can limit
>>                                         the width of epilog loop, but
>>                                         not sure about reusing alias
>>                                         check result.
>>                                         Any thoughts on rerunning
>>                                         vectorizer with reusing the
>>                                         alias check result ?
>>
>>
>>                                     One way of looking at this is:
>>                                     Reusing the alias-check result is
>>                                     really just a conditional
>>                                     propagation problem; if we don't
>>                                     already have an optimization that
>>                                     can combine these after the fact,
>>                                     then we should.
>>
>>                                 +Danny
>>                                 Isn’t Extended SSA supposed to help
>>                                 with this?
>>
>>                             Yes, it will solve this with no issue
>>                             already.  GVN probably does already too.
>>                             even if if you have
>>                             if (a == b)
>>                             if (a == c)
>>                              if (a == d)
>>                              if (a == e)
>>                              if (a == g)
>>                             and  we can prove a ... g equivalent,
>>                             newgvn will eliminate them all and set
>>                             all the branches true.
>>                             If you need a simpler clean up pass, we
>>                             could run it on sub-graphs.
>>
>>                         Yes we probably don’t want to run a full GVN
>>                         after the “loop-scheduling” passes.
>>
>>
>>                     FWIW, we could, just without the
>>                     memory-dependence analysis enabled (i.e. set the
>>                     NoLoads constructor parameter to true). GVN is
>>                     pretty fast in that mode.
>>
>>                 OK. Another data point is that I’ve seen cases in the
>>                 past where the alias checks required for the loop
>>                 passes could enable GVN to remove redundant
>>                 loads/stores. Currently we can only pick these up
>>                 with LTO when GVN is rerun.
>>
>>             This is just GVN brokenness, newgvn should not have this
>>             problem.
>>             If it does, i'd love to see it.
>>
>>
>>         I thought that the problem is that we just don't run GVN
>>         after that point in the pipeline.
>>
>>     Yeah, that is the problem but I think Danny misunderstood what I
>>     was trying to say.
>>     This was a datapoint to possibly rerun GVN with memory-awareness.
>>
>>
>>          -Hal
>>
>>             (I'm working on the last few parts of turning it on by
>>             default, but it requires a new getModRefInfo interface to
>>             be able to get the last few testcases)
>>
>>         -- 
>>
>>         Hal Finkel
>>
>>         Lead, Compiler Technology and Programming Languages
>>
>>         Leadership Computing Facility
>>
>>         Argonne National Laboratory
>>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170314/b6c4e0a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 18444 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170314/b6c4e0a2/attachment-0001.jpe>