SLP/Loop vectorizer pass ordering

Wed Oct 29 14:29:53 PDT 2014

Hi Chandler,

I found a problem exposed by this commit: when CSE runs right after the loop vectorizer, it perform incorrect transformation which cause verifier to fail later. The reason for that is seemingly that after the vectorizer some analysis (DomTree?) isn’t invalidated, while it should be. This happens on several benchmarks from spec2006 when compiled with PGO/LTO. A simple workaround is to invoke simplify-cfg before early-cse, but that’s just papering over the issue.

What is a proper way for killing/updating dom-tree (I suspect it’s the rootcause) info after some pass?

Here is a small reduced test-case to reproduce the problem:

$ opt  -loop-vectorize -instcombine -early-cse  -S reduced.ll
Instruction does not dominate all uses!
  %18 = xor i64 %umax, -1
  %scevgep45 = getelementptr i8* %d.020, i64 %18
LLVM ERROR: Broken function found, compilation aborted!

Thanks,
Michael

> On Oct 13, 2014, at 5:44 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> (and the flag went in in r219644 and is '-extra-vectorizer-passes', sorry for failing to mention that)
> 
> On Mon, Oct 13, 2014 at 5:44 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
> So, I've added a flag (off by default naturally) which adds several passes that folks have suggested either here or in other conversations around the vectorizers.
> 
> The theory behind these suggestions makes a lot of sense to me, but this will be one of the hard things to benchmark, so I wanted to just get an easy switch in place that anyone could try out and report back results.
> 
> I'll start a general discussion about this on a new thread. I think at least loop-rotate makes perfect sense, and we'll see if the others seem worth their compile-time cost.
> 
> On Thu, Oct 9, 2014 at 1:51 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
> On Thu, Oct 9, 2014 at 1:46 PM, Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> wrote:
> Are you going to test ARM and x86? Otherwise could you send out your patch even though it is preliminary?
> 
> Only x86 sadly. I'll send it out later today hopefully. 
> 
> Thanks
> Gerolf
> 
>> On Oct 9, 2014, at 12:44 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
>> 
>> I have a patch I've been testing to clean up a lot of the passes around the vectorizers. I'll add this in and finish testing it, then send it out with numbers.
>> 
>> On Oct 9, 2014 12:40 PM, "Andrew Trick" <atrick at apple.com <mailto:atrick at apple.com>> wrote:
>> 
>>> On Oct 9, 2014, at 8:48 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>> 
>>> ----- Original Message -----
>>>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com <mailto:aschwaighofer at apple.com>>
>>>> To: "Zinovy Nis" <zinovy.nis at gmail.com <mailto:zinovy.nis at gmail.com>>
>>>> Cc: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>, "LLVM Commits" <llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>>, "Tobias Grosser" <tobias at grosser.es <mailto:tobias at grosser.es>>,
>>>> "Chandler Carruth" <chandlerc at google.com <mailto:chandlerc at google.com>>, "Nadav Rotem" <nrotem at apple.com <mailto:nrotem at apple.com>>
>>>> Sent: Thursday, October 9, 2014 10:07:42 AM
>>>> Subject: Re: SLP/Loop vectorizer pass ordering
>>>> 
>>>> 
>>>> The loop vectorizer now sees this loop:
>>>> 
>>>> define void
>>>> @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect*
>>>> nocapture %isect, %class.vrand* nocapture readonly
>>>> dereferenceable(32) %rng) #0 {
>>>> entry:
>>>>  br label %for.body
>>>> 
>>>> for.body:                                         ; preds =
>>>> %for.inc.for.body_crit_edge, %entry
>>>>  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next,
>>>>  %for.inc.for.body_crit_edge ]
>>>>  %occlusion.017 = phi float [ 1.000000e+00, %entry ], [ %phitmp,
>>>>  %for.inc.for.body_crit_edge ]
>>>>  %exitcond = icmp eq i64 %indvars.iv, 63
>>>>  br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
>>>> 
>>>> for.inc.for.body_crit_edge:                       ; preds = %for.body
>>>>  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>>>  %phitmp = fadd fast float %occlusion.017, 1.000000e+00
>>>>  br label %for.body
>>>> 
>>>> for.end:                                          ; preds = %for.body
>>>>  %occlusion.017.lcssa = phi float [ %occlusion.017, %for.body ]
>>>>  %t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
>>>>  store float %occlusion.017.lcssa, float* %t5, align 4, !tbaa !1
>>>>  ret void
>>>> }
>>>> 
>>>> Notice that the loop exit block is the loop header and the latch is
>>>> not guaranteed to be executed. The loop vectorizer assumes such
>>>> loops have been rotated.
>>>> 
>>>> 
>>>> If we send this IR through loop-rotate it will vectorize.
>>>> 
>>>> The farther away we move the loop vectorizer from loop rotate the
>>>> likelier some optimization will destroy the rotated from. We might
>>>> just want to run loop rotate before the loop vectorizer ...
>>>> 
>>> 
>>> I think that makes sense -- and I don't recall loop rotation being expensive, plus is preserves just about everything (and I think does a reasonable job cleaning up after itself) ;)
>>> 
>>> I'd say we run some benchmarks, and barring any issues, we just do it.
>> 
>> Well, that is a classic candidate for rotate. So assuming whatever GVN is doing is sane, then I’d say it makes sense to rerun rotation.
>> 
>> -Andy
>> 
>> 
>>>> 
>>>> 
>>>>> On Oct 9, 2014, at 1:15 AM, Zinovy Nis <zinovy.nis at gmail.com <mailto:zinovy.nis at gmail.com>>
>>>>> wrote:
>>>>> 
>>>>> Hi.
>>>>> 
>>>>> Did you have a chance to look at my reproducer?
>>>>> 
>>>>> 2014-10-07 21:34 GMT+04:00 Zinovy Nis <zinovy.nis at gmail.com <mailto:zinovy.nis at gmail.com>>:
>>>>>> Hi.
>>>>>> 
>>>>>> I attached a reduced sample, based on
>>>>>> https://code.google.com/p/aobench/ <https://code.google.com/p/aobench/>.
>>>>>> 
>>>>>> Run it first with an old SLP order:
>>>>>> 
>>>>>> 1) clang -c -Ofast -static -march=core-avx2 aobench.cpp -Rpass=.
>>>>>> -mllvm -debug-only=loop-vectorize -mllvm
>>>>>> -run-slp-after-loop-vectorization=0
>>>>>> 
>>>>>> and then with a new order:
>>>>>> 
>>>>>> 2) clang -c -Ofast -static -march=core-avx2 aobench.cpp -Rpass=.
>>>>>> -debug-only=loop-vectorize -mllvm
>>>>>> -run-slp-after-loop-vectorization=1
>>>>>> 
>>>>>> and see the logs:
>>>>>> 
>>>>>> 1) aobench.cpp:59:9: remark: vectorized loop (vectorization
>>>>>> factor: 8,
>>>>>> unrolling interleave factor: 1) [-Rpass=loop-vectorize]
>>>>>> 2) aobench.cpp:59:9: remark: loop ***not*** vectorized: use
>>>>>> -Rpass-analysis=loop-vectorize for more info
>>>>>> [-Rpass-missed=loop-vectorize]
>>>>>> 
>>>>>> LV: Found an unidentified PHI.  %occlusion.017 = phi float [
>>>>>> 1.000000e+00, %entry ], [ %phitmp, %for.inc.for.body_crit_edge ]
>>>>>> LV: Can't vectorize the instructions or CFG
>>>>>> LV: Not vectorizing: Cannot prove legality.
>>>>>> 
>>>>>> 2014-10-06 17:46 GMT+04:00 Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Zinovy Nis" <zinovy.nis at gmail.com <mailto:zinovy.nis at gmail.com>>
>>>>>>>> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>
>>>>>>>> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>>, "Tobias Grosser"
>>>>>>>> <tobias at grosser.es <mailto:tobias at grosser.es>>, "Chandler Carruth"
>>>>>>>> <chandlerc at google.com <mailto:chandlerc at google.com>>, "Nadav Rotem" <nrotem at apple.com <mailto:nrotem at apple.com>>,
>>>>>>>> "Arnold Schwaighofer" <aschwaighofer at apple.com <mailto:aschwaighofer at apple.com>>
>>>>>>>> Sent: Monday, October 6, 2014 8:44:28 AM
>>>>>>>> Subject: Re: SLP/Loop vectorizer pass ordering
>>>>>>>> 
>>>>>>>> A bit later. At least GVN creates critical edges which are not
>>>>>>>> handled
>>>>>>>> by loop vectorizer then.
>>>>>>> 
>>>>>>> Okay, please do (this is fairly important) -- if you can extract
>>>>>>> some relevant IR, filing a bug report would be great. Are you
>>>>>>> saying that running SLP early inhibits GVN from creating
>>>>>>> critical edges that the loop vectorizer does not understand?
>>>>>>> 
>>>>>>> Thanks again,
>>>>>>> Hal
>>>>>>> 
>>>>>>>> 
>>>>>>>> 2014-10-06 17:33 GMT+04:00 Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Zinovy Nis" <zinovy.nis at gmail.com <mailto:zinovy.nis at gmail.com>>
>>>>>>>>>> To: "Chandler Carruth" <chandlerc at google.com <mailto:chandlerc at google.com>>
>>>>>>>>>> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>>, "Tobias
>>>>>>>>>> Grosser"
>>>>>>>>>> <tobias at grosser.es <mailto:tobias at grosser.es>>
>>>>>>>>>> Sent: Monday, October 6, 2014 8:19:24 AM
>>>>>>>>>> Subject: Re: SLP/Loop vectorizer pass ordering
>>>>>>>>>> 
>>>>>>>>>> Please wait a while, I'm using it to revert the new order as
>>>>>>>>>> it
>>>>>>>>>> introduces regression in our internal benchmark: SLP was
>>>>>>>>>> creating
>>>>>>>>>> loop
>>>>>>>>>> vectorization opportunities when was called before LV. Now no
>>>>>>>>>> such
>>>>>>>>>> opportunities are available, so we've got a regression.
>>>>>>>>> 
>>>>>>>>> Interesting. Can you provide any further details?
>>>>>>>>> 
>>>>>>>>> -Hal
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2014-10-06 3:28 GMT+04:00 Chandler Carruth
>>>>>>>>>> <chandlerc at google.com <mailto:chandlerc at google.com>>:
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Sep 4, 2014 at 6:32 AM, James Molloy
>>>>>>>>>>> <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Hal, Chandler,
>>>>>>>>>>>> 
>>>>>>>>>>>> r217144.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Is anyone still using the option to disable this? If I don't
>>>>>>>>>>> hear
>>>>>>>>>>> anything,
>>>>>>>>>>> I'll remove this option entirely in the next week.
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Hal Finkel
>>>>>>>>> Assistant Computational Scientist
>>>>>>>>> Leadership Computing Facility
>>>>>>>>> Argonne National Laboratory
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Hal Finkel
>>>>>>> Assistant Computational Scientist
>>>>>>> Leadership Computing Facility
>>>>>>> Argonne National Laboratory
>>>> 
>>>> 
>>> 
>>> -- 
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141029/39edb839/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduced.ll
Type: application/octet-stream
Size: 5373 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141029/39edb839/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141029/39edb839/attachment-0001.html>