[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?

Tue Oct 14 15:04:16 PDT 2014

> On Oct 14, 2014, at 11:16 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message -----
>> From: "Chandler Carruth" <chandlerc at google.com>
>> To: "Andrew Trick" <atrick at apple.com>
>> Cc: "James Molloy" <james at jamesmolloy.co.uk>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
>> Sent: Tuesday, October 14, 2014 12:56:46 PM
>> Subject: Re: [LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Oct 14, 2014 at 10:11 AM, Andrew Trick < atrick at apple.com >
>> wrote:
>> 
>> 
>> 
>>>> + correlated-propagation
>> 
>> A little worried about this.
>> 
>>>> + instcombine
>> 
>> I'm *very* concerned about rerunning instcombine, but understand it
>> may help cleanup the vectorized preheader.
>> 
>> 
>> 
>> Why are you concerned? Is instcombine that slow? I usually don't see
>> huge overhead from re-running it on nearly-canonical code. (Oh, I
>> see you just replied to Hal here, fair enough.
>> 
>> 
>> 
>> 
>>>> + licm
>>>> + loop-unswitch
>> 
>> These should limited to the relevant loop nest.
>> 
>> 
>> 
>> We have no way to do that currently. Do you think they will in
>> practice be too slow? If so, why? I would naively expect unswitch to
>> be essentially free unless it can do something, and LICM not much
>> more expensive.
>> 
>> 
>> 
>> 
>>>> + simplifycfg
>> 
>> OK if the CFG actually changed.
>> 
>> 
>> 
>> Again, we have no mechanism to gate this. Frustratingly, the only
>> thing I want here is to delete dead code formed by earlier passes.
>> We just don't have anything cheaper (and I don't have any
>> measurements indicating we need something cheaper).
>> 
>> 
>> 
>> 
>>>> + instcombine
>> 
>> instcombine again! This can’t be good.
>> 
>> 
>> 
>> I actually have no specific reason to think we need this other than
>> the fact that we run instcombine after simplifycfg in a bunch of
>> other places. If you're looking for one to rip out, this would be
>> the first one I would rip out because I'm doubtful of its value.
>> 
>> 
>> 
>> On a separate note:
>> 
>> 
>> 
>> 
>> 
>>>> + early-cse
>> 
>> Passes like loop-vectorize should be able to do their own CSE without
>> much engineering effort.
>> 
>>>> slp-vectorize
>>>> + early-cse
>> 
>> SLP should do its own CSE.
>> 
>> I actually agree with you in principle, but I would rather run the
>> pass now (and avoid hacks downstream to essentially do CSE in the
>> backend) than hold up progress on the hope of advanced on-demand CSE
>> layers being added to the vectorizers. I don't know of anyone
>> actually working on that, and so I'm somewhat concerned it will
>> never materialize.
> 
> I mentioned this in another mail, but to be specific, I'm also inclined to think that, globally, it shouldn't materialize. SLP should do its own internal cleanup, per tree, but cross-tree CSE should likely be left to an actual CSE pass (or perhaps GVN at -O3, but that's another matter).
> 
> What I mean is that if we have:
> 
> entry:
>  br %cond, %block1, %block2
> block1:
>  stuff in here is SLP vectorized
>  ...
> block2:
>  stuff in here is SLP vectorized
>  ...
> 
> 
> or even just:
> 
> entry:
>  ...
>  stuff in here is SLP vectorized
>  ...
>  stuff here is also SLP vectorized (using some of the same inputs)
>  ...
> 
> there might be some common vector shuffles, insert/extractelement instructions, etc. that are generated in both blocks that CSE might combine. But this is a general CSE problem (especially as these things might be memory operations, and thus need to deal with memory dependency issues), and we should not have new generalized CSE logic in the vectorizers (although we could certainly think about factoring some of the current logic out into utility functions).

Thanks for taking the time to explain! I wasn’t aware of the need to CSE across trees. In general, it would be great to see more comments either in the pass setup or the pass header explaining the dependencies between passes and motivation for those dependencies. Otherwise it gets hard to distinguish between coincidental vs. intentional pass order as things evolve.

-Andy

> 
> -Hal
> 
> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory