[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline

Wed Sep 12 14:53:57 PDT 2018

Sorry for the delayed response.

On 08/23/2018 10:01 AM, John Brawn wrote:
>
> > Here's another motivating bug where early transform to select 
> prevents optimization:
>
> > https://bugs.llvm.org/show_bug.cgi?id=36760
>
> > Is that case affected by this patch?
>
> From a quick reading of that bug it looks like CVP needs to be 
> enhanced to do a certain transformation, but the presence
> of a select means that even with that enhancement it wouldn’t do 
> anything. With this patch we have a phi and so I think
> that if the CVP enhancement were done it should work.
>
> >>> 1. LICM does not know how to hoist or sink regions.  It does know 
> how to hoist and sink selects.
> >>I think we should teach LICM to do this eventually.
> >Agreed.  I just don't currently see patches to do so. Once LICM 
> supports region hoisting, much of my concern disappears.
>
> I’ve looked into what happens with LICM with this change, and it looks 
> like everything except the phi gets hoisted, then
> later the phi is turned into a select, then a later LICM pass hoists 
> the select. For the very simple tests I tried we end up with
> the same result in the end, but I wouldn’t be surprised if indeed LICM 
> not doing this makes things worse in more complex
> cases.
>
> I’ve hacked together a modification to LICM to make it able to hoist 
> phis (by essentially hoisting entire blocks instead of
> moving everything into a single preheader block) and it seemed to work 
> OK, so I’ll get to work on getting that working
> properly and committed before I touch phi-to-select transformation.
>
This would be a really useful extension to LICM.  I look forward to the 
patch.  I was half considering an approach recently where LICM itself 
did phi-to-select if all of the operands to the resulting phi would be 
loop invariant.  It was just ugly enough I didn't move forward with it, 
but we have seen cases where it would resolve pass ordering problems 
leading to strictly better results overall.  I suspect your region 
hoister would have the same overall effect.

> > InstCombine has limited support for triangles/diamonds, but fairly 
> extensive support for selects.
>
> Currently I have InstCombine running directly after phis have been 
> turned into selects, so this should only be a problem if
> something before that relies on the select instcombine having already 
> happened. I’ll look into that to see if it is indeed a
> problem.
>
> >>>EarlyCSE and GVN do not know how to eliminate fully redundant 
> triangles/diamonds.  PRE *may* get some of these cases, but that is by 
> no means guaranteed or likely to be robust.
> >>Agreed, we’ll need a plan to deal with these issues.
> >Again, I'd love to see these issues fixed.  Once that's done, the 
> convert about phi vs select as canonical form is much less relevant.
>
> I’ll look into this.
>
> > For the one of the specific cases mentioned, teaching GVN to do FRE 
> and PRE for loads from selects of pointers just doesn't seem that 
> painful.  I would be really tempted to do that
> > instead.  Similarly, walking facts back from select uses in CVP 
> seems generally useful since we have use dependent facts in a bunch of 
> contexts, not just selects.  (Call arguments for
> > instance,  non-null from unconditional deref, etc..)
>
> Actually making GVN able to handle selects being painful is exactly 
> the reason I’m proposing doing this. GVN has a hard
> constraint that a single block can only have a single available value, 
> and attempting to change that (which you have to do
> when selecting between values defined in the same block) causes a ton 
> of problems that have to be resolved.
>
> John
>
> *From:*Philip Reames [mailto:listmail at philipreames.com]
> *Sent:* 17 August 2018 21:17
> *To:* Amara Emerson; Hal Finkel; John Brawn
> *Cc:* Sanjay Patel; llvm-dev at lists.llvm.org; nd
> *Subject:* Re: [llvm-dev] [RFC] Delaying phi-to-select transformation 
> until later in the pass pipeline
>
> On 08/17/2018 09:02 AM, Amara Emerson wrote:
>
>         On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev
>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>         On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote:
>
>             I'm concerned that we're focusing on one side of this. 
>             Let me point out a few concerns w/changing the canonical
>             form here:
>
>              1. LICM does not know how to hoist or sink regions.  It
>                 does know how to hoist and sink selects.
>
>     I think we should teach LICM to do this eventually.
>
> Agreed.  I just don't currently see patches to do so.  Once LICM 
> supports region hoisting, much of my concern disappears.
>
>          2. InstCombine has limited support for triangles/diamonds,
>             but fairly extensive support for selects.
>          3. EarlyCSE and GVN do not know how to eliminate fully
>             redundant triangles/diamonds.  PRE *may* get some of these
>             cases, but that is by no means guaranteed or likely to be
>             robust.
>
> Agreed, we’ll need a plan to deal with these issues.
>
> Again, I'd love to see these issues fixed.  Once that's done, the 
> convert about phi vs select as canonical form is much less relevant.
>
>         My personal opinion is that selects are the appropriate
>         canonical form.
>
>         For the one of the specific cases mentioned, teaching GVN to
>         do FRE and PRE for loads from selects of pointers just doesn't
>         seem that painful.  I would be really tempted to do that
>         instead.  Similarly, walking facts back from select uses in
>         CVP seems generally useful since we have use dependent facts
>         in a bunch of contexts, not just selects.  (Call arguments for
>         instance,  non-null from unconditional deref, etc..)
>
>         To be clear, I am raising concerns, not actively objecting to
>         this.  If you want to move forward and commit to work through
>         all of the issues identified I wont actively stand in the way.
>
>
>     As I've expressed in the past, I think that not using select as
>     part of our canonical form is potentially a superior design.
>     However, it would be a major change. In addition to the issues
>     that Philip mentions, there's also the fact that we'll just have
>     more basic blocks and that, in itself, could lead to an increase
>     in compile time. However, working through these issues will likely
>     leave us with a more-robust optimizer.
>
>     See also: https://bugs.llvm.org/show_bug.cgi?id=34603#c19
>
> Canonicalizing to phis is my preference too.
>
>
>  -Hal
>
>
> Philip
>
> On 08/14/2018 12:39 PM, Sanjay Patel via llvm-dev wrote:
>
>     I didn't look closely at the new patch, but this appears to be a
>     small extension to:
>
>     https://reviews.llvm.org/D38566
>
>     ...and the GVN-based reasons for delaying transformation to
>     'select' are discussed in detail in the motivating bug for that patch:
>
>     https://bugs.llvm.org/show_bug.cgi?id=34603
>
>     So this sounds like the right direction to me. Note that there was
>     objection to the implementation (a pile of pass options vs.
>     uniquely named passes).
>
>     Here's another motivating bug where early transform to select
>     prevents optimization:
>
>     https://bugs.llvm.org/show_bug.cgi?id=36760
>
>     Is that case affected by this patch?
>
>     On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Summary
>     =======
>
>     I'm planning on adjusting SimplifyCFG so that it doesn't turn
>     two-entry phi
>     nodes into selects until later in the pass pipeline, to give
>     passes which can
>     understand phis but not selects more opportunity to optimize. The
>     thing I'm
>     trying to do which made me think of doing this is described below,
>     but from the
>     benchmarking I've done it looks like this is overall a good idea
>     regardless of
>     if I manage to get that done or not.
>
>     Motivation
>     ==========
>
>     My goal is to get clang to optimize some code containing a call to
>     std::min_element which is dereferenced, so something like:
>
>       float min_element_example(float *data, int size)
>       {
>         return *std::min_element(data, data+size);
>       }
>
>     which, after inlining a specialization, looks like:
>
>       float min_element_example_inlined(float *first, float *last)
>       {
>         for (float *p = first; p != last; ++p)
>         {
>           if (*p < *first)
>             first = p;
>         }
>         return *first;
>       }
>
>     There are two loads in the loop, *p and *first, but actually the
>     load *p can be
>     eliminated by using either the previous load *p or the previous
>     *first,
>     depending on if the if-condition was taken or not. However the
>     "if (*p < *first) first = p" gets turned by simplifycfg into a
>     select and this
>     makes optimizing this a lot harder because you no longer have
>     distinct paths
>     through the CFG.
>
>     I have some ideas on how to do the optimization (see my previous
>     RFC "Making GVN
>     able to visit the same block more than once" posted in April,
>     though I've
>     decided that the specific idea presented there isn't the right way
>     to do it),
>     but I think the first step is to make sure we don't have a select
>     when we try
>     to optimise this.
>
>     Approach
>     ========
>
>     I've posted a patch to https://reviews.llvm.org/D50723
>     <https://reviews.llvm.org/D50723> showing what I'm
>     intending to do. An extra parameter is added to SimplifyCFG to
>     control whether
>     two-entry phi nodes are converted into select, and this is set to
>     false in all
>     instances before the end of module simplification. At the end of
>     module
>     simplification we do SimplifyCFG, then Instcombine to optimise the
>     selects that
>     are introduced, then EarlyCSE to eliminate common subexpressions
>     introduced by
>     instcombine.
>
> Is scheduling another simplifycfg, instcombine + earlycse pass really 
> necessary? I’m concerned about the compile time impact.
>
> Cheers,
>
> Amara
>
>
>             Benchmark Results
>             =================
>
>             These are performance differences reported by LNT when
>             running llvm-test-suite,
>             spec2000, and spec2006 at -O3 with and without the patch
>             linked above (using
>             trunk llvm from a week or so ago).
>
>             AArch64 results on ARM Cortex-A72:
>
>             Performance Regressions - execution_time                  
>                     Change
>             SingleSource/Benchmarks/Shootout/Shootout-ary3  9.48%
>             MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 3.79%
>             SingleSource/Benchmarks/CoyoteBench/huffbench 1.40%
>
>             Performance Improvements - execution_time                
>                      Change
>             MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
>             -23.74%
>             External/SPEC/CINT2000/256.bzip2/256.bzip2                
>                       -9.82%
>             MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt
>              -9.57%
>             MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt
>              -4.38%
>             MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
>             -3.94%
>             MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl  -3.44%
>             External/SPEC/CFP2006/453.povray/453.povray              
>                        -2.50%
>             SingleSource/Benchmarks/Adobe-C++/stepanov_vector  -1.49%
>
>             X86_64 results on Intel Xeon E5-2690:
>
>             Performance Regressions - execution_time        Change
>             MultiSource/Benchmarks/Ptrdist/yacr2/yacr2         5.62%
>
>             Performance Improvements - execution_time       Change
>             SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43%
>             External/SPEC/CINT2006/456.hmmer/456.hmmer        -2.50%
>             External/SPEC/CINT2006/464.h264ref/464.h264ref     -1.60%
>             MultiSource/Benchmarks/nbench/nbench        -1.19%
>             SingleSource/Benchmarks/Adobe-C++/functionobjects  -1.07%
>
>             I had a brief look at the regressions and they all look to
>             be caused by
>             getting bad luck with branch mispredictions: I looked into
>             the Shootout-ary3 and
>             yacr2 cases and in both the hot code path was the same
>             with and without the
>             patch, but with more mispredictions probably caused by
>             changes elsewhere.
>
>             John
>
>             _______________________________________________
>             LLVM Developers mailing list
>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>             http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>         _______________________________________________
>
>         LLVM Developers mailing list
>
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
>     _______________________________________________
>
>     LLVM Developers mailing list
>
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180912/3f6912b6/attachment-0001.html>