[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline

Tue Aug 14 12:39:38 PDT 2018

I didn't look closely at the new patch, but this appears to be a small
extension to:
https://reviews.llvm.org/D38566
...and the GVN-based reasons for delaying transformation to 'select' are
discussed in detail in the motivating bug for that patch:
https://bugs.llvm.org/show_bug.cgi?id=34603

So this sounds like the right direction to me. Note that there was
objection to the implementation (a pile of pass options vs. uniquely named
passes).

Here's another motivating bug where early transform to select prevents
optimization:
https://bugs.llvm.org/show_bug.cgi?id=36760

Is that case affected by this patch?

On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Summary
> =======
>
> I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi
> nodes into selects until later in the pass pipeline, to give passes which
> can
> understand phis but not selects more opportunity to optimize. The thing I'm
> trying to do which made me think of doing this is described below, but
> from the
> benchmarking I've done it looks like this is overall a good idea
> regardless of
> if I manage to get that done or not.
>
> Motivation
> ==========
>
> My goal is to get clang to optimize some code containing a call to
> std::min_element which is dereferenced, so something like:
>
>   float min_element_example(float *data, int size)
>   {
>     return *std::min_element(data, data+size);
>   }
>
> which, after inlining a specialization, looks like:
>
>   float min_element_example_inlined(float *first, float *last)
>   {
>     for (float *p = first; p != last; ++p)
>     {
>       if (*p < *first)
>         first = p;
>     }
>     return *first;
>   }
>
> There are two loads in the loop, *p and *first, but actually the load *p
> can be
> eliminated by using either the previous load *p or the previous *first,
> depending on if the if-condition was taken or not. However the
> "if (*p < *first) first = p" gets turned by simplifycfg into a select and
> this
> makes optimizing this a lot harder because you no longer have distinct
> paths
> through the CFG.
>
> I have some ideas on how to do the optimization (see my previous RFC
> "Making GVN
> able to visit the same block more than once" posted in April, though I've
> decided that the specific idea presented there isn't the right way to do
> it),
> but I think the first step is to make sure we don't have a select when we
> try
> to optimise this.
>
> Approach
> ========
>
> I've posted a patch to https://reviews.llvm.org/D50723 showing what I'm
> intending to do. An extra parameter is added to SimplifyCFG to control
> whether
> two-entry phi nodes are converted into select, and this is set to false in
> all
> instances before the end of module simplification. At the end of module
> simplification we do SimplifyCFG, then Instcombine to optimise the selects
> that
> are introduced, then EarlyCSE to eliminate common subexpressions
> introduced by
> instcombine.
>
> Benchmark Results
> =================
>
> These are performance differences reported by LNT when running
> llvm-test-suite,
> spec2000, and spec2006 at -O3 with and without the patch linked above
> (using
> trunk llvm from a week or so ago).
>
> AArch64 results on ARM Cortex-A72:
>
> Performance Regressions - execution_time
> Change
> SingleSource/Benchmarks/Shootout/Shootout-ary3
>  9.48%
> MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
> 3.79%
> SingleSource/Benchmarks/CoyoteBench/huffbench
> 1.40%
>
> Performance Improvements - execution_time
>  Change
> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
> -23.74%
> External/SPEC/CINT2000/256.bzip2/256.bzip2
> -9.82%
> MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt
>  -9.57%
> MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt
>  -4.38%
> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
> -3.94%
> MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
>  -3.44%
> External/SPEC/CFP2006/453.povray/453.povray
>  -2.50%
> SingleSource/Benchmarks/Adobe-C++/stepanov_vector
>  -1.49%
>
> X86_64 results on Intel Xeon E5-2690:
>
> Performance Regressions - execution_time           Change
> MultiSource/Benchmarks/Ptrdist/yacr2/yacr2          5.62%
>
> Performance Improvements - execution_time          Change
> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43%
> External/SPEC/CINT2006/456.hmmer/456.hmmer         -2.50%
> External/SPEC/CINT2006/464.h264ref/464.h264ref     -1.60%
> MultiSource/Benchmarks/nbench/nbench               -1.19%
> SingleSource/Benchmarks/Adobe-C++/functionobjects  -1.07%
>
> I had a brief look at the regressions and they all look to be caused by
> getting bad luck with branch mispredictions: I looked into the
> Shootout-ary3 and
> yacr2 cases and in both the hot code path was the same with and without the
> patch, but with more mispredictions probably caused by changes elsewhere.
>
> John
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180814/7cbb7212/attachment.html>