<div dir="ltr"><div>I didn't look closely at the new patch, but this appears to be a small extension to:</div><div><a href="https://reviews.llvm.org/D38566">https://reviews.llvm.org/D38566</a></div><div>...and the GVN-based reasons for delaying transformation to 'select' are discussed in detail in the motivating bug for that patch:</div><div><a href="https://bugs.llvm.org/show_bug.cgi?id=34603">https://bugs.llvm.org/show_bug.cgi?id=34603</a></div><div><br></div><div>So this sounds like the right direction to me. Note that there was objection to the implementation (a pile of pass options vs. uniquely named passes).<br></div><div><br></div><div>Here's another motivating bug where early transform to select prevents optimization:</div><div><a href="https://bugs.llvm.org/show_bug.cgi?id=36760">https://bugs.llvm.org/show_bug.cgi?id=36760</a><br></div><div><br></div><div>Is that case affected by this patch?<br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 14, 2018 at 11:17 AM, John Brawn via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Summary<br>

=======<br>

<br>

I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi<br>

nodes into selects until later in the pass pipeline, to give passes which can<br>

understand phis but not selects more opportunity to optimize. The thing I'm<br>

trying to do which made me think of doing this is described below, but from the<br>

benchmarking I've done it looks like this is overall a good idea regardless of<br>

if I manage to get that done or not.<br>

<br>

Motivation<br>

==========<br>

<br>

My goal is to get clang to optimize some code containing a call to<br>

std::min_element which is dereferenced, so something like:<br>

<br>

  float min_element_example(float *data, int size)<br>

  {<br>

    return *std::min_element(data, data+size);<br>

  }<br>

<br>

which, after inlining a specialization, looks like:<br>

<br>

  float min_element_example_inlined(<wbr>float *first, float *last)<br>

  {<br>

    for (float *p = first; p != last; ++p)<br>

    {<br>

      if (*p < *first)<br>

        first = p;<br>

    }<br>

    return *first;<br>

  }<br>

<br>

There are two loads in the loop, *p and *first, but actually the load *p can be<br>

eliminated by using either the previous load *p or the previous *first,<br>

depending on if the if-condition was taken or not. However the<br>

"if (*p < *first) first = p" gets turned by simplifycfg into a select and this<br>

makes optimizing this a lot harder because you no longer have distinct paths<br>

through the CFG.<br>

<br>

I have some ideas on how to do the optimization (see my previous RFC "Making GVN<br>

able to visit the same block more than once" posted in April, though I've<br>

decided that the specific idea presented there isn't the right way to do it),<br>

but I think the first step is to make sure we don't have a select when we try<br>

to optimise this.<br>

<br>

Approach<br>

========<br>

<br>

I've posted a patch to <a href="https://reviews.llvm.org/D50723" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D50723</a> showing what I'm<br>

intending to do. An extra parameter is added to SimplifyCFG to control whether<br>

two-entry phi nodes are converted into select, and this is set to false in all<br>

instances before the end of module simplification. At the end of module<br>

simplification we do SimplifyCFG, then Instcombine to optimise the selects that<br>

are introduced, then EarlyCSE to eliminate common subexpressions introduced by<br>

instcombine.<br>

<br>

Benchmark Results<br>

=================<br>

<br>

These are performance differences reported by LNT when running llvm-test-suite,<br>

spec2000, and spec2006 at -O3 with and without the patch linked above (using<br>

trunk llvm from a week or so ago).<br>

<br>

AArch64 results on ARM Cortex-A72:<br>

<br>

Performance Regressions - execution_time                              Change<br>

SingleSource/Benchmarks/<wbr>Shootout/Shootout-ary3                         9.48%<br>

MultiSource/Benchmarks/TSVC/<wbr>Packing-flt/Packing-flt                    3.79%<br>

SingleSource/Benchmarks/<wbr>CoyoteBench/huffbench                          1.40%<br>

<br>

Performance Improvements - execution_time                             Change<br>

MultiSource/Benchmarks/TSVC/<wbr>Searching-dbl/Searching-dbl              -23.74%<br>

External/SPEC/CINT2000/256.<wbr>bzip2/256.bzip2                            -9.82%<br>

MultiSource/Benchmarks/TSVC/<wbr>Searching-flt/Searching-flt               -9.57%<br>

MultiSource/Benchmarks/TSVC/<wbr>Equivalencing-flt/<wbr>Equivalencing-flt       -4.38%<br>

MultiSource/Benchmarks/TSVC/<wbr>LinearDependence-flt/<wbr>LinearDependence-flt -3.94%<br>

MultiSource/Benchmarks/TSVC/<wbr>Packing-dbl/Packing-dbl                   -3.44%<br>

External/SPEC/CFP2006/453.<wbr>povray/453.povray                           -2.50%<br>

SingleSource/Benchmarks/Adobe-<wbr>C++/stepanov_vector                     -1.49%<br>

<br>

X86_64 results on Intel Xeon E5-2690:<br>

<br>

Performance Regressions - execution_time           Change<br>

MultiSource/Benchmarks/<wbr>Ptrdist/yacr2/yacr2          5.62%<br>

<br>

Performance Improvements - execution_time          Change<br>

SingleSource/Benchmarks/Misc-<wbr>C++/Large/sphereflake -4.43%<br>

External/SPEC/CINT2006/456.<wbr>hmmer/456.hmmer         -2.50%<br>

External/SPEC/CINT2006/464.<wbr>h264ref/464.h264ref     -1.60%<br>

MultiSource/Benchmarks/nbench/<wbr>nbench               -1.19%<br>

SingleSource/Benchmarks/Adobe-<wbr>C++/functionobjects  -1.07%<br>

<br>

I had a brief look at the regressions and they all look to be caused by<br>

getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and<br>

yacr2 cases and in both the hot code path was the same with and without the<br>

patch, but with more mispredictions probably caused by changes elsewhere.<br>

<br>

John<br>

<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br></div>