[llvm-dev] [RFC] Delaying phi-to-select transformation until later in the pass pipeline
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 14 10:45:14 PDT 2018
I think it would be good to have the CFG simplification options provided
by TTI, including the limit on phi->select transformation. We have some
code on Hexagon that would benefit not only from converting 1 phi to
select (as is the limit now), but 4 of them.
-Krzysztof
On 8/14/2018 12:17 PM, John Brawn via llvm-dev wrote:
> Summary
> =======
>
> I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi
> nodes into selects until later in the pass pipeline, to give passes which can
> understand phis but not selects more opportunity to optimize. The thing I'm
> trying to do which made me think of doing this is described below, but from the
> benchmarking I've done it looks like this is overall a good idea regardless of
> if I manage to get that done or not.
>
> Motivation
> ==========
>
> My goal is to get clang to optimize some code containing a call to
> std::min_element which is dereferenced, so something like:
>
> float min_element_example(float *data, int size)
> {
> return *std::min_element(data, data+size);
> }
>
> which, after inlining a specialization, looks like:
>
> float min_element_example_inlined(float *first, float *last)
> {
> for (float *p = first; p != last; ++p)
> {
> if (*p < *first)
> first = p;
> }
> return *first;
> }
>
> There are two loads in the loop, *p and *first, but actually the load *p can be
> eliminated by using either the previous load *p or the previous *first,
> depending on if the if-condition was taken or not. However the
> "if (*p < *first) first = p" gets turned by simplifycfg into a select and this
> makes optimizing this a lot harder because you no longer have distinct paths
> through the CFG.
>
> I have some ideas on how to do the optimization (see my previous RFC "Making GVN
> able to visit the same block more than once" posted in April, though I've
> decided that the specific idea presented there isn't the right way to do it),
> but I think the first step is to make sure we don't have a select when we try
> to optimise this.
>
> Approach
> ========
>
> I've posted a patch to https://reviews.llvm.org/D50723 showing what I'm
> intending to do. An extra parameter is added to SimplifyCFG to control whether
> two-entry phi nodes are converted into select, and this is set to false in all
> instances before the end of module simplification. At the end of module
> simplification we do SimplifyCFG, then Instcombine to optimise the selects that
> are introduced, then EarlyCSE to eliminate common subexpressions introduced by
> instcombine.
>
> Benchmark Results
> =================
>
> These are performance differences reported by LNT when running llvm-test-suite,
> spec2000, and spec2006 at -O3 with and without the patch linked above (using
> trunk llvm from a week or so ago).
>
> AArch64 results on ARM Cortex-A72:
>
> Performance Regressions - execution_time Change
> SingleSource/Benchmarks/Shootout/Shootout-ary3 9.48%
> MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 3.79%
> SingleSource/Benchmarks/CoyoteBench/huffbench 1.40%
>
> Performance Improvements - execution_time Change
> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -23.74%
> External/SPEC/CINT2000/256.bzip2/256.bzip2 -9.82%
> MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt -9.57%
> MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt -4.38%
> MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -3.94%
> MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -3.44%
> External/SPEC/CFP2006/453.povray/453.povray -2.50%
> SingleSource/Benchmarks/Adobe-C++/stepanov_vector -1.49%
>
> X86_64 results on Intel Xeon E5-2690:
>
> Performance Regressions - execution_time Change
> MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 5.62%
>
> Performance Improvements - execution_time Change
> SingleSource/Benchmarks/Misc-C++/Large/sphereflake -4.43%
> External/SPEC/CINT2006/456.hmmer/456.hmmer -2.50%
> External/SPEC/CINT2006/464.h264ref/464.h264ref -1.60%
> MultiSource/Benchmarks/nbench/nbench -1.19%
> SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07%
>
> I had a brief look at the regressions and they all look to be caused by
> getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and
> yacr2 cases and in both the hot code path was the same with and without the
> patch, but with more mispredictions probably caused by changes elsewhere.
>
> John
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
More information about the llvm-dev
mailing list