[PATCH] D67318: [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 7 07:07:29 PDT 2019
lebedev.ri created this revision.
lebedev.ri added reviewers: efriedma, craig.topper, dmgreen, jmolloy.
lebedev.ri added a project: LLVM.
Herald added a subscriber: hiraditya.
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.
This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.
Also, let's add it's own `cl::opt` for this threshold.
This is an alternative solution to D65148 <https://reviews.llvm.org/D65148>:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.
Performance/codesize -wise this appears to be mostly neutral-positive.
I'm seeing 4 **major** improvements on RawSpeed benchmark:
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3052 -0.3052 225 156 225 156
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3065 -0.3066 225 156 225 156
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.7143 -0.7198 1 0 1 0
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1468 -0.1466 79 67 79 67
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1513 -0.1513 79 67 79 67
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +3.1372 +3.7836 0 1 0 1
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1331 -0.1331 170 147 170 147
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1329 -0.1327 170 147 170 147
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +1.4339 +1.9116 0 0 0 0
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0532 -0.0532 279 265 279 264
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0528 -0.0529 279 265 279 265
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev -0.2031 -0.2007 0 0 0 0
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D67318
Files:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
llvm/test/Transforms/SimplifyCFG/PhiEliminate3.ll
llvm/test/Transforms/SimplifyCFG/SpeculativeExec.ll
llvm/test/Transforms/SimplifyCFG/X86/speculate-cttz-ctlz.ll
llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
llvm/test/Transforms/SimplifyCFG/safe-abs.ll
llvm/test/Transforms/SimplifyCFG/safe-low-bit-extract.ll
llvm/test/Transforms/SimplifyCFG/signbit-like-value-extension.ll
llvm/test/Transforms/SimplifyCFG/speculate-math.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D67318.219239.patch
Type: text/x-patch
Size: 29563 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190907/86616c06/attachment.bin>
More information about the llvm-commits
mailing list