[PATCH] D67318: [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Sep 7 07:07:29 PDT 2019


lebedev.ri created this revision.
lebedev.ri added reviewers: efriedma, craig.topper, dmgreen, jmolloy.
lebedev.ri added a project: LLVM.
Herald added a subscriber: hiraditya.

Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.

This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.

Also, let's add it's own `cl::opt` for this threshold.

This is an alternative solution to D65148 <https://reviews.llvm.org/D65148>:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.

Performance/codesize -wise this appears to be mostly neutral-positive.
I'm seeing 4 **major** improvements on RawSpeed benchmark:

  Benchmark                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 27 vs 27
  Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean                                  -0.3052         -0.3052           225           156           225           156
  Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median                                -0.3065         -0.3066           225           156           225           156
  Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev                                -0.7143         -0.7198             1             0             1             0
  Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 27 vs 27
  Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean                                   -0.1468         -0.1466            79            67            79            67
  Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median                                 -0.1513         -0.1513            79            67            79            67
  Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev                                 +3.1372         +3.7836             0             1             0             1
  Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 27 vs 27
  Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean                                  -0.1331         -0.1331           170           147           170           147
  Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median                                -0.1329         -0.1327           170           147           170           147
  Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev                                +1.4339         +1.9116             0             0             0             0
  Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 27 vs 27
  Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean                                   -0.0532         -0.0532           279           265           279           264
  Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median                                 -0.0528         -0.0529           279           265           279           265
  Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev                                 -0.2031         -0.2007             0             0             0             0


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D67318

Files:
  llvm/lib/Transforms/Utils/SimplifyCFG.cpp
  llvm/test/Transforms/SimplifyCFG/PhiEliminate3.ll
  llvm/test/Transforms/SimplifyCFG/SpeculativeExec.ll
  llvm/test/Transforms/SimplifyCFG/X86/speculate-cttz-ctlz.ll
  llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
  llvm/test/Transforms/SimplifyCFG/safe-abs.ll
  llvm/test/Transforms/SimplifyCFG/safe-low-bit-extract.ll
  llvm/test/Transforms/SimplifyCFG/signbit-like-value-extension.ll
  llvm/test/Transforms/SimplifyCFG/speculate-math.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D67318.219239.patch
Type: text/x-patch
Size: 29563 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190907/86616c06/attachment.bin>


More information about the llvm-commits mailing list