[all-commits] [llvm/llvm-project] 94d3dd: [NFC][EarlyCSE][InstSimplify] Add tests for CSE of...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Thu Aug 27 08:47:44 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 94d3dd8b08a1abcd2964c4a5d5f6a3a3da2fa4cf
      https://github.com/llvm/llvm-project/commit/94d3dd8b08a1abcd2964c4a5d5f6a3a3da2fa4cf
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2020-08-27 (Thu, 27 Aug 2020)

  Changed paths:
    A llvm/test/Transforms/EarlyCSE/phi.ll
    A llvm/test/Transforms/InstSimplify/phi-cse.ll

  Log Message:
  -----------
  [NFC][EarlyCSE][InstSimplify] Add tests for CSE of PHI nodes

PHI nodes depend on the block they're in,
so we can only deal with the most basic case of same-BB PHI's.


  Commit: 6102310d814ad73eab60a88b21dd70874f7a056f
      https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2020-08-27 (Thu, 27 Aug 2020)

  Changed paths:
    M llvm/lib/Analysis/InstructionSimplify.cpp
    M llvm/lib/IR/Instruction.cpp
    M llvm/test/CodeGen/X86/statepoint-vector.ll
    M llvm/test/Transforms/EarlyCSE/phi.ll
    M llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
    M llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
    M llvm/test/Transforms/InstCombine/phi-equal-incoming-pointers.ll
    M llvm/test/Transforms/InstCombine/select.ll
    M llvm/test/Transforms/InstSimplify/phi-cse.ll
    M llvm/test/Transforms/JumpThreading/loop-phi.ll
    M llvm/test/Transforms/LoopVectorize/reduction.ll

  Log Message:
  -----------
  [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block

Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).


Compare: https://github.com/llvm/llvm-project/compare/c9455d3c5792...6102310d814a


More information about the All-commits mailing list