[PATCH] D65148: [SimplifyCFG] Bump phi-node-folding-threshold from 2 to 3

Sat Sep 7 01:57:00 PDT 2019

lebedev.ri added a comment.

Thank you for taking a look!

In D65148#1661691 <https://reviews.llvm.org/D65148#1661691>, @efriedma wrote:

> That looks mostly fine, then.  Maybe still a few more cmovs than I'd like, still... but close enough.

Aha! :)

> In terms of the actual code, I have a couple questions I'm trying to understand:
> 
> 1. It looks like PHINodeFoldingThreshold actually controls multiple SimplifyCFG transforms?

Yes, it appears to control 3 separate folds:

- `SpeculativelyExecuteBB()`
- `FoldTwoEntryPHINode((`
- `mergeConditionalStoreToAddress()`

> Which transforms are actually providing the improvements you see?

`FoldTwoEntryPHINode()`.

> Should we be tying all of them together?

Uh, good question.
I'm not all that familiar with SimplifyCFG, but i believe this
is okay. That being said, i see at least 4 other bugs there.

Both `FoldTwoEntryPHINode((` and `mergeConditionalStoreToAddress()` use
`PHINodeFoldingThreshold` separately for each BB, both of them use it to
count the accumulative cost of all per-BB instructions, so this is consistent.

(1) minor:
`mergeConditionalStoreToAddress()` uses `PHINodeFoldingThreshold`
as instruction count while others actually use it as cost.
It should likely actually check the cost, not instruction count.

(2,3) critical:
Both `FoldTwoEntryPHINode((` and `mergeConditionalStoreToAddress()` use it separately
for each BB, so even though `PHINodeFoldingThreshold` is at `2`,
we actually allow to speculate 4 instructions (at most 2 per BB),
and patch this will bump it to 6 - https://godbolt.org/z/WLfpEP

This not my intention, actually. Moreover, why would we be ok
flattening PHI with 2 instructions in both of BB's,
but not a PHI with 3 in one BB and 0 in another?
That looks just broken to me.

Do you agree that the threshold should specify the *total* cost
of instructions to speculate, not per-BB maximal speculation cost?

After fixing this, `PHINodeFoldingThreshold` will then need to be adjusted x2 (to 4)
to keep the //existing// behavior, but then the interesting cases
will be handled so there will be no immediate need to bump it to 3 (i.e. 6).

(4) insignificant:
In `SpeculativelyExecuteBB()`, we seem to at most speculate a single instruction,
so `PHINodeFoldingThreshold` is used as a cutoff threshold for that single instruction.
So this //could// be a separate `cl::opt`, but i'm not sure it's worth changing this.

> 2. For if-converting a triangle, like the testcases, it looks like we're using separate thresholds for the number of speculated instructions, and the number of generated select instructions? Should we be using some sort of combined cost, instead?

I'm not sure what you are talking about here?
We don't count the cost of the `select` we will need to produce
just like we don't consider the cost of the `PHINode` itself,
so i suppose we consider that we may for `select` by replacing `PHINode`.
Was that the question?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65148/new/

https://reviews.llvm.org/D65148