[PATCH] D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.

Tue Oct 3 19:09:13 PDT 2017

chandlerc added a comment.

In https://reviews.llvm.org/D37467#864858, @dberlin wrote:

> I'll admit to not staring at the cost model too hard (I rarely have useful input on that kind of target specific thing), but it looked at a glance like you trying to calculate  which might constant fold as well.
>  If not, or it's not part of the goal, awesome.

Correct, I'm not trying to calculate that, and it isn't part of the goal. The idea being, by the point this pass runs any profitable folding-through-PHI-speculation should be handled by something much more like GVN. The intent is for this to catch cases where the minimal, canonical form has edge-dependent constant inputs and without any *folding* it is profitable on the target to speculate users along the edge. The things I imagine firing here are things like x86's immediate operands and other architecture's barrel shifter operands.

Anyways, hopefully that clarifies. Also, the examples in the code now probably help. I've added a comment to specifically talk about the fact that we *don't* try to handle this case.

> i'm happy to whiteboard the safety part with you.
>  What you are doing would be safe, but maybe not cost effective, if you check and move the entire operand chain until you hit a phi in the same block. It looks like you might be, but it's not entirely clear you are doing that.

I may have bugs, but that is exactly the intent of the code.

> The logic in speculatephis and visiting deps/users is hard to follow for me to ensure it comes up with the right list.

Yeah, I tried to find a cleaner way to write this to make this clear, but failed to come up with one. It's in an awkward part of LLVM's IR for walking in the precise way we want. That is compounded by the fact that we expect for this whole thing to happen relatively rarely, and be profitable even more rarely. So the entire thing tries to avoid walk large amounts of the IR until there is at least some evidence that this is useful.

> The core point, however, to  take the very simple example above:
> 
>   a = phi(0, b)
>   c = add a, 2
>   d = mul c, a
> 
> 
> It would not be safe to speculate mul c,a without copying + translating add a, 2 as well.
> 
> A valid translation to each predecessor for mul is not:
>  mul c, 0
>  and 
>  mul c, b
> 
> it's
>  cop1 = add 0, 2
>  mul cop1, 0
> 
> and
> 
> cop2 = add a, b
>  mul cop2, b
> 
> So you must copy and translate c to translate d.

Correct. And I think I have test cases covering this, for example `@test_speculate_free_insts` where we are required to copy and speculate `trunc` instructions even though they don't use any PHI nodes just as we would have to do that for the `add` in your example.

> Also, if you do the speculation in postorder, you are going to presumably reevaluate the same ops again and again in smaller chains (first you do d above, then you do c).
> 
> It's unclear to me that you are really doing it in postorder, and not preorder :)
>  If you are, I imagine it would be better to go the other way around, in preorder, because you could save redundant checks and cut off earlier.
>  This assumes, however, there is no reason to speculate d above if you choose not to speculate c.

I think the profitability walk is OK -- it is in postorder, but of the operand graph rather than the use graph -- essentially, the inverse graph. It is specifically designed to check the shorter chains first and memoize that result when checking the longer chains.

It turns out that we have to do this to get good profitability measurements anyways, even setting aside the complexity. If we don't find small chains that are themselves profitable and mark them as profitable on their own, we would incorrectly count their speculation cost against the benefit of speculating the large chain. We still can't get this to be perfect because it is a greedy algorithm and there are cases that doesn't handle, but by being greedy in the right order we handle important, obvious cases such as exactly the one you describe where we want to first consider the chain rooted at c, then at d, etc etc., where each is a superset of the one before and it is important to consider the cost incrementally. I have some specific test cases for that which motivate this.

However, the *safety* check is I think still a problem (as you pointed out in your first comment). Specifically, it caches safe subgraphs but not unsafe subgraphs and so it can walk the unsafe subgraph over and over again and go superlinear. I'm working on a fix to that, but wanted to write up this much and see if we're on the same page.

https://reviews.llvm.org/D37467