[llvm] r179957 - SimplifyCFG: If convert single conditional stores

Sun Apr 21 05:52:36 PDT 2013

All,

I will revert the patch given I now doubt there is a clear answer of whether we should do it (I still believe in the short term we should but that is not necessarily a good argument ;).

See below for a longer argument.

Best,
Arnold

On Apr 21, 2013, at 3:21 AM, Andrew Trick <atrick at apple.com> wrote:

> 
> On Apr 20, 2013, at 6:47 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> 
>> I don't push for general if conversation but just if-conversion of single stores. This is always going to be deemed (or not - but always the same answer) beneficial by the existing early if conversion pass (see my previous email):
>> 
>> Basically we say if-convert a store: if critical-path + cost of store < CONST
>> 
>> (Jakob correct me if I am wrong)
>> 
>> This is a simple boolean value (obtainable via TTI if need to be).
>> 
>> So even if the early-if conversion pass implemented conversion of stores (which it currently does not - but it could be made to do so) and were enabled for x86 the decision would be the same. Lest the benefit of doing it earlier, and we do some select formation (I don't think we want to totally get rid of it - I might be wrong of course).
>> 
>> There is also the argument about canonicalization. Maybe we only want to do it in a target lowering phase run of SimplifyCFG - but we currently don't have this distinction. 
>> 
>> I do think in our existing pass structure we want to straighten basic blocks when we can which is why we have some select formation: InstCombine, Vectorization, ISel all benefit from it.
>> 
>> My judgement call was that this is a case where we want do it early, which is why I pushed it. Again, others might disagree with this judgement and I am happy to revert it if this is the case.
>> 
>> I do understand the concerns about guidance what and what not should be done. I agree discussion should be had. I defer to higher powers on this and try to make my point.
>> 
>> Best,
>> Arnold
>> 
>> On Apr 20, 2013, at 8:19 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>>> 
>>> On Sat, Apr 20, 2013 at 11:44 PM, Arnold <aschwaighofer at apple.com> wrote:
>>> Given that there are potential positive secondary effects of having longer basic blocks I believe the overall effect to be positive.
>>> 
>>> Also I believe in the short term the machine if conversation pass is not in the shape to be turned on and this opt temporarily improves things until it can (possibly) be superceeded by machine if conversation. 
>>> 
>>> These two arguments were *exactly* the ones I used, and they were very loudly objected to by everyone I mentioned. I largely agreed with you, but also saw the concerns put forward by others. They were very firm and demanded the patch be reverted. I don't even know *exactly* how similar the patches are, but at the very least I think there should be a discussion and explanation of what has changed since then. Otherwise, we have inconsistent and confusing direction on how to develop any and all of these passes.
>> 
> 
> This is different than Chandler's case, because we know the MI "early" if-converter doesn't currently handle Arnold's optimization. Also, Arnold has not yet proposed any target level heuristics that attempt to predict cpu behavior, which was the main objection.
> 
> That said, we should have a reason to if-convert before lowering other than optimizing for a machine's cpu pipeline.
> 
> Are we all convinced that if-converting a single store is the proper canonical form?

I guess the answer to this is pick your poison and depends on what your goals are. An "if" makes sinking of instructions easier/possible (we have a heuristic in place for this). A "select" gets rid of control-flow, enabling IR optimizations that operate on a single basic block basis and vectorization.

Given the push back I have received so far the answer seems to be no.

> 
> I would only be convinced if
> (1a) It is a locally profitable thing to do for most normal targets (independent of surrounding code)

I claimed this is the case, although Shuxin disagreed. He mentioned that conditional moves can be expensive on some platforms (disabling speculation)

And I already sort of argued that we might want to only do this transform in a lowering phase version of SimplifyCFG only.
I am referring to this email:

> "I don't push for general if conversation but just if-conversion of single stores. This is always going to be deemed (or not - but always the same answer) beneficial by the existing early if conversion pass (see my previous email):
> 
> Basically we say if-convert a store: if critical-path + cost of store < CONST
> 
> (Jakob correct me if I am wrong)
> 
> This is a simple boolean value (obtainable via TTI if need to be).
> 
> So even if the early-if conversion pass implemented conversion of stores (which it currently does not - but it could be made to do so) and were enabled for x86 the decision would be the same. Lest the benefit of doing it earlier, and we do some select formation (I don't think we want to totally get rid of it - I might be wrong of course).
> 
> There is also the argument about canonicalization. Maybe we only want to do it in a target lowering phase run of SimplifyCFG - but we currently don't have this distinction. 
> "

> (1b) AND downstream IR optimizations are highly likely to benefit
Vectorization currently is disabled for loops like this (this can be fixed locally in vectorization). 
for (i in …) 
 a[i] = 
  if (cond)
    a[i] =

DSE could potentially remove a redundant store after this optimization took place (making this transform likelier to be profitable):
  a[i] = 
  if (cond)
    a[i] =
  a[i] =
We could do this after the MI if-converter too (assuming, we can tell that the two stores must-alias, they probably will have the same address vreg in many cases).

> (2) OR we can specifically detect a target-indendent optimization that depends on the transformation
> 
> I understand that the MI if-converter will eventually do the same thing, but that doesn't argue for doing it early. We don't want canonicalization passes to produce lowered IR that differs across targets (or worse, sub targets).

This is the argument against this patch. I already had the temptation to say make this transform dependent on TTI.

> The "optimizer likes larger blocks" has never been a compelling argument for me (I could argue either way).

I am not sure i understand but this is the case in our current infrastructure and once we make instcombine and isel global/superlocal simplify cfg will probably not exist in its current form.

> MI "early" if-conversion is clearly a good place to do this. Is there any reason not to implement it there?

Other than pragmatical expedience of getting results sooner than later there are no reasons.

> 
> If there are temporary benefits to be derived then I think we should put the effort into evaluating MI if-conversion performance instead of just focusing on this patch. I think it can be turned on as soon as someone declares it worthwhile.

Given all these arguments I think it is less than clear that we should do this in simplifycfg (as I initially thought when committing this patch) and will revert it.

Best,
Arnold