[PATCH] D11393: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.

Thu Jul 23 16:11:54 PDT 2015

On Thu, Jul 23, 2015 at 3:52 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2015-Jul-22, at 15:54, Cong Hou <congh at google.com> wrote:
>>
>> On Wed, Jul 22, 2015 at 3:35 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>>
>>>> On 2015-Jul-22, at 12:45, Cong Hou <congh at google.com> wrote:
>>>>
>>>> On Tue, Jul 21, 2015 at 3:17 PM, Duncan P. N. Exon Smith
>>>> <dexonsmith at apple.com> wrote:
>>>>>
>>>>>> On 2015-Jul-21, at 11:27, Cong Hou <congh at google.com> wrote:
>>>>>>
>>>>>> congh created this revision.
>>>>>> congh added reviewers: dexonsmith, davidxl.
>>>>>> congh added a subscriber: llvm-commits.
>>>>>>
>>>>>> Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case:
>>>>>>
>>>>>> jne   .BB1
>>>>>> jp    .BB1
>>>>>> jmp   .BB2
>>>>>> .BB1:
>>>>>> ...
>>>>>> .BB2:
>>>>>> ...
>>>>>>
>>>>>> AnalyzeBranch() will reverse "jp  .BB1" to "jnp  .BB2" and then "jmp .BB2" will be removed:
>>>>>>
>>>>>> jne   .BB1
>>>>>> jnp   .BB2
>>>>>> .BB1:
>>>>>> ...
>>>>>> .BB2:
>>>>>> ...
>>>>>>
>>>>>> However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal.
>>>>>>
>>>>>> Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them.
>>>>>>
>>>>>> In order to reverse them, this patch defines two new CondCode X86::COND_NEG_NE_OR_P and X86::COND_NEG_NP_OR_E
>>>>>
>>>>> In terms of naming, `COND_NP_AND_E` and `COND_NE_AND_P` make more
>>>>> sense to me, but maybe there's precedent for your names?
>>>>
>>>> Here "negation" is not for the whole condition:
>>>>
>>>> if (!(a or b)) goto B1; else goto B2;
>>>>
>>>> but only for the second condition:
>>>>
>>>> if (a) { goto B1; } else if (!b) { goto B2; }
>>>>
>>>> So COND_NP_AND_E is imprecise and can be confusing. Getting a precise
>>>> name for the new condition is difficult, and I have added comments to
>>>> explain the behavior of the new condition.
>>>
>>> I didn't look carefully at your patch, but your example conditions here
>>> are confusing.  Assuming they're in this context:
>>>
>>>    /*** Conditional gotos! ***/
>>>
>>>    B1:
>>>    /*** B1 code ***/
>>>
>>>    B2:
>>>    /*** B2 code ***/
>>>
>>> your two examples have exactly opposite conditions.  In particular,
>>> your first set of conditions:
>>>
>>>    if (!(a || b)) goto B1;
>>>    goto B2;
>>>
>>> reduces to:
>>>
>>>    if (a || b) goto B2;
>>>
>>> whereas your second set of conditions:
>>>
>>>    if (a) goto B1;
>>>    if (!b) goto B2;
>>>
>>> reduces to:
>>>
>>>    if (!(a || b)) goto B2;
>>>
>>> Maybe I'm not following your examples?  (Sorry if I'm just being
>>> thick here...)
>>
>> Sorry for making this example confusing. The negation (or reversal?)
>> should not change the semantic of the program. I should give the
>> following example:
>>
>> The original program:
>>
>>
>> if (a) goto B1;
>> else if (b) goto B1;
>> else goto B2;
>>
>>
>> If we negate the condition, the program becomes:
>>
>>
>> if (!(a || b)) goto B2;
>> else goto B1;
>>
>> or
>>
>> if (!a && !b) goto B2;      (1)
>> else goto B1;
>>
>>
>> However, the negation in this patch is only done to the second
>> condition b but not a || b:
>>
>> if (a) goto B1;
>> else if (!b) goto B2;         (2)
>> else goto B1;
>>
>
> Thanks for the new examples.  Interestingly, it's arbitrary which one
> you flip (which one is first).
>
> You could also do:
>
>     if (b) goto B1;
>     else if (!a) goto B2;
>     else goto B1;
>
> The conditions are symmetric.
>
>> The name COND_NP_AND_E, IMO, implies (1) not (2). Actually I also
>> could not find out a good name for the semantic in (2) (note that (2)
>> has two instructions).
>
> The original, `COND_NE_OR_P` also has two instructions; it's not a
> literal `if (a || b) goto B2`, is it?
>
>>>
>>> Going back to the new conditions you're adding, I don't see how:
>>>
>>>    GetOppositeBranchCondition(COND_NE_OR_P)
>>>
>>> could possibly give something that isn't semantically equivalent
>>> to `COND_E_AND_NP`/`COND_NP_AND_E`.
>>
>> Right.
>>
>>>
>>> I checked the comment in the patch that you mentioned, but it
>>> didn't really explain:
>>
>> I will improve the comment and make it more clear.
>>
>>>
>>>> Index: lib/Target/X86/X86InstrInfo.h
>>>> ===================================================================
>>>> --- lib/Target/X86/X86InstrInfo.h
>>>> +++ lib/Target/X86/X86InstrInfo.h
>>>> @@ -69,6 +69,15 @@
>>>>     COND_NE_OR_P,
>>>>     COND_NP_OR_E,
>>>>
>>>> +    // Artificial condition codes. These are used to represent the "negation" of
>>>> +    // above two conditions. Here "negation" is not in terms of logic. The only
>>>
>>> What then, if not logic?
>>
>> COND_NE_OR_P can only be represented by two branch instructions on
>> X86. It is true that there is a "or" relation between two branch
>> conditions. However, the new condition only negates the second branch
>> condition in COND_NE_OR_P. This is because that we reverse a branch
>> condition to switch true and false bodies. In the case of
>> COND_NE_OR_P, we can only switch bodies of the second condition not
>> the condition combined by two. For example, the following branch:
>>
>> if (a) TB; else FB;
>>
>> can be transformed to
>>
>> if (!a) FB; else TB;
>>
>> For COND_NE_OR_P, we actually have
>>
>> if (a) TB;
>> if (b) TB; else FB;
>>
>> in which only b can be reversed:
>>
>> if (a) TB;
>> if (!b) FB; else TB;
>
> Or, equivalently:
>
>     if (b) TB;
>     if (!a) FB; else TB;
>
> Either way, it looks like this is the logical inverse.  If your
> new logic is `COND_???????`, then:
>
>   - COND_NE_OR_P takes the "branch"      when "NE or P"
>   - COND_NE_OR_P takes the "fallthrough" when "E and NP"
>   - COND_??????? takes the "branch"      when "E and NP"
>   - COND_??????? takes the "fallthrough" when "NE or P"

This is a great demonstration that convinced me that your suggested
name is better. The only weird behavior here is that the first
instruction of this condition takes fall-through and the second
instruction takes the branch. Thank you very much for your analysis!

>
> I guess I still think `COND_E_AND_NP` is a decent name.
>
> The only asterisk here is that your new logic requires a label for
> both the "fallthrough" and the "branch", whereas COND_NE_OR_P just
> required a label for the "branch".  I don't really know if that's a
> problem.

When running tests on LLVM benchmarks I only found this is a problem
when the condition is inserted with false body being null, in which
case I will find the block's layout successor and let it be the false
body (fall-through). I inserted two assertions checking that the BB
with the condition is not the tail block in the layout. I am not sure
if there is still any potential issue from it.

Cong

>
> But, as I said to begin with, I'm not the right person to review this
> so I'm just going to back away slowly... :)
>
>>
>>>
>>>> +    // scenario we need these two conditions is when we try to reverse above two
>>>> +    // conditions in order to remove redundant unconditional jumps. Note that both
>>>> +    // true and false bodies need to be avaiable in order to correctly synthesize
>>>> +    // instructions for them. These are never used in MachineInstrs.
>>>> +    COND_NEG_NE_OR_P,
>>>> +    COND_NEG_NP_OR_E,
>>>> +
>>>>     COND_INVALID
>>>>   };
>>>>
>>>>
>>>
>>>
>>>>>
>>>>>> . It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization.
>>>>>>
>>>>>> The test cases haven't been updated accordingly. If this design is OK I will do it later.
>>>>>>
>>>>>> http://reviews.llvm.org/D11393
>>>>>>
>>>>>> Files:
>>>>>> lib/Target/X86/X86InstrInfo.cpp
>>>>>> lib/Target/X86/X86InstrInfo.h
>>>>>>
>>>>>> <D11393.30275.patch>
>>>>>
>>>>> This sounds reasonable to me, but I'm not the right person to review
>>>>> X86 backend patches.  You might try to CC a few people that commit
>>>>> there regularly.
>>>>
>>>> Sure, I will do that. Thanks!
>