[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Fri Jan 7 03:21:44 PST 2022

alex-t added a comment.

In D116270#3225806 <https://reviews.llvm.org/D116270#3225806>, @rampitec wrote:

> In D116270#3224818 <https://reviews.llvm.org/D116270#3224818>, @alex-t wrote:
>
>> In D116270#3215217 <https://reviews.llvm.org/D116270#3215217>, @rampitec wrote:
>>
>>> In D116270#3209307 <https://reviews.llvm.org/D116270#3209307>, @alex-t wrote:
>>>
>>>> This looks like a regression in xnor.ll :
>>>>
>>>>   	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
>>>>   	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0
>>>>
>>>> but it is not really.  All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
>>>> S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
>>>> I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.
>>>
>>> This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes `SIInstrInfo::lowerScalarXnor()` useless. This is why XNOR was left behind by the D111907 <https://reviews.llvm.org/D111907>.
>>
>> `SIInstrInfo::lowerScalarXnor()` is exactly the part of the "manual" SALU to VALU lowering that I am trying to get rid of.
>> The divergent "not" must be selected to the "V_NOT_B32_e32/64" otherwise we still have illegal VGPR to SGPR copies.
>> This happens because the divergent "not" node has divergent operands and their result will be likely in VGPR.
>> Also, we should select everything correctly first and can apply some peephole optimizations after. 
>> In other words: we should not "cheat ourselves"  during the selection. The selection should be done fairly corresponding to the node divergence bit.
>> Then we can apply the optimization in case it is safe.
>> Note that this is not the only case when we would like to further optimize the code after selection.
>> I'm planning to further add a separate pass for that.
>>
>> We cannot solve the problem in the custom selection procedure because NOT node operand has not yet been selected and we do not know if it is SGPR or VGPR.
>> The only way, for now, is to post-process not(xor)/xor(not) in SIFixSGPRCopies. This may be considered a temporary hack until we have no proper pass for that.
>
> `SIInstrInfo::lowerScalarXnor()` is dead after your patch and thus the patch has to remove it.
>
> Then this is a clear regression, so if this requires a separate peephole later we need that peephole first and make sure the test does not regress.

I would not say that we need the peephole "first" because until this change is not applied there is nothing to optimize.
The only way I see here is to apply the peephole and the selection change at once. Hence we have to keep the pieces of the further post-isel optimizer within the SIFixSGPRCopies.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270