[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Thu Jan 6 11:05:27 PST 2022

rampitec added a comment.

In D116270#3224818 <https://reviews.llvm.org/D116270#3224818>, @alex-t wrote:

> In D116270#3215217 <https://reviews.llvm.org/D116270#3215217>, @rampitec wrote:
>
>> In D116270#3209307 <https://reviews.llvm.org/D116270#3209307>, @alex-t wrote:
>>
>>> This looks like a regression in xnor.ll :
>>>
>>>   	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
>>>   	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0
>>>
>>> but it is not really.  All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
>>> S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
>>> I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.
>>
>> This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes `SIInstrInfo::lowerScalarXnor()` useless. This is why XNOR was left behind by the D111907 <https://reviews.llvm.org/D111907>.
>
> `SIInstrInfo::lowerScalarXnor()` is exactly the part of the "manual" SALU to VALU lowering that I am trying to get rid of.
> The divergent "not" must be selected to the "V_NOT_B32_e32/64" otherwise we still have illegal VGPR to SGPR copies.
> This happens because the divergent "not" node has divergent operands and their result will be likely in VGPR.
> Also, we should select everything correctly first and can apply some peephole optimizations after. 
> In other words: we should not "cheat ourselves"  during the selection. The selection should be done fairly corresponding to the node divergence bit.
> Then we can apply the optimization in case it is safe.
> Note that this is not the only case when we would like to further optimize the code after selection.
> I'm planning to further add a separate pass for that.
>
> We cannot solve the problem in the custom selection procedure because NOT node operand has not yet been selected and we do not know if it is SGPR or VGPR.
> The only way, for now, is to post-process not(xor)/xor(not) in SIFixSGPRCopies. This may be considered a temporary hack until we have no proper pass for that.

`SIInstrInfo::lowerScalarXnor()` is dead after your patch and thus the patch has to remove it.

Then this is a clear regression, so if this requires a separate peephole later we need that peephole first and make sure the test does not regress.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270