[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 6 03:39:04 PST 2022


alex-t added a comment.

In D116270#3215217 <https://reviews.llvm.org/D116270#3215217>, @rampitec wrote:

> In D116270#3209307 <https://reviews.llvm.org/D116270#3209307>, @alex-t wrote:
>
>> This looks like a regression in xnor.ll :
>>
>>   	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
>>   	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0
>>
>> but it is not really.  All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
>> S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
>> I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.
>
> This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes `SIInstrInfo::lowerScalarXnor()` useless. This is why XNOR was left behind by the D111907 <https://reviews.llvm.org/D111907>.

`SIInstrInfo::lowerScalarXnor()` is exactly the part of the "manual" SALU to VALU lowering that I am trying to get rid of.
The divergent "not" must be selected to the "V_NOT_B32_e32/64" otherwise we still have illegal VGPR to SGPR copies.
This happens because the divergent "not" node has divergent operands and their result will be likely in VGPR.
Also, we should select everything correctly first and can apply some peephole optimizations after. 
In other words: we should not "cheat ourselves"  during the selection. The selection should be done fairly corresponding to the node divergence bit.
Then we can apply the optimization in case it is safe.
Note that this is not the only case when we would like to further optimize the code after selection.
I'm planning to further add a separate pass for that.
Currently, we have 2 options:

1. Implement the custom selection for xor(not) and not(xor) patterns
2. Add the exemption to SIFixSGPRCopies and process VGPR 2 SGPR copies originated from that patterns in a specific way

Both ways look not clean enough but the first one is preferable to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270



More information about the llvm-commits mailing list