[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Tue Jan 11 08:36:34 PST 2022

alex-t added a comment.

In D116270#3231666 <https://reviews.llvm.org/D116270#3231666>, @foad wrote:

> In D116270#3231657 <https://reviews.llvm.org/D116270#3231657>, @alex-t wrote:
>
>> Once again, in my case BOTH nodes (not,xor) are divergent!
>>
>>    %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
>>   DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
>>   DIVERGENT:       %xor = xor i32 %v, %s.load
>>   DIVERGENT:       %d = xor i32 %xor, -1
>>   DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4
>
> I know. I am suggesting that a DAG combine can rewrite this code to the equivalent of:
>
>                    %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
>   DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
>                    %not = xor i32 %s.load, -1
>   DIVERGENT:       %d = xor i32 %v, %not
>   DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4
>
> Now %not is uniform, so it is trivial to select it to s_not.

The idea is brilliant. Unfortunately, it only can be implemented by changing LLVM common code.
If you look in the SITargetLowering::reassociateScalarOps you can see the operand constantness check and corresponding comment:

   // If either operand is constant this will conflict with
   // DAGCombiner::ReassociateOps().
  if (DAG.isConstantIntBuildVectorOrConstantInt(Op0) ||
       DAG.isConstantIntBuildVectorOrConstantInt(Op1))
     return SDValue();

Removing this guard leads to the infinite transforming the pattern back and forth in SITargetLowering::reassociateScalarOps and DAGCombiner::ReassociateOps().
The former transform (xor (xor uniform, divergent), -1) to (xor (xor uniform, -1), divergent) but the latter one transform it back by applying this rule:

  if (N0.hasOneUse()) {
   ** // Reassociate: (op (op x, c1), y) -> (op (op x, y), c1)
    //              iff (op x, c1) has one use**
    if (SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N00, N1))
      return DAG.getNode(Opc, DL, VT, OpNode, N01);
    return SDValue();
  }

The infinite loop is formed because we add to the worklist the users of the transformed node. Hence this user comes to the combine in the next step and the same pattern gets transformed again and again.
What I can do here is to change the DAGCombiner::ReassociateOps() to make it consider node divergence. Not sure if this is suitable for upstreaming.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270