[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Thu Jan 20 09:25:02 PST 2022

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12491-12493
+    if (isCommutativeBinOp(N1->getOpcode()) &&
+        DAG.isConstantIntBuildVectorOrConstantInt(N1->getOperand(1)))
+      return true;
----------------
alex-t wrote:
> foad wrote:
> > I don't understand this heuristic. Can you give an example of when it would help?
> I could just demonstrate the concrete example but I would need to paste the DAGs here that look like overkill.  So, I try to explain w/o the drawing. 
> Let's imagine we have a sub-tree constituting the commutative arithmetic operations.
> Let us have a path in the tree such that each node has at least one operant constant.
> Given that it is very likely that this sub-tree is going to be simplified by the combiner by application arithmetic rules and constant folding.
> This heuristic states the priority of such constant folding over keeping the outer node uniform.
> 
> ```
>   %and = and i32 %tmp, 16711935     ; 0x00ff00ff
>   %tmp1 = and i32 %arg1, 4294967040 ; 0xffffff00
>   %tmp2 = or i32 %tmp1, -65536
>   %tmp3 = or i32 %tmp2, %and
> 
> ```
> This is folded and can be selected to v_perm_b32 with this heuristic but will be 4 scalar operations w/o it.
I still don't see why this would be useful //in general//. I think it means we should do this reassociation:
`(op (op n00, C), (op2 n10, C2)) --> (op (op n00, (op2 n10, C2)), C)`
where op2 is commutative but not necessarily the same as op. E.g. `(x|C)|(z&C2) --> (x|(z&C2))|C`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270