[PATCH] D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection.

Mon Nov 22 07:22:42 PST 2021

alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:1509
   (fabs (f32 VGPR_32:$src)),
   (V_AND_B32_e32 (S_MOV_B32 (i32 0x7fffffff)), VGPR_32:$src)
 >;
----------------
rampitec wrote:
> I think all of that should use VOP3 forms. That way you will avoid copy from SGPR to VGPR which appears in your test.
VOP2 form allows SGPR as 1st operand and the patterns are written to make a profit from this fact.
The COPY from SGPR to VGPR that you have mentioned resulted from the type legalization and further combining.  In fact, the test does not specify the subtarget. As a result, for subtargets that have no fp16,  
```
 t32: f16,ch = load ...
     t33: f16 = fneg t32
```
 is legalized to 
```
       t42: i32 = and t40, Constant:i32<65535>
     t38: f32 = fp16_to_fp t42
   t39: f32 = fneg t38
 t45: i32 = fp_to_fp16 t39
``` 
The latter, in order,  gets combined to 
```
 t47: i32,ch = load ...
     t49: i32 = xor t47, Constant:i32<32768>
```
The problem here is that whatever order of operands we use in combiner, the SelectionDAG::getNode will canonicalize it making the constant RHS. So, we always get 
```
xor t47, Constant:i32<32768>
```
For fp16 capable subtargets the explicit pattern is used and there are no SGPR to VGPR COPY. For now, I am going to update the test to check fp16 with the gfx900 subtarget.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114257/new/

https://reviews.llvm.org/D114257