[PATCH] D140907: [GlobalISel] New combine to commute constant operands to the RHS

Thu Jan 5 03:04:43 PST 2023

foad added inline comments.

================
Comment at: llvm/include/llvm/Target/GlobalISel/Combine.td:349
+  (match (wip_match_opcode G_ADD, G_MUL, G_AND, G_OR, G_XOR):$root, [{
+    return getIConstantVRegVal(${root}->getOperand(1).getReg(), MRI).has_value();
+  }]),
----------------
tsymalla wrote:
> This looks alright to me, but what is the point in swapping the operands if both of them are constants except making the ISA more readable?
> For instance:
> 
> `s_add_i32 s1, 0x1000, 0 => s_add_i32 s1, 0, 0x1000`
If both operands are constants then all the opcodes will be constant-folded.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:53
 ; a 64 bit multiplication where the second argument was zero extended.
 define amdgpu_kernel void @v_mul_i64_zext_01(ptr addrspace(1) %out, ptr addrspace(1) %aptr, ptr addrspace(1) %bptr) {
 ; GFX10-LABEL: v_mul_i64_zext_01:
----------------
OutOfCache wrote:
> @tsymalla suggested in my revision to give the test cases more descriptive names.
That's fine but I think it should be a separate patch.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:174
 ; GFX11-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
+; GFX11-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
----------------
tsymalla wrote:
> Shouldn't this one be eliminated?
Why? It's used by the global_store below.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:258
 ; GFX10-NEXT:    s_waitcnt vmcnt(1)
 ; GFX10-NEXT:    v_mad_u64_u32 v[4:5], s0, 0, v0, 0
 ; GFX10-NEXT:    v_mul_lo_u32 v1, 0, v1
----------------
OutOfCache wrote:
> This is a neat approach! Is there a possibility to extend this to `G_MAD` instructions as well? It's trickier since the operands don't have the same indices as for `G_MUL` etc. though.
There are no generic G_MAD instructions, so I think we would need to add a target-specific combine for target-specific opcodes.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:312
 ; GFX10-NEXT:    s_waitcnt vmcnt(1)
 ; GFX10-NEXT:    v_mad_u64_u32 v[4:5], s0, v0, 0, 0
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
----------------
OutOfCache wrote:
> `G_MAD` does not take advantage of the `binop_right_to_zero` rule.
Likewise, I think that would need to be a target-specific combine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140907/new/

https://reviews.llvm.org/D140907