[PATCH] D140907: [GlobalISel] New combine to commute constant operands to the RHS
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 5 03:04:43 PST 2023
foad added inline comments.
================
Comment at: llvm/include/llvm/Target/GlobalISel/Combine.td:349
+ (match (wip_match_opcode G_ADD, G_MUL, G_AND, G_OR, G_XOR):$root, [{
+ return getIConstantVRegVal(${root}->getOperand(1).getReg(), MRI).has_value();
+ }]),
----------------
tsymalla wrote:
> This looks alright to me, but what is the point in swapping the operands if both of them are constants except making the ISA more readable?
> For instance:
>
> `s_add_i32 s1, 0x1000, 0 => s_add_i32 s1, 0, 0x1000`
If both operands are constants then all the opcodes will be constant-folded.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:53
; a 64 bit multiplication where the second argument was zero extended.
define amdgpu_kernel void @v_mul_i64_zext_01(ptr addrspace(1) %out, ptr addrspace(1) %aptr, ptr addrspace(1) %bptr) {
; GFX10-LABEL: v_mul_i64_zext_01:
----------------
OutOfCache wrote:
> @tsymalla suggested in my revision to give the test cases more descriptive names.
That's fine but I think it should be a separate patch.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:174
; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX11-NEXT: v_mov_b32_e32 v2, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
----------------
tsymalla wrote:
> Shouldn't this one be eliminated?
Why? It's used by the global_store below.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:258
; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: v_mad_u64_u32 v[4:5], s0, 0, v0, 0
; GFX10-NEXT: v_mul_lo_u32 v1, 0, v1
----------------
OutOfCache wrote:
> This is a neat approach! Is there a possibility to extend this to `G_MAD` instructions as well? It's trickier since the operands don't have the same indices as for `G_MUL` etc. though.
There are no generic G_MAD instructions, so I think we would need to add a target-specific combine for target-specific opcodes.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll:312
; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: v_mad_u64_u32 v[4:5], s0, v0, 0, 0
; GFX10-NEXT: s_waitcnt vmcnt(0)
----------------
OutOfCache wrote:
> `G_MAD` does not take advantage of the `binop_right_to_zero` rule.
Likewise, I think that would need to be a target-specific combine.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140907/new/
https://reviews.llvm.org/D140907
More information about the llvm-commits
mailing list