[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.

Thomas Symalla via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 14 10:19:54 PST 2022


tsymalla marked 6 inline comments as done.
tsymalla added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:359-363
+  // If there are defines of Exec between the v_cmp and the saveexec instr, insert the
+  // new sequence right behind the last define.
+  MachineBasicBlock::reverse_iterator A = SaveExecInstr.getReverseIterator(),
+                                      E = SaveExecInstr.getParent()->rend();
+
----------------
nhaehnle wrote:
> There shouldn't be any EXEC modifications between those two instructions. If there are, it's a sign that something really weird is going on and certainly this trick isn't sufficient to ensure correctness. You should probably move this check into the find function and have that function bail out if it finds an EXEC modification.
This is possible. In wqm.ll, check the following sequence:


```
; GFX10-W32-NEXT:    v_cmp_nlt_f32_e32 vcc_lo, 0, v0
; GFX10-W32-NEXT:    s_and_b32 exec_lo, exec_lo, s12
; GFX10-W32-NEXT:    buffer_store_dword v4, v2, s[0:3], 0 idxen
; GFX10-W32-NEXT:    ; implicit-def: $vgpr0
; GFX10-W32-NEXT:    s_wqm_b32 exec_lo, exec_lo
; GFX10-W32-NEXT:    s_and_saveexec_b32 s13, vcc_lo

```
Here, the second instruction modifies the exec register.
However, I investigated that case already and found out that there's no way to give correctness here, so I decided to do as you proposed and moved the check into the find function. :-)


================
Comment at: llvm/test/CodeGen/AMDGPU/wqm.ll:385-386
 ; GFX10-W32-NEXT:    v_mbcnt_hi_u32_b32 v0, -1, v0
-; GFX10-W32-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 16, v0
+; GFX10-W32-NEXT:    v_cmpx_gt_u32_e64 16, v0
 ; GFX10-W32-NEXT:    v_mov_b32_e32 v0, 0
 ; GFX10-W32-NEXT:    s_cbranch_execz .LBB9_2
----------------
tsymalla wrote:
> sebastian-ne wrote:
> > I think the v_cmpx instruction should be inserted the the place of the s_and_saveexec, not at the place of the v_cmp. Otherwise this v_mov gets executed with the wrong exec mask.
> Thanks, going to fix this.
Apparently, there is another case to consider. Moving the v_cmpx to the place of the s_and_saveexec will use the wrong value for v0, so a copy needs to be inserted whenever a v_cmp register source operand gets redefined inbetween in the original sequence, and the cmpx needs to use the new operand.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119696/new/

https://reviews.llvm.org/D119696



More information about the llvm-commits mailing list