[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 14 07:05:09 PST 2022


nhaehnle added a comment.

Thank you for doing ths. Please make sure to address that clang-format complaints. I have a bunch of other comments inline.



================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:301-303
+  for (const MachineOperand &op : SaveExec.operands()) {
+    if (!op.isReg())
+      continue;
----------------
Just look at the operands explicitly by argument instead of iterating over them.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:305
+
+    // Check if the dest of the saveexec instruction is an SGPR.
+    const Register &Reg = op.getReg();
----------------
Can it be anything else? This should probably just be an assertion.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:325
+      // optimization.
+      const Register &VCmpDest = I.getOperand(0).getReg();
+      if (MRI.hasOneUse(VCmpDest) && VCmpDest == Reg) {
----------------
VCmpDest should be equal to Reg by definition. Again, this can be an assertion.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:326
+      const Register &VCmpDest = I.getOperand(0).getReg();
+      if (MRI.hasOneUse(VCmpDest) && VCmpDest == Reg) {
+        VCmp = &I;
----------------
This doesn't work how you want it to work because after SSA elimination, if you need to iterate over defs then almost certainly you have multiple uses as well.

Do you have liveness information available and can you use it to get to the relevant defining instruction in a more targeted way?

Alternatively: this transform is really only helpful if the V_CMP and S_AND_SAVEEXEC are close together. If the V_CMP appears much earlier in the code (perhaps even in a different basic block!) then we shouldn't do the transform. So a better approach may be to do a bounded backwards walk through instructions. (This interacts with the point about checking for intermediate EXEC writes in a later comment.)


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:359-363
+  // If there are defines of Exec between the v_cmp and the saveexec instr, insert the
+  // new sequence right behind the last define.
+  MachineBasicBlock::reverse_iterator A = SaveExecInstr.getReverseIterator(),
+                                      E = SaveExecInstr.getParent()->rend();
+
----------------
There shouldn't be any EXEC modifications between those two instructions. If there are, it's a sign that something really weird is going on and certainly this trick isn't sufficient to ensure correctness. You should probably move this check into the find function and have that function bail out if it finds an EXEC modification.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:504-505
           SaveExecInst = &*J;
           LLVM_DEBUG(dbgs() << "Found save exec op: " << *SaveExecInst << '\n');
+
           continue;
----------------
Spurious whitespace change.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:571
+  // After all s_op_saveexec instructions are inserted,
+  // replace (on GFX10.3)
+  // v_cmp_* SGPR, IMM, VGPR
----------------
gfx10.3 and later.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:583-594
+    for (MachineBasicBlock &MBB : MF) {
+      for (MachineInstr &MI : MBB) {
+        // Try to record existing s_and_saveexec instructions, iff
+        // they are reading from a v_cmp dest SGPR write.
+        if (MI.getOpcode() != AndSaveExecOpcode)
+          continue;
+
----------------
The S_AND_SAVEEXEC_Bnn instruction has to be one of the last instructions of the basic block. It actually gets derived from a terminator instruction, though for some reason we apparently never changed the S_xxx_SAVEEXEC instructions to be terminators as well.

In any case, what you should do here from a compile-time perspective is to iterate basic blocks backwards and limit the lop to a small number of iterations. Maybe 5?

Also, this loop should break if it finds a different instruction that writes EXEC (shouldn't happen in practice, but...).


================
Comment at: llvm/lib/Target/AMDGPU/VOPCInstructions.td:222
+             VCMPXNoSDstTable<1, opName#"_e32">,
+             VCMPVCMPXTable<0, opName#"_e32"> {
     let Defs = !if(DefExec, [VCC, EXEC], [VCC]);
----------------
Shouldn't the is_vcmp field here logically be 1?


================
Comment at: llvm/lib/Target/AMDGPU/VOPCInstructions.td:259
+             VCMPXNoSDstTable<0, opName#"_e32">,
+             VCMPVCMPXTable<!ne(!subst("v_cmpx", "v_cmp", opName), opName), !subst("v_cmpx", "v_cmp", opName#"_e32")> {
     let Defs = [EXEC];
----------------
Why the complicated expression for is_vcmp? Shouldn't that just be 0?

More to the point, VOPC_Pseudo already derives from VCMPVCMPXTable, so this is rather dodgy. I'd suggest changing the table to have an is_cmpx field and using a `let` here to override the field's default value from the VOPC_Pseudo.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119696/new/

https://reviews.llvm.org/D119696



More information about the llvm-commits mailing list