[PATCH] D129073: [AMDGPU] Combine s_or_saveexec, s_xor instructions.

Wed Jul 20 01:46:48 PDT 2022

tsymalla marked 6 inline comments as done.
tsymalla added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:751-754
+      // Peek at the next instruction and check if this is a relevant
+      // s_xor.
+      MachineInstr &PossibleXor = *MI.getNextNode();
+      if (PossibleXor.getOpcode() == XorOpcode) {
----------------
nhaehnle wrote:
> This is sort of backwards. When matching code for this kind of combine, the general pattern is to start from the definition of the final value you want to rewrite. In this case, that would be the S_XOR. You can see the approach taken in findPossibleVCMPVCMPXOptimization, using findInstrBackwards.
But wouldn't this cause more work to do? I think it's more likely to first find a S_XOR instruction which the compiler could use as start point for looking for the s_or_saveexec. The issue is, that we basically would require to stop at a lot of s_xor and see if the instruction before is a eligible s_or_saveexec instruction. The amount of checks would be reduced if we could first stop at the s_or_saveexec instruction and _then_ look at the s_xor instruction.

See following (artificial) example:

```
BB0:
s_or_saveexec_b32 s0, s1
s_xor_b32 exec_lo, exec_lo, s0
s_xor_b32 exec_lo, exec_lo, s1
s_xor_b32 exec_lo, exec_lo, s2
```

Three checks for three s_xor instructions when checking s_xor first and two of them will fail. In comparison, with the current approach, we'd only check once at the cost of incrementing the iterator.

================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:818-820
+      // While the or_saveexec sequence is most likely to
+      // appear at the end of the basic blocks, the v_cmp
+      // sequences can be spread over the whole basic block.
----------------
nhaehnle wrote:
> This isn't true. The v_cmp may potentially be far away, but the s_and_saveexec must be near the end of the block, so the entire loop should exit after the SearchWindow.
> 
> Also, and I know this isn't due to your change, but debug instructions should not be counted towards the search window. The goal is to avoid codegen changes caused by debug.
> 
> Also also, shouldn't we be able to exit this loop once we found the first write to EXEC?
I don't get the first part of your comment. If we exit after SearchWindow, we're likely to ignore some v_cmp instructions.

For the debug instructions, you're correct. I missed that.

For the EXEC part, I'll double-check.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129073/new/

https://reviews.llvm.org/D129073