[PATCH] D129073: [AMDGPU] Combine s_or_saveexec, s_xor instructions.

Wed Jul 20 06:33:35 PDT 2022

tsymalla added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:751-754
+      // Peek at the next instruction and check if this is a relevant
+      // s_xor.
+      MachineInstr &PossibleXor = *MI.getNextNode();
+      if (PossibleXor.getOpcode() == XorOpcode) {
----------------
nhaehnle wrote:
> tsymalla wrote:
> > nhaehnle wrote:
> > > This is sort of backwards. When matching code for this kind of combine, the general pattern is to start from the definition of the final value you want to rewrite. In this case, that would be the S_XOR. You can see the approach taken in findPossibleVCMPVCMPXOptimization, using findInstrBackwards.
> > But wouldn't this cause more work to do? I think it's more likely to first find a S_XOR instruction which the compiler could use as start point for looking for the s_or_saveexec. The issue is, that we basically would require to stop at a lot of s_xor and see if the instruction before is a eligible s_or_saveexec instruction. The amount of checks would be reduced if we could first stop at the s_or_saveexec instruction and _then_ look at the s_xor instruction.
> > 
> > See following (artificial) example:
> > 
> > ```
> > BB0:
> > s_or_saveexec_b32 s0, s1
> > s_xor_b32 exec_lo, exec_lo, s0
> > s_xor_b32 exec_lo, exec_lo, s1
> > s_xor_b32 exec_lo, exec_lo, s2
> > ```
> > 
> > Three checks for three s_xor instructions when checking s_xor first and two of them will fail. In comparison, with the current approach, we'd only check once at the cost of incrementing the iterator.
> Can such sequences actually happen, though? Before register allocation, we generally follow the rule that EXEC can only be written by special terminator instructions, and there is really no reason to have more than one of those. So I don't see where this sequence would come from.
> 
> See also the note about being able to stop the search after seeing the first instruction (from the end) that writes EXEC.
Not exactly. My point is, we will see single occurrences of s_xor_b32 exec_lo, exec_lo, s* as terminator instructions, and will stop at any of them even if in a lot of cases the combine won't be applied. But I can change the order of checks anyhow.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129073/new/

https://reviews.llvm.org/D129073