[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.

Thomas Symalla via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 19 02:10:06 PST 2022


tsymalla marked 3 inline comments as done.
tsymalla added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:645
+        SaveExecInstr->eraseFromParent();
+        VCmpInstr->eraseFromParent();
+      }
----------------
tsymalla wrote:
> sebastian-ne wrote:
> > Not sure if this was already mentioned, but do you make sure that the v_cmp is not used anywhere outside the s_saveexec?
> > 
> > I guess it would still be beneficial to insert a v_cmpx if the v_cmp has more uses, but it shouldn’t be removed in that case.
> No, I did not take this into account (it should mostly just write to VCC as this is checking sequences of s_and_saveexec and v_cmp).
> I am going to add some more control flow for cases like this.
> 
> I don't really get your last comment though. Why would it beneficial to add the v_cmpx in addition to the v_cmp in cases like this?
Looking at this again.

The result of v_cmp will be VCC = Exec & ThreadsPassed, the result of v_cmpx will be Exec[Executing Thread] = bool for each executing thread. It could be that the result inside the VCC Is used as integer carry, so in these cases the transformation would be wrong. Another case which was already discussed was the early-exit option by using VCCZ, but I think it makes sense to ignore this for now.

It would require additional logic to make sure any following instructions using VCC (if there are any) are using the correct input value, which probably reduces the gain here - so I'm going to just don't do the transformation if any following instruction uses the destination operand of v_cmp.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119696/new/

https://reviews.llvm.org/D119696



More information about the llvm-commits mailing list