[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 16 07:23:52 PST 2022


foad added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:571
+  // After all s_op_saveexec instructions are inserted,
+  // replace (on GFX10.3)
+  // v_cmp_* SGPR, IMM, VGPR
----------------
tsymalla wrote:
> arsenm wrote:
> > nhaehnle wrote:
> > > gfx10.3 and later.
> > Why only do this on gfx10.3? Every target has v_cmpx?
> We decided to do so because it is unclear if on < GFX10 this gives any performance advantage and on 10.1 / 10.2 an additional s_waitcnt_depctr needs to be inserted for correctness, so probably there won't be any performance gain from this on these targets.
> 
> @foad might elaborate more details for this.
I don't have much to add. The performance of v_cmp followed by s_and_savexec is particularly bad on GFX10 because of instruction counter stalls. So yes we probably could do this pre GFX10, but I don't know if there's any reason to.

For the bugs in GFX10.1 and GFX10.2 see GCNHazardRecognizer::fixVcmpxExecWARHazard. Having to insert S_WAITCNT_DEPCTR probably negates any advantage you get from using v_cmpx in the first place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119696/new/

https://reviews.llvm.org/D119696



More information about the llvm-commits mailing list