[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 16 07:23:52 PST 2022
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:571
+ // After all s_op_saveexec instructions are inserted,
+ // replace (on GFX10.3)
+ // v_cmp_* SGPR, IMM, VGPR
----------------
tsymalla wrote:
> arsenm wrote:
> > nhaehnle wrote:
> > > gfx10.3 and later.
> > Why only do this on gfx10.3? Every target has v_cmpx?
> We decided to do so because it is unclear if on < GFX10 this gives any performance advantage and on 10.1 / 10.2 an additional s_waitcnt_depctr needs to be inserted for correctness, so probably there won't be any performance gain from this on these targets.
>
> @foad might elaborate more details for this.
I don't have much to add. The performance of v_cmp followed by s_and_savexec is particularly bad on GFX10 because of instruction counter stalls. So yes we probably could do this pre GFX10, but I don't know if there's any reason to.
For the bugs in GFX10.1 and GFX10.2 see GCNHazardRecognizer::fixVcmpxExecWARHazard. Having to insert S_WAITCNT_DEPCTR probably negates any advantage you get from using v_cmpx in the first place.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119696/new/
https://reviews.llvm.org/D119696
More information about the llvm-commits
mailing list