[PATCH] D119696: [AMDGPU] Improve v_cmpx usage on GFX10.3.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 19 06:40:47 PST 2022
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:571
+ // After all s_op_saveexec instructions are inserted,
+ // replace (on GFX10.3)
+ // v_cmp_* SGPR, IMM, VGPR
----------------
foad wrote:
> tsymalla wrote:
> > arsenm wrote:
> > > nhaehnle wrote:
> > > > gfx10.3 and later.
> > > Why only do this on gfx10.3? Every target has v_cmpx?
> > We decided to do so because it is unclear if on < GFX10 this gives any performance advantage and on 10.1 / 10.2 an additional s_waitcnt_depctr needs to be inserted for correctness, so probably there won't be any performance gain from this on these targets.
> >
> > @foad might elaborate more details for this.
> I don't have much to add. The performance of v_cmp followed by s_and_savexec is particularly bad on GFX10 because of instruction counter stalls. So yes we probably could do this pre GFX10, but I don't know if there's any reason to.
>
> For the bugs in GFX10.1 and GFX10.2 see GCNHazardRecognizer::fixVcmpxExecWARHazard. Having to insert S_WAITCNT_DEPCTR probably negates any advantage you get from using v_cmpx in the first place.
Should explain this in the comment
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119696/new/
https://reviews.llvm.org/D119696
More information about the llvm-commits
mailing list