[PATCH] D141728: [AMDGPU] Tune scheduler on GFX10 and GFX11

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 13 14:17:08 PST 2023


rampitec created this revision.
rampitec added reviewers: kerbowa, foad.
Herald added subscribers: kosarev, StephenFan, wenlei, asbirlea, arphaman, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm, MatzeB.
Herald added a project: All.
rampitec requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

Unlike older ASICs GFX10+ have a lot of VGPRs. Therefore, it is possible
to achieve high occupancy even with all or almost all addressable VGPRs
used. Our scheduler was never tuned for this scenario. The VGPR Critical
Limit threshold always comes very high, even if maximum occupancy is
targeted. For example on gfx1100 it is set to 192 registers even with
the requested occupancy 16. As a result scheduler starts prioritizing
register pressure reduction very late and we easily end up spilling.

This patch makes scheduling on new targets much closer to GFX9. The
value of VGPR critical limit is based on the number of addressable
registers and not on a total VGPR budget.

The intent of the patch is to have no impact on GFX9 and older targets,
a massive lit tests update shows no changes on these.


https://reviews.llvm.org/D141728

Files:
  llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-ext-fma.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-ext-neg-mul.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i16.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i8.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f32.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.i16.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.i8.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.o.dim.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/load-local.128.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/load-local.96.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/sext_inreg.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.128.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.96.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
  llvm/test/CodeGen/AMDGPU/bf16.ll
  llvm/test/CodeGen/AMDGPU/bug-sdag-emitcopyfromreg.ll
  llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
  llvm/test/CodeGen/AMDGPU/cluster_stores.ll
  llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
  llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
  llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
  llvm/test/CodeGen/AMDGPU/fdiv.ll
  llvm/test/CodeGen/AMDGPU/fshl.ll
  llvm/test/CodeGen/AMDGPU/fshr.ll
  llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
  llvm/test/CodeGen/AMDGPU/idiv-licm.ll
  llvm/test/CodeGen/AMDGPU/idot4s.ll
  llvm/test/CodeGen/AMDGPU/idot4u.ll
  llvm/test/CodeGen/AMDGPU/idot8s.ll
  llvm/test/CodeGen/AMDGPU/idot8u.ll
  llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
  llvm/test/CodeGen/AMDGPU/lds-atomic-fmin-fmax.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.format.v3f16.ll
  llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
  llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
  llvm/test/CodeGen/AMDGPU/load-local.128.ll
  llvm/test/CodeGen/AMDGPU/load-local.96.ll
  llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll
  llvm/test/CodeGen/AMDGPU/memory_clause.ll
  llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
  llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
  llvm/test/CodeGen/AMDGPU/saddo.ll
  llvm/test/CodeGen/AMDGPU/saddsat.ll
  llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit3.ll
  llvm/test/CodeGen/AMDGPU/scratch-simple.ll
  llvm/test/CodeGen/AMDGPU/smrd.ll
  llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
  llvm/test/CodeGen/AMDGPU/ssubsat.ll
  llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll
  llvm/test/CodeGen/AMDGPU/store-local.128.ll
  llvm/test/CodeGen/AMDGPU/store-local.96.ll
  llvm/test/CodeGen/AMDGPU/strict_fadd.f16.ll
  llvm/test/CodeGen/AMDGPU/strict_fma.f16.ll
  llvm/test/CodeGen/AMDGPU/strict_fmul.f16.ll
  llvm/test/CodeGen/AMDGPU/strict_fsub.f16.ll
  llvm/test/CodeGen/AMDGPU/uaddsat.ll
  llvm/test/CodeGen/AMDGPU/usubsat.ll
  llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll
  llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll



More information about the llvm-commits mailing list