[PATCH] D104730: [MCA] [AMDGPU] Adding CustomBehaviour implementation for AMDGPU.

Patrick Holland via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 24 10:31:12 PDT 2021


holland11 added a comment.

> ... could you change this to let SchedModel = SIFullSpeedModel, RetireOOO = 1 and the same for all the other let SchedModel = ... lines? Would that still work? I think that would make it slightly easier for the next person who cut'n'pastes one of these schedmodels to create a new one, to get it right.

Unfortunately, this does not build. The `RetireOOO` flag can only be applied to scheduling classes so its `let` statement can't include any `InstRW` expressions.

I understand and agree with your point though. Do you have any other ideas to achieve something similar? I'm a rookie when it comes to tablegen so I can't really think of a better way to do it than I'm doing it right now.

I could maybe format it a bit better. Something like (moving the flag to the top of the block so it's a bit more obvious):

  let SchedModel = GFX10SpeedModel in {
  let RetireOOO = 1 in { // llvm-mca specific flag
  
  // The latency values are 1 / (operations / cycle).
  // Add 1 stall cycle for VGPR read.
  def : HWWriteRes<Write32Bit,         [HWVALU, HWRC],   5>;
  def : HWWriteRes<WriteFloatCvt,      [HWVALU, HWRC],   5>;
  def : HWWriteRes<Write64Bit,         [HWVALU, HWRC],   6>;
  def : HWWriteRes<WriteTrans32,       [HWTransVALU, HWRC], 10>;
  def : HWWriteRes<WriteQuarterRate32, [HWVALU, HWRC],   8>;
  def : HWWriteRes<WriteFloatFMA,      [HWVALU, HWRC],   5>;
  def : HWWriteRes<WriteDouble,        [HWVALU, HWRC],   22>;
  def : HWWriteRes<WriteDoubleAdd,     [HWVALU, HWRC],   22>;
  def : HWWriteRes<WriteDoubleCvt,     [HWVALU, HWRC],   22>;
  def : HWWriteRes<WriteIntMul,        [HWVALU, HWRC],   8>;
  def : HWWriteRes<WriteTrans64,       [HWVALU, HWTransVALU, HWRC], 24>;
  
  def : HWWriteRes<WriteBranch,        [HWBranch],       32>;
  def : HWWriteRes<WriteExport,        [HWExport, HWRC], 16>;
  def : HWWriteRes<WriteLDS,           [HWLGKM,   HWRC], 20>;
  def : HWWriteRes<WriteSALU,          [HWSALU,   HWRC], 2>;
  def : HWWriteRes<WriteSMEM,          [HWLGKM,   HWRC], 20>;
  def : HWWriteRes<WriteVMEM,          [HWVMEM,   HWRC], 320>;
  def : HWWriteRes<WriteBarrier,       [HWBranch],       2000>;
  } // End RetireOOO = 1 (Can't be applied to InstRW expressions)
  
  def : InstRW<[WriteCopy], (instrs COPY)>;
  
  }  // End SchedModel = GFX10SpeedModel

Let me know if you want me to make this change or have any other ideas.

> Thanks. The RetireOOO output looks much better. Independent VALU instructions can definitely execute while there are outstanding loads. It would be great to add a new mca test case that shows this effect directly, if you wouldn't mind.

I will update the current tests with your new `timeline-max-cycles` change and I will also add 1-2 new tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104730/new/

https://reviews.llvm.org/D104730



More information about the llvm-commits mailing list