[PATCH] D104730: [MCA] [AMDGPU] Adding CustomBehaviour implementation for AMDGPU.

Thu Jun 24 02:01:36 PDT 2021

foad added a comment.

> AFAIU, many of the design decision within mca are made with the idea that mca could potentially end up being used in a backend pass one day. I can remove the pseudo cases if you'd like, but in the hypothetical future where mca gets used in an AMDGPU backend pass, you'd probably want to add them back in.

OK, it's fine to leave the Pseudos in this patch. It's only a pain when we have long lists of opcodes, and hopefully in future most of those can be replaced by checking some flags (possibly new flags that don't exist yet).

>> Hmm, I wonder if you could examine SIInstrFlags::LGKM_CNT instead of having to list these opcodes explicitly?
>
> I can make that change if you'd like, but those specific opcodes are checked within the SIInsertWaitcnts::updateEventWaitcntAfter() function which all of this is based on.

In that case it's fine to leave it as-is. Any future cleanup can be done in both places.

> Added an example to the end of the original post's body to show demonstrate the effects that the global RetireOOO flag has.

Thanks. The RetireOOO output looks //much// better. Independent VALU instructions can definitely execute while there are outstanding loads. It would be great to add a new mca test case that shows this effect directly, if you wouldn't mind.

================
Comment at: llvm/lib/Target/AMDGPU/SISchedule.td:140

+  let RetireOOO = 1 in { // llvm-mca specific flag
   def : HWWriteRes<WriteBranch,  [HWBranch], 8>;
----------------
Nit: it's a shame that this adds 10 lines to the patch instead of just 1. As an alternative...

================
Comment at: llvm/lib/Target/AMDGPU/SISchedule.td:183

 let SchedModel = SIFullSpeedModel in {

----------------
... could you change this to `let SchedModel = SIFullSpeedModel, RetireOOO = 1` and the same for all the other `let SchedModel = ...` lines? Would that still work? I think that would make it slightly easier for the next person who cut'n'pastes one of these schedmodels to create a new one, to get it right.

================
Comment at: llvm/test/tools/llvm-mca/AMDGPU/gfx10-double.s:44
 # CHECK-NEXT: Instructions:      28
-# CHECK-NEXT: Total Cycles:      224
+# CHECK-NEXT: Total Cycles:      205
 # CHECK-NEXT: Total uOps:        29
----------------
I'm surprised that there are no related changes to the tables below. Is this a case where the timeline view would have changed, except that it gets truncated due to some archaic notion of screen width? :-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104730/new/

https://reviews.llvm.org/D104730