[PATCH] D123578: [RISCV] Add sched to pseudo function call instructions

Thu Apr 21 23:19:15 PDT 2022

pcwang-thead added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVInstrInfo.td:1282
     mayStore = 0, mayLoad = 0 in
-def PseudoCALLReg : Pseudo<(outs GPR:$rd), (ins call_symbol:$func), []> {
+def PseudoCALLReg : Pseudo<(outs GPR:$rd), (ins call_symbol:$func), []>, Sched<[WriteIALU, WriteJalr, ReadJalr]> {
   let AsmString = "call\t$rd, $func";
----------------
andreadb wrote:
> pcwang-thead wrote:
> > craig.topper wrote:
> > > pcwang-thead wrote:
> > > > craig.topper wrote:
> > > > > craig.topper wrote:
> > > > > > Don't exceed 80 characters per line.
> > > > > I'm not sure it's correct to have more than one Write scheduling class with only a single output register. Each Write is supposed to correspond to the latency of the register.
> > > > According to comments of `Sched` class in `llvm/include/llvm/Target/TargetSchedule.td`:
> > > > 
> > > > > One SchedWrite type must be listed for each explicit def operand in order. Additional SchedWrite types may optionally be listed for implicit def operands.
> > > > 
> > > > It is legal to add more Writes for implicit operands and I see a lot of definitions like this in other targets. Example in AArch64:
> > > > ```
> > > > def TLSDESC_CALLSEQ
> > > >     : Pseudo<(outs), (ins i64imm:$sym),
> > > >              [(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,
> > > >       Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;
> > > > ```
> > > > 
> > > > 
> > > > 
> > > My understanding of "implicit def operands" would be physical registers listed in `let Defs = []` syntax.
> > > 
> > > @andreadb is this the right thing to do?
> > I'm not for sure. Here is an example in `llvm/lib/Target/AArch64/AArch64InstrInfo.td`:
> > ```
> > let isCall = 1, Defs = [LR, X0, X1], hasSideEffects = 1, Size = 16,
> >     isCodeGenOnly = 1 in
> > def TLSDESC_CALLSEQ
> >     : Pseudo<(outs), (ins i64imm:$sym),
> >              [(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,
> >       Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;
> > ```
> > Defs are 3 registers ([LR, X0, X1]) and there is no output register, while there are 4 writes.
> > 
> Sorry for the late reply (I am just back from holiday).
> 
> > My understanding of "implicit def operands" would be physical registers listed in `let Defs = []` syntax.
> > 
> > @andreadb is this the right thing to do?
> 
> Your understanding is correct.
> 
> Each write is associated with a specific register definition. The sequence of writes matters, and implicit definitions always follow explicit definitions. Implicit definitions come from the `let Defs = [...]` directive.
> 
> To answer to your question:
> 
> It is legal to declare a scheduling class with more writes than register definitions.
> The subtarget emitter doesn't consider it a user error, and a valid scheduling class would be generated for it.
> 
> The generated scheduling class would declare a set of MCWriteProcResEntry entries which also includes contributions from the extra write(s). However, that class would declare more MCWriteLatencyEntry than register definitions.
> 
> Scheduling algorithms would still work fine because method MCSchedModel::computeInstrLatency always iterates over the fill set of MCWriteLatencyEntry when computing the "max" instruction latency.
> 
> Method `TargetSchedModel::computeOperandLatency` (lib/CodeGen/TargetSchedule.cpp) can be used by scheduling algorithms to query the latency of specific register definition. That method would still work fine (simply because it would never be queried for an invalid (e.g. off-by-one) `DefOperIdx` value.
> 
> In conclusion: resource consumption related to extra writes would be added to the set of processor resources. Extra writes can still affect the total instruction latency. However, scheduling algorithms won't be able to associate those to any particular register operand.
> 
> Ideally, if possible, it would be better to have a matching number of writes. That's just my opinion though.
> 
> --
> 
> The TLSDESC_CALLSEQ pseudo that Wang mentioned in his last comment is associated with a scheduling class descriptor named `WriteI_WriteLD_WriteI_WriteBrReg` (whose definition can be found in AArch64GenSubtargetInfo.inc). 
> That scheduling class reports the right number of MCWriteProcResEntry (it also includes contributions from the "extra" write, i.e. WriteBrReg).
> The "problem" is that it declares 4 MCWriteLatencyEntry, but the instruction only specifies 3 (implicit) register writes.
> So, the fourth write latency cannot be attributed to any register operand in particular.
> 
> I hope it helps.
Thank you for your detailed explanation! I have learnt a lot from it!

So such definitions in this patch and other targets won't cause errors and scheduler will work fine, the only 'problem' is that we can't associate extra writes to any specific register. But I think it is not a problem in this patch, because all calls are scheduling boundaries (sees `isSchedBoundary` in `CodeGen/MachineScheduler.cpp`) and `llvm-mca` simply sets max latency of calls to 100 cycles.

Correct me if I am wrong, this patch makes no difference to scheduler, just makes `llvm-mca` happy. :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123578/new/

https://reviews.llvm.org/D123578