[PATCH] D68266: [MCA][LSUnit] Track loads and stores until retirement.

Fri Oct 4 03:52:42 PDT 2019

andreadb marked 2 inline comments as done.
andreadb added a comment.

Thanks Roman,

================
Comment at: include/llvm/MCA/HardwareUnits/LSUnit.h:298-299
+  // Stores are tracked by the STQ (store queue) from dispatch until commitment.
+  // By default we conservatively assume that the LDQ receives a load at
+  // dispatch. Loads leave the LDQ at retirement stage.
+  virtual void onInstructionRetired(const InstRef &IR);
----------------
lebedev.ri wrote:
> > // By default we conservatively assume that the LDQ receives a load at dispatch.
> 
> I think this may explain some of the weird throughput numbers i was seeing
> for load-folded instructions. (as compared with llvm-exegesis measurements)
> Is there a bug that tracks this? I wonder if the correct choice would be
> to make it wait for L1 latency here.
It would be interesting to see what code is compiled and run by exegesis to obtain the latency/throughput of those load folded instructions. Not knowing what kernel is run by exegesis makes it hard for me to understand your last comment. Could you please post an example in PR39830 (or raise a separate bug)? that would be very useful. Thanks.

================
Comment at: include/llvm/MCA/HardwareUnits/LSUnit.h:298-299
+  // Stores are tracked by the STQ (store queue) from dispatch until commitment.
+  // By default we conservatively assume that the LDQ receives a load at
+  // dispatch. Loads leave the LDQ at retirement stage.
+  virtual void onInstructionRetired(const InstRef &IR);
----------------
andreadb wrote:
> lebedev.ri wrote:
> > > // By default we conservatively assume that the LDQ receives a load at dispatch.
> > 
> > I think this may explain some of the weird throughput numbers i was seeing
> > for load-folded instructions. (as compared with llvm-exegesis measurements)
> > Is there a bug that tracks this? I wonder if the correct choice would be
> > to make it wait for L1 latency here.
> It would be interesting to see what code is compiled and run by exegesis to obtain the latency/throughput of those load folded instructions. Not knowing what kernel is run by exegesis makes it hard for me to understand your last comment. Could you please post an example in PR39830 (or raise a separate bug)? that would be very useful. Thanks.
To further clarify this. The LDQ does receive load opcodes at dispatch.
The 'conservative assumption' here is that loads leave at retire rather than at the end of execution.
Stores are always tracked by the STQ from dispatch until retire.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68266/new/

https://reviews.llvm.org/D68266