[PATCH] D144010: [X86] AMD Znver4 (Genoa) Scheduler enablement

Thu Mar 2 05:35:23 PST 2023

RKSimon added a comment.

In D144010#4163853 <https://reviews.llvm.org/D144010#4163853>, @GGanesh wrote:

>> Don't you need to account for double pumping for ZMM? In the Jaguar model we'd typically double the resource usage to [2] to simulate it - so uops stays at 1.
>
> It is not really double pumping similar to that of AMD archs! The micro-ops are doubled however they get into same pipeline one cycle at a time. So, at any given cycle, we can have two different 512-insns fed into the FP pipes (unlike previously double pumped archs). So, uops will be 2 for 512-insns however the other units in FP-pipeline are available for picking another 512 insn on the same cycle.

I agree that LLVM scheduler models resources usage don't fully match what is actually happening on the hardware (we have similar problems with hardware having different concepts of microcoded instructions) - but we do need to indicate that the rthroughput of, say, VADDPD zmm is 1.0 but xmm/ymm is 0.5 - the easiest way to approximately model this is by doubling the resource count, the sideeffects are minimal in comparison (it doesn't affect latency / uop counts, its just not very accurate on how it uses a resource group).

> @RKSimon! We would like to have this patch as part of 16.0 release. Would like to know your thoughts on that! Revisions to the model can follow after the release!

Apart from the 512-bit throughput under estimations I'm happy for the patch to get in for 16.0.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144010/new/

https://reviews.llvm.org/D144010