[all-commits] [llvm/llvm-project] 2b93c9: [X86] AMD Zen 3 Scheduler Model

Roman Lebedev via All-commits all-commits at lists.llvm.org
Sat May 1 12:11:17 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 2b93c9c16c586c26d20a5166c6ffbd71bc85b2e6
      https://github.com/llvm/llvm-project/commit/2b93c9c16c586c26d20a5166c6ffbd71bc85b2e6
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-05-01 (Sat, 01 May 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86.td
    M llvm/lib/Target/X86/X86PfmCounters.td
    A llvm/lib/Target/X86/X86ScheduleZnver3.td
    M llvm/test/CodeGen/X86/slow-unaligned-mem.ll
    M llvm/test/CodeGen/X86/x86-64-double-shifts-var.ll
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-2.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-3.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-4.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-5.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-6.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update-7.s
    A llvm/test/tools/llvm-mca/X86/Znver3/partial-reg-update.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-adx.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-aes.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-avx1.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-avx2.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-bmi1.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-bmi2.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-clflushopt.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-clzero.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-cmov.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-cmpxchg.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-f16c.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-fma.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-fsgsbase.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-lea.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-lzcnt.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-mmx.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-movbe.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-mwaitx.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-pclmul.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-popcnt.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-prefetchw.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-rdrand.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-rdseed.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sha.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse1.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse2.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse3.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse41.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse42.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-sse4a.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-ssse3.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-x86_32.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-x86_64.s
    A llvm/test/tools/llvm-mca/X86/Znver3/resources-x87.s
    M llvm/test/tools/llvm-mca/X86/cpus.s
    M llvm/test/tools/llvm-mca/X86/in-order-cpu.s
    M llvm/test/tools/llvm-mca/X86/read-after-ld-1.s
    M llvm/test/tools/llvm-mca/X86/register-file-statistics.s
    M llvm/test/tools/llvm-mca/X86/scheduler-queue-usage.s

  Log Message:
  -----------
  [X86] AMD Zen 3 Scheduler Model

Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.

This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.

I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)

Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
  {F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
  {F16422601}
* throughput is within reason
  {F16422607}

I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}

I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
  I'm not really sure if it's a modelling defect that needs to be fixed,
  or it's a defect of measurments.
* Pipe distributions are probably bad :)
  I can't do much here until AMD allows that to be fixed
  by documenting the appropriate counters and updating libpfm

That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!

Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.

Differential Revision: https://reviews.llvm.org/D94395




More information about the All-commits mailing list