[all-commits] [llvm/llvm-project] 84610a: AMDGPU: Add AMDGPUSubtarget::getEUsPerCU()

Mon Jan 23 12:43:27 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 84610a82a1aa9bac828569560818b919951039e1
      https://github.com/llvm/llvm-project/commit/84610a82a1aa9bac828569560818b919951039e1
  Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
  Date:   2023-01-23 (Mon, 23 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

  Log Message:
  -----------
  AMDGPU: Add AMDGPUSubtarget::getEUsPerCU()

We will use this for more accurate occupancy computations. Note that
IsaInfo takes WGP mode vs. CU mode into account on gfx10+.

Differential Revision: https://reviews.llvm.org/D139467

  Commit: 07ed1d631c097aea6f5cc9d9c9b52d2617dea670
      https://github.com/llvm/llvm-project/commit/07ed1d631c097aea6f5cc9d9c9b52d2617dea670
  Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
  Date:   2023-01-23 (Mon, 23 Jan 2023)

  Changed paths:
    M llvm/test/CodeGen/AMDGPU/dbg-value-ends-sched-region.mir
    M llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir
    M llvm/test/CodeGen/AMDGPU/licm-regpressure.mir
    M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir

  Log Message:
  -----------
  AMDGPU: Re-run UTC scripts on some test cases

Reduce the diff of subsequent changes.

  Commit: 0775f21b62a5fe6422c3d021685a779e647bf8d1
      https://github.com/llvm/llvm-project/commit/0775f21b62a5fe6422c3d021685a779e647bf8d1
  Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
  Date:   2023-01-23 (Mon, 23 Jan 2023)

  Changed paths:
    A llvm/test/CodeGen/AMDGPU/schedule-regpressure-lds.ll

  Log Message:
  -----------
  AMDGPU: Add a scheduler test to demonstrate an upcoming change

  Commit: 10cef708a7ccaf69c18be86460583f9b62ee3c29
      https://github.com/llvm/llvm-project/commit/10cef708a7ccaf69c18be86460583f9b62ee3c29
  Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
  Date:   2023-01-23 (Mon, 23 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
    M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
    M llvm/lib/Target/AMDGPU/R600Subtarget.cpp
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
    M llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memcpy.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
    M llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll
    M llvm/test/CodeGen/AMDGPU/bf16.ll
    M llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
    M llvm/test/CodeGen/AMDGPU/dbg-value-ends-sched-region.mir
    M llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir
    M llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll
    M llvm/test/CodeGen/AMDGPU/half.ll
    M llvm/test/CodeGen/AMDGPU/idot8s.ll
    M llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
    M llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
    M llvm/test/CodeGen/AMDGPU/licm-regpressure.mir
    M llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
    M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
    M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
    M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
    M llvm/test/CodeGen/AMDGPU/memory_clause.mir
    M llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll
    M llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
    M llvm/test/CodeGen/AMDGPU/pr51516.mir
    M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
    M llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
    M llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir
    M llvm/test/CodeGen/AMDGPU/schedule-barrier.mir
    M llvm/test/CodeGen/AMDGPU/schedule-regpressure-lds.ll
    M llvm/test/CodeGen/AMDGPU/sdiv.ll
    M llvm/test/CodeGen/AMDGPU/sdiv64.ll
    M llvm/test/CodeGen/AMDGPU/shift-i128.ll
    M llvm/test/CodeGen/AMDGPU/shl.ll
    M llvm/test/CodeGen/AMDGPU/sra.ll
    M llvm/test/CodeGen/AMDGPU/srl.ll
    M llvm/test/CodeGen/AMDGPU/ssubsat.ll
    M llvm/test/CodeGen/AMDGPU/udiv.ll
    M llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

  Log Message:
  -----------
  AMDGPU: Clean up LDS-related occupancy calculations

Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.

getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.

At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).

This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.

Differential Revision: https://reviews.llvm.org/D139468

Compare: https://github.com/llvm/llvm-project/compare/20f895cd7104...10cef708a7cc