[all-commits] [llvm/llvm-project] 84610a: AMDGPU: Add AMDGPUSubtarget::getEUsPerCU()
Nicolai Hähnle via All-commits
all-commits at lists.llvm.org
Mon Jan 23 12:43:27 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 84610a82a1aa9bac828569560818b919951039e1
https://github.com/llvm/llvm-project/commit/84610a82a1aa9bac828569560818b919951039e1
Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
Date: 2023-01-23 (Mon, 23 Jan 2023)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
Log Message:
-----------
AMDGPU: Add AMDGPUSubtarget::getEUsPerCU()
We will use this for more accurate occupancy computations. Note that
IsaInfo takes WGP mode vs. CU mode into account on gfx10+.
Differential Revision: https://reviews.llvm.org/D139467
Commit: 07ed1d631c097aea6f5cc9d9c9b52d2617dea670
https://github.com/llvm/llvm-project/commit/07ed1d631c097aea6f5cc9d9c9b52d2617dea670
Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
Date: 2023-01-23 (Mon, 23 Jan 2023)
Changed paths:
M llvm/test/CodeGen/AMDGPU/dbg-value-ends-sched-region.mir
M llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir
M llvm/test/CodeGen/AMDGPU/licm-regpressure.mir
M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
Log Message:
-----------
AMDGPU: Re-run UTC scripts on some test cases
Reduce the diff of subsequent changes.
Commit: 0775f21b62a5fe6422c3d021685a779e647bf8d1
https://github.com/llvm/llvm-project/commit/0775f21b62a5fe6422c3d021685a779e647bf8d1
Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
Date: 2023-01-23 (Mon, 23 Jan 2023)
Changed paths:
A llvm/test/CodeGen/AMDGPU/schedule-regpressure-lds.ll
Log Message:
-----------
AMDGPU: Add a scheduler test to demonstrate an upcoming change
Commit: 10cef708a7ccaf69c18be86460583f9b62ee3c29
https://github.com/llvm/llvm-project/commit/10cef708a7ccaf69c18be86460583f9b62ee3c29
Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
Date: 2023-01-23 (Mon, 23 Jan 2023)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
M llvm/lib/Target/AMDGPU/R600Subtarget.cpp
M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
M llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memcpy.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
M llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll
M llvm/test/CodeGen/AMDGPU/bf16.ll
M llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
M llvm/test/CodeGen/AMDGPU/dbg-value-ends-sched-region.mir
M llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir
M llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll
M llvm/test/CodeGen/AMDGPU/half.ll
M llvm/test/CodeGen/AMDGPU/idot8s.ll
M llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
M llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
M llvm/test/CodeGen/AMDGPU/licm-regpressure.mir
M llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll
M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
M llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
M llvm/test/CodeGen/AMDGPU/memory_clause.mir
M llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll
M llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
M llvm/test/CodeGen/AMDGPU/pr51516.mir
M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
M llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
M llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir
M llvm/test/CodeGen/AMDGPU/schedule-barrier.mir
M llvm/test/CodeGen/AMDGPU/schedule-regpressure-lds.ll
M llvm/test/CodeGen/AMDGPU/sdiv.ll
M llvm/test/CodeGen/AMDGPU/sdiv64.ll
M llvm/test/CodeGen/AMDGPU/shift-i128.ll
M llvm/test/CodeGen/AMDGPU/shl.ll
M llvm/test/CodeGen/AMDGPU/sra.ll
M llvm/test/CodeGen/AMDGPU/srl.ll
M llvm/test/CodeGen/AMDGPU/ssubsat.ll
M llvm/test/CodeGen/AMDGPU/udiv.ll
M llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll
Log Message:
-----------
AMDGPU: Clean up LDS-related occupancy calculations
Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.
getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.
At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).
This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.
Differential Revision: https://reviews.llvm.org/D139468
Compare: https://github.com/llvm/llvm-project/compare/20f895cd7104...10cef708a7cc
More information about the All-commits
mailing list