[PATCH] D83674: [AMDGPU] Calculate minimum allowed occupancy based on threads per lane

Thu Jul 16 12:59:06 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp:200
+  // - 16 waves per SIMD for GFX10 wave32
+  unsigned MinOccupancy = ST.getTotalNumVGPRs() / 64;
+  return std::min(Occupancy, MinOccupancy);
----------------
nhaehnle wrote:
> rampitec wrote:
> > This is not expressable in a number of VGPRs. The limiter is about memory and not about register budget at all. Meanwhile it looks like 8 waves might be more reasonable for wave32, but wave64 shall still use the same value across the targets, because at the end of the day it boils down to a number of concurrent memory operations.
> I agree that using number of VGPRs here is an awkward roundabout way of doing it. The point isn't really about VGPRs, it's about how many threads are alive per lane in order to allow a good mix of VALU and VMEM instruction issue in the face of certain (IMHO underspecified) memory pressure situations.
> 
> I also disagree that 8 is the right answer for Wave32, for precisely the same reason: it is about the number of concurrent memory operations. 16x Wave32 gives you the same amount of memory operations per CU (2 SIMD * 16 waves * 32 = 1024 workitems) as 4 waves on <=gfx9 (4 SIMD * 4 waves * 64 = 1024 workitems).
I can understand the logic behind it, but using of TotalNumVGPRs() is not really what we want here. It probably needs to use getMaxWavesPerEU() and wavefront size to estimate a number of threads in flight. Besides the same logic will give answer 16 even on gfx1030 where 16 is the absolute maximum. The latter does not sound right.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83674/new/

https://reviews.llvm.org/D83674