[PATCH] D159452: [AMDGPU] Machine scheduler should take care of the 'amdgpu-num-vgp' function attribute

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 5 14:18:12 PDT 2023


alex-t added a comment.

In D159452#4638836 <https://reviews.llvm.org/D159452#4638836>, @rampitec wrote:

> In D159452#4638728 <https://reviews.llvm.org/D159452#4638728>, @alex-t wrote:
>
>> Thanks to all who took the time to look at this.
>> The aim of this review is to discuss and find the right approach.
>> We have a target description interface that keeps the HW details.
>> We also have a set of user-defined attributes like: "amdgpu-waves-per-eu", "amdgpu-num-vgpr" etc. which affect the defaults.
>> We also take into account the LDS size.
>> Shouldn't we compute the occupancy uniformly accounting for all constraints together?
>
> There is `GCNSubtarget::computeOccupancy` which aims to do exactly that.

No, it does not care of "amdgpu-num-vgpr" attribute.

  if (NumVGPRs)
     Occupancy = std::min(Occupancy, getOccupancyWithNumVGPRs(NumVGPRs));



  unsigned GCNSubtarget::getOccupancyWithNumVGPRs(unsigned NumVGPRs) const {
    return AMDGPU::IsaInfo::getNumWavesPerEUWithNumVGPRs(this, NumVGPRs);
  }



  unsigned getNumWavesPerEUWithNumVGPRs(const MCSubtargetInfo *STI,
                                        unsigned NumVGPRs) {
    unsigned MaxWaves = getMaxWavesPerEU(STI);
    unsigned Granule = getVGPRAllocGranule(STI);
    if (NumVGPRs < Granule)
      return MaxWaves;
    unsigned RoundedRegs = alignTo(NumVGPRs, Granule);
    return std::min(std::max(**getTotalNumVGPRs(STI) **/ RoundedRegs, 1u), MaxWaves);
  }

getTotalNumVGPRs only takes the defaults.

> In D159452#4638833 <https://reviews.llvm.org/D159452#4638833>, @alex-t wrote:
>
>> One more case that we may want to consider: with "amdgpu-num-vgpr"=8 the difference between 2 RPs of 2 is significant.
>> The schedule that spends 7 VGPRs does not spill but another one, which spends 9 will.
>> Currently, we have a "less" operator defined for the GCNRegPressure which only compares the occupancy based on the default target number of available registers and the number of VGPRs from the RP.  To this method, RPs in my case are equal as they both have occupancy 10. This is yet another reason why we cannot use the excess RP set to decide if we need to keep or revert the current schedule.
>
> 8 vgpr case is artificial and only used for stress testing. There is no other reason to use anything below maximum occupancy.

But it is still a valid input and we don't want that tests to become spilling.
Another question: What this attribute is intended for? If it imposes the constraint we have to take it into account when computing the occupancy, otherwise, we schedule/allocate more registers than was requested by the user and this will be the UB.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159452/new/

https://reviews.llvm.org/D159452



More information about the llvm-commits mailing list