[PATCH] D76356: [AMDGPU] Introduce more scratch registers in the ABI.

Mon Mar 23 14:12:57 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8710
+    * All SGPR registers except the clobbered registers of SGPR4-31.
+    * VGPR36-39
+      VGPR44-47
----------------
arsenm wrote:
> rampitec wrote:
> > t-tye wrote:
> > > t-tye wrote:
> > > > cdevadas wrote:
> > > > > arsenm wrote:
> > > > > > cdevadas wrote:
> > > > > > > t-tye wrote:
> > > > > > > > b-sumner wrote:
> > > > > > > > > arsenm wrote:
> > > > > > > > > > t-tye wrote:
> > > > > > > > > > > arsenm wrote:
> > > > > > > > > > > > t-tye wrote:
> > > > > > > > > > > > > arsenm wrote:
> > > > > > > > > > > > > > A description of why it's split this way may be helpful
> > > > > > > > > > > > > Is the striping being picked at 4 VGPRs to match the hardware VGPR allocation granularity (4 for <=GFX9 and 8 for >=GFX10)? How does this stripping impact register file fragmentation? What is the impact of objects being promoted to registers that are larger than 4 VGPRs?
> > > > > > > > > > > > These aren't used for argument passing, so there's no concept of objects to consider
> > > > > > > > > > > Also this is an ABI breaking change (as is the change for the handling of the wave scratch offset) so should the EI_ABIVERSION for each EI_OSABI in the ELF header be bumped? My thinking is no since AMD has not yet started to support isa level linking nor function pointers so this change cannot affect any existing programs.
> > > > > > > > > > We didn't have, and still don't, have an ABI worthy of considering for this
> > > > > > > > > Seems OK to me to not bump the version since we're not touching the kernel ABI and we don't have ISA linking.
> > > > > > > > > These aren't used for argument passing, so there's no concept of objects to consider
> > > > > > > > 
> > > > > > > > But if the compiler wants to promote an object to contiguous registers, then it will end up spanning both clobbered and non-clobbered registers for objects larger than the granularity, forcing some spill/restore code for the parts that are in non-clobbered registers. If the stripping was at a courser granularity then this may be unnecessary if the object will fit in the courser granularity.
> > > > > > > > 
> > > > > > > > Do we think this could be an issue?
> > > > > > > If you are still talking about passing the object, the compiler won't promote an object of large size into registers beyond the range defined by the convention (the first 32 VGPRs, in our case, that we didn't split anyway). 
> > > > > > We have few contexts with single VPGR tuples > 4. Some image instructions and indirect register indexing. We probably don’t have any tests with calls stressing these cases
> > > > > Are you saying, there are cases more than 4 contiguous VGPRs to be allocated by RA? If yes, can you tell me the instructions?
> > > > > I was under the impression that we have instructions requiring at most 4 contiguous VGPRs (for instance, FLAT_LOAD_DWORDX4)
> > > > No I am not talking about argument passing, I am talking about generating code for the body of the function. It would presumably be best that it only uses clobbered registers for values that do not want to be live across the calls it makes. So for objects larger than 4 dwords that will mean prologue/epilogue spilling/restoring, and register shuffling at call sites.
> > > We do have T# and V# values using in image instructions that are larger than 4 dwords. The instructions that use them currently need them in SGPRs but they may need to be moved from VGPRs (which would not need them to be contiguous). Do we also want to be sure the ABI will work for future hardware that may change, or are we ok with different ABIs for different targets?
> > There are some image instructions which have address in much longer VGPR tuple. If you look into SiRegisterInfo.td our largest VGPR register class is 32 dwords long. I'd say this also shall be a minimal interleave value.
> But those are AGPRs, which last I checked were not considered preserved across calls anyway. The question is also how often do those need to be live across calls, which is probably very rare.
AGPRs can be copied to VGPRs. Anyway, there are image addresses too which are pretty big.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76356/new/

https://reviews.llvm.org/D76356