[PATCH] D76356: [AMDGPU] Introduce more scratch registers in the ABI.

Sat Mar 21 10:40:53 PDT 2020

cdevadas marked an inline comment as done.
cdevadas added inline comments.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8710
+    * All SGPR registers except the clobbered registers of SGPR4-31.
+    * VGPR36-39
+      VGPR44-47
----------------
arsenm wrote:
> cdevadas wrote:
> > t-tye wrote:
> > > b-sumner wrote:
> > > > arsenm wrote:
> > > > > t-tye wrote:
> > > > > > arsenm wrote:
> > > > > > > t-tye wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > A description of why it's split this way may be helpful
> > > > > > > > Is the striping being picked at 4 VGPRs to match the hardware VGPR allocation granularity (4 for <=GFX9 and 8 for >=GFX10)? How does this stripping impact register file fragmentation? What is the impact of objects being promoted to registers that are larger than 4 VGPRs?
> > > > > > > These aren't used for argument passing, so there's no concept of objects to consider
> > > > > > Also this is an ABI breaking change (as is the change for the handling of the wave scratch offset) so should the EI_ABIVERSION for each EI_OSABI in the ELF header be bumped? My thinking is no since AMD has not yet started to support isa level linking nor function pointers so this change cannot affect any existing programs.
> > > > > We didn't have, and still don't, have an ABI worthy of considering for this
> > > > Seems OK to me to not bump the version since we're not touching the kernel ABI and we don't have ISA linking.
> > > > These aren't used for argument passing, so there's no concept of objects to consider
> > > 
> > > But if the compiler wants to promote an object to contiguous registers, then it will end up spanning both clobbered and non-clobbered registers for objects larger than the granularity, forcing some spill/restore code for the parts that are in non-clobbered registers. If the stripping was at a courser granularity then this may be unnecessary if the object will fit in the courser granularity.
> > > 
> > > Do we think this could be an issue?
> > If you are still talking about passing the object, the compiler won't promote an object of large size into registers beyond the range defined by the convention (the first 32 VGPRs, in our case, that we didn't split anyway). 
> We have few contexts with single VPGR tuples > 4. Some image instructions and indirect register indexing. We probably don’t have any tests with calls stressing these cases
Are you saying, there are cases more than 4 contiguous VGPRs to be allocated by RA? If yes, can you tell me the instructions?
I was under the impression that we have instructions requiring at most 4 contiguous VGPRs (for instance, FLAT_LOAD_DWORDX4)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76356/new/

https://reviews.llvm.org/D76356