[PATCH] D156852: [AMDGPU] Use inreg for hint to preload kernel arguments

Sun Aug 13 21:35:12 PDT 2023

kerbowa added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:922

+static void addPreloadKernArgHint(Function &F) {
+  for (unsigned I = 0;
----------------
arsenm wrote:
> This isn't really gaining value by being in the attributor as it stands. Are you planning on a more sophisticated user analysis to select the starting point?
Yes, that's the plan. Any heuristics of that sort will be implemented here.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:924
+  for (unsigned I = 0;
+       I < F.arg_size() && I < std::min(KernargPreloadCount.getValue(), 16u);
+       ++I) {
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > Assuming 1 register : 1 argument which is not the case.
> > > 
> > > Also the user SGPR count went up (in gfx10 I think?) so you should query the number for that
> > AFAIR it was increased to 32 in gfx9, then again was 16 in gfx10. I also do not think this is implemented.
> We already have getMaxNumUserSGPRs but it's just hardcoded
InReg is just a hint so being super accurate with the number of available SGPRs isn't needed for correctness here. The intent is just to have it stop at the maximum, but if it goes over it doesn't matter. I wasn't aware that the number of available user SGPRs had changed i.e. at https://llvm.org/docs/AMDGPUUsage.html I just see 16, I've only ever heard the number 16 talked about. Anyway, I will change this to getMaxNumUserSGPRs as I look into what's actually supported more.

================
Comment at: llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -amdgpu-attributor -S < %s | FileCheck -check-prefix=NO-PRELOAD %s
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -amdgpu-kernarg-preload-count=1 -amdgpu-attributor -S < %s | FileCheck -check-prefix=PRELOAD-1 %s
----------------
arsenm wrote:
> Use -passes
I think it won't work for amdgpu-attributor with old pass manager?

================
Comment at: llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll:235
+}
+
+declare void @func(ptr) #0
----------------
arsenm wrote:
> arsenm wrote:
> > Needs some exotic types. i8, <2 x half>, <3 x half> odd bit sized scalars, odd bit sized element vectors, odd vectors, structs, arrays and nested structs
> We could probably get away with making the verifier reject aggregate arguments for amdgpu_kernel
Kind of unnecessary here since none of that impacts the addition of inreg at this point, it will just reject preloading the argument later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156852/new/

https://reviews.llvm.org/D156852