[llvm] [AMDGPU] Enable kernarg preloading by default on gfx940 (PR #110691)

Wed Oct 2 13:35:00 PDT 2024

================
@@ -1014,12 +1014,49 @@ struct AAAMDGPUNoAGPR
 
 const char AAAMDGPUNoAGPR::ID = 0;
 
+static unsigned getMaxNumPreloadArgs(const Function &F, const DataLayout &DL,
+                                     const TargetMachine &TM) {
+  const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+  unsigned Offset = 0;
+  unsigned ArgsToPreload = 0;
+  for (const auto &Arg : F.args()) {
+    if (Arg.hasByRefAttr())
+      break;
+
+    Type *Ty = Arg.getType();
+    Align ArgAlign = DL.getABITypeAlign(Ty);
+    auto Size = DL.getTypeAllocSize(Ty);
----------------
kerbowa wrote:

I think the alloc size is correct with respect to the number of user SGPRs that are initially allocated for preloading, since it maps directly on to how the data looks in the kernarg segment, and the number of registers used for preloading is derived from that.

> I think it would be better to be more precise (and maybe even make the inreg a hard requirement to respect)

I short of agree, but we have to decide what to do when frontends like Triton just add inreg to every argument. Should we remove it in cases where we cannot preload the argument? Print a warning if we cannot preload?

I'm leaning towards the first option where we remove inreg from arguments that wont actually be preloaded somewhere like AMDGPULowerKernelArguments after all the attributes are finalized, ect.

https://github.com/llvm/llvm-project/pull/110691