[llvm] [AMDGPU][NFCI] Reorder AGPRs to allow skipping over them (PR #105633)

Thu Aug 22 04:37:40 PDT 2024

================
@@ -3365,6 +3365,24 @@ SIRegisterInfo::getConstrainedRegClassForOperand(const MachineOperand &MO,
   return nullptr;
 }
 
+unsigned SIRegisterInfo::getNumSupportedRegs(const MachineFunction &MF) const {
+#ifndef NDEBUG
+  for (unsigned K = AMDGPU::AGPR0; K < AMDGPU::NUM_TARGET_REGS; ++K) {
+    // Skip lo16 registers, they're "fake" and don't have a regclass assigned.
+    if (K >= AMDGPU::AGPR0_HI16 && K <= AMDGPU::AGPR255_HI16)
+      continue;
+    if (!isAGPR(MF.getRegInfo(), K))
+      report_fatal_error("register at index " + Twine(K) + " is not an AGPR!");
+  }
+#endif
+
+  // Don't include AGPRs on targets that don't have them.
+  // This cuts about 4000 register (almost half of all registers) off.
+  return MF.getInfo<SIMachineFunctionInfo>()->mayUseAGPRs(MF.getFunction())
----------------
Pierre-vh wrote:

> Actually this is bugged. You should probably just start with considering ST.hasGFX90AInsts(), but you want MFI::usesAGPRs, not mayUseAGPRs which looks at the underlying IR attribute.

So hasGFX90AInsts && usesAGPR ?

> The second possible bug is amdgpu-no-agprs means the IR doesn't require the kernel to allocate AGPRs. But AGPRs may still appear as a result of the occupancy bounds (we probably should fix the inference logic to consider that case)

This is worrying because if we have to always enable AGPRs on all GFX90AInsts, then this patch is more or less useless. I think the large inputs that'd benefit from this often target MI and if MI targets don't benefit from it then I'd rather not bother

Would the fix be easy? Is there an extra check this can do to avoid the issue?

> I haven't seen this getNumSupportedRegs callback before; it probably shouldn't exist? It seems redundant with RegisterClassInfo, which is supposed to filter out a lot of the redundant cases

I think this is targeted at reducing iteration, and the size of containers that are used for registers (arrays, BitVectors, etc.). If this lands, I'd like to see if we can `getNumSupportedRegs` in more places.

Note that I felt like this was a good idea, but if you think it's too confusing and not worth it then I don't want to force it. I've been looking at ways to improve performance with backend passes for AMDGPU because some of them scale very poorly with a large amount of registers like we have, so reducing the amount of registers we're dealing with seemed like a good idea.



https://github.com/llvm/llvm-project/pull/105633