[llvm] [AMDGPU] Align loop headers to prevent instruction fetch split on GFX950 (PR #181999)

Thu Feb 19 02:13:53 PST 2026

================
@@ -18811,6 +18825,30 @@ Align SITargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
   return CacheLineAlign;
 }
 
+unsigned SITargetLowering::getMaxPermittedBytesForAlignment(
+    MachineBasicBlock *MBB) const {
+  // GFX950: Limit padding to 4 bytes (one s_nop) for blocks where an 8-byte
+  // instruction could be split by the 32-byte fetch window boundary.
+  // See getPrefLoopAlignment() for context.
+  if (needsFetchWindowAlignment(MBB))
+    return 4;
+  return TargetLowering::getMaxPermittedBytesForAlignment(MBB);
+}
+
+bool SITargetLowering::needsFetchWindowAlignment(
+    const MachineBasicBlock *MBB) const {
----------------
arsenm wrote:

```suggestion
    const MachineBasicBlock &MBB) const {
```

https://github.com/llvm/llvm-project/pull/181999