[llvm] [AMDGPU] Push amdgpu-preload-kern-arg-prolog after livedebugvalues (PR #126148)
Scott Linder via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 17 10:27:42 PST 2025
https://github.com/slinder1 updated https://github.com/llvm/llvm-project/pull/126148
>From 158abadbe998acc4ab70da3e4af0af1c4ff3b989 Mon Sep 17 00:00:00 2001
From: Scott Linder <Scott.Linder at amd.com>
Date: Thu, 6 Feb 2025 00:01:07 +0000
Subject: [PATCH] [AMDGPU] Push amdgpu-preload-kern-arg-prolog after
livedebugvalues
This is effectively a workaround for a bug in livedebugvalues, but seems
to potentially be a general improvement, as BB sections seems like it
could ruin the special 256-byte prelude scheme that
amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later
doesn't seem to have any material impact, and just adds livedebugvalues
to the list of things which no longer have to deal with pseudo
multiple-entry functions.
AMDGPU debug-info isn't supported upstream yet, so the bug being avoided
isn't testable here. I am posting the patch upstream to avoid an
unnecessary diff with AMD's fork.
---
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 6 ++++++
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll | 10 +++++-----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index eb488843b53e0..92ab106dd4a98 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1151,6 +1151,7 @@ class GCNPassConfig final : public AMDGPUPassConfig {
void addPostRegAlloc() override;
void addPreSched2() override;
void addPreEmitPass() override;
+ void addPostBBSections() override;
};
} // end anonymous namespace
@@ -1690,6 +1691,11 @@ void GCNPassConfig::addPreEmitPass() {
addPass(&AMDGPUInsertDelayAluID);
addPass(&BranchRelaxationPassID);
+}
+
+void GCNPassConfig::addPostBBSections() {
+ // We run this later to avoid passes like livedebugvalues and BBSections
+ // having to deal with the apparent multi-entry functions we may generate.
addPass(createAMDGPUPreloadKernArgPrologLegacyPass());
}
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index 893b9fa6fb40d..d7f54f3b8e9e2 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -145,11 +145,11 @@
; GCN-O0-NEXT: Post RA hazard recognizer
; GCN-O0-NEXT: AMDGPU Insert waits for SGPR read hazards
; GCN-O0-NEXT: Branch relaxation pass
-; GCN-O0-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O0-NEXT: Register Usage Information Collector Pass
; GCN-O0-NEXT: Remove Loads Into Fake Uses
; GCN-O0-NEXT: Live DEBUG_VALUE analysis
; GCN-O0-NEXT: Machine Sanitizer Binary Metadata
+; GCN-O0-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis
; GCN-O0-NEXT: Machine Optimization Remark Emitter
; GCN-O0-NEXT: Stack Frame Layout Analysis
@@ -430,11 +430,11 @@
; GCN-O1-NEXT: AMDGPU Insert waits for SGPR read hazards
; GCN-O1-NEXT: AMDGPU Insert Delay ALU
; GCN-O1-NEXT: Branch relaxation pass
-; GCN-O1-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O1-NEXT: Register Usage Information Collector Pass
; GCN-O1-NEXT: Remove Loads Into Fake Uses
; GCN-O1-NEXT: Live DEBUG_VALUE analysis
; GCN-O1-NEXT: Machine Sanitizer Binary Metadata
+; GCN-O1-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
; GCN-O1-NEXT: Machine Optimization Remark Emitter
; GCN-O1-NEXT: Stack Frame Layout Analysis
@@ -743,11 +743,11 @@
; GCN-O1-OPTS-NEXT: AMDGPU Insert waits for SGPR read hazards
; GCN-O1-OPTS-NEXT: AMDGPU Insert Delay ALU
; GCN-O1-OPTS-NEXT: Branch relaxation pass
-; GCN-O1-OPTS-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O1-OPTS-NEXT: Register Usage Information Collector Pass
; GCN-O1-OPTS-NEXT: Remove Loads Into Fake Uses
; GCN-O1-OPTS-NEXT: Live DEBUG_VALUE analysis
; GCN-O1-OPTS-NEXT: Machine Sanitizer Binary Metadata
+; GCN-O1-OPTS-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
; GCN-O1-OPTS-NEXT: Stack Frame Layout Analysis
@@ -1062,11 +1062,11 @@
; GCN-O2-NEXT: AMDGPU Insert waits for SGPR read hazards
; GCN-O2-NEXT: AMDGPU Insert Delay ALU
; GCN-O2-NEXT: Branch relaxation pass
-; GCN-O2-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O2-NEXT: Register Usage Information Collector Pass
; GCN-O2-NEXT: Remove Loads Into Fake Uses
; GCN-O2-NEXT: Live DEBUG_VALUE analysis
; GCN-O2-NEXT: Machine Sanitizer Binary Metadata
+; GCN-O2-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
; GCN-O2-NEXT: Machine Optimization Remark Emitter
; GCN-O2-NEXT: Stack Frame Layout Analysis
@@ -1394,11 +1394,11 @@
; GCN-O3-NEXT: AMDGPU Insert waits for SGPR read hazards
; GCN-O3-NEXT: AMDGPU Insert Delay ALU
; GCN-O3-NEXT: Branch relaxation pass
-; GCN-O3-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O3-NEXT: Register Usage Information Collector Pass
; GCN-O3-NEXT: Remove Loads Into Fake Uses
; GCN-O3-NEXT: Live DEBUG_VALUE analysis
; GCN-O3-NEXT: Machine Sanitizer Binary Metadata
+; GCN-O3-NEXT: AMDGPU Preload Kernel Arguments Prolog
; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
; GCN-O3-NEXT: Machine Optimization Remark Emitter
; GCN-O3-NEXT: Stack Frame Layout Analysis
More information about the llvm-commits
mailing list