[llvm] 3597f02 - [AMDGPU] Add GlobalDCE before internalization pass

Yaxun Liu via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 17 08:25:56 PDT 2021


Author: Yaxun (Sam) Liu
Date: 2021-04-17T11:25:25-04:00
New Revision: 3597f02fd5c62f7c49c71b92e467128ffe2cf9cd

URL: https://github.com/llvm/llvm-project/commit/3597f02fd5c62f7c49c71b92e467128ffe2cf9cd
DIFF: https://github.com/llvm/llvm-project/commit/3597f02fd5c62f7c49c71b92e467128ffe2cf9cd.diff

LOG: [AMDGPU] Add GlobalDCE before internalization pass

The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783

Added: 
    clang/test/CodeGenCUDA/unused-global-var.cu

Modified: 
    llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Removed: 
    


################################################################################
diff  --git a/clang/test/CodeGenCUDA/unused-global-var.cu b/clang/test/CodeGenCUDA/unused-global-var.cu
new file mode 100644
index 0000000000000..1dbb3a22563c8
--- /dev/null
+++ b/clang/test/CodeGenCUDA/unused-global-var.cu
@@ -0,0 +1,53 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -x hip %s \
+// RUN:   -std=c++11 -O3 -mllvm -amdgpu-internalize-symbols -emit-llvm -o - \
+// RUN:   -target-cpu gfx906 | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -x hip %s \
+// RUN:   -std=c++11 -O3 -mllvm -amdgpu-internalize-symbols -emit-llvm -o - \
+// RUN:   -target-cpu gfx906 | FileCheck -check-prefix=NEGCHK %s
+
+#include "Inputs/cuda.h"
+
+// AMDGPU internalize unused global variables for whole-program compilation
+// (-fno-gpu-rdc for each TU, or -fgpu-rdc for LTO), which are then
+// eliminated by global DCE. If there are invisible unused address space casts
+// for global variables, these dead users need to be eliminated by global
+// DCE before internalization. This test makes sure unused global variables
+// are eliminated.
+
+// Check unused device/constant variables are eliminated.
+
+// NEGCHK-NOT: @v1
+__device__ int v1;
+
+// NEGCHK-NOT: @v2
+__constant__ int v2;
+
+// NEGCHK-NOT: @_ZL2v3
+constexpr int v3 = 1;
+
+// Check managed variables are always kept.
+
+// CHECK-DAG: @v4
+__managed__ int v4;
+
+// Check used device/constant variables are not eliminated.
+// CHECK-DAG: @u1
+__device__ int u1;
+
+// CHECK-DAG: @u2
+__constant__ int u2;
+
+// Check u3 is kept because its address is taken.
+// CHECK-DAG: @_ZL2u3
+constexpr int u3 = 2;
+
+// Check u4 is not kept because it is not ODR-use.
+// NEGCHK-NOT: @_ZL2u4
+constexpr int u4 = 3;
+
+__device__ int fun1(const int& x);
+
+__global__ void kern1(int *x) {
+  *x = u1 + u2 + fun1(u3) + u4;
+}

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index ce39609da303d..1b3b56f5dc71f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -575,6 +575,9 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB,
         PM.addPass(AMDGPUPrintfRuntimeBindingPass());
 
         if (InternalizeSymbols) {
+          // Global variables may have dead uses which need to be removed.
+          // Otherwise these useless global variables will not get internalized.
+          PM.addPass(GlobalDCEPass());
           PM.addPass(InternalizePass(mustPreserveGV));
         }
         PM.addPass(AMDGPUPropagateAttributesLatePass(*this));


        


More information about the llvm-commits mailing list