[Openmp-commits] [PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Yaxun Liu via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Aug 23 20:07:03 PDT 2023
yaxunl added inline comments.
================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the device lib and verify the load of llvm.amdgcn.abi.version being eliminated after optimization.
I think currently it cannot do that since llvm.amdgcn.abi.version is not internalized by the internalization pass. This can cause some significant perf drops since loading is expensive. Need to tweak the function controlling what variables can be internalized for amdgpu so that this variable gets internalized, or having a generic way to tell that function which variables should be internalized, e.g. by adding a metadata amdgcn.internalize
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D139730/new/
https://reviews.llvm.org/D139730
More information about the Openmp-commits
mailing list