[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Tue Aug 29 04:40:52 PDT 2023

saiislam added inline comments.

================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
yaxunl wrote:
> saiislam wrote:
> > yaxunl wrote:
> > > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the device lib and verify the load of llvm.amdgcn.abi.version being eliminated after optimization.
> > > 
> > > I think currently it cannot do that since llvm.amdgcn.abi.version is not internalized by the internalization pass. This can cause some significant perf drops since loading is expensive. Need to tweak the function controlling what variables can be internalized for amdgpu so that this variable gets internalized, or having a generic way to tell that function which variables should be internalized, e.g. by adding a metadata amdgcn.internalize
> > load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and mlink-builtin-bitcode of device lib.
> It seems being eliminated by IPSCCP. It makes sense since it is constant weak_odr without externally_initialized. Either changing it to weak or adding externally_initialized will keep the load. Normal `__constant__` var in device code may be changed by host code, therefore they are emitted with externally_initialized and do not have the load eliminated.
Thank you @yaxunl !
I have added these observations as comments in the code at load emit and global emit locations.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730