[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Mon Aug 21 11:26:02 PDT 2023

saiislam marked 4 inline comments as done.
saiislam added inline comments.

================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:383
+            CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,
+            llvm::GlobalValue::WeakODRLinkage);
+}
----------------
yaxunl wrote:
> 
> I am not sure weak_odr linkage will work when code object version is none. This will cause conflict when a module emitted with cov none is linked with a module emitted with cov4 or cov5. Also, when all modules are emitted with cov none, we end up with a linked module with cov none and the work group size code will not work.
> 
> Probably we need to emit llvm.amdgcn.abi.version with external linkage for cov none.
> 
> Another issue is that llvm.amdgcn.abi.version is not internalized. It is always loaded from memory even though it is in constant address space. This will cause bad performance. Considering device libs may use clang builtin for workgroup size. The performance impact may be significant. To avoid performance degradation, we need to internalize it as early as possible in the optimization pipeline.
I tried external linkage but it didn't work. Only weak_odr is working fine.

================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386
+    if (CGM.getModule().getNamedGlobal(Name))
+      return;
+
+    auto *Type =
+        llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
+    auto *GV = new llvm::GlobalVariable(
+        CGM.getModule(), Type, true, Linkage,
----------------
arsenm wrote:
> You moved GetOrCreateLLVMGlobal but don't use it? 
> 
> The lamdba is unnecessary for a single local use
I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for amdgpu_worgroup_size.

================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386
+    if (CGM.getModule().getNamedGlobal(Name))
+      return;
+
+    auto *Type =
+        llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
+    auto *GV = new llvm::GlobalVariable(
+        CGM.getModule(), Type, true, Linkage,
----------------
saiislam wrote:
> arsenm wrote:
> > You moved GetOrCreateLLVMGlobal but don't use it? 
> > 
> > The lamdba is unnecessary for a single local use
> I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for amdgpu_worgroup_size.
I was hoping that this patch will pave way for D130096, so that it can generate rest of the control constants using the same lambda.
I can remove this and simplify the code if you want.

================
Comment at: clang/lib/Driver/ToolChain.cpp:1372
+      if (SameTripleAsHost ||
+          A->getOption().matches(options::OPT_mcode_object_version_EQ))
         DAL->append(A);
----------------
arsenm wrote:
> Don't understand why this is necessary
This function creates a derived argument list for OpenMP target specific flags.
`mcode-object-version` remains unset for device compilation step if we don't pass it here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730