[Openmp-commits] [PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Thu Aug 24 10:06:39 PDT 2023

saiislam added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8649-8650

+  // code-object-version=X needs to be passed to clang-linker-wrapper to ensure
+  // that it is used by lld.
+  if (const Arg *A = Args.getLastArg(options::OPT_mcode_object_version_EQ)) {
----------------
arsenm wrote:
> so device rtl is linked once as a normal library?
No, this is command generation for clang-linker-wrapper. Since, devicertl is compiled only to get bitcode file (-c), it is never called.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8653-8654
+    CmdArgs.push_back(Args.MakeArgString("-mllvm"));
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--amdhsa-code-object-version=") + A->getValue()));
+  }
----------------
arsenm wrote:
> Why do you need this? The code object version is supposed to come from a module flag. We should be getting rid of the command line argument for it
During command generation for clang-linker-wrapper, it is required to check user's provided `mcode-object-version=X` so that `amdhsa-code-object-version=X` can be passed to the clang/lto backend.

`getAmdhsaCodeObjectVersion()` and `getHsaAbiVersion()` both still use the above command line argument to override user's choice of COV, instead of the module flag.

================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
yaxunl wrote:
> need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the device lib and verify the load of llvm.amdgcn.abi.version being eliminated after optimization.
> 
> I think currently it cannot do that since llvm.amdgcn.abi.version is not internalized by the internalization pass. This can cause some significant perf drops since loading is expensive. Need to tweak the function controlling what variables can be internalized for amdgpu so that this variable gets internalized, or having a generic way to tell that function which variables should be internalized, e.g. by adding a metadata amdgcn.internalize
load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and mlink-builtin-bitcode of device lib.

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:406-410
+  // pass on -mllvm options to the clang
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Arg->getValue());
+  }
----------------
arsenm wrote:
> Shouldn't need this?
It is required so that when clang pass (not the lto backend) is called from clang-linker-wrapper due to `-save-temps`, user provided COV is correctly propagated.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730