[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 7 09:33:41 PDT 2023


yaxunl added a comment.

need a lit test for the codegen of the clang builtin for cov 4/5/none and a lit test to show the branching code generated with cov none can be optimized away when linked with cov4 or cov5.



================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:383
+            CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,
+            llvm::GlobalValue::WeakODRLinkage);
+}
----------------

I am not sure weak_odr linkage will work when code object version is none. This will cause conflict when a module emitted with cov none is linked with a module emitted with cov4 or cov5. Also, when all modules are emitted with cov none, we end up with a linked module with cov none and the work group size code will not work.

Probably we need to emit llvm.amdgcn.abi.version with external linkage for cov none.

Another issue is that llvm.amdgcn.abi.version is not internalized. It is always loaded from memory even though it is in constant address space. This will cause bad performance. Considering device libs may use clang builtin for workgroup size. The performance impact may be significant. To avoid performance degradation, we need to internalize it as early as possible in the optimization pipeline.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730



More information about the cfe-commits mailing list