[PATCH] D120129: [NVPTX] Enhance vectorization of ld.param & st.param

Mon Mar 21 08:29:58 PDT 2022

yaxunl added a comment.

In D120129#3392878 <https://reviews.llvm.org/D120129#3392878>, @tra wrote:

> In D120129#3392681 <https://reviews.llvm.org/D120129#3392681>, @yaxunl wrote:
>
>> For HIP, we mark non-kernel device functions with hidden visibility and internalize them in a LLVM pass for -fno-gpu-rdc.
>
> Looks like now we may have a reason to do so for CUDA, too. Could you point me to where we do it for HIP?

Make default visibility to be hidden: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L203

To avoid making kernels invisible, make them protected visibility: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/TargetInfo.cpp#L9315

Tell the backend that it needs to internalize non-kernel functions: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L189

Let backend internalize non-kernel functions but not variables: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L702

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120129/new/

https://reviews.llvm.org/D120129