[PATCH] D112492: [CUDA][HIP] Allow comdat for kernels

Tue Nov 9 14:19:52 PST 2021

yaxunl added a comment.

I think probably it is necessary to merge linkonce_odr symbols for them to work properly.

Consider the following testcase:

  // a.cu
  template<typename T>
  __global__ void foo(T x) {}

  void test1() {
      foo<<<1,1>>>(1);
  }

  // b.cu
  template<typename T>
  __global__ void foo(T x) {}

  void test2() {
      foo<<<1,1>>>(1);
  }

  // c.cu
  template<typename T>
  __global__ void foo(T x);

  int main() {
      foo<<<1,1>>>(1);
  }

Assume a.cu, b.cu, and c.cu are compiled with default -fno-gpu-rdc option and linked together.

Both a.obj and b.obj contain a global symbol foo<int> as the kernel stub function. c.obj contains reference to foo<int>, so it has to resolve to foo<int> in a.obj or b.obj. It only makes sense for linker to merge foo<int> in a.obj and b.obj and let c.obj resolve to the merged symbol. This also requires that the fat binary embedded in a.obj and b.obj must contain the identical definition of kernel foo<int>. That is, if ODR is followed, even though there are two fat binaries containing kernel foo<int>, only one of them will be used (it is fine since they are identical), which corresponds to the merged symbol for the kernel stub foo<int>.

The implication is that, we have to ask users to follow ODR even with the default -fno-gpu-rdc option. And users cannot have different definitions for the same template instantiation (e.g. foo<int>) in different TU's, otherwise there will be UB.

Considering ODR is a fundamental assumption for C++, I think it is justifiable to request users to follow that no matter whether -fgpu-rdc or -fno-gpu-rdc.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112492/new/

https://reviews.llvm.org/D112492