[PATCH] D137154: Adding nvvm_reflect clang builtin

Fri Nov 11 10:52:23 PST 2022

tra added a comment.

> Yes, this probably would become untenable with really large sizes.

I think that horse had left the barn long ago. Vast majority of NVIDIA GPU cycles these days spent in libraries shipped by NVIDIA and those are huge (O(1GB)), numerous (cuDNN, TensorRT, cuBLAS, cuSPARSE, cuFFT) and contain binaries for all major GPU variants NVIDIA supports (sm3x,5x,6x,7x,8x,9x). AMD's ROCm stack is comparable in size, too, I think.
Yes, it does cause issues when one links in everything statically. E.g. we sometimes end up causing overflows of 32-bit signed ELF relocations, but typically users do link with shared libraries and avoid that particular issue.

Distributing large binaries does end up being a challenge, mostly due to the storage and transfer costs. Tensorflow had been forced to split GPU support bits into a separate package and limit the set of the supported GPUs official packages are compiled for. I think we also considered per-GPU variant packages, but that would multiply storage costs.

However, most of the users and libraries are either nowhere near cuDNN/ROCm in terms of size and the size increase will likely be manageable, or already have to deal with the size issues and the size increase will be insignificant compared to existing size.
JIT would be another option, but it has its own challenges at that scale.

Overall I think there's enough existing use cases to consider compiling for all/most supported GPUs to be a practical solution, even for large projects. It does not mean it will work for everyone, but I think specifically for `libclc` compiling for all GPUs would be the right thing to do.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137154/new/

https://reviews.llvm.org/D137154