[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver
cfe-dev at lists.llvm.org
Sun Mar 6 21:29:50 PST 2016
Phone reply so some formatting may get messed up.
Internally we (pathscale) use unified objects with symbol name mangling for devise sections. The only complication to this is the assembler/runtime needs to know what to do when hitting these sections. I personally really don't like stuffing all the device code in a data section. It seems very hacky but I'd agree it's way more friendly than multiple objects. In my perfect world the unoptimized, optimized and even offload version is just name mangling. (something not too far away with how glibc handles optimized versions of a function. )
The way llvm would handle sse4 symbol vs avx512 symbol overlaps with this quite a bit.
4. Store the device code in special sections of the host object file. This seems the most build-system friendly, although perhaps the most complicated to implementation on our end. Also, as has been pointed out, is also the technique nvcc uses.
All things considered, I think that I'd prefer (4). If we're picking an option to minimize build-system changes, which I fully support, picking the option with the smallest chance of incompatibilities seems optimal. There is also other (prior) art here, and we should find out how GCC is handling this in GCC 6 for OpenACC and/or OpenMP 4 (https://gcc.gnu.org/wiki/OpenACC). Also, we can check on PGI and/or Pathscale (for OpenACC, OpenHMPP, etc.), in addition to any relevant details of what nvcc does here.
More information about the cfe-dev