<html><body><p>Hi Justin,<br><br>Thanks for your response.<br><br>I am using a mix of our OpenMP nvptx toolchain for OpenMP-gpu programs and Clang-Cuda for the OpenMP runtime that we've written in Cuda. This may be the source of some of your surprises.<br><br>I translate the Cuda code to LLVM IR and pull it into the user's GPU program (with -<font face="Menlo-Regular">mlink-cuda-bitcode</font>, similar to how you pull in libdevice.compute.bc). We then use our toolchain to build relocatable objects with ptxas. I'll be happy to talk more about our use case and how we can make the improvements you suggest.<br><br><tt>> Given that "extern __shared__" means "get me a pointer to the<br>> dynamically-allocated shared memory for this kernel," using a<br>> non-array / non-pointer type would be...odd?<br>> </tt><br><br><tt>I believe the difference is whether the cuda code is being compiled in whole-program or separate compilation modes. The following section covers the case I described for separate compilation mode, which is what I'm doing:</tt><br><a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-qualifiers"><tt>https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-qualifiers</tt></a><br><br><tt>"When compiling in the separate compilation mode (see the nvcc user manual for a description of this mode), __device__, __shared__, and __constant__ variables can be defined as external using the extern keyword. nvlink will generate an error when it cannot find a definition for an external variable (unless it is a dynamically allocated __shared__ variable)."</tt><br><br><tt>Can we add a flag in Clang-Cuda to indicate separate compilation mode?</tt><br><tt><br>Could you point me to patches/code that I can look at to understand the implications of separate compilation? What LLVM optimizations benefit from whole-program compilation mode? What, if anything, breaks if I use it for separate compilation?</tt><br><tt><br>Thanks in advance for your time.</tt><br><br><tt>Arpith</tt><BR>
</body></html>