[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

Wed Mar 14 03:38:53 PDT 2018

SimeonEhrig marked an inline comment as done.
SimeonEhrig added inline comments.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:281

+  // get name from the module to generate unique ctor name for every module
+  SmallString<128> ModuleName
----------------
rjmccall wrote:
> Please explain in the comment *why* you're doing this.  It's just for debugging, right?  So that it's known which object file the constructor function comes from.
The motivation is the same at this review: https://reviews.llvm.org/D34059
We try to enable incremental compiling of cuda runtime code, so we need unique ctor/dtor names, to handle the cuda device code over different modules. 

================
Comment at: lib/CodeGen/CGCUDANV.cpp:281

+  // get name from the module to generate unique ctor name for every module
+  SmallString<128> ModuleName
----------------
tra wrote:
> SimeonEhrig wrote:
> > rjmccall wrote:
> > > Please explain in the comment *why* you're doing this.  It's just for debugging, right?  So that it's known which object file the constructor function comes from.
> > The motivation is the same at this review: https://reviews.llvm.org/D34059
> > We try to enable incremental compiling of cuda runtime code, so we need unique ctor/dtor names, to handle the cuda device code over different modules. 
> I'm also interested in in the motivation for this change.
> 
> Also, if the goal is to have an unique module identifier, would compiling two different files with the same name be a problem? If the goal is to help identifying a module, this may be OK, if not ideal. If you really need to have unique name, then you may need to do something more elaborate. NVCC appears to use some random number (or hash of something?) for that.
We need this modification for our C++-interpreter Cling, which we want to expand to interpret CUDA runtime code. Effective, it's a jit, which read in line by line the program code. Every line get his own llvm::Module. The Interpreter works with incremental and lazy compilation. Because the lazy compilation, we needs this modification. In the CUDA mode, clang generates  for every module an _ _cuda_module_ctor and _ _cuda_module_dtor, if the compiler was started with a path to a fatbinary file. But the ctor is also depend on the source code, which will translate to llvm IR in the module. For Example, if a _ _global_ _ kernel will defined, the CodeGen add the function call __cuda_register_globals() to the ctor. But the lazy compilations prevents, that we can translate a function, which is already translate. Without the modification, the interpreter things, that the ctor is always same and use the first translation of the function, which was generate. Therefore, it is impossible to add new kernels. 

================
Comment at: unittests/CodeGen/IncrementalProcessingTest.cpp:176-178
+
+// In CUDA incremental processing, a CUDA ctor or dtor will be generated for 
+// every statement if a fatbinary file exists.
----------------
tra wrote:
> I don't understand the comment. What is 'CUDA incremental processing' and what exactly is meant by 'statement' here? I'd appreciate if you could give me more details. My understanding is that ctor/dtor are generated once per TU. I suspect "incremental processing" may change that, but I have no idea what exactly does it do.
A CUDA ctor/dtor will generates for every llvm::module. The TU can also composed of many modules. In our interpreter, we add new code to our AST with new modules at runtime. 
The ctor/dtor generation is depend on the fatbinary code. The CodeGen checks, if a path to a fatbinary file is set. If it is, it generates an ctor with at least a __cudaRegisterFatBinary() function call. So, the generation is independent of the source code in the module and we can use every statement. A statement can be an expression, a declaration, a definition and so one.   

Repository:
  rC Clang

https://reviews.llvm.org/D44435