[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

Wed Mar 14 15:39:19 PDT 2018

rjmccall added inline comments.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:281

+  // get name from the module to generate unique ctor name for every module
+  SmallString<128> ModuleName
----------------
v.g.vassilev wrote:
> rjmccall wrote:
> > SimeonEhrig wrote:
> > > tra wrote:
> > > > SimeonEhrig wrote:
> > > > > rjmccall wrote:
> > > > > > Please explain in the comment *why* you're doing this.  It's just for debugging, right?  So that it's known which object file the constructor function comes from.
> > > > > The motivation is the same at this review: https://reviews.llvm.org/D34059
> > > > > We try to enable incremental compiling of cuda runtime code, so we need unique ctor/dtor names, to handle the cuda device code over different modules. 
> > > > I'm also interested in in the motivation for this change.
> > > > 
> > > > Also, if the goal is to have an unique module identifier, would compiling two different files with the same name be a problem? If the goal is to help identifying a module, this may be OK, if not ideal. If you really need to have unique name, then you may need to do something more elaborate. NVCC appears to use some random number (or hash of something?) for that.
> > > We need this modification for our C++-interpreter Cling, which we want to expand to interpret CUDA runtime code. Effective, it's a jit, which read in line by line the program code. Every line get his own llvm::Module. The Interpreter works with incremental and lazy compilation. Because the lazy compilation, we needs this modification. In the CUDA mode, clang generates  for every module an _ _cuda_module_ctor and _ _cuda_module_dtor, if the compiler was started with a path to a fatbinary file. But the ctor is also depend on the source code, which will translate to llvm IR in the module. For Example, if a _ _global_ _ kernel will defined, the CodeGen add the function call __cuda_register_globals() to the ctor. But the lazy compilations prevents, that we can translate a function, which is already translate. Without the modification, the interpreter things, that the ctor is always same and use the first translation of the function, which was generate. Therefore, it is impossible to add new kernels. 
> > I'm not asking you to explain to *me* why you're doing this, I'm asking you to explain *in the comment* why you're doing this.
> > 
> > That said, we should discuss this.  It sounds like you need the function to have a unique name because otherwise you're seeing inter-module conflicts between incremental slices.  Since the function is emitted with internal linkage, I assume that those conflicts must be because you're promoting internal linkage to external in order to make incremental processing able to link to declarations from an earlier slice of the translation unit.  I really think that a better solution would be to change how we assign LLVM linkage to static global declarations in IRGen — basically, recognizing the difference between internal linkage (where different parts of the translation unit can still refer to the same entity) and no linkage at all (where they cannot).  We could then continue to emit truly private entities, like global ctors/dtors, lambda bodies, block functions, and so on, with internal/private linkage without worrying about how your pass will mess up the linkage later.
> @rjmccall, I agree. What's the best way to discuss this? My irc handle is vvassilev and I am in CET timezone. I will be online for approx. 2 hours from now on.
Sorry, I seem to have missed you for today.  I think for the next day it would be best to just trade e-mail, because I have errands to run in the morning and early afternoon tomorrow.

I think the major piece of the plan would be to make things like the computation of GVALinkage in ASTContext.cpp consider your incremental mode.  Currently, `basicGVALinkageForFunction` and `basicGVALinkageForVariable` only consider `isExternallyVisible()`, which conflates no-linkage and internal-linkage; you would need to map internal linkage to `GVA_StrongExternal` when processing in incremental mode.

That alone might not be sufficient because there are things with no formal linkage that still do need to be shared across incremental slices; for example, anonymous structures at global scope.  To get those things right, we will need to split NoLinkage by adding an InternalNoLinkage for declarations that formally have no linkage but in reality are visible throughout the translation unit; but that seems quite feasible.

We should get Richard's thoughts on that plan first, though.

Repository:
  rC Clang

https://reviews.llvm.org/D44435