[cfe-dev] How to generate Unique Module identifier

Xiangling Liao via cfe-dev cfe-dev at lists.llvm.org
Fri May 29 12:15:28 PDT 2020

Hi All,

There have been recent discussions about how to generate unique module
identifiers which can be embedded in AIX static init function names.

On AIX, static init functions are sinit/sterm pairs looking like this:

*__sinit<priority #>_<unique module identifier>__sterm<priority #>_<unique
module identifier>*

There is one sinit/sterm pair per priority number for each module.

The AIX linker collects static init functions simply based on their name.
So we need to guarantee that each module has its own unique sinit/sterm
pairs. To achieve that, we need a unique module identifier which will be
used as a part of static init function name as suffix.

Our several thoughts about this so far are as follows:

*1. `getUniqueModuleId` function to generate unique module identifier*
*https://llvm.org/doxygen/ModuleUtils_8cpp_source.html#l00255 *

*“Produce unique identifier for a module by taking the MD5 sum of the names
of the module's strong external symbols. However, if the module has no
strong external symbols (such a module may still have a semantic effect if
it performs global initialization), we cannot produce a unique identifier
for this module, so we return the empty string.”*

Issues with this `getUniqueModuleId` function are:
(1)Since this function does not take either `Internal linkage` or
`WeakOnceODR linkage` global variables, so it is not able to return a
string for the following cases:

*class test {public:    test();    ~test();};static test t;  //Internal


*extern "C" int puts(const char *);template <typename = void>struct A {
 A() { puts("hello\n"); }  ~A() { puts("bye\n"); }  static A
instance;};template <typename T> A<T> A<T>::instance;template A<>
A<>::instance;   //WeakOnceODR linkage*

(2) Even if we add our own version `getUniqueModuleId` to care about above
linkage types, the biggest issue here is content-based hashing won't work
for the identical-content internal linkage case.

*2. Source filename string as the module* *identifier*
The `source filename` string is set to the original module identifier,
which will be the name of the compiled source file when compiling from
source through the clang front end. [*
<https://releases.llvm.org/10.0.0/docs/LangRef.html#source-filename>* ]

That means if we have multiple objects compiled with the same command-line
source file path, we have same module identifiers. The static init
functions are not guaranteed to be unique.

Also, there's *Unique Names for Functions with Internal Linkage*
patch, whose solution does not guarantee uniqueness either.

*3. Using the information around the compilation process itself*
Though using the information around the compilation process itself (PID,
timestamp) can give us unique module identifiers, but it could be
problematic for reproducibility.

*4. source file full path + OutputFile name following -o  option*
Another thing hopeful is to use* the source file full path plus the
OutputFile name following -o option* as something to hash on or as a suffix
for static init functions on AIX.

We didn’t find any precedent in LLVM to do so so far. And it requires us to
pass -o ’s OutputFile name from `FrontendOpts` to `llvm::Module` like we
pass each `Input` from `FrontendOpts.Inputs` to `llvm::Module` as
*https://llvm.org/doxygen/Module_8cpp_source.html#l00073 *

Any thoughts about what to hash on or encode into the unique ID we need?

Please let me know if there are any questions as well. Your feedback is


Xiangling Liao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200529/5fb0883c/attachment.html>

More information about the cfe-dev mailing list