[llvm-dev] A sufficient test for GV unification via unnamed_addr ?

Mon Nov 9 11:42:46 PST 2015

Does anyone know of a good test to estimate whether or not a pair of
GlobalVariables could potentially be unified due to the "unnamed_addr"
flag?  This is for an out-of-source AA project, and I need to err on the
side of assuming unification is possible.  But for precision reasons, I'd
really like to avoid false positives.

I'm having trouble understanding just how equivalent two GV's initalizers
must be for the linker to be allowed to unify them.  Here's what the docs
<http://llvm.org/docs/LangRef.html#global-variables> say:

Global variables can be marked with unnamed_addr which indicates that the
> address is not significant, only the content. Constants marked like this
> can be merged with other constants if they have the same initializer. Note
> that a constant with significant address *can* be merged with a
> unnamed_addr constant, the result being a constant whose address is
> significant.

>From that wording, I'm having trouble figuring out just how similar two
GVs' initializers must be before the linker is considered free to unify the
GVs' storage.  I've got a few theories, but would appreciate any
suggestions.  I'm hoping for an overall test which is both precise, and not
too computationally intensive on a program with very many globals.

   - Theory 1: At the LLVM C++ API level, GV1 and GV2 can only be unified
   if their initializer is the very same API object.  I.e.,
   "GV1->getInitializer() == GV2->getInitializer()."  For this to be a
   sufficient test, I think there would need to be some strong promises by the
   C++ API implementation regarding using a single object to represent equal
   or equivalent initial values.

   - Theory 2: The linker requires that the initializers for GV1 and GV2
   are *syntactically* equivalent compile-time constants, but their
   initializers might not be described using the same llvm::Constant object.
   For example:

@GV1 = private unnamed_addr constant [4 x i8] c"Foo\00", align 1
@GV2 = private constant [4 x i8] c"Foo\00", align 1

   - Theory 3: Type-safe compile-time-constant semantic equivalence, but
   unlike Theory 2, allows for syntactically alternative representations.  All
   that matters is that two initializer objects are equivalently typed,
   equivalently shaped, and ultimately have equivalent constituent scalar
   values.

@GV1 = unnamed_addr constant [4 x i32] zeroinitializer, align 16
@GV2 = constant [4 x i32] [i32 0, i32 0, i32 0, i32 0], align 16

   - Theory 4: Arbitrary compile-time-constant bit-pattern equivalence.
   For example:

@X = constant i32 -1, align 4
@Y = unnamed_addr constant [4 x i8] c"\FF\FF\FF\FF", align 1

Note: I'm using LLVM's 3.7's C++ API.  The target program will ultimately
be linked on modern x86-64 Linux system, *probably* using Gnu ld.  The
program is compiled with clang or clang++, and in some cases I've used
"llvm-link" to combine the target-program bitcode files into a single
module.  My analysis only considers a single bitcode file in isolation.

Thanks,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151109/8f1565eb/attachment.html>