[LLVMdev] [Propose] Add address-taken bit to GlobalVariable for disambiguation purpose

Tue Oct 29 17:02:05 PDT 2013

----- Original Message -----
> Hi, There:
> 
>    I'd like to add bit, called "addr_not_taken", to GlobalVariable in
> order to
> indicate if a GlobalVariable doesn't has its address taken or not.
> 
> 1.The motivation
> ===============
>     The motivation can be explained by the following example. In this
> example,
> variable x does not have its address taken, therefore, it cannot be
> indirectly
> access. So, we can prove "x" and "*p" don't overlap.
>    -----------------
>    static int x;
>    foo(int *p) {
>      x = 1;
>      *p = 2;   // do not modify x
>      x = x + 1;
>    }
>    ---------------
> 
>    The semantic of llvm::GlobalVariable::addr_not_taken is:
>     1). Of course, the global variable in question doesn't have its
> address taken, and
>     2). the global variable is not volatile, and

Why are you also imposing this restriction? I'm also not quite sure what it means, in the context of LLVM, where only loads and stores can be volatile.

>     3). Compiler need to see all the references of the variables.
> 
>    The address-not-taken can be used to disambiguate memory
>    operations
> between:
>      o. direct load/store to a global variable which does not have
>      its
> address taken, and
>      o. any arbitrary indirect memory operations.
> 
>    The 2) is to ensure compiler will not set a volatile variable
> "addr-not-taken", even
> if all its reference are direct accesses.
> 
>    The 3) does not necessarily imply the global variable has internal
> linkage type, see
> following scenario S2 and S3.
> 
>    Since llvm::GlobalVariable has plenty of vacant bits (for flags),
> this bit will
> not increase its size.
> 
> 
> 2. Scenarios where llvm::GlobalVariable::addr_not_taken is useful
> =================================================================
>   NOTE: As you can see from the S2 and S3, addr_not_taken can serve
>   as
> additional info
>      of the input IR. In that sense, we should view "addr_not_taken"
>      as
> an attribute of
>      a variable (akin to GNU's function-attribute).
> 
>   S1) C-thinking programs likely have lots of global variables, and
>   most
> of them
>       are directly referenced.
> 
>       It would be more useful in LTO mode as in this mode more global
> variables
>       will be recognized as "address-not-taken".
> 
>   S2) Compiler may generate some global variables for some special
> purpose or for some
>       special languages. These global variables are normally directly
> accessed. Compiler
>       can mark the variable "addr-not-taken" to avoid uncessary
>       alias.
> We would otherwise
>       need to introduce a special meta-data for this purpose.
> 
>       Go back to semantic 3). At the moment a compiler generate such
> variables,
>       it does not physically "see" all the reference of these vars,
> however it
>       is pretty sure how these these variable will be accessed.
> 
>       Note that in this scenario, the references of these
> compiler-created global
>     variables may occur in multiple modules. Therefore, they don't
>     have
> "internal"
>     linkage.
> 
>    S3) Consider this snippet:
>    S3) Consider this snippet:
>       ---------------------
>       static int xyz; // address-not-taken
>       foo() {
>          access xyz;
>       }
> 
>       bar() { access xyz; }
>       ----------------------
> 
>       For some reasons (say, triage problematic functions), we many
> separate foo()
>       and bar() into different source files. For this purpose, we
>       have
> to get
>       rid of "static" from "xyz", which may bring new problem once in
>       a
> while,
>       for instance, bug disappear, performance degrade.
> 
>       Introducing address-not-taken will alleviate such troubles.
> 
> 3. Q&A
> ======
>    Q1: If we have nice points-to analysis, do we still need what you
> propose today.
> --------------------------------------------------------------------------------
>    A1: For scenario S2 and S3, points-to analysis will not help. I
>    agree
> S2 and S3
> are not very important.
> 
>    For S1, sure, it does not need to be a "nice" analyzer, even a
> "stupid" anlayzer
> should work for this end. The problem is that points-to analyses are
> normally very
> expensive. They are normally invoked once the in the entire compiler
> pipeline,
> before that point, we cannot disambiguate global-variable and
> indirection
> mem-ops.
> 
>    On the other hand, compiler can afford re-analyzing address-taken
>    for
> particular global variable or all global variables again and again
> after
> some optimizations (say, DCE). We certainly cannot affording to
> invoke
> the prohibitively expensive points-to analysis for so many times.
> 
> 
>    Q2: Does addr-not-taken imply "unnamed_addr" (Nick asked this
> question the other day)
> -------------------------------------------------------------------------------------
>    A2: No. Consider S2 and S3.
> 
>      For S1, I don't know for sure if variable name matter or not if
> debug-info is
>    turned on.
> 
>    Q3: Can we save the not-addr-taken to meta-data?
>    -------------------------------------------------
>    A3: Meta-data, by its very nature, is part of IR. It is pretty
>    slow
> to access.
>        We can view addr-not-taken as an attribute of the globalvar
>        (just
> like
>        function as attributes like readonly/malloc/....).
> 
>          I'm aware of other compilers whose symbol-tables have imilar
>          bits
>        for all kind of symbols (funcitons and data at level).
> 
>    Q4: Can we save the not-addr-taken to an analysis pass
>    -----------------------------------------------------
>    A4: If we implement this way:
>        o. we have add a statement to most, if not all, passes to
>        claim it
>           preserve this addr-taken analysis result, which is bit
>           anonying.
> 
>        o. The address-taken information collected before LTO (aka
>        pre-IPO),
>           cannot be fed to LTO, unless we re-analyze them from ground
>           up.

So is the idea that lib/Analysis/IPA/GlobalsModRef.cpp will set this bit when it runs?

 -Hal

> 
>    Q5: How difficult is it to maintain this flag
>    -----------------------------------------------------
>    A5: It is almost maintenance free, in the sense that we almost
>    never
>      come across a situation when an optimizer take the address of
>        the globalvar which is marked addr-not-taken.
> 
>        It would be nice if we can afford re-analyze addr-taken over
>        and
>       over again to make sure the information is precise. However,
>     we don't have to do that --- while the information is imprecise,
>     it is conservatively correct.
> 
>     Q6: Analyzing addr-taken is inexpensive. Why not just
>     disambiguate by
>         analyzing addr-taken on the fly during AA phase (Nick asked
>         the
> question
>         other day)?
>     -----------------------------------------------------
>     A6: While analyzing addr-taken for data is more expensive
>         than analyzing addr-takne for funciton, I think it is still
> relatively
>         inexpensive.
> 
>         That said, I don't think it is still inexpensive if we
> re-analyze the
>      addr-taken again and again *ON THE FLY".  It is really difficult
>      to
>      predict compile-time impact. You never know how many global
>      variables
>      in a program, and you never know how extensive they are used.
>      Poorly-written program tends to have lots of global-vars.
> 
> 
> Thanks
> Shuxin
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory