[llvm-dev] RFC: New function attribute HasInaccessibleState

Tue Dec 8 10:12:05 PST 2015

[sorry for the re-send; fixed a couple critical typos inline]

We used a similar thing in the phx compiler.  We called it "ExternalMemoryTag".  It was a name for "all the program state that doesn't have another name", and in phx (unlike LLVM) it was easy to define what "has another name" meant since we kept an explicit universe of names in our alias metadata.  The inferences were flipped the other way, though; being able to access "external memory" was the default assumption, and what the compiler had to prove/propagate was that some functions (whose definitions could be inspected) did NOT touch "external memory".

Trying to translate that idea to LLVM, I'm having trouble getting around the lack of an explicit universe of named storage locations.  Different alias analyses of course have their own ways of partitioning program storage, and "external memory" (in the sense I'm used to it) is the intersection of each analysis' notion of "other storage I'm not modeling precisely".  So e.g. in phx we'd start with a heap-insensitive analysis that would leave the entire heap in "external memory", but then running a heap-sensitive analysis would "chip away" at it, carving out a name for each allocation site and redefining "external memory" to exclude those.  The difficulty translating this to LLVM is that when you have different analyses all summarizing the same function definition, each one could tell you "I have specific names for all the things this function touches" or not, but there's no useful way to combine those results, since what you want to know is whether all accessed storage is covered by the union of all the specific names.  You'd need something along the lines of an API on each alias analysis that would look at an individual instruction and tell you whether it has specific names for all the storage accessed by that instruction; then the function could be summarized as "this doesn't touch anything we don't have a specific name for" if every instruction had at least one analysis claim to have a name for it.  Then when querying for aliasing between two calls, the set of returns could be expanded so that each analysis could report "may alias because they both might touch something I have a specific name for" as one possible answer, and "may alias because neither touches anything I have a specific name for (but both might touch something else)" as another possible answer -- then the calls don't alias if the function was summarized as "this doesn't touch anything we don't have a specific name for" and all the analyses report "may alias because neither touches anything I have a specific name for".

The other way would of course be to make one analysis powerful enough to catch the cases we're concerned with.  Would it be workable to have GlobalsAA look through the stores (and other mutating operations) in a function to see if they're:

 1. a store to a local alloca
 2. a store to a global defined in this compilation unit
 3. a store through a pointer parameter
 4. a call to a function that it has inferred is "ReadOnly-like" (or that is actually readonly)
 5. a call to a function that it has inferred is "ArgMemOnly-like" (or that is actually argmemonly)
 6. other

so that:
 - if it gets through a function and all modifications are of type 1 or 4, it could summarize that function as "ReadOnly-like"; it could report two calls to "ReadOnly-like" functions as not aliasing in their writes, and report calls to "ReadOnly-like" functions as not modifying any global in the current compilation unit (i.e. any global it can be asked about)
 - if it gets through a function and all modifications are of type 1, 3, 4, or 5, it could summarize that function as "ArgMemOnly-like"; it could report a global (in the current compilation unit) as not modified by an "ArgMemOnly-like" so long as it knows the global's address isn't passed as an argument to that function (which I hope is what its notion of "escaped" can tell it) ?

Then IIUC we could have it recognize some libcalls and treat printf as ArgMemOnly-like (so long as printf's definition is not in the current compilation unit), treat free as ReadOnly-like (so long as likewise), etc.

I'm probably mixing up details around how lib calls are analyzed and how GlobalsAA propagates information, but I think they key points I'm making are that we may be looking for attributes that would only mean anything to GlobalsAA, and that what they would mean to GlobalsAA may be the same things that "ReadOnly" and "ArgMemOnly" already mean to it.

Thanks
-Joseph

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal Finkel via llvm-dev
Sent: Friday, December 4, 2015 7:33 PM
To: Mehdi Amini <mehdi.amini at apple.com>
Cc: LLVM Dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: New function attribute HasInaccessibleState

----- Original Message -----
> From: "Mehdi Amini" <mehdi.amini at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Vaivaswatha Nagaraj" <vn at compilertree.com>, "LLVM Dev" 
> <llvm-dev at lists.llvm.org>
> Sent: Friday, December 4, 2015 12:47:00 PM
> Subject: Re: [llvm-dev] RFC: New function attribute 
> HasInaccessibleState
> 
> 
> > On Dec 4, 2015, at 10:33 AM, Hal Finkel via llvm-dev 
> > <llvm-dev at lists.llvm.org> wrote:
> > 
> > ----- Original Message -----
> >> From: "Vaivaswatha Nagaraj" <vn at compilertree.com>
> >> To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel"
> >> <hfinkel at anl.gov>
> >> Cc: "LLVM Dev" <llvm-dev at lists.llvm.org>
> >> Sent: Friday, December 4, 2015 12:28:03 PM
> >> Subject: Re: [llvm-dev] RFC: New function attribute 
> >> HasInaccessibleState
> >> 
> >> that would be an escaping global, and as far as I know is handled 
> >> separately in GlobalsAA (AnalyzeUsesOfPointer checks if a global is 
> >> used as operand to a function)
> >> 
> > 
> > More generally, I think this attribute is supposed to mean, "this 
> > function might access globals, but none of these globals are things 
> > you can name in the IR being optimized." You might, of course, pass 
> > in aliasing memory as a parameter, but that's a separate matter.
> 
> I’m not what "things you can name in the IR” mean exactly, would this 
> be equivalent to "none of these globals can alias with any memory 
> location accessible from the IR being optimized”?
> 
> To come back to what I phrased earlier, this effectively split the 
> state in two distinct parts, is this enough in all cases? Would there 
> be some need/benefit to model more partitions?

I agree this is a good question. I don't have any use cases right now where modeling multiple external states would be useful.

There might be a relationship to our long-standing problem of modeling errno, but that does not quite fit into this model, because errno is sometimes implemented as a global.

 -Hal

> 
> Thanks,
> 
> —
> Mehdi
> 
> > 
> >> 
> >> On December 4, 2015 11:47:20 PM GMT+05:30, James Molloy 
> >> <james at jamesmolloy.co.uk> wrote:
> >> 
> >> It is if one of the operands is or can alias a global ?
> >> 
> >> 
> >> On Fri, 4 Dec 2015 at 18:16, Vaivaswatha Nagaraj < 
> >> vn at compilertree.com > wrote:
> >> 
> >> 
> >> 
> >> writing into operands is not the same as writing to globals right?
> >> I
> >> added printf in the same category since we were discussing writing 
> >> to globals.
> >> 
> >> 
> >> 
> >> On December 4, 2015 11:34:10 PM GMT+05:30, James Molloy < 
> >> james at jamesmolloy.co.uk > wrote:
> >> 
> >> 
> >> Hi,
> >> 
> >> 
> >> I just want to reiterate: printf and friends do *not* fall into 
> >> this category as they can write to their operands (unless you parse 
> >> and check the format string for %n).
> >> 
> >> 
> >> James
> >> 
> >> 
> >> On Fri, 4 Dec 2015 at 17:53 Vaivaswatha Nagaraj via llvm-dev < 
> >> llvm-dev at lists.llvm.org > wrote:
> >> 
> >> 
> >> 
> >> 
> >> 
> >>> Most of the time you don't have the entire call graph information.
> >>> Imagine that you are developing a module that is a part of a 
> >>> larger project.
> >> 
> >> 
> >> I now understand the concern. It looks to me that we will need to 
> >> set the flag by default to all functions whose definitions aren't 
> >> available (external), and then propagate from there on. I don't see 
> >> any optimizations being inhibited by such a setting, so it should 
> >> be okay.
> >> 
> >> 
> >> 
> >>> I think we need to go back and look at the underlying use case (as 
> >>> I understand it): GlobalAA should be able to figure out that calls 
> >>> to malloc/free don't touch global variables visible to the 
> >>> optimizer.
> >>> How do we address this problem?
> >> 
> >> 
> >> Yes, this is the primary concern. Most libc functions (including 
> >> printf, malloc, free) fall into the same category.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> - Vaivaswatha
> >> 
> >> 
> >> 
> >> On Fri, Dec 4, 2015 at 11:12 PM, Hal Finkel < hfinkel at anl.gov >
> >> wrote:
> >> 
> >> 
> >> ----- Original Message -----
> >>> From: "Vaivaswatha Nagaraj via llvm-dev" < llvm-dev at lists.llvm.org
> >>>> 
> >>> To: "Krzysztof Parzyszek" < kparzysz at codeaurora.org >
> >>> Cc: "LLVM Dev" < llvm-dev at lists.llvm.org >
> >>> Sent: Friday, December 4, 2015 11:21:03 AM
> >>> Subject: Re: [llvm-dev] RFC: New function attribute 
> >>> HasInaccessibleState
> >> 
> >>>>> In the case of user-defined allocation functions, the 
> >>>>> definitions for those functions are available
> >> 
> >>>> Are they? probably not unless you're in an LTO build.
> >> 
> >>> Yes, I'm assuming an LTO build.
> >> 
> >> The concerns around LTO here, while legitimate, apply only to a 
> >> very-specific kind of LTO: An LTO which includes the definitions of 
> >> the libc. This is actually quite tricky to support, semantically, 
> >> and already breaks our malloc aliasing assumptions. There are many 
> >> legitimate uses of LLVM, both for statically-compiled code and for 
> >> JIT'd code, that depend on a visibility boundary between certain 
> >> core runtime services and the user code being compiled to provide 
> >> for effective optimization.
> >> 
> >> So, yes, this will break LTO when you include libc itself in the 
> >> optimization process. We already don't support this (we'd need, at 
> >> least, to adjust our malloc noalias assumptions, if not many other 
> >> things). I don't think this is a major concern.
> >> 
> >> I think we need to go back and look at the underlying use case (as 
> >> I understand it): GlobalAA should be able to figure out that calls 
> >> to malloc/free don't touch global variables visible to the 
> >> optimizer.
> >> How do we address this problem?
> >> 
> >> Thanks again,
> >> Hal
> >> 
> >> ...
> >> 
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fli
> >>> sts.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c0
> >>> 1%7cjotrem%40microsoft.com%7c606569a7772347cd741408d2fd0babf8%7c72
> >>> f988bf86f141af91ab2d7cd011db47%7c1&sdata=n61OMLL2IjHgbhxfk14QNboJi
> >>> MdHphjpB5DlXBIqKso%3d
> >> 
> >> --
> >> 
> >> 
> >> 
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> 
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flis
> >> ts.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%
> >> 7cjotrem%40microsoft.com%7c606569a7772347cd741408d2fd0babf8%7c72f98
> >> 8bf86f141af91ab2d7cd011db47%7c1&sdata=n61OMLL2IjHgbhxfk14QNboJiMdHp
> >> hjpB5DlXBIqKso%3d
> >> 
> >> 
> >> --
> >> Sent from my Android device with K-9 Mail. Please excuse my 
> >> brevity.
> >> --
> >> Sent from my Android device with K-9 Mail. Please excuse my 
> >> brevity.
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flist
> > s.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7c
> > jotrem%40microsoft.com%7c606569a7772347cd741408d2fd0babf8%7c72f988bf
> > 86f141af91ab2d7cd011db47%7c1&sdata=n61OMLL2IjHgbhxfk14QNboJiMdHphjpB
> > 5DlXBIqKso%3d
> 
> 

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c606569a7772347cd741408d2fd0babf8%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Gh14ReFy%2bwjHGJqP%2bV22kWRNOvd6S2WFJX3Cka%2bwRaw%3d
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c1ac7b4dd642a408033a408d2fff891eb%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=YsWMlFxWi2ZmJp%2fb%2fcxScUqLW7nOy43IyDC4Zfsut9U%3d