[LLVMdev] Request for comments: TBAA on call

Fri Oct 11 04:27:49 PDT 2013

> On Oct 11, 2013, at 12:16 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> 
> 
> 
>> On Thu, Oct 10, 2013 at 9:34 PM, Chris Lattner <clattner at apple.com> wrote:
>> On Oct 10, 2013, at 8:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>> The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem.
>>> 
>>> 
>>> My concerns are simply that whether designed this way or not, it ends up fairly inconsistent.
>>> 
>>> For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)?
>> 
>> There are two possible answers here, depending on the constraints:
>> 
>> 1. The frontend author could unify them into one grand tree, like struct field TBAA does.
> 
> I would be impressed if you could unify the output of something like andersens, and something like the current nested TBAA structure, without massive loss of precision.
>  
>> 2. The frontend author could model them as two separate TBAA trees, which the TBAA machinery in LLVM handles conservatively.
>> 
>> The conservative handling of different TBAA trees is critical, because you can LTO (for example) Javascript into C++ code in principle, each using their own TBAA structure.  LLVM is already well set for this, and it isn't an accident :-)
>> 
>> 
>>> You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap.    In Filip's scheme, there is no way to represent this. 
>> 
>> I'm not sure what you mean.
> 
> I mean it's not possible, in this scheme, to specify both to be used for disambiguation.
> 
> Right now, you get *one* tbaa tag on the load.
> So you have to choose which tree you use if you want if you choose option #2 above.
> 
> You always have to choose one or the other, or else the lose the disambiguation.
> 
> 
>  
>>  The compiler will handle this conservatively.
> 
> Only if you drop the tags/disambiguation capability from most things, or i'm seriously confused.
> 
> let's take the following not-quite llvm-ir, but close enough.
> 
> 
> !tbaa tree
> 
> 0 -> everything
> 
> 1 (parent 0) -> int

This. 

It's a C-ism, and you'd only have a TBAA tree for it if you lacked any other information.

Hence if you had abstract heaps (such as ones you get from field names in Java-esque type systems, or tuple entry IDs in ML-esque type systems, etc) you wouldn't be sweating about reconciling that with a tree that talked about types. 

(As an aside, I'm sort of bothered by the implication that abstract heaps aren't types. A type is nothing more than a set of values. For a pointer that means an abstract set of heap locations. So, my use of TBAA corresponds exactly to "types" in *some* language.)

> 2 (parent 0 -> heap a
> 
> load <whatever> , !tbaa 1
> 
> call !tbaa.read 2
> 
> 
> Note: You can't specify tbaa.read 2, 1 (at least as proposed).
> the current machinery will say the call and the load never conflict.
> 
> It would seem you have to either make heap tags also children of appropriate type tags, or only ever use one type of tree
> What design for a split tree would work here?
> 
> I also can't think of a combined tree that would work without massive precision loss.
>  
>>  If you have two different schemas existing in the same application (e.g. due to LTO or due to a language implementing two different non-unified models for some weird reason) then the compiler just doesn't draw any aliasing implications from references using two different schemas.
> Which is of course, bad.
> You should in fact, be able to draw implications from both.  If the design essentially only allows one to draw from one, that seems like a pretty big flaw.

I agree that TBAA would be more powerful if you could say something like:

load blah !tbaa.list !42

42 = (!tbaa !1, !tbaa !2, ...

But it would probably also be more expensive to process such richer information, particularly if you had it on each load. I'd be inclined to bet that the current TBAA, even with its limitations, is a sensible compromise.

OTOH, being able to list types that are affected by a call would be great. 

>  
> 
>> It is possible in principle to allow a load (for example) to have an arbitrary number of TBAA tags on it, which would solve this (allowing a single load to participate in multiple non-overlapping schemas) but I don't think it is worth the complexity at all.
> 
> Okay then, we'll agree to disagree. :)
> 
> This actually happens a lot of the time.  Restricting llvm to essentially choosing one tree per load to use for disambiguation, as this scheme will do, when it's not much more work to do better, seems shortsighted to me.

If someone hacked up a !tbaa.list, I would definitely try it out!

> 
>> 
>> -Chris
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131011/98ff1715/attachment.html>