[LLVMdev] Request for comments: TBAA on call

Filip Pizlo fpizlo at apple.com
Thu Oct 10 21:30:07 PDT 2013



> On Oct 10, 2013, at 8:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> 
> 
> 
>> On Thu, Oct 10, 2013 at 10:34 AM, Chris Lattner <clattner at apple.com> wrote:
>>> On Oct 7, 2013, at 11:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>> 
>>>> Hence it’s more meaningful to reason about TBAA in terms of its semantics rather than hypothesizing about how and why someone would produce it.
>>> 
>>> That would be great, but it's not what the langref says, nor does it match up with the name of the thing you are creating, nor does it necessarily have optimal semantics, nor would it be great for future producers or the ability to do other analyses in those producers.
>> 
>> Hey Daniel,
>> 
>> Can you be more specific about your concerns here?  It's true that we describe the TBAA nodes in terms of expression C-like type constraints, but the intention of the design has always been for it to be more general.  
>> 
>> Specifically, partitioning the heap for use-cases like what Phil is doing with Javascript was factored into the original design.  We have even talked about adding type tags to represent C++ vtables (for example) since language loads and stores can't touch them (not even through char*).
>> 
>> The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem.
> 
> 
> My concerns are simply that whether designed this way or not, it ends up fairly inconsistent.
> 
> For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)?

Are you worried about clang doing this, or are you worried about WebKit doing this? Or are you worried about some other front end doing it?

WebKit won't do it because we only have abstract heaps. We have no notion of types that originate from the source language. 

I've also considered - as a thought experiment - front ends for other languages. For example, in Java you would probably use TBAA just to express the space of fields (I.e. Field name at Class name at ClassLoader). 

Broadly I believe that using TBAA to literally express the types of the source language is more of a C-ism than a general use case. I think that higher level type safe languages have a more-or-less obvious mapping to abstract heaps, and they tend to be *mostly* just disjoint sets.

This mapping usually doesn't involve describing the source language' style hierarchy, as much as it involves describing the proofs about aliasing that arise from that language's type system (and whatever analyses the frontend is able to perform).

TBAA's ability to express a hierarchy, as opposed to just disjoint sets, is kind of an escape mechanism for the cases where disjoint sets are too constraining. 

> 
> Generate a split tree of some sort, one side of which represents TBAA info, and the other side which represents partitioned abstract heaps?[1]
> It seems like that would be awfully confusing.  

Let's take the Java example, since that's sort of a great example of a practical sound type system. Heck, JS VMs try to *at best* infer types that are Java-esque. 

I've given an example above of abstract heaps that I would construct using TBAA. In your split tree world, what would the *other* TBAA data be?  Why would you want to use TBAA for anything other than field names?

(And yes I understand you'd use TBAA differently for array element types - but those would be disjoint to field types anyway. And having a hierarchy there is slightly useful.)

I think it would be useful to get an example of what you're worried about and how it would manifest in IR and attributes generated from some concrete frontend. 

> 
> However, it would now be necessary in order to use the new tbaa.read/tbaa.write metadata,, since they will only reference tbaa tags.  But they only make a lot of sense on tbaa tags that point to partitioned heaps like filip's, so if you did want to actually to make use of them, you now have to put both the type info and the heap info in the same tree.
> 
> You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap.    In Filip's scheme, there is no way to represent this. 
> 
> Because of this, the only thing I essentially asked Filip to do was not place it in the same exact tree as we are currently putting type info into.
> 
> Then your heap.read/heap.write metadata works with the heap tree (and you annotate loads/stores with your heap attributes), and your tbaa attributes work on the tbaa tree.  You can tag a load/store with both a heap tag and a tbaa tag, and disambiguate based on both of them.
> 
> Now, if the consensus is this is still a good idea, great. My suggestion would then be to update langref, rename the attributes, etc, to make this all more clear.
> 
> --Dan
> [1]  The other option of trying to generate some fake set of heaps that accurately represent the conflicts in both is, well, difficult :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/51b86aed/attachment.html>


More information about the llvm-dev mailing list