[llvm-dev] RFC: Resolving TBAA issues

Thu Aug 17 17:20:36 PDT 2017

On Thu, Aug 17, 2017 at 4:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 08/17/2017 04:49 PM, Daniel Berlin via llvm-dev wrote:
>
>
>>
>> For two given access sequences we can determine if the accessed
>> objects are allowed to overlap by the rules of the input
>> language.
>
>
> Sadly, this is where this becomes "unlikely to want to use to replace
> TBAA", at least for me. It may still be a thing we want anyway.
>
>
> The rules proposed by Ivan for handling C/C++ seem pretty generic.
>

Uh?

"

  [X...] is allowed to overlap with [S1..., X..., S2...]
  and the most generic access sequence is [X...].

  [X1..., X2...] is allowed to overlap with [S1..., X1...]
  with the most generic access sequence to be [X1...].

  [X1..., U, U::m1, X2...] is allowed to overlap with
  [S1..., X1..., U, U::m2, S2...]
  for a union U of an unknown effective type, provided m1 != m2
  and the most generic access sequence is [X1..., U].

If neither of the given sequences contains the leading access
type of the other, then they are allowed to overlap if the
leading access type of one sequence is a direct or indirect field
of the final access type of the other sequence and then the most
generic access sequence consists of a single element, which is
that final access type.

For the purpose of determining whether one type is a direct or
indirect member of another type unions are considered to have no
members as accesses to members of unions are only allowed to
overlap if they have the base union object explicitly specified."

This sounds super specific to me.  It talks about unions, and how they
behave in C/C++, etc.
It has initial sequence rules, etc.
These would make no sense for "Rust", "Go", or "Haskell" at all.

We generally explain our current TBAA rules by saying that they're generic
> but motivated by C/C++ rules.
>

We do say that but that's not really what our implementation does in any
way.

> I could say the same thing about this proposed system with its proposed
> rules.
>
It is, IMHO, nowhere near as generic.

>
>

> This scheme is really an encoding of C/C++ TBAA info so it can be read by
> LLVM and requires that *LLVM* have some set of rules that it enforces about
> that scheme.
> But that scheme is still very language specific in how it is used.
> GCC still has something in between this and LLVM, where the language rules
> are a bit encoded (but not as much as you have).
>
> We (and gcc) have deliberately avoided such schemes, in favor of
> transforming the info into abstract set trees that then tag loads and
> stores.
>
> The encoding of "struct path" tbaa, is just a way of trading space vs time
> in that encoding.  We trade walking time for the space of transitive
> closure, etc.
>
>
> If the provided statistic of 15% holds up, maybe which way we go in this
> trade-off space doesn't matter much?
>

Sure, that part i'm indifferent about.

>
>
>
> None of the TBAA in LLVM really has any *real* relation to the original
> type system rules, and that is deliberate.  I've argued for years that
> calling it "TBAA" just badly confuses people, and i believe here is a good
> example :)
>
> So i don't think what you've written can be used to replace TBAA.
>
>
> So *if* we just take the proposed rules for C/C++ as the rules for the
> scheme in general, I'm not sure this is true.
>

Okay, let me try to rephrase a little more precisely:
It is almost certainly true that we *could* make *any* particular lowering
work, at different costs to the front ends and middle ends.  IE i totally
agree that the power of all of these systems (with small extensions) are
equal.  and also, for the record, i certainly think the rules/scheme
described here would almost certainly produce better optimization results
than what is currently implemented.

*But*
I don't see how you use this as a general scheme without ending up forcing
the other frontends to lower things to look as if it was a C/C++ based type
system.
This is definitely *not* true today.

They only have to lower things to look like a tree of sets.  Having been
down this path before, it's definitely easier to lower things to look like
trees of sets than like groups of C/C++ structs/bitfields/etc that follow
certain rules.

This is one of the reasons I'm honestly having a lot of trouble seeing how
this is is definitively better than what IBM's proposal was, which seemed
both more generic and incremental, and handled most concerns.

The only practical difference i see is that this just changes the scheme
from one where the language rules are mostly in the front ends to one where
the language rules are also partially in llvm.  To me that isn't
necessarily better, just different (heck, i'm even more used to that system
because it's a push closer to gcc :P)

So overall I don't see one as particularly better than the other.
You seem to.
So i'd really like to understand your perspective on this in terms of the
pro/con that you are seeing.

I mean, i'm  really not even opposed (though others may be) in principle to
saying "hey, let's just have a set of language specific tbaa passes with
semi-common metadata".
I'd just like to understand why we would change tradeoffs, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/478f35fa/attachment-0001.html>