[llvm-dev] Ambiguity in !tbaa metadata?

Tue Nov 1 16:40:24 PDT 2016

> On Nov 1, 2016, at 10:52 AM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> I was trying to add some stronger assertions in the verifier around
> !tbaa metadata when I ran into an ambiguity: I think the encoding of
> the metadata nodes are such that a given node can be interpreted as
> either a struct type node or a scalar tbaa node.  I'd like a sanity
> check before I try to fix or work around this.
> 
> Consider some IR that I got from running clang over a small C++
> program:
> 
> ```
> define void @foo() {
>  ...
>  load ..., !tbaa !2
>  load ..., !tbaa !7
>  load ..., !tbaa !10
>  ...
> }
> 
> !2 = !{!3, !5, i64 0}
> !3 = !{!"T0", !4, i64 0}
> !4 = !{!"T1", !5, i64 0}
> !5 = !{!"T2", !6, i64 0}
> !6 = !{!"Root"}
> !7 = !{!8, !9, i64 0}
> !8 = !{!"T3", !9, i64 0}
> !9 = !{!"T4", !5, i64 0}
> !10 = !{!9, !9, i64 0}
> ```
> 
> I've erased the actual string names to make the ambiguity more
> obvious.
> 
> Here !2 and !7 are both struct tag nodes.  This means that !5 and !9
> are both scalar type nodes and !3 and !8 are struct type nodes[1].

For struct-path-aware TBAA, the scalar type node has name, parent node, and offset (optional).
This simplifies implementation by making formats of struct type nodes and scalar type nodes uniform.

Sorry that the comments are confusing.
For old scalar TBAA, the scalar node has name, parent node, and “constant”.

Why do you need to tell whether it is a struct type node or a scalar type node?

> 
> However, once we get to the first field of !3, !4 at offset 0, things
> become murkier.  !4 could either be a read-write scalar node with a !5
> as a parent, or be a struct type node containing !5 at offset 0.  I
> don't see a way to tell the two possibilities apart.
> 
> The ambiguity shown above is "fine" since (I think, but I'm not sure)
> that containing an object at offset 0 should be equivalent to
> "subclassing" it in the TBAA type system.  It still makes writing
> verifier Assertions more difficult than it should be, though.
> 
> Things get a bit more problematic once we allow for setting the
> "constant" tag on scalar TBAA.  If !4 was !{!"T1", !5, i64 1} then
> there we'd have a "real" ambiguity between it being a scalar node
> describing constant memory or a struct type node containing !5 at
> offset 1.
> 
> 
> Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that
> implies scalar TBAA was slated for removal:
> 
> "After all testing cases are upgraded to use struct-path aware TBAA
> and we can auto-upgrade existing bc files, the support for scalar TBAA
> can be dropped."
> 
> Does anyone have some context for what the motivations were / why the
> work was stopped?  If all the use cases for scalar TBAA can be
> simulated using struct tbaa then that may be the easiest way to remove
> the ambiguity.

We auto-upgrade from scalar TBAA to struct-path-aware TBAA. I vaguely recall that we keep scalar TBAA around because of outside usage.
That is why we still have “createTBAANode” even though clang frontend does not use it.

Manman

> 
> [1]: I've assumed that the base type has to be a struct type node iff
>  it is different from access type node; without it things are even
>  more ambiguous.  This isn't explicitly stated today, though.
> 
> 
> For completeness, here is the C++ source that was used to generate the
> IR above:
> 
> struct A { char f; };
> struct B { A a; };
> struct C { int f; };
> 
> int f(B *b, C *c, int *i) {
>     return b->a.f + c->f + *i;
> }
> 
> and the metadata was:
> 
> !2 = !{!3, !5, i64 0}
> !3 = !{!"_ZTS1B", !4, i64 0}
> !4 = !{!"_ZTS1A", !5, i64 0}
> !5 = !{!"omnipotent char", !6, i64 0}
> !6 = !{!"Simple C++ TBAA"}
> !7 = !{!8, !9, i64 0}
> !8 = !{!"_ZTS1C", !9, i64 0}
> !9 = !{!"int", !5, i64 0}
> !10 = !{!9, !9, i64 0}
> 
> 
> -- Sanjoy
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev