[llvm-dev] Ambiguity in !tbaa metadata?

Tue Nov 1 10:52:09 PDT 2016

I was trying to add some stronger assertions in the verifier around
!tbaa metadata when I ran into an ambiguity: I think the encoding of
the metadata nodes are such that a given node can be interpreted as
either a struct type node or a scalar tbaa node.  I'd like a sanity
check before I try to fix or work around this.

Consider some IR that I got from running clang over a small C++
program:

```
define void @foo() {
   ...
   load ..., !tbaa !2
   load ..., !tbaa !7
   load ..., !tbaa !10
   ...
}

!2 = !{!3, !5, i64 0}
!3 = !{!"T0", !4, i64 0}
!4 = !{!"T1", !5, i64 0}
!5 = !{!"T2", !6, i64 0}
!6 = !{!"Root"}
!7 = !{!8, !9, i64 0}
!8 = !{!"T3", !9, i64 0}
!9 = !{!"T4", !5, i64 0}
!10 = !{!9, !9, i64 0}
```

I've erased the actual string names to make the ambiguity more
obvious.

Here !2 and !7 are both struct tag nodes.  This means that !5 and !9
are both scalar type nodes and !3 and !8 are struct type nodes[1].

However, once we get to the first field of !3, !4 at offset 0, things
become murkier.  !4 could either be a read-write scalar node with a !5
as a parent, or be a struct type node containing !5 at offset 0.  I
don't see a way to tell the two possibilities apart.

The ambiguity shown above is "fine" since (I think, but I'm not sure)
that containing an object at offset 0 should be equivalent to
"subclassing" it in the TBAA type system.  It still makes writing
verifier Assertions more difficult than it should be, though.

Things get a bit more problematic once we allow for setting the
"constant" tag on scalar TBAA.  If !4 was !{!"T1", !5, i64 1} then
there we'd have a "real" ambiguity between it being a scalar node
describing constant memory or a struct type node containing !5 at
offset 1.

Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that
implies scalar TBAA was slated for removal:

"After all testing cases are upgraded to use struct-path aware TBAA
and we can auto-upgrade existing bc files, the support for scalar TBAA
can be dropped."

Does anyone have some context for what the motivations were / why the
work was stopped?  If all the use cases for scalar TBAA can be
simulated using struct tbaa then that may be the easiest way to remove
the ambiguity.

[1]: I've assumed that the base type has to be a struct type node iff
   it is different from access type node; without it things are even
   more ambiguous.  This isn't explicitly stated today, though.

For completeness, here is the C++ source that was used to generate the
IR above:

struct A { char f; };
struct B { A a; };
struct C { int f; };

int f(B *b, C *c, int *i) {
      return b->a.f + c->f + *i;
}

and the metadata was:

!2 = !{!3, !5, i64 0}
!3 = !{!"_ZTS1B", !4, i64 0}
!4 = !{!"_ZTS1A", !5, i64 0}
!5 = !{!"omnipotent char", !6, i64 0}
!6 = !{!"Simple C++ TBAA"}
!7 = !{!8, !9, i64 0}
!8 = !{!"_ZTS1C", !9, i64 0}
!9 = !{!"int", !5, i64 0}
!10 = !{!9, !9, i64 0}

-- Sanjoy