[llvm-dev] Ambiguity in !tbaa metadata?
Sanjoy Das via llvm-dev
llvm-dev at lists.llvm.org
Tue Nov 1 18:04:22 PDT 2016
Hi Manman,
Thanks for the quick reply!
Manman wrote:
>> On Nov 1, 2016, at 10:52 AM, Sanjoy Das via
llvm-dev<llvm-dev at lists.llvm.org> wrote:
>>
>> I was trying to add some stronger assertions in the verifier around
>> !tbaa metadata when I ran into an ambiguity: I think the encoding of
>> the metadata nodes are such that a given node can be interpreted as
>> either a struct type node or a scalar tbaa node. I'd like a sanity
>> check before I try to fix or work around this.
>>
>> Consider some IR that I got from running clang over a small C++
>> program:
>>
>> ```
>> define void @foo() {
>> ...
>> load ..., !tbaa !2
>> load ..., !tbaa !7
>> load ..., !tbaa !10
>> ...
>> }
>>
>> !2 = !{!3, !5, i64 0}
>> !3 = !{!"T0", !4, i64 0}
>> !4 = !{!"T1", !5, i64 0}
>> !5 = !{!"T2", !6, i64 0}
>> !6 = !{!"Root"}
>> !7 = !{!8, !9, i64 0}
>> !8 = !{!"T3", !9, i64 0}
>> !9 = !{!"T4", !5, i64 0}
>> !10 = !{!9, !9, i64 0}
>> ```
>>
>> I've erased the actual string names to make the ambiguity more
>> obvious.
>>
>> Here !2 and !7 are both struct tag nodes. This means that !5 and !9
>> are both scalar type nodes and !3 and !8 are struct type nodes[1].
>
> For struct-path-aware TBAA, the scalar type node has name, parent
node, and offset (optional).
> This simplifies implementation by making formats of struct type nodes
and scalar type nodes uniform.
So this answers one of my key questions -- the similarity is
not accidental.
> Sorry that the comments are confusing.
> For old scalar TBAA, the scalar node has name, parent node, and
“constant”.
>
> Why do you need to tell whether it is a struct type node or a scalar
type node?
Say !4 above is !{!"T1", !5, i64 1}, with the interpretation that it
is a scalar type node with the "constant" bit set. Then an access
through !4 with the offset 0 may alias with an access through !5.
However, if LLVM interprets !4 = !{!"T1", !5, i64 1} as a struct type
TBAA node that contains !5 at an offset of 1 byte, then an access
through !4 with the offset 0 is no alias with an access through !5.
Thus if the compiler thinks the latter and the frontend that generated
the TBAA metadata was thinking the former, then we may have a
miscompile.
The example is somewhat flimsy though, since ideally we'd force struct
type nodes to start with a 0 offset. However this isn't part of the
specification today.
The back-story here is that I think there are some unverified
invariants that our TBAA implementation uses which would be nice to
enforce in the verifier (in particular, we had a subtle bug in our
codebase that would've been caught by a sufficiently smart verifier).
The "struct type nodes must start with offset 0" is one. Another one
I want to verify is (this is the motivating cause): the final access
into the access type in a struct path has to be at offset 0. That is,
the following is illegal:
...
load ..., !tbaa !2
...
!0 = !{"struct S", !1, 0, !1, 4, !1, 8}
!1 = < scalar int >
!2 = !{!0, !1, 6}
since if the above is allowed, then the implementation of
getMostGenericTBAA looks suspect.
>> However, once we get to the first field of !3, !4 at offset 0, things
>> become murkier. !4 could either be a read-write scalar node with a !5
>> as a parent, or be a struct type node containing !5 at offset 0. I
>> don't see a way to tell the two possibilities apart.
>>
>> The ambiguity shown above is "fine" since (I think, but I'm not sure)
>> that containing an object at offset 0 should be equivalent to
>> "subclassing" it in the TBAA type system. It still makes writing
>> verifier Assertions more difficult than it should be, though.
>>
>> Things get a bit more problematic once we allow for setting the
>> "constant" tag on scalar TBAA. If !4 was !{!"T1", !5, i64 1} then
>> there we'd have a "real" ambiguity between it being a scalar node
>> describing constant memory or a struct type node containing !5 at
>> offset 1.
>>
>>
>> Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that
>> implies scalar TBAA was slated for removal:
>>
>> "After all testing cases are upgraded to use struct-path aware TBAA
>> and we can auto-upgrade existing bc files, the support for scalar TBAA
>> can be dropped."
>>
>> Does anyone have some context for what the motivations were / why the
>> work was stopped? If all the use cases for scalar TBAA can be
>> simulated using struct tbaa then that may be the easiest way to remove
>> the ambiguity.
>
> We auto-upgrade from scalar TBAA to struct-path-aware TBAA. I vaguely
recall that we keep scalar TBAA around because of outside usage.
> That is why we still have “createTBAANode” even though clang frontend
does not use it.
Yes, I see that now.
Unless you have objections, I'd like to propose dropping support for
scalar TBAA metadata, since that lets us get rid of a non-trivial
amount of logic from TypeBasedAliasAnalysis. I'll start an RFC on the
mailing list.
-- Sanjoy
More information about the llvm-dev
mailing list