[llvm-dev] An ambiguity in TBAA info format
Ivan Kosarev via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 30 14:57:34 PDT 2017
Hello,
Consider these two TBAA access tags:
!1 = !{!5, !5, i64 0}
!3 = !{!7, !7, i64 0}
!5 = !{!"A", !9}
!7 = !{!"B", !9}
The tag !1 describes an access to an object of type "A" and !3 describes
an access to object of type "B".
Both the type descriptors, !5 and !7, refer to node !9 as their type
group. A definition of that node could look like this:
!9 = !{"omnipotent char", ...}
We know that these two accesses should be considered no-alias as neither
of them encloses the other; the least common type group for them is !9.
TypeBasedAAResult::Aliases() and MDNode::getMostGenericTBAA() respond
accordingly and all is good.
Then, let's change the definition for the node !9:
!9 = !{"int", ...}
Now it doesn't look like a type group, but rather a structure member.
And nodes !5 and !7 now look as descriptors for structure types, with
their offset fields added during auto-upgrade:
!5 = !{!"A", !9, i64 0}
!7 = !{!"B", !9, i64 0}
We know that, being interpreted as structure accesses, they still should
be considered no-alias. However, the least common type group for these
types is likely to be the "omnipotent char" node, but certainly not the
type of the field, which is "int".
The problem is that since the formats for the member-of-structure and
member-of-type-group relationships match, MDNode::getMostGenericTBAA()
cannot disambiguate between them and always treat first members of
structure types as type groups.
To resolve this issue I'm thinking of changing the format of type nodes
so that all of them, except root ones, refer to their type groups with
their first operand. The scalar types "A" and "B" mentioned above would
then be rewritten as:
!5 = !{!9, !"A"}
!7 = !{!9, !"B"}
!9 = !{..., "omnipotent char"}
and their structure versions would read:
!5 = !{!9, !"A", !11, i64 0}
!7 = !{!9, !"B", !11, i64 0}
!11 = !{!9, "int"}
The new format can be easily recognized by considering the type of the
first operand: a string would mean the old format and a metadata node
would suggest the new convention.
The question to the community is, are there any reasons that wouldn't
work or not desirable? Or, are there better alternatives to the proposed
solution?
As usual, any comments are highly appreciated.
Thanks,
--
More information about the llvm-dev
mailing list