[LLVMdev] dragonegg: switch from old TBAA format to the new struct-path aware TBAA format

Sat Oct 12 05:28:45 PDT 2013

Hi Manman, thanks for the heads up.  I looked into what it would take to produce
full struct TBAA metadata from the GCC aliasing info (GCC has aliasing info for
struct types, in fact for any type), but it looks kind of tricky.  The problem
is the "offset" field, which doesn't exist in GCC.  In GCC the aliasing
information forms a DAG, with a node for each type, plus a special root node.
How language types turn into GCC TBAA DAG nodes depends on the language, but
for a simple language like C it is set up like this: scalar types just have an
edge to the root node; struct types have edges to the nodes for the types of
its fields.  This is pretty similar to what you have set up, but there is no
offset, and it is not clear how I can get hold of one in general (I need to
look into this more though).

There are other problems with the offset field too:

(1) many languages have array types.  For these languages it is natural to have
a node for the array type with an edge to the node for the element type.  But
there is no reasonable offset in general: arrays have many elements (all of the
same type) but each at a different offset.  A possibility is to create one node
per element, but since arrays often have thousands of elements this would create
vast amounts of metadata.  There could be an artificial limit: only produce TBAA
info for arrays with less than X elements, but that is a bit nasty.  It would
also mean not producing any TBAA info for array types with variable size that
you can find in some languages.

(2) some languages have struct types with fields at variable offsets (i.e. the
offset is determined by the value of some variable, often the value of another
field in the struct).  As the offset is not a constant, it is not possible to
put it in the metadata.  Not producing TBAA info for these types would be a
pity.

An interesting part of the GCC design is that there is no distinction between
scalar and struct nodes: all nodes are equal.  You can *define* a "scalar" node
to be one which only has an edge to the root, but they might not correspond to
the scalar types in the original language.  In fact I don't think nodes have to
have anything to do with a type at all, they seem to be arbitrary cookies.

It's a pity that the LLVM design hardwires in the offset.  It looks like it has
been designed with only C like languages in mind, which isn't surprising but is
limiting.  Is the offset field really useful/necessary?

Best wishes, Duncan.

On 08/10/13 02:47, Manman Ren wrote:
> Hi Duncan,
>
> I am hoping to remove the support for the old TBAA format soon.
> You should be able to switch to the new format by replacing
>
> MDNode *AliasTag = MDHelper.createTBAANode(TreeName, getTBAARoot());
>
> with
> MDNode *AliasType = MDHelper.createTBAAScalarTypeNode(TreeName, getTBAARoot());
> MDNode *AliasTag = MDHelper.createTBAAStructTagNode(AliasType, AliasType, 0)
>
> Also replacing
> LeafTag->replaceAllUsesWith(getTBAARoot());
>
> with
> MDNode *Root = getTBAARoot();
> LeafTag->replaceAllUsesWith(MDHelper.createTBAAStructTagNode(Root, Root, 0)
>
> The document is currently at the beginning of TypeBasedAliasAnalysis.cpp. I am
> going to update the language ref when struct-path aware TBAA is on by default.
>
> Let me know if you have any problem with it.
>
> Thanks,
> Manman
>
>
>