[llvm-dev] RFC: Representing unions in TBAA

Ivan A. Kosarev via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 14 09:58:56 PDT 2017


Hello Steven, Hal and Daniel,

Thanks a lot for your discussion; it really helps with summarizing 
current TBAA issues and ways to resolve them.

Do you guys know anything of the current status of the proposed change? 
Steven, will you please let us know if the work is in progress and if 
there is any ETA you can share?

I'm asking because we are working on an alternative approach that not 
only supports accesses to union members, bit fields, fields of aggregate 
and union types, but also allows to represent accesses to aggregates and 
unions the same way we do it for scalars so that !tbaa.struct is 
replaced with plain !tbaa, meaning TBAA information can be propagated 
uniformly regardless of types of accessed objects. As a consequence, it 
supports identification of user types defined in different translation 
units, even if some of them are written in C and others are in C++. It 
also defines a set of language-neutral formal rules that LLVM codegen 
follows to determine whether a given pair of accesses are allowed to 
overlap by rules of the input language. As of today, we know this 
implementation covers all currently supported TBAA functionality 
reflected in the test suites and to test the new functionality we have 
SROA improved to preserve TBAA information.

The point is, our approach does not try to describe accesses as (type, 
offset) pairs and instead represents access sequences explicitly 
beginning from the base type followed by field descriptors, which is 
what makes the approach so flexible. TypeBasedAAResult::Aliases() and 
MDNode::getMostGenericTBAA() are a bit more complex than they used to be 
(they actually use the same internal function), but rely exclusively on 
linear scans of access sequences unless we have a situation when have to 
check if one of the accessed types is the type of a member of the other 
one, in which case it seems we just have to traverse through fields 
recursively no matter what.

So, I wonder if this or similar approaches have ever been considered 
before and what are the cons, if there are any sounded. Do you think it 
is worth to consider it now?

Thanks again,

-- 



More information about the llvm-dev mailing list