[llvm-dev] RFC: Resolving TBAA issues

Wed Aug 23 07:26:15 PDT 2017

Daniel,

 > GEP has no relation to original field accesses, as you know (IE
 > we allow them to access negative offsets, etc)
 > For a lot of these languages, more than the TBAA rules say that
 > you can't just  go marching through structures, etc.

So with the current approach we mix two different things: alias rules 
for types and information about specific accesses, such as offsets. What 
this means is, whatever we can conclude from considering a couple of 
accesses represented with such a mix, it can never extend beyond the 
scope of what Clang treats as a single access, that is, an expression of 
the form 'p->a.b.c'. Same expression split into parts, e.g., 'p2 = 
&p->a.b; p2->c', results in a less specific description of the access 
and, as a consequence, in a greater number of potential false positives. 
In turn, proving that 'p2' relates to 'p' is up to analyses that deal 
with memory locations and not memory accesses. Looks like long-term the 
current approach drives us nowhere.

If I take it correctly, purifying TBAA information from offsets means we 
end up with a sort of alias sets. Then, offsets go to another metadata 
tag that encode accesses in terms of constraint expressions. These tags 
are supposed to be processed with what eventually should become an 
implementation of the field-sensitive points-to analysis. This would 
also resolve the BasicAA vs. TBAA responses issue.

I wonder if !tbaa tags for loads and stores reworked to refer to both 
alias sets and constraint expressions would work as a transient format 
for groping our way toward full-size field-sensitive.

Thanks,

--