<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 08/21/2017 04:43 PM, Xinliang David
Li wrote:<br>
</div>
<blockquote
cite="mid:CALRgJCNVM+tuoq_21wSwNmkAuCqWbG6OZ9KJYrmNacWbvVB7yw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Aug 21, 2017 at 2:19 PM, Hal
Finkel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div class="gmail-h5">
<div
class="gmail-m_479059396580791321moz-cite-prefix">On
08/20/2017 08:47 PM, Xinliang David Li via
llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Ivan, thanks for writing it up.
This is a pretty long thread, and there are many
good points brought up in the heated
discussions. Here is my take on many of the
points that have been mentioned:
<div><br>
</div>
<div>1) The type based aliasing IR annotation
should be generic enough to represent aliasing
constraints for any frontend languages; </div>
<div>2) The offset based aliasing rules and type
based aliasing rule have a lot in common. The
difference is that the offset based aliasing
rule require that the two accesses to have
same base pointer value, while type based
aliasing rules specifies whether two base
pointers can possibly point to the same
underlying object, and if yes, what are the
legally allowed offsets at which the two base
pointers can be aligned;</div>
<div>3) Type specific information is usually not
interesting to offset based aliasing rules if
offsets can be folded into constant. It is
when there is a variable involved in the
indexing that makes language guaranteed bounds
information also useful.</div>
<div><br>
</div>
<div>Given the above, the type based aliasing
annotation can be represented as list of
<base_type, offset> pairs, as well as
the original base pointer type in the source.
Each pair represent a way to access the memory
legally allowed by the language -- this field
can be accessed with a base pointer of type
'base_type' at 'offset'.</div>
<div><br>
</div>
<div>For instance:</div>
<div><br>
</div>
<div>struct A { struct B { struct C {int m, n; }
c1; int i, j; } b1, b2; int k; };</div>
<div>struct A*ap;</div>
<div>struct B *bp;</div>
<div>struct C *cp;</div>
<div>int *ip;</div>
<div><br>
</div>
<div>(1) access ap->b2.c1.n has following
annotation:</div>
<div><br>
</div>
<div> {A*, [ <A, 12>, <B, 4>,
<C,4>, <int, 0>] }</div>
<div><br>
</div>
<div>What it means </div>
<div> </div>
<div> Access (1) may only be aliased with
another access if that access's base pointer
is type A and the offset is 12, or type B at
offset 4, or type C at offset 4, or type int
at offset 0. Also access (1) is originally
accessed via path <A, 12>.</div>
<div><br>
</div>
<div>(2) access ap->k has only</div>
<div> </div>
<div> {A*, [<A, 16>, <int, 0>}</div>
<div><br>
</div>
<div>(3) access bp->c1.n has</div>
<div> </div>
<div> {B*, [<B, 4>, <int, 0>]}</div>
<div><br>
</div>
<div>(4) Access bp->j has</div>
<div><br>
</div>
<div> {B*, [<B, 12>, <int, 0>]}</div>
<div><br>
</div>
<div>(5) access *ip (implicitly) has</div>
<div><br>
</div>
<div> {int *, <int, 0>}</div>
<div><br>
</div>
<div>From the above, we can see (5) is aliased
with all the above, and (3) is also aliased
with (1). <br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
This sounds reasonable. It is almost exactly an encoding
of our current TBAA scheme where, instead of causing the
client to walk up the tree, we explicitly provide the
path up the tree at each access. It should also handle
unions naturally because, regardless of the "currently
live" union member, all potentially-aliasing accesses
will have an entry in their list like <U, offset>.</div>
</blockquote>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>With this representation, the type based
alias rules implementation becomes: given two
memory accesses, align the pointer of one access
to the other. If it can be done, they are not
aliased. If yes, the use the same
base+offset+access_size aliasing rule to check
overlapping.</div>
</div>
</blockquote>
<br>
</span> I'm not entirely sure what you mean by aligning
pointers (and why aligning them would mean they couldn't
alias). Can you please explain?</div>
</blockquote>
<div><br>
</div>
<div>I had a typo there. I meant 'if it can not be done,
they are not aliased ...".</div>
<div><br>
</div>
<div>By aligning pointers, I meant align the base pointer of
another access with one of the allowed types in the access
list.</div>
<div><br>
</div>
<div>Given:</div>
<div><br>
</div>
<div>Access (1) {A*, [ <A, 12>, <B, 4>,
<C,4>, <int, 0>] } and access (2) {B*,
[<B, 4>, <int, 0>]}, we know that access
(2)'s base pointer type is 'B*'. Its base pointer typed
then needs to be matched up against the types in Access
(1)'s list. In this case, it finds a match, so (2)'s base
pointer will be aligned to point to the type B subobject
of the access (1). After alignment, we then compare
offset and access size to determine overlapping.</div>
</div>
</div>
</div>
</blockquote>
<br>
Maybe I'm missing something. Isn't the aliasing check just a suffix
check in this scheme? As in:<br>
<br>
Might an access with [ <A, 12>, <B, 4>, <C,4>,
<int, 0> ], list (1), alias with an access with [<B, 4>,
<C, 4>, <int, 0>], list (2)? Yes, because the elements
in list (2) are a suffix of those in list (1).<br>
<br>
I'm assuming here that the offsets are to the start of each field,
and the offsets within the field are separate. Is dealing with the
intra-field-offset issue what you meant by alignment?<br>
<br>
To deal with unions, I think that we'd need an additional rule.
Something like: If the suffix check fails, for each union type in
the list, discard the path elements to the right of the union type
and try again. In this way, we can check things like:<br>
<br>
[ <A, 12>, <U, 4>, <C,4>, <int, 0> ], list
(1), checked against, [ <A, 12>, <U, 4>, <F,4>,
<float, 0> ], list (2). The initial suffix check fails, but we
then discard <F, 4> and <float, 0> from list (2), and
try again. At that point, the check succeeds and we conclude that
aliasing is possible.<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:CALRgJCNVM+tuoq_21wSwNmkAuCqWbG6OZ9KJYrmNacWbvVB7yw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>David</div>
<div><br>
</div>
<div><br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>For languages that have rules about access
bounds of member arrays, the offset information
can be replaced with offset + range. By default,
the range is the size of the array type.</div>
</div>
</blockquote>
<br>
</span> Yes, and I'd certainly like to include this
information going forward (along with some flag saying
whether it's legal to over/under index the field).<br>
<br>
Thanks again,<br>
Hal
<div>
<div class="gmail-h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>David</div>
<div><br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Aug 16, 2017
at 11:59 AM, Ivan A. Kosarev via llvm-dev <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hal,
Daniel,<br>
<br>
Thanks for your responses. Here's a quick
formal introduction to the proposed
approach follows. I also attached a couple
files to put some visibility on
implementation details. We'd love to hear
from you gentlemen and LLVM community in
general on how this fits what you know
about TBAA.<br>
<br>
<br>
Overview<br>
========<br>
<br>
With this writing we propose a new
approach to the TBAA mechanism<br>
designed to overcome current issues such
as:<br>
- inability to represent accesses to
aggregates, unions, union<br>
members, bit fields, fields of union
and aggregate types,<br>
including array members and<br>
- lack of a mean to identify user types
defined in different<br>
translation units.<br>
<br>
As of today, we have a local patch that
implements this approach.<br>
This new implementation is known to be at
least as functionally<br>
complete as the current one. Additionally,
with this patch on top<br>
we improved SROA to propagate TBAA
information, thus making sure<br>
the new features work as expected.<br>
<br>
We should be able to upload that patch for
review in a few days.<br>
<br>
<br>
The Approach<br>
============<br>
<br>
With the proposed approach we represent
accesses as sequences<br>
that contain all accessed types and fields
in order. For example<br>
for this access:<br>
<br>
struct T { union U { struct S { int i1,
i2; } s1, s2; } u1, u2; } t;<br>
t.u1.s1.i1<br>
<br>
we generate an access sequence of the
form:<br>
<br>
[T, T::u1, U, U::s1, S, S::i1, int]<br>
<br>
An array is allowed to overlap with any
object of its element<br>
type, including other arrays of the same
element type, so an<br>
access to an array element is represented
as an access to an<br>
object of the element type:<br>
<br>
int a[7];<br>
a[5]<br>
<br>
[int]<br>
<br>
In case of a multi-dimensional array this
rule applies<br>
recursively:<br>
<br>
int a[7][9];<br>
a[3][5]<br>
<br>
[int]<br>
<br>
For member arrays we specify the member
field as usual so we can<br>
distinct it from other array members:<br>
<br>
struct S { int a[7], b[9]; } s;<br>
s.a[5]<br>
<br>
[S, S::a, int]<br>
<br>
Similarly to the scalar and struct-path
approaches, we consider<br>
every type to be a member of a type group
it explicitly refers<br>
to. Here's how the tree that describes
relations between type<br>
groups would look like for the example
above:<br>
<br>
<tbaa_root><br>
|- <may_alias><br>
|- <representation_byte><br>
|-<structure><br>
| |- S<br>
|- int<br>
<br>
The <vtable_pointer> group has a
special meaning and is used to<br>
describe accesses to virtual table
pointers. Similarly, the<br>
<union> type group includes all
union types and used by the TBAA<br>
implementation to distinct union types
from other types. The<br>
<may_alias> group is technically
equivalent to<br>
<representation_byte> and supposed
to be a group for<br>
may_alias-marked types.<br>
<br>
For two given access sequences we can
determine if the accessed<br>
objects are allowed to overlap by the
rules of the input<br>
language. Here's the list of rules
complete enough to support C<br>
and C++. Ellipsis elements denote
sequences of zero or more<br>
elements. For other input languages more
rules can be supported,<br>
if necessary.<br>
<br>
[X...] is allowed to overlap with
[S1..., X..., S2...]<br>
and the most generic access sequence is
[X...].<br>
<br>
[X1..., X2...] is allowed to overlap
with [S1..., X1...]<br>
with the most generic access sequence to
be [X1...].<br>
<br>
[X1..., U, U::m1, X2...] is allowed to
overlap with<br>
[S1..., X1..., U, U::m2, S2...]<br>
for a union U of an unknown effective
type, provided m1 != m2<br>
and the most generic access sequence is
[X1..., U].<br>
<br>
If neither of the given sequences contains
the leading access<br>
type of the other, then they are allowed
to overlap if the<br>
leading access type of one sequence is a
direct or indirect field<br>
of the final access type of the other
sequence and then the most<br>
generic access sequence consists of a
single element, which is<br>
that final access type.<br>
<br>
For the purpose of determining whether one
type is a direct or<br>
indirect member of another type unions are
considered to have no<br>
members as accesses to members of unions
are only allowed to<br>
overlap if they have the base union object
explicitly specified.<br>
<br>
Otherwise, given sequences overlap if
there is a type group that<br>
includes both the leading access types and
the most generic<br>
access sequence consists of the smallest
common type group as its<br>
only element.<br>
<br>
See the attached
TypeBasedAliasAnalysis.cpp file and
specifically<br>
the MatchAccessSequences() function for
how these rules can be<br>
implemented.<br>
<br>
TBAA information is encoded as metadata
nodes, as usual. Load and<br>
store instructions refer to access
sequences:<br>
store %struct.T* %p, %struct.T**
%p.addr, align 8, !tbaa !11<br>
<br>
A type node is either a terminal type node
that names a root type<br>
group:<br>
!0 = !{ !"<tbaa_root>" }<br>
<br>
or a non-terminal type node that names a
type and refers to a<br>
type group it belongs to:<br>
!1 = !{ !0, !"int" }<br>
<br>
Record types also refer to their field
descriptors:<br>
!3 = !{ !0, !"S", !9 }<br>
<br>
An access node is either a terminal access
node that refers to<br>
the corresponding access type:<br>
!5 = !{ !1 }<br>
!9 = !{ !3 }<br>
<br>
or a member node that refers to a
structure/class or union field<br>
descriptor and a subsequent access path
node:<br>
!7 = !{ !type_group, !field_id,
!field_offset, !field_size }<br>
!11 = !{ !5, !9, !7 }<br>
<br>
For a field node the first element refers
to its type. The<br>
purpose of other elements is to make the
field node unique. Their<br>
meaning is unspecified. Currently the
other members for C and C++<br>
are the field name, bit offset and bit
size of the member, but<br>
this may change in future and front ends
for other input<br>
languages may act differently, so TBAA
implementation in the<br>
codegen shall not rely on specific shape
or meaning of these<br>
elements.<br>
<br>
For types that are interchangeable for
purposes of TBAA it is<br>
important to encode them identically so
that descriptors of<br>
interchangeable types defined in different
modules merge into<br>
same metadata nodes.<br>
<br>
Structure/class fields are specified in
the order of declaration.<br>
For union fields there is a canonical
order that guarantee that<br>
definitions of the same union type will
result in identical<br>
descriptors regardless of the order of
member declarations.<br>
Currently we sort union fields with key
(field_id, field_offset,<br>
field_size).<br>
<br>
C++ tagged types with no linkage are to be
encoded as "distinct"<br>
nodes to guarantee their uniqueness.<br>
<br>
The support for !tbaa.struct information
is to be replaced with<br>
plain !tbaa tags representing accesses to
the corresponding<br>
record types.<br>
<br>
Another attached file, the tbaa.cpp one,
is a test case that can<br>
give an idea what encoded TBAA metadata
may look like.<br>
<br>
<br>
Space and Performance Analysis<br>
==============================<br>
<br>
In terms of metadata size, with the new
approach we generate<br>
about 15% more of metadata nodes. The
ratio of the total number<br>
of TBAA nodes to the amount of code
remains very low, meaning the<br>
size of TBAA metadata is unlikely to be a
problem any time soon.<br>
<br>
From the performance perspective, the
proposed approach differs<br>
from the current one in that:<br>
- it does not traverse through record
types if one of the access<br>
sequences to match contains the leading
access type of the<br>
other and<br>
- it never traverses union types.<br>
<br>
We thus expect that the new approach is at
least as efficient as<br>
the current one. Our experiments do not
indicate any sensible<br>
difference in performance between the
implementations.<span
class="gmail-m_479059396580791321HOEnZb"><font
color="#888888"><br>
<br>
-- <br>
<br>
</font></span><br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank">llvm-dev@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset
class="gmail-m_479059396580791321mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a moz-do-not-send="true" class="gmail-m_479059396580791321moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="gmail-m_479059396580791321moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</div></div><span class="gmail-"><pre class="gmail-m_479059396580791321moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</span></div>
</blockquote></div>
</div></div>
</blockquote>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></body></html>