[cfe-dev] [LLVMdev] [RFC] Scoped no-alias metadata

Sun Dec 2 21:21:00 PST 2012

On Sun, Dec 2, 2012 at 8:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Chandler Carruth" <chandlerc at google.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Clang Developers" <cfe-dev at cs.uiuc.edu>, "Dan Gohman"
>> <dan433584 at gmail.com>
>> Sent: Sunday, December 2, 2012 7:02:52 PM
>> Subject: Re: [RFC] Scoped no-alias metadata
>>
>>
>>
>> On Sun, Dec 2, 2012 at 12:48 PM, Hal Finkel < hfinkel at anl.gov >
>> wrote:
>>
>>
>>
>>
>> Hello,
>>
>> I'd like to propose a new form of memory-aliasing metadata: scoped
>> no-alias metadata. This can be used to model local 'restrict'
>> pointers in C99, but should also be useful for other frontends (I
>> think, for example, that Fortran will want this as well). Currently,
>> we only use the restrict qualifier on function arguments in Clang
>> where we translate the restrict qualifier as a 'noalias' function
>> parameter attribute.
>>
>> In C99, the 'restrict' guarantee holds only for the lifetime of the
>> pointer, and different lifetime regions can exist within a single
>> function. Furthermore, these lifetime regions can be nested. As a
>> result, we'll need to assign a unique identifier to each lifetime
>> region (roughly a block in C99, although this may be only a partial
>> block if a restrict pointer is declared in the middle of the block).
>> Also, during optimization, instructions from these different
>> lifetime regions can be merged, so a particular pointer value may
>> end up associated with multiple lifetime regions.

C99 also states that
>>
>>
>>
>> I think a good way to think about this is through the inliner. When
>> we inline a function that has a noalias parameter, we should
>> preserve that noalias for the region of code inlined, no? This might
>> be simpler than describing it in terms of C99 initially, but I think
>> we'll be able to support everything we need... Anyways, not really
>> relevant to the proposal...
>
> This is also a good use case. When a function is inlined, you'd like to preserve the 'noalias', but only when comparing pointers from the callee, not when comparing pointers from the callee to pointers in the caller.
>
>>
>>
>> Similar to TBAA, the scoped lifetime regions are arranged into
>> disjoint tree structures:
>> !0 = metadata !{ metadata !"scope of foo()" }
>> !1 = metadata !{ metadata !"scope 1", metadata !0 }
>> !2 = metadata !{ metadata !"scope 2", metadata !0 }
>> !3 = metadata !{ metadata !"scope 2.1", metadata !2 }
>> !4 = metadata !{ metadata !"scope 2.2", metadata !2 }
>>
>> and these are used to mark the pointer values:
>> %arrayidx = getelementptr inbounds %struct.ST* %s, i64 1, i32 2, i32
>> 1, i64 5, i64 13, !sna !3
>>
>> in C99 this corresponds to something like:
>> void foo(...) {
>> // begin root scope for foo()
>> int *restrict a = ...;
>> ...
>> {
>> // things in this block have scope 1
>> int *restrict b = ...;
>> ...
>> }
>> ...
>> {
>> // things here have scope 2
>> int *restrict b = ...;
>> ...
>> {
>> // a new scope 2.1
>> }
>> ...
>> *b += 1; // some use of a restrict pointer
>> ...
>> // more restrict pointers are declared, start a new nested scope 2.2
>> int *restrict c = ...;
>> ...
>> }
>> }
>>
>> When BasicAA compares to pointers for aliasing, as it recurses to
>> find each pointer's base pointer, it collects a list of scope ids
>> associated with each pointer. If one of the pointers is associated
>> with a scope id that is identical to one associated with the other
>> pointer, or is a descendant or parent to one associated with the
>> other pointer, then the two pointers are assumed not to alias. When
>> merging two pointer values, if both have scope ids, then the
>> resulting value can carry both scope ids. If one does not have a
>> scope id, then the resulting pointer must carry none (we can
>> preserve optimization ability by moving the metadata from the value
>> to be merged to its uses that are also pointer-manipulation
>> instructions).
>>
>>
>>
>> Maybe I'm misunderstanding, but why can you assume that a particular
>> scope will have a distinct GEP to produce the pointer? Or are you
>> really relying on the propagation from the pre-optimization copy of
>> the pointer the frontend emits to the load/store instructions during
>> optimization?
>
> I'm not completely sure I understand your point. A particular GEP could carry multiple scope ids, the real danger comes with sharing a >GEP between a restrict pointer in one scope and a non-restrict pointer from a different scope. From my understanding of how Clang >CodeGen works, this won't happen (because each statement and initializer gets its own set of instructions; there is no effective CSE).

Well, actually, you have a number of problems if you attach noalias to GEP's.
>From C99 (6.7.3.1, example 3):
"

The function parameter declarations

void h(int n, int * restrict p, int * restrict q, int * restrict r)
{
int i;
for (i = 0; i < n; i++)
p[i] = q[i] + r[i];
}
illustrate how an unmodiﬁed object can be aliased through two
restricted pointers. In particular, if a and b
are disjoint arrays, a call of the form h(100, a, b, b) has deﬁned
behavior, because array b is not
modiﬁed within function h.
"

If you claim q and r (which may or may not share a GEP) noalias each
other, that would be wrong.
That's easy to handle, since you say they won't share a GEP.
But you can make the examples arbitrarily complex,  because the
restrict behavior is only really applicable *if you modify the
object*.
"During each execution of B, let L be any lvalue that has &L based on
P. If L is used to
access the value of the object X that it designates, and X is also
modiﬁed (by any means),then the following requirements apply:
...
If these requirements are not met, then the behavior is undeﬁned
"
(Note the last clause, requiring a modification for the requirements to apply).

So you can come up with arbitrarily complex aliased restrict pointers
and scopes, all of which share values,and as long as you don't modify
any of the objects, it's all okay.
Worse, none of the scoping requirements around restrict lifetimes
apply *unless* you modify the objects, since those are inside the
"..." part above.
Worse still, even if you modify the objects, you've only made the
restrict requirements valid for the pointers that point to that
object.

This all seems a bit complex to handle just by attaching the noalias
info to GEP's.

I do realize most of this has to be done in the frontend, but still,
you really are going to have to say in some cases, load alias load but
load noalias store, regardless of what GEP they come from.

Thus, IMHO, you really should attach these to loads/stores, so you can
properly translate restrict into "load-load aliases, unless a store is
present into a noaliased pointer, and then all restricted  pointers
that mustalias X now have to follow the rest of the rules"

Note, BTW, that a number of compilers get the above wrong, and simply
implement what most people want restrict to say, which is that
restricted pointers can't alias each other, rather than what the
standard actually says, because what the standard says is a bit
mind-numbing.

GCC gets this right, BTW, by ignoring most of restrict :(