[llvm-dev] Alias Analysis with inbound GEPs

Mon Jul 25 15:10:35 PDT 2016

On Mon, Jul 25, 2016 at 2:06 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> ------------------------------
>
> *From: *"Elena Demikhovsky" <elena.demikhovsky at intel.com>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>
> *Sent: *Monday, July 25, 2016 2:46:34 PM
> *Subject: *RE: [llvm-dev] Alias Analysis with inbound GEPs
>
>
>
> I’m checking aliasing of two pointers:
>
>
>
>   %GEP1 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 1,
> i64 %indvars.iv41, i64 %indvars.iv39
>
>   %GEP2 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 16
>
>
>
> The result I got is “PartialAlias” because the indices of the GEP1 are
> variable.
>
> That seems like a bug. PartialAlias should only be returned when we can
> prove a partial overlap. Otherwise, MayAlias should be returned.
>
> *[Demikhovsky, Elena] There are some comments inside:*
>
>   // Statically, we can see that the base objects are the same, but the
>
>   // pointers have dynamic offsets which we can't resolve. And none of our
>
>   // little tricks above worked.
>
>   //
>
>   // TODO: Returning PartialAlias instead of MayAlias is a mild hack; the
>
>   // practical effect of this is protecting TBAA in the case of dynamic
>
>  // indices into arrays of unions or malloc'd memory.
>
>   return PartialAlias;
>
> Ah, thanks! That, unfortunately, makes sense.
>
>
>
> Shouldn’t the “inbounds” keyword mean that the access to sub-array is also
> in-bounds?
>
> No. inbounds applies only to the whole object.
>
> I’m trying to reach “NoAlias” consensus between GEP1 and GEP2.
>
> Did the original code come from C or C++? What are we modeling here?
>
> *[Demikhovsky, Elena] C-code:*
>
> *                           for (m=0; m < params->num; m++) {*
>
> *                                           params->a[i][m] = expr;*
>
> *                              }*
>
> *%GEP1 is the store for params->a[i][m]*
>
> *%GEP2 is the load for params->num.*
>
> *The loop is not vectorized due to a possible collision between
> params->num and params->a[i][m]. If I take loading of params->num outside
> the loop, it is vectorized.*
>
> *Bounds of array “a” are known at compile time. Limit of “i” and “m” are
> runtime variables.*
>
> The problem is, IIRC, it is not undefined behavior to access one structure
> field by over-indexing an earlier array member. C++ has rules for
> "safely-derived pointers", and I think they include all pointer arithmetic
> on addresses from subobjects. If array access is just pointer arithmetic,
> I'm not sure that helps you as much as you'd like. cc'ing Richard to
> correct me if necessary.
>
>
It is actually undefined behavior. This is explicitly called out in Annex
J.2: "An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression a[1][7]
given the declaration int a[4][5]) ".  If you break it apart into separate
steps, the interesting bit is that the implicit array-to-pointer conversion
is not equivalent to a cast; "int* b = (int*)a;" is not equivalent to "int*
b = *a;", even though the expressions have the same type and value.

There currently isn't any way to model the aliasing behavior of the
address-of operator or array-to-pointer decay in LLVM IR. See also
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html .

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/fd0b95eb/attachment.html>