[llvm-dev] RFC: inbounds on getelementptr indices for global splitting

David Majnemer via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 18 20:29:34 PDT 2016


On Mon, Jul 18, 2016 at 6:02 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> I'd like to propose an IR extension that allows the inbounds keyword to be
> attached to indices in a getelementptr constantexpr.
>
> By placing the inbounds keyword on an index, any pointer derived from the
> getelementptr outside of the bounds of the element referred to by that
> index, other than the pointer one past the end of the element, shall be
> treated as a poison value.
>


I have read this sentence several times and I am still not quite sure what
it means.
Can you please provide more examples of exactly what you are trying to
represent?

I know what it means for inbounds to be on the GEP but none of the indices:
that's the GEP of today.
What does it mean for the GEP to be marked inbound while only some of the
indices are inbounds?  What if the GEP isn't marked inbounds?
What does it mean if all the GEP indices are marked inbounds but the GEP
isn't marked inbounds?

GEPs are folded and optimized, two GEPs can compute the same numeric
position with differing indices.  What happens when we are giving out new
indices for a GEP with one (or more) inbounds index?

I'm also a little confused when you talk about pointers derived from the
GEP being outside of the bounds... The inbounds of present day GEP refers
to the base pointer and offsets, it does not directly have semantics on
pointers derived from such a GEP.  This means it is perfectly OK to have an
out of bounds GEP of an inbounds GEP.  My reading of your extension says
that this is not possible...


> The main motivation is to allow the optimizer to split vtable groups along
> vtable boundaries thus reducing code size when certain compiler features
> [1] are enabled, in a way that avoids breaking ABI. The idea is that it
> would be safe to split only if the vtable has local linkage and each
> reference to the global has an inbounds keyword on the correct index. If
> both of these conditions are satisfied, the address of the entire global is
> known to not be taken, and therefore a split would not break semantics. The
> new attribute could also potentially be used with other features such as
> alias analysis.
>
> This proposal arises from some concerns raised by Eli at
> http://reviews.llvm.org/D22295 regarding an earlier implementation of
> global splitting, which was based on metadata. I'm posting this new
> proposal as an RFC as the increased intrusiveness in the IR warrants
> greater visibility.
>
> Example
>
> i8** getelementptr inbounds ({[4 x i8*], [4 x i8*]}, {[4 x i8*], [4 x
> i8*]}* @_ZTVfoo, i32 0, inbounds i32 1, i32 2)
>
> This is a reference to the address point of the second element of _ZTVfoo,
> a virtual table group with two virtual tables, such that any pointer
> extending beyond the bounds of the second virtual table is a poison value.
>
> Alternatives
>
> We could consider attaching metadata to globals as proposed and
> implemented in D22295, which would avoid extending the constant expression
> IR. However as pointed out by Eli this is problematic because optimization
> passes may plausibly rebuild globals in non-permitted ways, for example by
> deriving a pointer to one past the end of a partition (which may cause it
> to point to another global).
>
> We could consider using intrinsics instead of constant expressions to
> represent references to partitions. However this is not enough at least for
> vtables because other globals (e.g. VTTs) may need to contain pointers to
> vtables.
>
> Thanks,
> --
> Peter
>
> [1] Specifically, if either of the whole-program devirtualization or
> control flow integrity features are being used. In the former case, under
> virtual constant propagation we are able to place propagated constants
> directly in front of virtual tables of classes with multiple bases. In the
> latter case, we can arrange virtual tables with multiple bases in a more
> hierarchical order, which reduces the required amount of runtime data and
> simplifies the required checks.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160718/692372d1/attachment.html>


More information about the llvm-dev mailing list