[llvm-dev] RFC: inbounds on getelementptr indices for global splitting

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 18 18:02:00 PDT 2016


Hi all,

I'd like to propose an IR extension that allows the inbounds keyword to be
attached to indices in a getelementptr constantexpr.

By placing the inbounds keyword on an index, any pointer derived from the
getelementptr outside of the bounds of the element referred to by that
index, other than the pointer one past the end of the element, shall be
treated as a poison value.

The main motivation is to allow the optimizer to split vtable groups along
vtable boundaries thus reducing code size when certain compiler features
[1] are enabled, in a way that avoids breaking ABI. The idea is that it
would be safe to split only if the vtable has local linkage and each
reference to the global has an inbounds keyword on the correct index. If
both of these conditions are satisfied, the address of the entire global is
known to not be taken, and therefore a split would not break semantics. The
new attribute could also potentially be used with other features such as
alias analysis.

This proposal arises from some concerns raised by Eli at
http://reviews.llvm.org/D22295 regarding an earlier implementation of
global splitting, which was based on metadata. I'm posting this new
proposal as an RFC as the increased intrusiveness in the IR warrants
greater visibility.

Example

i8** getelementptr inbounds ({[4 x i8*], [4 x i8*]}, {[4 x i8*], [4 x
i8*]}* @_ZTVfoo, i32 0, inbounds i32 1, i32 2)

This is a reference to the address point of the second element of _ZTVfoo,
a virtual table group with two virtual tables, such that any pointer
extending beyond the bounds of the second virtual table is a poison value.

Alternatives

We could consider attaching metadata to globals as proposed and implemented
in D22295, which would avoid extending the constant expression IR. However
as pointed out by Eli this is problematic because optimization passes may
plausibly rebuild globals in non-permitted ways, for example by deriving a
pointer to one past the end of a partition (which may cause it to point to
another global).

We could consider using intrinsics instead of constant expressions to
represent references to partitions. However this is not enough at least for
vtables because other globals (e.g. VTTs) may need to contain pointers to
vtables.

Thanks,
-- 
Peter

[1] Specifically, if either of the whole-program devirtualization or
control flow integrity features are being used. In the former case, under
virtual constant propagation we are able to place propagated constants
directly in front of virtual tables of classes with multiple bases. In the
latter case, we can arrange virtual tables with multiple bases in a more
hierarchical order, which reduces the required amount of runtime data and
simplifies the required checks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160718/69c3c84a/attachment.html>


More information about the llvm-dev mailing list