[llvm-dev] [RFC] A proposal for byval in a world with opaque pointers

Sat Jan 23 19:16:45 PST 2016

Hi,

As there seems to be some concern that adding type attributes 
complicates the attributes system, I decided to experiment with 
implementing type attributes and byval as a type attribute.

Adding type attributes required mostly adding boilerplate code.  The 
only slightly tricky part was updating the types in the linker 
(IRMover.cpp).  This however seems to follow the common pattern in 
IRMover.cpp.  See D16515 for the implementation [1].

Adding the byval attribute on top of this was straightforward as well 
and probably similar to adding an integer attribute.  The tricky bit 
here was bitcode compatibility.  When reading the attribute group, we 
don't have access to the parameter types, so the current implementation 
casts `-1` to Type * and fixes the attribute later (when the attribute 
set is referenced by an actual function).  A similar hack would likely 
also be needed if it was an integer attribute.  See D16516 for the 
implementation [2].

I think we can only go from byval(<type>) or byval(<size>) to 
dereferenceable(<size>), but not the other way around.  In particular, 
the current definition of the dereferenceable attribute contains "It is 
legal for the number of bytes to be less than the size of the pointee 
type.".

-Manuel

[1] http://reviews.llvm.org/D16515
[2] http://reviews.llvm.org/D16516

On 2016-01-19 23:47, Eddy B. via llvm-dev wrote:
> Hi,
> 
> In the past months, several options have been presented for making 
> byval
> (and similar attributes, such as inalloca or sret) work with opaque 
> pointers.
> 
> The main two I've seen were byval(T) and byval(N) where N is the size 
> of T.
> 
> They both have their upsides and downsides, for example: byval(T) would 
> be
> a type-parametric attribute, which, AFAIK, does not already exist and 
> may
> complicate the attribute system significantly, while byval(N) would be 
> hard
> to introduce in tests as computing N from T requires LLVM's DataLayout.
> 
> Also, this would have to be done for inalloca and sret as well - sret 
> only
> needs it when targeting SPARC, although still generally useful in 
> analysis.
> 
> To sidestep some of the concerns and allow a smooth transition towards 
> a
> byval that works with opaque pointers, I've come up with a new 
> approach:
> 
> Reuse dereferenceable(S) and align A for the size and alignment of 
> byval.
> 
> That is, a byval dereferenceable(S) align A argument is guaranteed to 
> have
> S bytes available to read from, *and only S*, aligned to a multiple of 
> A.
> Reading past that size is UB, as LLVM will not copy more than S bytes.
> 
> An API can be provided to add the attribute alongside dereferenceable
> and align attributes, for a given Type* and DataLayout.
> 
> A preliminary implementation (w/o sret) can be found at:
> https://github.com/eddyb/llvm/compare/2579466...65ac99b
> 
> To maintain compatibility with existing code, dereferenceable and align
> attributes are automatically injected as soon as a non-default 
> DataLayout
> is available. The "injection" mechanism could potentially be replaced 
> with
> a pass, although it was easier to experiment with it being guaranteed.
> 
> This works out pretty well in practice, as analysis already understands
> dereferenceable and can make decisions based on it.
> 
> The verifier checks that for byval & friends, dereferenceable(S) and
> align A are present (clang always adds align, but not all tests have 
> it)
> and that S is the exact size of the pointee type (while we still know 
> that).
> 
> That last bit is very important, because it allows a script to do the 
> following:
> 
> 1. Find all byval arguments in tests that are missing dereferenceable, 
> e.g.
>     ... i32* byval align 4 ...
>     .... {i8, i64}* byval ...
> 2. Add a bogus dereferenceable(unique ID) to each of them, i.e.
>     ... i32* byval dereferenceable(123400001) align 4 ...
>     .... {i8, i16}* byval dereferenceable(123400002) ...
> 3. Run the tests and record the errors, which may look like:
> 
> Attribute 'byval' expects 'dereferenceable(4)' for type i32*,
>     found 'dereferenceable(123400001)'
> 
> Attribute 'byval' expects 'dereferenceable(16) align 8' for type {i8, 
> i64}*,
>     found 'dereferenceable(123400002)'
> 
> 4. Use the verifier error messages to replace the bogus attributes
> with the proper ones, which include align A when it is missing:
>     ... i32* byval dereferenceable(4) align 4 ...
>     .... {i8, i16}* byval dereferenceable(16) align 8 ...
> 
> For what is worth, the same scheme would also work for byval(N), and
> would be entirely unnecessary for byval(T).
> 
> I would love to know your thoughts on this, and more specifically:
> Which of the 3 (byval(T), byval(N) and byval + dereferenceable + align)
> do you think would provide the easiest transition path for front-ends?
> 
> Thank you,
>  - eddyb
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev