[LLVMdev] alloc_size metadata

Nuno Lopes nunoplopes at sapo.pt
Tue May 29 13:28:06 PDT 2012


>> >> I think this is a good point, here's a suggestion:
>> >>
>> >> Have the metadata name two functions, both assumed to have the same
>> >> signature as the tagged function, one which returns the offset of
>> >> the start of the allocated region and one which returns the length
>> >> of the allocated region. Alternatively, these functions could take
>> >> the same signature and additionally the returned pointer of the
>> >> tagged function, and then one function can return the start of the
>> >> region and the other the length.
>> >
>> > Ok, so this seems to be the most general proposal, which can
>> > obviously handle all cases.
>>
>> I agree.  Variation: have one function return the offset of the start
>> of the memory, and the other the offset of the end of the memory (or
>> the end plus 1), i.e. a range.  This seems more uniform to me, but I
>> don't have a strong preference.
>>
>> > Something like this would work:
>> >
>> > define i8* @foo() {
>> >     %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
>> >     %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,
>> > !alloc_size !{i32 %0}
>> >     ret i8* %call
>> > }
>> >
>> > Basically I just added a function call as the metadata (it's not
>> > currently possible to add the function itself to the metadata; the
>> > function call is required instead).
>> > As long as the function is marked as readnone, I think it shouldn't
>> > interfere with the optimizers, and we can have a later pass to drop
>> > the metadata and remove the calls.  I still don't like having the
>> > explicit calls there, though.  Any suggestions to remove the
>> > functions calls from there?
>>
>> How about this:
>>
>> define i32 @lo(i32) {
>>    ret i32 0
>> }
>>
>> define i32 @hi(i32 %n) {
>>    ret i32 %n
>> }
>>
>> declare i8* @wonder_allocator(i32)
>>
>> define i8* @foo(i32 %n) {
>>    %r = call i8* @wonder_allocator(i32 %n), !alloc !0
>>    ret i8* %r
>> }
>>
>> !0 = metadata !{ i32 (i32)* @lo, i32 (i32)* @hi }
>
> This is the format that I had in mind.
>
>>
>>
>> The main problem I see is that if you declare @lo and @hi to have
>> internal linkage then the optimizers will zap them.  Maybe there's a
>> neat solution to that.
>
> I would consider the optimizer doing this a feature, not a problem.
> That having been said, we need to make sure that the optimzer does not
> zap them before the analysis/instrumentation passes get to run.

This is actually non-trivial to accomplish.
Metadata doesn't count as a user, so internal functions with no other  
usage will get removed. On the other hand, we could use @llvm.used to  
make sure the functions had (at least) one user, but that's probably  
equivalent to not using internal linkage.
And I still want to make sure that these functions disappear in the  
final binary..

Another thing that bothers me is the implementation on the objectsize  
intrinsic. This intrinsic returns the *constant* size of the pointed  
object given as argument (if the object has a constant size). However,  
with this function scheme, the implementation would be a bit heavy,  
since it would need to inline the @lo and @hi functions, simplify the  
resulting expression, and then check if the result is a ConstantInt.  
And remember that in general these functions can be arbitrary complex.


>> > I feel that the offset function is probably not required. I've never
>> > seen an allocation function that doesn't return a pointer to the
>> > beginning of the allocated buffer. Also, I cannot remember of any
>> > function in the C library that has that behavior.
>>
>> Yes, in C you probably never see such a thing, but we are not just
>> dealing with C here.  I think it is important to have the start
>> offset as well as the length.

Ok, that makes the whole thing more general. But is the added  
complexity worth it?  I mean, if it won't have any expected users,  
then probably it's not worth it.  I don't know much about ADA or other  
languages, so I'm just asking if there is a front-end that may use  
this functionality.


>> > We will also need a convenient syntax to export this feature in the
>> > languages we support.
>>
>> Actually, no you don't.  You could just implement GCC's alloc_size in
>> terms of this, at least for the moment.  Even in the long term it's
>> probably pretty pointless for clang to ever expose the start offset
>> functionality since clang only supports C-like languages and probably
>> (as you mentioned) this is pretty useless for them.
>
> As I mentioned in my other response, posix_memalign and friends do
> this. However, writing to the memory prior to the returned pointer
> would still be an error, so perhaps this does not matter.

Uhm, that's not what I understand from the man page. It even says that  
the returned pointer can be used with free and realloc.

Nuno



More information about the llvm-dev mailing list