[LLVMdev] alloc_size metadata

Tue May 29 10:23:56 PDT 2012

Hi Nuno,

>> I think this is a good point, here's a suggestion:
>>
>> Have the metadata name two functions, both assumed to have the same
>> signature as the tagged function, one which returns the offset of the
>> start of the allocated region and one which returns the length of the
>> allocated region. Alternatively, these functions could take the same
>> signature and additionally the returned pointer of the tagged
>> function, and then one function can return the start of the region and
>> the other the length.
>
> Ok, so this seems to be the most general proposal, which can obviously
> handle all cases.

I agree.  Variation: have one function return the offset of the start of
the memory, and the other the offset of the end of the memory (or the end
plus 1), i.e. a range.  This seems more uniform to me, but I don't have a
strong preference.

> Something like this would work:
>
> define i8* @foo() {
>     %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
>     %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,
> !alloc_size !{i32 %0}
>     ret i8* %call
> }
>
> Basically I just added a function call as the metadata (it's not
> currently possible to add the function itself to the metadata; the
> function call is required instead).
> As long as the function is marked as readnone, I think it shouldn't
> interfere with the optimizers, and we can have a later pass to drop
> the metadata and remove the calls.  I still don't like having the
> explicit calls there, though.  Any suggestions to remove the functions
> calls from there?

How about this:

define i32 @lo(i32) {
   ret i32 0
}

define i32 @hi(i32 %n) {
   ret i32 %n
}

declare i8* @wonder_allocator(i32)

define i8* @foo(i32 %n) {
   %r = call i8* @wonder_allocator(i32 %n), !alloc !0
   ret i8* %r
}

!0 = metadata !{ i32 (i32)* @lo, i32 (i32)* @hi }

The main problem I see is that if you declare @lo and @hi to have internal
linkage then the optimizers will zap them.  Maybe there's a neat solution
to that.

> I feel that the offset function is probably not required. I've never
> seen an allocation function that doesn't return a pointer to the
> beginning of the allocated buffer. Also, I cannot remember of any
> function in the C library that has that behavior.

Yes, in C you probably never see such a thing, but we are not just dealing
with C here.  I think it is important to have the start offset as well as the
length.

> We will also need a convenient syntax to export this feature in the
> languages we support.

Actually, no you don't.  You could just implement GCC's alloc_size in terms of
this, at least for the moment.  Even in the long term it's probably pretty
pointless for clang to ever expose the start offset functionality since clang
only supports C-like languages and probably (as you mentioned) this is pretty
useless for them.

> I personally would like to see '__attribute__((alloc_size( strlen(x)+1
> ))' in C, but the implementation seems to be non-trivial.
>
> About Duncan's comment about having the memory builtin analysis
> recognize this intrinsic, well I agree it should (and I'll take care
> of that), but I'm not sure if we should be very aggressive in
> optimizing based on this metadata.

It would be great for understanding that loads/stores from/to outside the bounds
of the allocation result in undef.  I think the optimizers already exploit this
kind of info in the case of alloca - maybe this helps generalize to heap
allocations.

> For example, do we really want to remove a call to a custom allocator
> whose return value is unused (like we do for malloc)?

No we don't, so LLVM's interface to malloc-like and calloc-like things would
have to be reworked to extract out different kinds of knowledge.

   If so, we'll
> also need a metadata node to mark de-allocation functions  (so that
> sequences like my_free(my_malloc(xx)) are removed).

Maybe!

Ciao, Duncan.