[LLVMdev] alloc_size metadata

Nuno Lopes nunoplopes at sapo.pt
Tue May 29 09:37:40 PDT 2012


>> On 25/05/12 17:22, John Criswell wrote:
>> > On 5/25/12 2:16 AM, Duncan Sands wrote:
>> >> Hi John,
>> >>
>> >>>>> I'm implementing the alloc_size function attribute in clang.
>> >>>> does anyone actually use this attribute? And if they do, can it
>> >>>> really buy them anything? How about "implementing" it by
>> >>>> ignoring it!
>> >>>
>> >> ...
>> >>>
>> >>> Currently, SAFECode has a pass which just recognizes certain
>> >>> functions as allocators and knows how to interpret the arguments
>> >>> to find the size. If we want SAFECode to work with another
>> >>> allocator (like a program's custom allocator, the Objective-C
>> >>> allocator, the Boehm garbage collector, etc), then that pass
>> >>> needs to be modified to recognize it. Having to update this pass
>> >>> for every allocator name and type is one of the few reasons why
>> >>> SAFECode only works with C/C++ and not just any old language that
>> >>> is compiled down to LLVM IR.
>> >>
>> >>
>> >>> Nuno's proposed feature would allow programmers to communicate
>> >>> the relevant information about allocators to tools like SAFECode
>> >>> and ASan. I think it might also make some of the optimizations in
>> >>> LLVM that require knowing about allocators work on non-C/C++ code.
>> >>
>> >> these are good points. The attribute and proposed implementation
>> >> feel pretty clunky though, which is my main gripe.
>> >
>> > Hrm. I haven't formed an opinion on what the attributes should look
>> > like. I think supporting the ones established by GCC would be
>> > important for compatibility, and on the surface, they look
>> > reasonable. Devising better ones for Clang is fine with me. What
>> > about them feels klunky?
>>
>> basically it feels like "I only know about C, here's something that
>> pretends to be general but only handles C".  Consider a language with
>> a string type that contains the string length as well as the
>> characters.  It has a library function allocate_string(length).  How
>> much does it allocate?  length+4 bytes. That can't be represented by
>> alloc_size.  What's more, it may well store the length at the start,
>> and return a pointer to just after the length: a pointer to the first
>> character.  alloc_size can't represent "the allocated memory starts 4
>> bytes before the return value" either.  In short, it feels like a
>> hack for handling something that turns up in some particular C code
>> that someone has, rather than a general solution to the general
>> problem.
>
> I think this is a good point, here's a suggestion:
>
> Have the metadata name two functions, both assumed to have the same
> signature as the tagged function, one which returns the offset of the
> start of the allocated region and one which returns the length of the
> allocated region. Alternatively, these functions could take the same
> signature and additionally the returned pointer of the tagged
> function, and then one function can return the start of the region and
> the other the length.

Ok, so this seems to be the most general proposal, which can obviously  
handle all cases.
Something like this would work:

define i8* @foo() {
   %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
   %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,  
!alloc_size !{i32 %0}
   ret i8* %call
}

Basically I just added a function call as the metadata (it's not  
currently possible to add the function itself to the metadata; the  
function call is required instead).
As long as the function is marked as readnone, I think it shouldn't  
interfere with the optimizers, and we can have a later pass to drop  
the metadata and remove the calls.  I still don't like having the  
explicit calls there, though.  Any suggestions to remove the functions  
calls from there?

I feel that the offset function is probably not required. I've never  
seen an allocation function that doesn't return a pointer to the  
beginning of the allocated buffer. Also, I cannot remember of any  
function in the C library that has that behavior.

We will also need a convenient syntax to export this feature in the  
languages we support.
I personally would like to see '__attribute__((alloc_size( strlen(x)+1  
))' in C, but the implementation seems to be non-trivial.

About Duncan's comment about having the memory builtin analysis  
recognize this intrinsic, well I agree it should (and I'll take care  
of that), but I'm not sure if we should be very aggressive in  
optimizing based on this metadata.
For example, do we really want to remove a call to a custom allocator  
whose return value is unused (like we do for malloc)?  If so, we'll  
also need a metadata node to mark de-allocation functions  (so that  
sequences like my_free(my_malloc(xx)) are removed).

Any feedback on the issues described is highly appreciated!

Thanks,
Nuno



More information about the llvm-dev mailing list