[LLVMdev] alloc_size metadata

Fri May 25 10:55:07 PDT 2012

On 5/25/12 12:41 PM, Nuno Lopes wrote:
>> basically it feels like "I only know about C, here's something that
>> pretends to
>> be general but only handles C".  Consider a language with a string type that
>> contains the string length as well as the characters.  It has a
>> library function
>> allocate_string(length).  How much does it allocate?  length+4
>> bytes. That can't
>> be represented by alloc_size.  What's more, it may well store the
>> length at the
>> start, and return a pointer to just after the length: a pointer to the first
>> character.  alloc_size can't represent "the allocated memory starts 4 bytes
>> before the return value" either.  In short, it feels like a hack for handling
>> something that turns up in some particular C code that someone has,
>> rather than
>> a general solution to the general problem.
> It's not a general solution, and not it even for C, of course.
> But it's very useful for applications that have their own malloc
> wrappers and implementations. For example, LLVM, which has its own
> allocators! Without this metadata, you'll never be able to analyze
> LLVM's code at all. It's simply impossible to detect, in general, if a
> function is a custom allocator.

Just a nitpick here: in some cases, you can detect if a function is a 
wrapper around an allocator.  A simple data-flow analysis can detect 
whether the function's return value is always the result of a known 
allocator and whether the parameters to the known allocator are a 
function of the function's arguments.

This can work for some allocators, but not all, and I don't know how 
well it works in practice.  However, it's not technically impossible all 
the time.
:)

I think the poolalloc project has an implementation of this analysis: 
poolalloc/lib/DSA/AllocatorIdentification.cpp.

-- John T.