[LLVMdev] alloc_size metadata

Fri May 25 10:41:19 PDT 2012

>>>> Currently, SAFECode has a pass which just recognizes certain functions as
>>>> allocators and knows how to interpret the arguments to find the  
>>>> size. If we want
>>>> SAFECode to work with another allocator (like a program's custom  
>>>> allocator, the
>>>> Objective-C allocator, the Boehm garbage collector, etc), then  
>>>> that pass needs
>>>> to be modified to recognize it. Having to update this pass for  
>>>> every allocator
>>>> name and type is one of the few reasons why SAFECode only works  
>>>> with C/C++ and
>>>> not just any old language that is compiled down to LLVM IR.
>>>
>>>
>>>> Nuno's proposed feature would allow programmers to communicate  
>>>> the relevant
>>>> information about allocators to tools like SAFECode and ASan. I  
>>>> think it might
>>>> also make some of the optimizations in LLVM that require knowing about
>>>> allocators work on non-C/C++ code.
>>>
>>> these are good points. The attribute and proposed implementation  
>>> feel pretty
>>> clunky though, which is my main gripe.
>>
>> Hrm. I haven't formed an opinion on what the attributes should look like. I
>> think supporting the ones established by GCC would be important for
>> compatibility, and on the surface, they look reasonable. Devising  
>> better ones
>> for Clang is fine with me. What about them feels klunky?
>
> basically it feels like "I only know about C, here's something that  
> pretends to
> be general but only handles C".  Consider a language with a string type that
> contains the string length as well as the characters.  It has a  
> library function
> allocate_string(length).  How much does it allocate?  length+4  
> bytes. That can't
> be represented by alloc_size.  What's more, it may well store the  
> length at the
> start, and return a pointer to just after the length: a pointer to the first
> character.  alloc_size can't represent "the allocated memory starts 4 bytes
> before the return value" either.  In short, it feels like a hack for handling
> something that turns up in some particular C code that someone has,  
> rather than
> a general solution to the general problem.

It's not a general solution, and not it even for C, of course.
But it's very useful for applications that have their own malloc  
wrappers and implementations. For example, LLVM, which has its own  
allocators! Without this metadata, you'll never be able to analyze  
LLVM's code at all. It's simply impossible to detect, in general, if a  
function is a custom allocator.
So, yes, some metadata is necessary. I agree my proposal is not  
general enough for all applications. For example, I run the tool over  
some code yesterday and I found an allocator that is the following:  
alloc(x, y, x) and allocates 'x * y + z' bytes. And that cannot be  
represented either at source-code level (with GCC's attribute) nor at  
IR level following my metadata proposal.
I'm happy to implement something more general if we come up with a  
better design.

Nuno