[LLVMdev] alloc_size metadata

John Criswell criswell at illinois.edu
Fri May 25 09:38:39 PDT 2012


On 5/25/12 10:43 AM, Duncan Sands wrote:
> Hi John,
>
> On 25/05/12 17:22, John Criswell wrote:
>> On 5/25/12 2:16 AM, Duncan Sands wrote:
>>> Hi John,
>>>
>>>>>> I'm implementing the alloc_size function attribute in clang.
>>>>> does anyone actually use this attribute? And if they do, can it 
>>>>> really buy
>>>>> them anything? How about "implementing" it by ignoring it!
>>>>
>>> ...
>>>>
>>>> Currently, SAFECode has a pass which just recognizes certain 
>>>> functions as
>>>> allocators and knows how to interpret the arguments to find the 
>>>> size. If we want
>>>> SAFECode to work with another allocator (like a program's custom 
>>>> allocator, the
>>>> Objective-C allocator, the Boehm garbage collector, etc), then that 
>>>> pass needs
>>>> to be modified to recognize it. Having to update this pass for 
>>>> every allocator
>>>> name and type is one of the few reasons why SAFECode only works 
>>>> with C/C++ and
>>>> not just any old language that is compiled down to LLVM IR.
>>>
>>>
>>>> Nuno's proposed feature would allow programmers to communicate the 
>>>> relevant
>>>> information about allocators to tools like SAFECode and ASan. I 
>>>> think it might
>>>> also make some of the optimizations in LLVM that require knowing about
>>>> allocators work on non-C/C++ code.
>>>
>>> these are good points. The attribute and proposed implementation 
>>> feel pretty
>>> clunky though, which is my main gripe.
>>
>> Hrm. I haven't formed an opinion on what the attributes should look 
>> like. I
>> think supporting the ones established by GCC would be important for
>> compatibility, and on the surface, they look reasonable. Devising 
>> better ones
>> for Clang is fine with me. What about them feels klunky?
>
> basically it feels like "I only know about C, here's something that 
> pretends to
> be general but only handles C".  Consider a language with a string 
> type that
> contains the string length as well as the characters.  It has a 
> library function
> allocate_string(length).  How much does it allocate?  length+4 bytes. 
> That can't
> be represented by alloc_size.  What's more, it may well store the 
> length at the
> start, and return a pointer to just after the length: a pointer to the 
> first
> character.  alloc_size can't represent "the allocated memory starts 4 
> bytes
> before the return value" either.  In short, it feels like a hack for 
> handling
> something that turns up in some particular C code that someone has, 
> rather than
> a general solution to the general problem.

True.  It also doesn't handle a number of "C" allocators like strdup(), 
etc.  Making it that general, though, may be tricky, and I don't think 
it negates the utility of the simpler form.  I suspect a fair number of 
allocators could be described by the alloc_size feature.

Even in the C/C++ world, I think it'd be useful.  There's the 
GC_malloc() in Boehm's garbage collector, kmalloc() in the Linux kernel, 
malloc wrappers in applications, memalign(), etc.

>
>>> Since LLVM already has utility functions for recognizing allocators 
>>> (i.e. that
>>> know about malloc, realloc and -fno-builtin etc) can't SAFECode just 
>>> make use
>>> of them?
>>
>> It probably could. It doesn't simply because SAFECode was written 
>> before these
>> features existed within LLVM.
>> :)
>>
>> [snip]
> no, I'm thinking that SAFECode won't need to look at or worry about the
> attribute at all, because the LLVM methods will know about it and serve
> up the appropriate info.  Take a look at Analysis/MemoryBuiltins.h.  In
> spite of the names, things like extractMallocCall are dealing with 
> "malloc
> like" functions, such as C++'s "new" as well as malloc.  Similarly for
> calloc.  So you could use those right now to extract "malloc" and 
> "calloc"
> sizes.  If alloc_size is implemented, presumably these would just 
> magically
> start to give you useful sizes for functions annotated with that 
> attribute too.

I see.  That makes sense.

-- John T.




More information about the llvm-dev mailing list