[LLVMdev] alloc_size metadata

Tue May 29 11:11:55 PDT 2012

On Tue, 29 May 2012 19:23:56 +0200
Duncan Sands <baldrick at free.fr> wrote:

> Hi Nuno,
> 
> >> I think this is a good point, here's a suggestion:
> >>
> >> Have the metadata name two functions, both assumed to have the same
> >> signature as the tagged function, one which returns the offset of
> >> the start of the allocated region and one which returns the length
> >> of the allocated region. Alternatively, these functions could take
> >> the same signature and additionally the returned pointer of the
> >> tagged function, and then one function can return the start of the
> >> region and the other the length.
> >
> > Ok, so this seems to be the most general proposal, which can
> > obviously handle all cases.
> 
> I agree.  Variation: have one function return the offset of the start
> of the memory, and the other the offset of the end of the memory (or
> the end plus 1), i.e. a range.  This seems more uniform to me, but I
> don't have a strong preference.
> 
> > Something like this would work:
> >
> > define i8* @foo() {
> >     %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
> >     %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,
> > !alloc_size !{i32 %0}
> >     ret i8* %call
> > }
> >
> > Basically I just added a function call as the metadata (it's not
> > currently possible to add the function itself to the metadata; the
> > function call is required instead).
> > As long as the function is marked as readnone, I think it shouldn't
> > interfere with the optimizers, and we can have a later pass to drop
> > the metadata and remove the calls.  I still don't like having the
> > explicit calls there, though.  Any suggestions to remove the
> > functions calls from there?
> 
> How about this:
> 
> define i32 @lo(i32) {
>    ret i32 0
> }
> 
> define i32 @hi(i32 %n) {
>    ret i32 %n
> }
> 
> declare i8* @wonder_allocator(i32)
> 
> define i8* @foo(i32 %n) {
>    %r = call i8* @wonder_allocator(i32 %n), !alloc !0
>    ret i8* %r
> }
> 
> !0 = metadata !{ i32 (i32)* @lo, i32 (i32)* @hi }

This is the format that I had in mind.

> 
> 
> The main problem I see is that if you declare @lo and @hi to have
> internal linkage then the optimizers will zap them.  Maybe there's a
> neat solution to that.

I would consider the optimizer doing this a feature, not a problem.
That having been said, we need to make sure that the optimzer does not
zap them before the analysis/instrumentation passes get to run.

> 
> > I feel that the offset function is probably not required. I've never
> > seen an allocation function that doesn't return a pointer to the
> > beginning of the allocated buffer. Also, I cannot remember of any
> > function in the C library that has that behavior.
> 
> Yes, in C you probably never see such a thing, but we are not just
> dealing with C here.  I think it is important to have the start
> offset as well as the length.
> 
> > We will also need a convenient syntax to export this feature in the
> > languages we support.
> 
> Actually, no you don't.  You could just implement GCC's alloc_size in
> terms of this, at least for the moment.  Even in the long term it's
> probably pretty pointless for clang to ever expose the start offset
> functionality since clang only supports C-like languages and probably
> (as you mentioned) this is pretty useless for them.

As I mentioned in my other response, posix_memalign and friends do
this. However, writing to the memory prior to the returned pointer
would still be an error, so perhaps this does not matter.

 -Hal

> 
> > I personally would like to see
> > '__attribute__((alloc_size( strlen(x)+1 ))' in C, but the
> > implementation seems to be non-trivial.
> >
> > About Duncan's comment about having the memory builtin analysis
> > recognize this intrinsic, well I agree it should (and I'll take care
> > of that), but I'm not sure if we should be very aggressive in
> > optimizing based on this metadata.
> 
> It would be great for understanding that loads/stores from/to outside
> the bounds of the allocation result in undef.  I think the optimizers
> already exploit this kind of info in the case of alloca - maybe this
> helps generalize to heap allocations.
> 
> > For example, do we really want to remove a call to a custom
> > allocator whose return value is unused (like we do for malloc)?
> 
> No we don't, so LLVM's interface to malloc-like and calloc-like
> things would have to be reworked to extract out different kinds of
> knowledge.
> 
>    If so, we'll
> > also need a metadata node to mark de-allocation functions  (so that
> > sequences like my_free(my_malloc(xx)) are removed).
> 
> Maybe!
> 
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory