[LLVMdev] alloc_size metadata
Hal Finkel
hfinkel at anl.gov
Tue May 29 11:11:55 PDT 2012
On Tue, 29 May 2012 19:23:56 +0200
Duncan Sands <baldrick at free.fr> wrote:
> Hi Nuno,
>
> >> I think this is a good point, here's a suggestion:
> >>
> >> Have the metadata name two functions, both assumed to have the same
> >> signature as the tagged function, one which returns the offset of
> >> the start of the allocated region and one which returns the length
> >> of the allocated region. Alternatively, these functions could take
> >> the same signature and additionally the returned pointer of the
> >> tagged function, and then one function can return the start of the
> >> region and the other the length.
> >
> > Ok, so this seems to be the most general proposal, which can
> > obviously handle all cases.
>
> I agree. Variation: have one function return the offset of the start
> of the memory, and the other the offset of the end of the memory (or
> the end plus 1), i.e. a range. This seems more uniform to me, but I
> don't have a strong preference.
>
> > Something like this would work:
> >
> > define i8* @foo() {
> > %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
> > %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,
> > !alloc_size !{i32 %0}
> > ret i8* %call
> > }
> >
> > Basically I just added a function call as the metadata (it's not
> > currently possible to add the function itself to the metadata; the
> > function call is required instead).
> > As long as the function is marked as readnone, I think it shouldn't
> > interfere with the optimizers, and we can have a later pass to drop
> > the metadata and remove the calls. I still don't like having the
> > explicit calls there, though. Any suggestions to remove the
> > functions calls from there?
>
> How about this:
>
> define i32 @lo(i32) {
> ret i32 0
> }
>
> define i32 @hi(i32 %n) {
> ret i32 %n
> }
>
> declare i8* @wonder_allocator(i32)
>
> define i8* @foo(i32 %n) {
> %r = call i8* @wonder_allocator(i32 %n), !alloc !0
> ret i8* %r
> }
>
> !0 = metadata !{ i32 (i32)* @lo, i32 (i32)* @hi }
This is the format that I had in mind.
>
>
> The main problem I see is that if you declare @lo and @hi to have
> internal linkage then the optimizers will zap them. Maybe there's a
> neat solution to that.
I would consider the optimizer doing this a feature, not a problem.
That having been said, we need to make sure that the optimzer does not
zap them before the analysis/instrumentation passes get to run.
>
> > I feel that the offset function is probably not required. I've never
> > seen an allocation function that doesn't return a pointer to the
> > beginning of the allocated buffer. Also, I cannot remember of any
> > function in the C library that has that behavior.
>
> Yes, in C you probably never see such a thing, but we are not just
> dealing with C here. I think it is important to have the start
> offset as well as the length.
>
> > We will also need a convenient syntax to export this feature in the
> > languages we support.
>
> Actually, no you don't. You could just implement GCC's alloc_size in
> terms of this, at least for the moment. Even in the long term it's
> probably pretty pointless for clang to ever expose the start offset
> functionality since clang only supports C-like languages and probably
> (as you mentioned) this is pretty useless for them.
As I mentioned in my other response, posix_memalign and friends do
this. However, writing to the memory prior to the returned pointer
would still be an error, so perhaps this does not matter.
-Hal
>
> > I personally would like to see
> > '__attribute__((alloc_size( strlen(x)+1 ))' in C, but the
> > implementation seems to be non-trivial.
> >
> > About Duncan's comment about having the memory builtin analysis
> > recognize this intrinsic, well I agree it should (and I'll take care
> > of that), but I'm not sure if we should be very aggressive in
> > optimizing based on this metadata.
>
> It would be great for understanding that loads/stores from/to outside
> the bounds of the allocation result in undef. I think the optimizers
> already exploit this kind of info in the case of alloca - maybe this
> helps generalize to heap allocations.
>
> > For example, do we really want to remove a call to a custom
> > allocator whose return value is unused (like we do for malloc)?
>
> No we don't, so LLVM's interface to malloc-like and calloc-like
> things would have to be reworked to extract out different kinds of
> knowledge.
>
> If so, we'll
> > also need a metadata node to mark de-allocation functions (so that
> > sequences like my_free(my_malloc(xx)) are removed).
>
> Maybe!
>
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list