[PATCH] D14933: Add the allocsize attribute to LLVM

Tue Feb 16 19:11:37 PST 2016

george.burgess.iv added a comment.

Let me modify your code sample a bit, so it represents what the transformation would really look like if we wanted to support arbitrary expressions.

>    define i8* @checked_malloc(i32 %N) {

>    	%1 = call i8* @object_malloc(i32 %N)

>    	%2 = call {i32, i32} @llvm.objectsize.and.object.offset(i8* %1)

>   	%3 = call {i32, i32} @__autogenerated_get_object_size_of_object_malloc(i32 %N)

>   	%c = icmp eq %2, %3

>   	call void @llvm.assume(i1 %c)

>   	%4 = extractelement {i32, i32} %2, 0

>   	%5 = icmp lt %4, %N

>   	br i1 %3, label %Trap, label %Return

>   	...

>   }

>    

So let's walk through what needs to happen here:

- If we don't run a special piece of code (`ObjectSizeOffsetEvaluator`), then the call to `@__autogenerated_get_object_size_of_object_malloc` must go away without a trace.
- If we do run `ObjectSizeOffsetEvaluator` on `%1`, `@__autogenerated_get_object_size_of_object_malloc` must stay, as it will be used

Because of this, we either have to get rid of `@__autogenerated_get_object_size_of_object_malloc` before inlining, or we need to mark it `noinline` so it's not impossible to remove from functions. The former case means we lose our `allocsize` *really* early, and the latter means we probably get slower code (especially if the function amounts to `return {N - 8, 8};`)

Also, we'd need a highly restricted user function constant folder to make this work, just like in http://reviews.llvm.org/D16390.

Plus, `@llvm.assume` says on the tin that it makes optimizing things harder, so this makes optimizing any function that calls an allocation function harder. Ouch. Okay, let's try something else.

  define i8* @checked_malloc(i32 %N) {
    %1 = call i8* @object_malloc(i32 %N)
    call void @llvm.assert_object_size_is(i8* %1, {i32, i32}(i32)* @__autogenerated_get_object_size_of_object_malloc, i32 %N)
    ...
  }

This seems more reasonable. Here are the problems it poses, as I see it:

- It still makes computing `@llvm.objectsize` more complex (instead of chasing a pointer through bitcasts/GEPs/... to its definition, you need to check at every GEP/bitcast/... for uses of the pointer that are calls to this intrinsic.
- We still need a special user function constant folder.
- We still need to rely on optimizations happening to `@__autogenerated_get_object_size_of_object_malloc` in order to have a chance of constant folding it.
- We waste time optimizing `@__autogenerated_get_object_size_of_object_malloc`, a function that may never even hit an object file.
- We would need to magically turn the intrinsic into an actual function call in some cases. If this happens late, the function won't get inlined. Like said above, if the function amounts to `return {N - 8, 8};`, that's not good.
- `@__autogenerated_get_object_size_of_object_malloc` isn't allowed to have any side-effects (because the intrinsic would presumably be treated as though it has no side-effects, much unlike a `call` instruction).
- We need to keep around anything passed to `@__autogenerated_get_object_size_of_object_malloc` until the intrinsic is deleted, so it may suffer from similar optimization problems as `@llvm.assume`.
- We still need to emit this at every callsite to any allocation function.

...And this is the best alternate design I can think of. Does this match what you were thinking? If so, I'm not at all convinced that it's strictly better than what I'm presenting here, which prevents no optimizations, is very lightweight, doesn't require special handling everywhere, doesn't need a custom constant folder, etc.

--------

Honestly, and this is me speaking from a position of complete ignorance as to what the actual real-life use case of this would be, in what case are tiny wrapper functions unacceptable? The only things I have to pull from are here: http://lists.llvm.org/pipermail/cfe-dev/2012-June/022272.html -- can you tell me why the below alternatives don't work just as well? Or give me an example where it wouldn't be possible to write a similar wrapper? I'm unable to think of one.

  // Original
  char *my_strdup(char *str) __attribute__((alloc_size_ex(strlen(str)+1)));

  // Modified
  extern char *_my_strdup_impl(const char *str, int N) __attribute__((alloc_size(1)));
  inline char *my_strdup(const char *str) {
    size_t total_len = strlen(str)+1;
    return _my_strdup_impl(str, total_len);
  }

  // ------------------

  // Original
  void *my_complex_alloc(int n, int size, int add) __attribute__((alloc_size_ex(n * size + add)));

  // Modified -- _impl returns the start of the buffer
  extern void *_my_complex_alloc_impl(int total_bytes, int n, int size, int add) __attribute__((alloc_size(0)));
  inline void *my_complex_alloc(int n, int size, int add) {
    return _my_complex_alloc_impl(n * size + add, n, size, add);
  }

  // ------------------
  // Original
  char *middle(int size) __attribute__((alloc_size_ex(size, size/2)));

  // Modified - Again, _middle_impl returns the start of the buffer
  extern char *_middle_impl(int size) __attribute__((alloc_size(0)));
  inline char *middle(int size) { return _middle_impl(size) + size/2; }

http://reviews.llvm.org/D14933