[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

Wed Jan 5 14:32:31 PST 2022

Hi everyone! I’m working on making the Rust compiler being able to track
LLVM HEAD more closely, and as part of that we need to obviate a patch[0]
that teaches LLVM about some Rust allocator implementation details. This
proposal is the product of many conversations and a couple of failed
attempts at simpler implementations.

Background

========

Rust uses LLVM for codegen, and has its own allocator functions. In order
for LLVM to correctly optimize out allocations we have to tell the
optimizer about the allocation/deallocation functions used by Rust.

Languages supported by Clang, such as C and C++, have stable symbol names
for their allocation functions, which are hardcoded in LLVM[1][2].
Unfortunately, this strategy does not work for Rust, where developers don't
want to commit to a particular symbol name and calling convention yet.

Proposal

=======

We add two attributes to LLVM IR:

 * `allocator(FAMILY)`: Marks a function as part of an allocator family,
named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
`allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).

 * `releaseptr(idx)`: Indicates that the function releases the pointer that
is its Nth argument.

These attributes, combined with the existing `allocsize(n[, m])` attribute
lets us annotate alloc, realloc, and free type functions in LLVM IR, which
relieves Rust of the need to carry a patch to describe its allocator
functions to LLVM’s optimizer. Some example IR of what this might look like:

; Function Attrs: nounwind ssp

define i8* @test5(i32 %n) #4 {

entry:

  %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8

  %1 = load i8*, i8** @s, align 8

  call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1
dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1,
i32 10, i1 false) #0

  ret i8* %0

}

attributes #8 = { nounwind allocsize(0) "allocator"="malloc" }

Similarly, the call `free(foo)` would get the attributes
`”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets
`”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the
`releaseptr(n)` attribute is 1-indexed to avoid issues with storing zero
values in attributes in my current draft - I’m very open to suggestions to
change that, this just seemed like the right solution rather than adding
getters/setters everywhere to increment/decrement a value.

Benefits

=======

In addition to the benefits for Rust, the LLVM optimizer could also be
improved to not optimize away defects like

{

  auto *foo = new Thing();

  free(foo);

}

which would then correctly crash instead of silently “working” until
something actually uses the allocation. Similarly, there’s a potential
defect when only one side of an overridden operator::new and
operator::delete is visible to the optimizer and inlineable, which can look
indistinguishable from the above after inlining.

This also probably opens the door to fixing issues like
https://bugs.llvm.org/show_bug.cgi?id=49022 caused by overloading the
`builtin` annotation on allocator functions, but I’m unlikely to continue
in that direction.

What do people think?

Thanks,

Augie

[0]
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5

[1]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73

[2]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/c81e63a4/attachment.html>