[llvm-dev] RFC: Adding argument allocas

Thu Dec 8 17:05:44 PST 2016

Clang is currently missing some low-hanging performance and code size
opportunities when receiving function parameters. For most scalar
parameters, Clang typically emits an alloca and stores the LLVM SSA
argument value into the alloca to initialize it. With optimizations, this
initialization is often removed, but it stays behind in debug builds and
when the user takes the address of a parameter (
https://llvm.org/bugs/show_bug.cgi?id=26328). In many cases, the memory
allocation and store duplicate work already done by the caller.

Case 1: The parameter is already being passed in memory. In this case, we
waste memory and do an extra load and store to copy it into our own alloca.
This is very common on 32-bit x86.

Case 2: On Win64, the caller pre-allocates shadow stack slots for all
register parameters. This allows the callee to “home” register parameters
to ease debugging and the implementation of variadic functions. In this
case, the store is not dead, but we fail to use the pre-allocated memory.

Case 3: The parameter is a register parameter in a normal Sys-V ABI. In
this case, nothing is wasted. Both the memory and store are needed.

I think we can improve our code for cases 1 and 2 by making it possible to
create allocas from parameters, and we can lower this down to a regular
stack object with a store in case 3. The syntax for this would look like:
define void @f(i32 %x) {
  %px = alloca i32, argument i32 %x, align 4

The semantics are the same as the following:
define void @f(i32 %x) {
  %px = alloca i32, align 4
  store i32 %x, i32* %px, align 4

It is invalid to make one of these argument allocas dynamic in any way,
either by using inalloca, using a dynamic element count, or being outside
the entry block.

If the semantics are the same, it begs the question, why don’t we pattern
match the alloca and store to elide dead stores and reuse existing argument
stack slots? My main preference for adding a new way to do this is that it
gives us simpler and smaller code in debug builds, where we presumably
don’t want to do this kind of pattern recognition. My experience with our
-O0 codegen is that we do a lot of useless copies to initialize parameters,
and this leads to larger output size, which slows things down. Having a
more easily recognizable idiom for getting the storage behind parameters if
it exists feels like a win.

Changing the semantics of alloca affects a lot of things, but I think it
makes sense to extend alloca here rather than a new intrinsic or
instruction that can create local variable stack memory that passes would
have to reason about. Here’s a list of things we’d need to change off the
top of my head:
1. Inliner: When hoisting static allocas to the entry block, the inliner
will now strip the argument operand off of allocas and insert the
equivalent store at the inlined call site.
2. Mem2reg: Loading from an argument alloca needs to produce the argument
value instead of undef.
3. GVN: We need to apply the same logic to GVN and similar store to load
forwarding transforms.
4. Instcombine: This transform has some simple store forwarding logic that
would need to be updated.

I’m sure there are more, but the changes seem worth it to get at these low
hanging opportunities.

One other questionable side benefit of doing this is that it would make it
possible to implement va_start by taking the address of a parameter and
doing pointer arithmetic. While that code is fairly invalid, it’s exactly
what the MSVC STL headers do for 32-bit x86. If we make this work in Clang,
we can remove our stdarg.h and vadefs.h wrapper headers. Users often pass
flags that cause clang to skip these wrapper headers, and then they file
bugs complaining that va lists don't work.

At the end of the day, this feels like a straightforward engineering
improvement to LLVM, and it feels worth doing to me. Does anyone feel
otherwise or have any suggestions?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161208/d5357451/attachment-0001.html>