[llvm-dev] RFC: Adding argument allocas

Thu Dec 8 17:11:31 PST 2016

Hi Reid,

This seems pretty reasonable to me.

For the debug info people:

Reid and I also chatted about dbg.declare vs dbg.value as a consequence of
this. His work here is going to take us closer to being able to get rid of
dbg.declare, but not actually be part of it (and isn't planned - it doesn't
make sense as part of this).

-eric

On Thu, Dec 8, 2016 at 5:06 PM Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Clang is currently missing some low-hanging performance and code size
> opportunities when receiving function parameters. For most scalar
> parameters, Clang typically emits an alloca and stores the LLVM SSA
> argument value into the alloca to initialize it. With optimizations, this
> initialization is often removed, but it stays behind in debug builds and
> when the user takes the address of a parameter (
> https://llvm.org/bugs/show_bug.cgi?id=26328). In many cases, the memory
> allocation and store duplicate work already done by the caller.
>
> Case 1: The parameter is already being passed in memory. In this case, we
> waste memory and do an extra load and store to copy it into our own alloca.
> This is very common on 32-bit x86.
>
> Case 2: On Win64, the caller pre-allocates shadow stack slots for all
> register parameters. This allows the callee to “home” register parameters
> to ease debugging and the implementation of variadic functions. In this
> case, the store is not dead, but we fail to use the pre-allocated memory.
>
> Case 3: The parameter is a register parameter in a normal Sys-V ABI. In
> this case, nothing is wasted. Both the memory and store are needed.
>
> I think we can improve our code for cases 1 and 2 by making it possible to
> create allocas from parameters, and we can lower this down to a regular
> stack object with a store in case 3. The syntax for this would look like:
> define void @f(i32 %x) {
>   %px = alloca i32, argument i32 %x, align 4
>
> The semantics are the same as the following:
> define void @f(i32 %x) {
>   %px = alloca i32, align 4
>   store i32 %x, i32* %px, align 4
>
> It is invalid to make one of these argument allocas dynamic in any way,
> either by using inalloca, using a dynamic element count, or being outside
> the entry block.
>
> If the semantics are the same, it begs the question, why don’t we pattern
> match the alloca and store to elide dead stores and reuse existing argument
> stack slots? My main preference for adding a new way to do this is that it
> gives us simpler and smaller code in debug builds, where we presumably
> don’t want to do this kind of pattern recognition. My experience with our
> -O0 codegen is that we do a lot of useless copies to initialize parameters,
> and this leads to larger output size, which slows things down. Having a
> more easily recognizable idiom for getting the storage behind parameters if
> it exists feels like a win.
>
> Changing the semantics of alloca affects a lot of things, but I think it
> makes sense to extend alloca here rather than a new intrinsic or
> instruction that can create local variable stack memory that passes would
> have to reason about. Here’s a list of things we’d need to change off the
> top of my head:
> 1. Inliner: When hoisting static allocas to the entry block, the inliner
> will now strip the argument operand off of allocas and insert the
> equivalent store at the inlined call site.
> 2. Mem2reg: Loading from an argument alloca needs to produce the argument
> value instead of undef.
> 3. GVN: We need to apply the same logic to GVN and similar store to load
> forwarding transforms.
> 4. Instcombine: This transform has some simple store forwarding logic that
> would need to be updated.
>
> I’m sure there are more, but the changes seem worth it to get at these low
> hanging opportunities.
>
> One other questionable side benefit of doing this is that it would make it
> possible to implement va_start by taking the address of a parameter and
> doing pointer arithmetic. While that code is fairly invalid, it’s exactly
> what the MSVC STL headers do for 32-bit x86. If we make this work in Clang,
> we can remove our stdarg.h and vadefs.h wrapper headers. Users often pass
> flags that cause clang to skip these wrapper headers, and then they file
> bugs complaining that va lists don't work.
>
> At the end of the day, this feels like a straightforward engineering
> improvement to LLVM, and it feels worth doing to me. Does anyone feel
> otherwise or have any suggestions?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161209/48442fd7/attachment.html>