[llvm-dev] RFC: Adding argument allocas

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Sat Dec 10 08:20:00 PST 2016


----- Original Message -----

> From: "Reid Kleckner via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, December 8, 2016 7:05:44 PM
> Subject: [llvm-dev] RFC: Adding argument allocas

> Clang is currently missing some low-hanging performance and code size
> opportunities when receiving function parameters. For most scalar
> parameters, Clang typically emits an alloca and stores the LLVM SSA
> argument value into the alloca to initialize it. With optimizations,
> this initialization is often removed, but it stays behind in debug
> builds and when the user takes the address of a parameter (
> https://llvm.org/bugs/show_bug.cgi?id=26328 ). In many cases, the
> memory allocation and store duplicate work already done by the
> caller.

> Case 1: The parameter is already being passed in memory. In this
> case, we waste memory and do an extra load and store to copy it into
> our own alloca. This is very common on 32-bit x86.

> Case 2: On Win64, the caller pre-allocates shadow stack slots for all
> register parameters. This allows the callee to “home” register
> parameters to ease debugging and the implementation of variadic
> functions. In this case, the store is not dead, but we fail to use
> the pre-allocated memory.

> Case 3: The parameter is a register parameter in a normal Sys-V ABI.
> In this case, nothing is wasted. Both the memory and store are
> needed.

> I think we can improve our code for cases 1 and 2 by making it
> possible to create allocas from parameters, and we can lower this
> down to a regular stack object with a store in case 3. The syntax
> for this would look like:
> define void @f(i32 %x) {
> %px = alloca i32, argument i32 %x, align 4

> The semantics are the same as the following:
> define void @f(i32 %x) {

> %px = alloca i32, align 4
> store i32 %x, i32* %px, align 4

> It is invalid to make one of these argument allocas dynamic in any
> way, either by using inalloca, using a dynamic element count, or
> being outside the entry block.

Having an alloca with an initializer seems like a reasonable enhancement. Please, however, without all of these special restrictions: any value should be accepted on any alloca. We can match the special argument cases in the backend and otherwise lower to the equivalent of an alloca+store. 

-Hal 

> If the semantics are the same, it begs the question, why don’t we
> pattern match the alloca and store to elide dead stores and reuse
> existing argument stack slots? My main preference for adding a new
> way to do this is that it gives us simpler and smaller code in debug
> builds, where we presumably don’t want to do this kind of pattern
> recognition. My experience with our -O0 codegen is that we do a lot
> of useless copies to initialize parameters, and this leads to larger
> output size, which slows things down. Having a more easily
> recognizable idiom for getting the storage behind parameters if it
> exists feels like a win.

> Changing the semantics of alloca affects a lot of things, but I think
> it makes sense to extend alloca here rather than a new intrinsic or
> instruction that can create local variable stack memory that passes
> would have to reason about. Here’s a list of things we’d need to
> change off the top of my head:
> 1. Inliner: When hoisting static allocas to the entry block, the
> inliner will now strip the argument operand off of allocas and
> insert the equivalent store at the inlined call site.
> 2. Mem2reg: Loading from an argument alloca needs to produce the
> argument value instead of undef.
> 3. GVN: We need to apply the same logic to GVN and similar store to
> load forwarding transforms.
> 4. Instcombine: This transform has some simple store forwarding logic
> that would need to be updated.

> I’m sure there are more, but the changes seem worth it to get at
> these low hanging opportunities.

> One other questionable side benefit of doing this is that it would
> make it possible to implement va_start by taking the address of a
> parameter and doing pointer arithmetic. While that code is fairly
> invalid, it’s exactly what the MSVC STL headers do for 32-bit x86.
> If we make this work in Clang, we can remove our stdarg.h and
> vadefs.h wrapper headers. Users often pass flags that cause clang to
> skip these wrapper headers, and then they file bugs complaining that
> va lists don't work.

> At the end of the day, this feels like a straightforward engineering
> improvement to LLVM, and it feels worth doing to me. Does anyone
> feel otherwise or have any suggestions?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161210/e56670a8/attachment.html>


More information about the llvm-dev mailing list