[llvm-dev] RFC: Adding argument allocas

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Thu Dec 8 17:37:09 PST 2016


> On Dec 8, 2016, at 5:05 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Clang is currently missing some low-hanging performance and code size opportunities when receiving function parameters. For most scalar parameters, Clang typically emits an alloca and stores the LLVM SSA argument value into the alloca to initialize it. With optimizations, this initialization is often removed, but it stays behind in debug builds and when the user takes the address of a parameter (https://llvm.org/bugs/show_bug.cgi?id=26328 <https://llvm.org/bugs/show_bug.cgi?id=26328>). In many cases, the memory allocation and store duplicate work already done by the caller.
> 
> Case 1: The parameter is already being passed in memory. In this case, we waste memory and do an extra load and store to copy it into our own alloca. This is very common on 32-bit x86.
> 
> Case 2: On Win64, the caller pre-allocates shadow stack slots for all register parameters. This allows the callee to “home” register parameters to ease debugging and the implementation of variadic functions. In this case, the store is not dead, but we fail to use the pre-allocated memory.
> 
> Case 3: The parameter is a register parameter in a normal Sys-V ABI. In this case, nothing is wasted. Both the memory and store are needed.
> 
> I think we can improve our code for cases 1 and 2 by making it possible to create allocas from parameters, and we can lower this down to a regular stack object with a store in case 3. The syntax for this would look like:
> define void @f(i32 %x) {
>   %px = alloca i32, argument i32 %x, align 4
> 
> The semantics are the same as the following:
> define void @f(i32 %x) {
>   %px = alloca i32, align 4
>   store i32 %x, i32* %px, align 4
> 
> It is invalid to make one of these argument allocas dynamic in any way, either by using inalloca, using a dynamic element count, or being outside the entry block.
> 
> If the semantics are the same, it begs the question, why don’t we pattern match the alloca and store to elide dead stores and reuse existing argument stack slots? My main preference for adding a new way to do this is that it gives us simpler and smaller code in debug builds, where we presumably don’t want to do this kind of pattern recognition.

So IIUC basically the *only* reason for this IR change is that we don’t want to pattern match in debug build?
I don't understand right now why we wouldn’t want to do this?

Thanks,

— 
Mehdi 



> My experience with our -O0 codegen is that we do a lot of useless copies to initialize parameters, and this leads to larger output size, which slows things down. Having a more easily recognizable idiom for getting the storage behind parameters if it exists feels like a win.
> 
> Changing the semantics of alloca affects a lot of things, but I think it makes sense to extend alloca here rather than a new intrinsic or instruction that can create local variable stack memory that passes would have to reason about. Here’s a list of things we’d need to change off the top of my head:
> 1. Inliner: When hoisting static allocas to the entry block, the inliner will now strip the argument operand off of allocas and insert the equivalent store at the inlined call site.
> 2. Mem2reg: Loading from an argument alloca needs to produce the argument value instead of undef.
> 3. GVN: We need to apply the same logic to GVN and similar store to load forwarding transforms.
> 4. Instcombine: This transform has some simple store forwarding logic that would need to be updated.
> 
> I’m sure there are more, but the changes seem worth it to get at these low hanging opportunities.
> 
> One other questionable side benefit of doing this is that it would make it possible to implement va_start by taking the address of a parameter and doing pointer arithmetic. While that code is fairly invalid, it’s exactly what the MSVC STL headers do for 32-bit x86. If we make this work in Clang, we can remove our stdarg.h and vadefs.h wrapper headers. Users often pass flags that cause clang to skip these wrapper headers, and then they file bugs complaining that va lists don't work.
> 
> At the end of the day, this feels like a straightforward engineering improvement to LLVM, and it feels worth doing to me. Does anyone feel otherwise or have any suggestions?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161208/cd47a2fb/attachment.html>


More information about the llvm-dev mailing list