[LLVMbugs] [Bug 1226] NEW: scalarrepl should be able to scalarrepl aggregates with memcpy uses

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Sun Feb 25 12:35:26 PST 2007


           Summary: scalarrepl should be able to scalarrepl aggregates with
                    memcpy uses
           Product: libraries
           Version: 1.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Scalar Optimizations
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sabre at nondot.org


#include <tr1/functional>
#include <algorithm>
void assign( long* variable, long v) {
        std::transform( variable, variable + 1, variable,
                std::tr1::bind( std::plus< long >(), 0L, v ) );

This compiles to a single store on x86, but a whole ton of code on x86-64.  This is because the 
temporary structs are larger on x86-64, so EmitAggregateCopy in llvm-gcc emits them as a memcpy 
instead of scalar transfers.

The problem is that this later blocks scalarrepl from promoting the structs, causing much worse 

__Z6assignRll:    # x86-32
        movl 8(%esp), %eax
        movl 4(%esp), %ecx
        movl %eax, (%ecx)

__Z6assignRll:   # x86-64
        subq $88, %rsp
        movb $0, 64(%rsp)
        movq $0, 72(%rsp)
        movq %rsi, 80(%rsp)
        movq %rsi, 48(%rsp)
        movq 72(%rsp), %rax
        movq %rax, 40(%rsp)
        movq 64(%rsp), %rax
        movq %rax, 32(%rsp)
        movq 40(%rsp), %rax
        movq %rax, 8(%rsp)
        movq 48(%rsp), %rax
        movq %rax, 16(%rsp)
        movq 32(%rsp), %rax
        movq %rax, (%rsp)
        movq 16(%rsp), %rax
        addq 8(%rsp), %rax
        movq %rax, (%rdi)
        addq $88, %rsp


