[LLVMdev] llc -O# / opt -O# differences

Sat Jun 30 11:09:30 PDT 2012

Hey everyone,

I'm running stock LLVM 3.1 release. Both llc and opt programs have the
-O# arguments, however it looks like the results are somewhat
different. Here's a silly unoptimized bit of code which I'm generating
from my LLVM-backed program

; ModuleID = 'foo'

%Coord = type { double, double, double }

define double @foo(%Coord*, %Coord*) nounwind uwtable ssp {
entry:
  %dx_ptr = alloca double
  %b_ptr = alloca %Coord*
  %a_ptr = alloca %Coord*
  store %Coord* %0, %Coord** %a_ptr
  store %Coord* %1, %Coord** %b_ptr
  %a = load %Coord** %a_ptr
  %addr = getelementptr %Coord* %a, i64 0
  %2 = getelementptr inbounds %Coord* %addr, i32 0, i32 0
  %3 = load double* %2
  %b = load %Coord** %b_ptr
  %addr1 = getelementptr %Coord* %b, i64 0
  %4 = getelementptr inbounds %Coord* %addr1, i32 0, i32 0
  %5 = load double* %4
  %sub = fsub double %3, %5
  store double %sub, double* %dx_ptr
  %dx = load double* %dx_ptr
  %dx2 = load double* %dx_ptr
  %mult = fmul double %dx, %dx2
  ret double %mult
}

This roughly matches the following C code

struct Coord { double x; double y; double z; };

double foo(struct Coord * a, struct Coord * b) {
    dx = a[0].x - a[0].y;
    return dx * dx;
}

Running through opt

$ llvm-as < x.ll | opt -O3 | llc > y.s

Produces the following:

_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %entry
    movsd   (%rdi), %xmm0
    subsd   (%rsi), %xmm0
    mulsd   %xmm0, %xmm0
    ret
    .cfi_endproc

This also matches what clang compiles from the C function. However,
running through llc with the same optimization flag

$ llc -O3 x.ll -o z.s

_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %entry
    movq    %rdi, -24(%rsp)
    movq    %rsi, -16(%rsp)
    movq    -24(%rsp), %rax
    movsd   (%rax), %xmm0
    subsd   (%rsi), %xmm0
    movsd   %xmm0, -8(%rsp)
    mulsd   %xmm0, %xmm0
    ret
    .cfi_endproc

This matches the results of LLVMCreateTargetMachine with
CodeGenLevelAggressive followed by LLVMTargetMachineEmitToFile which
I'm using.
.
Is the llc/opt difference expected? I'm a bit confused since I'd
expect same -O level running the same optimization passes. I have to
admit I'm not well versed in assembly but to me it looks like opt
produces something that eliminates a bunch of stack loading ops. I'd
appreciate any insight into this.

Thanks,

Dimitri