sumarrray-dbl now fails because SROA is promoting a struct to an i512 and
codegen is shifting by the wrong amount.  Here's a testcase:

define void @test(<8 x double> *%P, i64* %Q) nounwind {
        %A = load <8 x double>* %P
        %B = bitcast <8 x double> %A to i512            ; <i512> [#uses=2]
        %C = lshr i512 %B, 448          ; <i512> [#uses=1]
        %D = trunc i512 %C to i64               ; <i64> [#uses=1]
        volatile store i64 %D, i64* %Q
        ret void

I get:
$ llvm-as < ~/t.ll | llc -march=x86-64

        movq    24(%rdi), %rax
        movq    %rax, (%rsi)

bit 448 = byte 56 not byte 24.  It looks like it is off by 32 or maybe
modulusing by 32 or something weird like that?

