[PATCH] D14596: [SROA] Choose more profitable type in findCommonType

Wed Dec 9 16:38:47 PST 2015

Here is a simple case (not yet minimized):

#include <complex>
using namespace std;
typedef complex<float> DD;
extern DD dd, dd2;

DD compute(DD &c1, DD& c2){
   float r1 = c1.real();
    float i1 = c1.imag();
    float r2 = c2.real();
    float i2 = c2.imag();
   return DD(r1 * r2 - i1 * i2, r2 * i1 + r1 * i2);
}

void foo(int n) {
    int i;
    DD ldd = 0;

    for (i = 0; i < n; i++)
      ldd += compute(dd, dd2);

    dd = ldd;
}

On x86, the kernel loop generated by clang/LLVM is:

.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        movd    %eax, %xmm3
        addss   %xmm1, %xmm3
        movd    %xmm3, %eax
        addss   %xmm0, %xmm2
        decl    %edi
        jne     .LBB1_2

The better one should be:

L4:
        vaddss  %xmm2, %xmm0, %xmm0
        vaddss  %xmm3, %xmm1, %xmm1
        decl  %edi
        jne     .L4

On Wed, Dec 9, 2015 at 3:34 PM, Chandler Carruth <chandlerc at gmail.com> wrote:
> chandlerc added a comment.
>
> I was really confused about this patch until I read the bug. =]
>
> For the record, bugs don't go to a widely followed mailing list, and so it is often helpful to provide a summary of the problem being faced when posting a patch for review in addition to a link to the bug. Otherwise it can be hard to figure out.
>
> However, I don't think this is the correct fix. Regardless of what type SROA chooses to use, the optimizer should *always* fix the type when we discover that in the end loads and stores end up feeding floating point instructions.

Yes -- if there is a such pass that does the type fixup. Whatever that
pass is, it also needs to do the cost analysis in case of mixed uses.
The analysis does need to
1) look at frequency of uses
2) follow DU chain deeper than stopping at load/store boundary.

> I worked really hard to fix this exact issue so that SROA could stay simple and ignorant of such issues, and so we could continue to lower memcpy as integer loads and stores.

>
> It would really help to provide a reduced IR test case. The bug is full of C++ code and asm, but only has excerpts of the IR. The integer loads and stores should only exist within the function as long as there are no floating point operations on the loaded or stored type. The question is why that isn't proving to be the case, as there is code that tries to ensure that in the tree.
>

See above test case.

David
>
> http://reviews.llvm.org/D14596
>
>
>