[PATCH] [CodeGen] Replace the outgoing chain when reusing stores for extractelt expansion.

Ahmed Bougacha ahmed.bougacha at gmail.com
Mon Mar 9 13:26:53 PDT 2015


Hi hfinkel,

This fixes a subtle issue that was introduced in r205153 [0].

When reusing a store for the extractelement expansion (to load directly from it, inserting of going through the stack), later stores to the same location might have overwritten the data we were expecting to extract from.  To fix that, we need to explicitly replace the chain going out of the reused store, so that later stores also depend on the generated element-extracting loads.


Here's a small annotated testcase, convoluted as it may be (any simpler and we're able to realize this is just a vector shuffle):


```
typedef int32_t v4i32 __attribute__ ((vector_size (16)));

void test(int32_t *x, int32_t *y, int i) {
  v4i32 b = (*(v4i32*)y);
  x[i*4 + 1] = b[i*4 + 0];
  x[i*4 + 0] = b[i*4 + 1];
}
```



Let's assume i is dynamically always 0.  Consider the case where x == y.  The function then does something like the swap pattern:

```
  b = *x;
  x[1] = b[0];
  x[0] = b[1];
```


The compiler used to generate:


```
        movl    8(%ebp), %eax        ; eax <- x
        movl    16(%ebp), %ecx       ; ecx <- i
        movl    12(%ebp), %edx       ; edx <- y
        movaps  (%edx), %xmm0        ; b <- *y

        movaps  %xmm0, -40(%ebp)     ; tmp = b
        shll    $4, %ecx             ; ecx <- ecx * 4;  ecx = i = 0
        leal    -40(%ebp), %edx      ;
        movl    (%ecx,%edx), %edx    ; edx <- tmp[0]
        movl    %edx, 4(%eax,%ecx)   ; x[1] = tmp[0] = b[0]
        movaps  %xmm0, -24(%ebp)     ; tmp2 = b
        leal    -24(%ebp), %edx      ;
        movl    4(%ecx,%edx), %edx   ; edx <- tmp2[1]
        movl    %edx, (%eax,%ecx)    ; x[0] = tmp2[1] = b[1]
```

The new code however is sometimes problematic:

```
        movl    8(%ebp), %eax        ; eax <- x
        movl    16(%ebp), %ecx       ; ecx <- i
        movl    12(%ebp), %edx       ; edx <- y
        movaps  (%edx), %xmm0        ; b <- *y

        movaps  %xmm0, -24(%ebp)     ; tmp = b
        shll    $4, %ecx             ; ecx <- ecx * 4;  ecx = i = 0
        leal    -24(%ebp), %edx      ;
        movl    (%ecx,%edx), %esi    ; esi <- tmp[0]
        movl    %esi, 4(%eax,%ecx)   ; 
        movl    4(%ecx,%edx), %edx   ; edx <- tmp[0]
        movl    %edx, (%eax,%ecx)    ; x[0] = tmp[0]

```
Broadcasting x[0].


[0] Quoting:
    Make use of previously generated stores in SelectionDAGLegalize::ExpandExtractFromVectorThroughStack

    When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
    SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
    vector and then load the piece we want. This is fine in isolation, but
    generating a new store (and corresponding stack slot) for each extraction ends
    up producing code of poor quality. When we scalarize a vector operation (using
    SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
    for each element in the vector. This used to generate one stored copy of the
    vector for each element in the vector. Now we search the uses of the vector for
    a suitable store before generating a new one, which results in much more
    efficient scalarization code.

http://reviews.llvm.org/D8180

Files:
  lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
  test/CodeGen/X86/extractelement-legalization-store-ordering.ll
  test/CodeGen/X86/vector-idiv.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D8180.21506.patch
Type: text/x-patch
Size: 16135 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150309/b7b049a6/attachment.bin>


More information about the llvm-commits mailing list