<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Wrong code generated for shufflevector"
   href="https://bugs.llvm.org/show_bug.cgi?id=44783">44783</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Wrong code generated for shufflevector
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: AArch64
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>scw@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, peter.smith@linaro.org, Ties.Stuij@arm.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=23090" name="attach_23090" title="IR reproducer">attachment 23090</a> <a href="attachment.cgi?id=23090&action=edit" title="IR reproducer">[details]</a></span>
IR reproducer

C++ code snippet:

typedef long long raw128 __attribute__((vector_size(16)));                      
typedef long long raw256 __attribute__((vector_size(32)));                      
void f() {                                                                      
  raw128 a = {784, 785};                                                        
  raw256 c = __builtin_shufflevector(a, a, 0, 1, 0, 1);                         
  (void) c;                                                                     
}

IR attached (t.ll).

Command: llc -O0 --target=aarch64 t.ll

Result:

        .text                                                                   
        .file   "t.cc"                                                          
        .globl  _Z1fv                   // -- Begin function _Z1fv              
        .p2align        2                                                       
        .type   _Z1fv,@function                                                 
_Z1fv:                                  // @_Z1fv                               
.L_Z1fv$local:                                                                  
// %bb.0:                               // %entry                               
        sub     sp, sp, #48             // =48                                  
        mov     x8, #784                                                        
        mov     x9, #785                                                        
                                        // implicit-def: $q0                    
        fmov    d0, x8                                                          
        mov     v0.d[1], x9                                                     
        str     q0, [sp, #32]                                                   
        ldr     q0, [sp, #32]                                                   
        mov     v1.16b, v0.16b                                                  
        mov     d2, v0.d[1]                                                     
        mov     v3.16b, v0.16b                                                  
        mov     d4, v0.d[1]                                                     
        str     d1, [sp]                                                        
        str     d1, [sp, #8]                                                    
        str     d3, [sp, #16]                                                   
        str     d3, [sp, #24]                                                   
        add     sp, sp, #48             // =48                                  
        ret                                                                     
.Lfunc_end0:                                                                    
        .size   _Z1fv, .Lfunc_end0-_Z1fv                                        
                                        // -- End function 

Expected result:

The four "str" toward the end of the function should be storing d1, d2, d3, d4
onto the stack, instead of d1 twice and d3 twice.

I tracked it down to AArch64InstructionSelector::preISelLower() on the G_STORE
in the following snippet:

  %11:fpr(s64), %12:fpr(s64) = G_UNMERGE_VALUES %5:fpr(<2 x s64>)
  %22:gpr(s64) = COPY %12:fpr(s64)
  G_STORE %22:gpr(s64), %17:gpr(p0) :: (store 8 into %ir.c + 24, align 16)

AArch64InstructionSelector::contractCrossBankCopyIntoStore() calls
llvm::getDefIgnoringCopies() to get the instruction where %22 originates and
got G_UNMERGE_VALUES. It then substitutes %22 with the operand 0 of
G_UNMERGE_VALUES, i.e. %11. However, G_UNMERGE_VALUES has two outputs and %22
is actually from %12, the operand 1.

AArch64InstructionSelector::contractCrossBankCopyIntoStore() seems to assume
that each MIR has at most one output, but that's not the case here.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>