<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [X86][AVX] Expansion of 256 bit vector loads fails to fold into shuffles"

   href="http://llvm.org/bugs/show_bug.cgi?id=21780">21780</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86][AVX] Expansion of 256 bit vector loads fails to fold into shuffles

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>llvm-dev@redking.me.uk

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Follow up to [<a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - [X86][AVX] suboptimal expansion of 256 bit vector loads."

   href="show_bug.cgi?id=21710">Bug #21710</a>] '[X86][AVX] suboptimal expansion of 256 bit vector

loads.'

Merging of consecutive loads into a 256-bit ymm register now works well for

simple cases, and the loads also fold nicely for bitwise ops (as well as basic

float ops - fadd, fsub etc.). 

Vector shuffle optimizations however attempt to selectively load individual

lanes and in doing so prevent the optimization from folding the load into the

shuffle.

e.g.

__m256d vsht_d4(__m256d foo) { 

  return __builtin_shufflevector( foo, foo, 0, 0, 2, 2 ); 

} 

define <4 x double> @_Z7vsht_d4Dv4_d(<4 x double> %foo) #1 {

  %1 = shufflevector <4 x double> %foo, <4 x double> undef, <4 x i32> <i32 0,

i32 0, i32 2, i32 2>

  ret <4 x double> %1

}

vpermilpd $0, %ymm0, %ymm0 # ymm0 = ymm0[0,0,2,2] 

retq 

__m256d vsht_d4_fold(const double* ptr) {

  __m256d foo = (__m256d){ ptr[0], ptr[1], ptr[2], ptr[3] }; 

  return __builtin_shufflevector( foo, foo, 0, 0, 2, 2 ); 

}

define <4 x double> @_Z12vsht_d4_foldPKd(double* nocapture readonly %ptr) #0 {

  %1 = load double* %ptr, align 8, !tbaa !1

  %2 = insertelement <4 x double> undef, double %1, i32 0

  %3 = getelementptr inbounds double* %ptr, i64 2

  %4 = load double* %3, align 8, !tbaa !1

  %5 = insertelement <4 x double> %2, double %4, i32 2

  %6 = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> <i32 0, i32

0, i32 2, i32 2>

  ret <4 x double> %6

}

vmovsd (%rdi), %xmm0 

vmovsd 16(%rdi), %xmm1 

vinsertf128 $1, %xmm1, %ymm0, %ymm0 

vpermilpd $0, %ymm0, %ymm0 # ymm0 = ymm0[0,0,2,2] 

retq 

Manually editing the IR does permit the fold to occur:

define <4 x double> @_Z12vsht_d4_foldPKd(double* nocapture readonly %ptr) #0 {

  %1 = load double* %ptr, align 8, !tbaa !1

  %2 = insertelement <4 x double> undef, double %1, i32 0

  %3 = getelementptr inbounds double* %ptr, i64 1

  %4 = load double* %3, align 8, !tbaa !1

  %5 = insertelement <4 x double> %2, double %4, i32 1

  %6 = getelementptr inbounds double* %ptr, i64 2

  %7 = load double* %6, align 8, !tbaa !1

  %8 = insertelement <4 x double> %5, double %7, i32 2

  %9 = getelementptr inbounds double* %ptr, i64 3

  %10 = load double* %9, align 8, !tbaa !1

  %11 = insertelement <4 x double> %8, double %10, i32 3

  %12 = shufflevector <4 x double> %11, <4 x double> undef, <4 x i32> <i32 0,

i32 0, i32 2, i32 2>

  ret <4 x double> %12

}

vpermilpd $0, (%rdi), %ymm0 # ymm0 = mem[0,0,2,2]

retq</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>