<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [X86][AVX] separate 2x128bit loads are not being merged into a single 256bit load."

   href="http://llvm.org/bugs/show_bug.cgi?id=21709">21709</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86][AVX] separate 2x128bit loads are not being merged into a single 256bit load.

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>andrea.dibiagio@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Example 1.

///

__m256 unaligned_loads(const float *ptr) {

  __m128 lo = _mm_loadu_ps( ptr + 0 );

  __m128 hi = _mm_loadu_ps( ptr + 4 );

  return _mm256_insertf128_ps( _mm256_castps128_ps256( lo ), hi, 1);

}

///

clang -march=btver2 -O2 -S -o -

  vmovups  (%rdi), %xmm0

  vinsertf128  $1, 16(%rdi), %ymm0, %ymm0

  retq

Ideally, it should generate a single 32B load:

  vmovups  (%rdi), %ymm0

Basically the backend should generate a single unaligned 32B load instead of

the sequence vmovups + vinsertf128 (+ folded load).

I think this should be done for AVX targets with feature 'FastUAMem' and not

'SlowUAMem32'.

Also (probably a minor/separate issue?) the vinsertf128 may cause an exception

if alignment checking is enabled and the current privilege level is 3.

Example 2.

///

__m256 aligned_loads(const float *ptr) {

  __m128 lo = _mm_load_ps( ptr + 0 );

  __m128 hi = _mm_load_ps( ptr + 4 );

  return _mm256_insertf128_ps( _mm256_castps128_ps256( lo ), hi, 1);

}

///

clang -march=btver2 -O2 -S -o -

  vmovaps  (%rdi), %xmm0

  vinsertf128  $1, 16(%rdi), %ymm0, %ymm0

  retq

Again, this could be folded into:

  vmovaps  (%rdi), %ymm0

As a side note:

the code from Example 1. is equivalent to the following code:

///

__m256 unaligned_loads_v2(const float *ptr) {

  __m128 lo = _mm_loadu_ps( ptr + 0 );

  __m128 hi = _mm_loadu_ps( ptr + 4 );

  return (__m256) __builtin_shufflevector(lo, hi, 0, 1, 2, 3, 4, 5, 6, 7);

}

///

Where the call to the x86 intrinsic _mm256_insertf128_ps has been replaced with

a __builtin_shufflevector call.

What I am trying to say here is that we could teach the instruction combiner

that a call to _mm256_insertf128_ps is actually equivalent to a shuffle that

performs a `concat_vector`. Basically, the instruction combiner could early

replace that call with a shuffle before we even reach the backend.

The codegen for 'unaligned_loads_v2' is still the same as 'unaligned_loads':

  vmovups  (%rdi), %xmm0

  vinsertf128  $1, 16(%rdi), %ymm0, %ymm0

  retq</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>