<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Poor code generation for NEON unaligned loads/stores"

   href="https://bugs.llvm.org/show_bug.cgi?id=39414">39414</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Poor code generation for NEON unaligned loads/stores

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>7.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: ARM

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>fabiang@radgametools.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Sample repro on Godbolt:

<a href="https://godbolt.org/z/itWjaD">https://godbolt.org/z/itWjaD</a>

Here's the (C++) code verbatim:

----

#include <arm_neon.h>

typedef uint32_t U32unalign __attribute__((aligned(1)));

void f_unalign(void *p, uint8x8_t v)

{

    vst1_lane_u32((U32unalign *) p, vreinterpret_u32_u8(v), 0);

}

void f_align(void *p, uint8x8_t v)

{

    vst1_lane_u32((uint32_t *) p, vreinterpret_u32_u8(v), 0);

}

uint8x8_t g_unalign(const void *p)

{

    return vld1_dup_u32((U32unalign *)p);

}

uint8x8_t g_align(const void *p)

{

    return vld1_dup_u32((uint32_t *)p);

}

----

clang -target armv7a-none-eabi -O2 -S produces:

----

f_unalign(void*, __simd64_uint8_t):

        vmov    d16, r2, r3

        vmov.32 r1, d16[0]

        strb    r1, [r0]

        lsr     r2, r1, #24

        strb    r2, [r0, #3]

        lsr     r2, r1, #16

        lsr     r1, r1, #8

        strb    r2, [r0, #2]

        strb    r1, [r0, #1]

        bx      lr

f_align(void*, __simd64_uint8_t):

        vmov    d16, r2, r3

        vst1.32 {d16[0]}, [r0:32]

        bx      lr

g_unalign(void const*):

        ldrb    r1, [r0]

        ldrb    r2, [r0, #1]

        ldrb    r3, [r0, #2]

        ldrb    r0, [r0, #3]

        orr     r1, r1, r2, lsl #8

        orr     r0, r3, r0, lsl #8

        orr     r0, r1, r0, lsl #16

        vdup.32 d16, r0

        vmov    r0, r1, d16

        bx      lr

g_align(void const*):

        vld1.32 {d16[]}, [r0:32]

        vmov    r0, r1, d16

        bx      lr

----

The code I would expect to see for the unaligned variants is:

----

f_unalign(void*, __simd64_uint8_t):

        vmov    d16, r2, r3

        vst1.32 {d16[0]}, [r0] ; same as f_align except for :32 alignment

specifier

        bx      lr

g_unalign(void const*):

        vld1.32 {d16[]}, [r0] ; same here

        vmov    r0, r1, d16

        bx      lr

----

This is especially awkward with the NEON intrinsics because the single-lane

variants of vst1.32 (or .16) can be used to either:

- Store a single 32-bit lane (the intrinsics are OK for this)

- Store contiguous 32 bits (or 16 bits) of a narrower type, which the

intrinsics can't really express, requiring awkward workarounds like the

"U32unalign" construction above.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>