<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [AArch64] Vector store of scalars produces sub-optimal code"

   href="https://bugs.llvm.org/show_bug.cgi?id=43460">43460</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[AArch64] Vector store of scalars produces sub-optimal code

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: AArch64

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>florian_hahn@apple.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, peter.smith@linaro.org, Ties.Stuij@arm.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>It looks like we fail to generate optimal code when doing a vector store of a

vector of scalars. Consider the examples below (or on

<a href="https://godbolt.org/z/hpv4S9">https://godbolt.org/z/hpv4S9</a>) . I am not sure how common those cases actually

are, I just stumbled across this while looking at

<a href="http://lists.llvm.org/pipermail/llvm-dev/2019-September/135432.html">http://lists.llvm.org/pipermail/llvm-dev/2019-September/135432.html</a> .

define void @const_vec(<2 x i32>* %c)  {

  store <2 x i32> <i32 2, i32 3>, <2 x i32>* %c, align 16

  ret void

}

define void @const_split(<4 x i32>* %c)  {

entry:

  %0 = getelementptr inbounds <4 x i32>, <4 x i32>* %c, i64 0, i64 0

  store i32 1, i32* %0, align 4

  %1 = getelementptr <4 x i32>, <4 x i32>* %c, i64 0, i64 1

  store i32 2, i32* %1, align 4

   ret void

}

With llc -O3 -mtriple=aarch64, we generate the assembly below. For the vector

version, we miss that we can use movk and instead load the constants from

memory.

.LCPI0_0:

  .word 2 // 0x2

  .word 3 // 0x3

const_vec: // @const_vec

  adrp x8, .LCPI0_0

  ldr d0, [x8, :lo12:.LCPI0_0]

  str d0, [x0]

  ret

const_split: // @const_split

  mov x8, #1

  movk x8, #2, lsl #32

  str x8, [x0]

  ret

For the case we store 2 arbitrary i32, we have an extra fmov and mov with the

vector version.

define void @var_vec_2(<2 x i32>* %c, i32 %a, i32 %b)  {

  %ins1 = insertelement <2 x i32> undef, i32 %a, i32 0

  %ins2 = insertelement <2 x i32> %ins1, i32 %b, i32 1

  store <2 x i32> %ins2, <2 x i32>* %c, align 16

  ret void

}

define void @var_split(<4 x i32>* %c, i32 %a, i32 %b)  {

entry:

  %0 = getelementptr inbounds <4 x i32>, <4 x i32>* %c, i64 0, i64 0

  store i32 %a, i32* %0, align 4

  %1 = getelementptr <4 x i32>, <4 x i32>* %c, i64 0, i64 1

  store i32 %b, i32* %1, align 4

   ret void

}

var_vec_2: // @var_vec_2

  fmov s0, w1

  mov v0.s[1], w2

  str d0, [x0]

  ret

var_split: // @var_split

  stp w1, w2, [x0]

  ret</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>