<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Inefficient code for fp16 vectors"

   href="https://llvm.org/bugs/show_bug.cgi?id=27222">27222</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Inefficient code for fp16 vectors

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>pirama@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org, srhines@google.com

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>We generate inefficient code for half vectors for some architectures.  Consider

the following IR:

define void @add_h(<4 x half>* %a, <4 x half>* %b) {

entry:

  %x = load <4 x half>, <4 x half>* %a, align 8

  %y = load <4 x half>, <4 x half>* %b, align 8

  %0 = fadd <4 x half> %x, %y

  store <4 x half> %0, <4 x half>* %a

  ret void

}

LLVM currently splits and scalarizes vectors.  IOW, it splits the <4 x half>

into 4 half datum and operates individually on them.  This prevents the backend

from selecting vector load and vector conversion instructions.  The code

generated has repeated 16-byte loads, converstion to fp32, addition, conversion

back to fp16 and a 16-byte store.

Here's the code generated for ARM32:

        ldrh    r4, [r1, #6]

        ldrh    r3, [r0, #6]

        ldrh    r12, [r1]

        ldrh    r2, [r0, #4]

        ldrh    lr, [r0, #2]

        vmov    s0, r4

        ldrh    r4, [r1, #2]

        ldrh    r1, [r1, #4]

        vmov    s2, r3

        ldrh    r3, [r0]

        vmov    s6, r2

        vmov    s10, lr

        vmov    s12, r12

        vcvtb.f32.f16   s0, s0

        vcvtb.f32.f16   s2, s2

        vadd.f32        s0, s2, s0

        vmov    s4, r1

        vmov    s8, r4

        vmov    s14, r3

        vcvtb.f32.f16   s4, s4

        vcvtb.f32.f16   s6, s6

        vcvtb.f32.f16   s2, s8

        vcvtb.f32.f16   s8, s10

        vcvtb.f32.f16   s10, s12

        vcvtb.f32.f16   s12, s14

        vcvtb.f16.f32   s0, s0

        vadd.f32        s4, s6, s4

        vadd.f32        s2, s8, s2

        vadd.f32        s6, s12, s10

        vmov    r1, s0

        vcvtb.f16.f32   s4, s4

        vcvtb.f16.f32   s0, s2

        vcvtb.f16.f32   s2, s6

        strh    r1, [r0, #6]

        vmov    r1, s4

        strh    r1, [r0, #4]

        vmov    r1, s0

        strh    r1, [r0, #2]

        vmov    r1, s2

        strh    r1, [r0]

In comparison, the same code gets translated to the following for AArch64:

        ldr             d0, [x1]

        ldr             d1, [x0]

        fcvtl   v0.4s, v0.4h

        fcvtl   v1.4s, v1.4h

        fadd    v0.4s, v1.4s, v0.4s

        fcvtn   v0.4h, v0.4s

        str             d0, [x0]

        ret

.Lfunc_end0:

This happens for the architectures whose LLVM backends don't natively support

half (such as x86, x86_64 and ARM32).</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>