<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Support __fp16 vectors"
   href="https://llvm.org/bugs/show_bug.cgi?id=23305">23305</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Support __fp16 vectors
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>-New Bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>ahmed.bougacha@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>__fp16 is a storage-only type, and there are two CodeGen variants:
  - soften to i16, promote using llvm.convert.to/from.fp16 (e.g., X86)
  - when LangOptions::NativeHalfType or HalfArgsAndReturns, use the LLVM "half"
type, promote using fpext/fptrunc (e.g., AArch64)

In both cases, we don't do the right thing for vectors.

On X86, this:

    typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = *a + *b;
    }

generates the very broken:

      %3 = add <4 x i16> %1, %2


This is because the Sema::UsualUnaryConversions don't apply to VectorTypes (see
Sema::CheckVectorOperands), so we never try to promote to v4f32 (as we would
promote __fp16 to f32).


Even if we decide to reject that code and never do the implicit promotion, the
alternative is also broken:

    typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;
    typedef float __attribute__((__ext_vector_type__(4))) v4f32;

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = __builtin_convertvector(*a, v4f32);
    }

Generates:

      %2 = uitofp <4 x i16> %1 to <4 x float>


Even when "half" is used instead of i16 (AArch64, or after we migrate away from
the convert intrinsics), we generate IR without the promotion:

      %3 = fadd <4 x half> %1, %2

Relying on the backend to do the promotion.
However, this has slightly different semantics, because LLVM works at the
instruction level, and clang at the expression level.  Consider:

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = (*a + *b) + *c;
    }

Doing the promotion in clang means the intermediate result is a v4f32.  Doing
it in LLVM means the intermediate result is truncated back to v4f16, before
being extended again to v4f32.

This can give different result, and it's probably best to mirror the scalar
clang behavior of promoting entire expressions.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>