[LLVMbugs] [Bug 23305] New: Support __fp16 vectors

Tue Apr 21 15:22:49 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=23305

            Bug ID: 23305
           Summary: Support __fp16 vectors
           Product: clang
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: ahmed.bougacha at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

__fp16 is a storage-only type, and there are two CodeGen variants:
  - soften to i16, promote using llvm.convert.to/from.fp16 (e.g., X86)
  - when LangOptions::NativeHalfType or HalfArgsAndReturns, use the LLVM "half"
type, promote using fpext/fptrunc (e.g., AArch64)

In both cases, we don't do the right thing for vectors.

On X86, this:

    typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = *a + *b;
    }

generates the very broken:

      %3 = add <4 x i16> %1, %2

This is because the Sema::UsualUnaryConversions don't apply to VectorTypes (see
Sema::CheckVectorOperands), so we never try to promote to v4f32 (as we would
promote __fp16 to f32).

Even if we decide to reject that code and never do the implicit promotion, the
alternative is also broken:

    typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;
    typedef float __attribute__((__ext_vector_type__(4))) v4f32;

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = __builtin_convertvector(*a, v4f32);
    }

Generates:

      %2 = uitofp <4 x i16> %1 to <4 x float>

Even when "half" is used instead of i16 (AArch64, or after we migrate away from
the convert intrinsics), we generate IR without the promotion:

      %3 = fadd <4 x half> %1, %2

Relying on the backend to do the promotion.
However, this has slightly different semantics, because LLVM works at the
instruction level, and clang at the expression level.  Consider:

    void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
      *c = (*a + *b) + *c;
    }

Doing the promotion in clang means the intermediate result is a v4f32.  Doing
it in LLVM means the intermediate result is truncated back to v4f16, before
being extended again to v4f32.

This can give different result, and it's probably best to mirror the scalar
clang behavior of promoting entire expressions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150421/75f0bff0/attachment.html>