[PATCH] D40112: [CodeGen][X86] Fix handling of __fp16 vectors

Akira Hatanaka via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 15 16:46:49 PST 2017


ahatanak created this revision.
Herald added a subscriber: javed.absar.

IRGen for __fp16 vectors on X86 is currently completely broken. For example when the following code is compiled:

  half4 hv0, hv1, hv2; // these are vectors of __fp16.
  
  void foo221() {
    hv0 = hv1 + hv2;
  }

clang generates the following IR, in which two i16 values are added:

  @hv1 = common global <4 x i16> zeroinitializer, align 8
  @hv2 = common global <4 x i16> zeroinitializer, align 8
  @hv0 = common global <4 x i16> zeroinitializer, align 8
  
  define void @foo221() {
  entry:
    %0 = load <4 x i16>, <4 x i16>* @hv1, align 8
    %1 = load <4 x i16>, <4 x i16>* @hv2, align 8
    %add = add <4 x i16> %0, %1
    store <4 x i16> %add, <4 x i16>* @hv0, align 8
    ret void
  }

To fix IRGen for __fp16 vectors, this patch uses the code committed in r314056, which modified clang to promote and truncate __fp16 vectors to and from float vectors in the AST. Also, as the first step toward doing away with the __fp16 conversion intrinsics such as @llvm.convert.to.fp16 (see http://lists.llvm.org/pipermail/llvm-dev/2014-July/074689.html),  I made changes to IRGen for __fp16 scalars so that fpext/fptrunc instructions are emitted instead of the __fp16 conversion intrinsics IRGen currently emits. This fixes another IRGen bug where a short value is assigned to an __fp16 variable without any integer-to-floating-point conversion, as shown in the following example:

C code
------

  __fp16 a;
  short b;
  
  void foo1() {
    a = b;
  }



generated IR
------------

  @b = common global i16 0, align 2
  @a = common global i16 0, align 2
  
  define void @foo1() #0 {
  entry:
    %0 = load i16, i16* @b, align 2
    store i16 %0, i16* @a, align 2
    ret void
  }

I haven't spent too much time inspecting the code the X86 backend emits, but the code I've seen so far seems at least functionally correct (although it doesn't look very efficient since the backend scalarizes __fp16 vectors).


https://reviews.llvm.org/D40112

Files:
  include/clang/Basic/TargetInfo.h
  lib/Basic/Targets/AArch64.h
  lib/Basic/Targets/ARM.h
  lib/Basic/Targets/X86.h
  lib/CodeGen/CGExprConstant.cpp
  lib/CodeGen/CGExprScalar.cpp
  lib/CodeGen/CodeGenTypes.cpp
  lib/Sema/SemaExpr.cpp
  test/CodeGen/fp16-ops.c
  test/CodeGen/fp16vec-ops.c
  test/CodeGenCXX/float16-declarations.cpp
  test/CodeGenCXX/fp16-mangle.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D40112.123100.patch
Type: text/x-patch
Size: 19716 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20171116/02fb5475/attachment-0001.bin>


More information about the cfe-commits mailing list