[PATCH] [AArch64]Fix the problem can't select f16_to_f32 and f32_to_f16

Wed Jan 15 02:40:01 PST 2014

Hi t.p.northover,

Hi Tim and reviewers,

This patch fixes the problem that can't select f16_to_f32 and f32_to_f16, which are matched to fcvt instructions.

If using llc under -O0, a SUBREG_TO_REG node from FPR16 to FPR32 will be lowering into a copy from FPR16 to FPR16. This will cause an assertion failure in copyPhysReg, because there is no code for copy between two FPR16. So I also add code to implement such copy to avoid such failure under -O0.

Review, please.

Thanks,
-Hao

http://llvm-reviews.chandlerc.com/D2551

Files:
  lib/Target/AArch64/AArch64InstrInfo.cpp
  lib/Target/AArch64/AArch64InstrNEON.td
  test/CodeGen/AArch64/fp16_fp32_convert.ll

Index: lib/Target/AArch64/AArch64InstrInfo.cpp
===================================================================

--- lib/Target/AArch64/AArch64InstrInfo.cpp
+++ lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -132,6 +132,16 @@
         .addImm(16);
       return;
     }
+  } else if (AArch64::FPR16RegClass.contains(DestReg, SrcReg)) {
+    // The copy of two FPR16 registers is implemented by the copy of two FPR32
+    const TargetRegisterInfo *TRI = &getRegisterInfo();
+    unsigned Dst = TRI->getMatchingSuperReg(SrcReg, AArch64::sub_16,
+                                            &AArch64::FPR32RegClass);
+    unsigned Src = TRI->getMatchingSuperReg(DestReg, AArch64::sub_16,
+                                            &AArch64::FPR32RegClass);
+    BuildMI(MBB, I, DL, get(AArch64::FMOVss), Dst)
+      .addReg(Src);
+    return;
   } else {
     CopyPhysRegTuple(MBB, I, DL, DestReg, SrcReg);
     return;
Index: lib/Target/AArch64/AArch64InstrNEON.td
===================================================================
--- lib/Target/AArch64/AArch64InstrNEON.td
+++ lib/Target/AArch64/AArch64InstrNEON.td
@@ -8848,6 +8848,12 @@
 // Patterns for handling half-precision values
 //
 
+// Convert between f16 value and f32 value
+def : Pat<(f32 (f16_to_f32 (i32 GPR32:$Rn))),
+          (FCVTsh (EXTRACT_SUBREG (FMOVsw $Rn), sub_16))>;
+def : Pat<(i32 (f32_to_f16 (f32 FPR32:$Rn))),
+          (FMOVws (SUBREG_TO_REG (i64 0), (f16 (FCVThs $Rn)), sub_16))>;
+
 // Convert f16 value coming in as i16 value to f32
 def : Pat<(f32 (f16_to_f32 (i32 (and (i32 GPR32:$Rn), 65535)))),
           (FCVTsh (EXTRACT_SUBREG (FMOVsw GPR32:$Rn), sub_16))>;
Index: test/CodeGen/AArch64/fp16_fp32_convert.ll
===================================================================
--- /dev/null
+++ test/CodeGen/AArch64/fp16_fp32_convert.ll
@@ -0,0 +1,27 @@
+; RUN: llc -march=aarch64 -mattr=+neon < %s | FileCheck %s
+; RUN: llc -march=aarch64 -mattr=+neon -O0 < %s
+
+ at x = global i16 12902
+ at y = global i16 0
+
+; Check that f16_to_f32 and f32_to_f16 can be selected correctly.
+; If using -O0, there will be a copy of two FPR16 registers in copyPhysReg.
+; Make sure the compiler won't crash.
+define void @foo() nounwind {
+; CHECK-LABEL: foo:
+entry:
+  %0 = load i16* @x, align 2
+  %1 = load i16* @y, align 2
+  %2 = tail call float @llvm.convert.from.fp16(i16 %0)
+; CHECK: fcvt {{s[0-9]+}}, {{h[0-9]+}}
+  %3 = tail call float @llvm.convert.from.fp16(i16 %1)
+; CHECK: fcvt {{s[0-9]+}}, {{h[0-9]+}}
+  %4 = fadd float %2, %3
+  %5 = tail call i16 @llvm.convert.to.fp16(float %4)
+; CHECK: fcvt {{h[0-9]+}}, {{s[0-9]+}}
+  store i16 %5, i16* @x, align 2
+  ret void
+}
+
+declare float @llvm.convert.from.fp16(i16) nounwind readnone
+declare i16 @llvm.convert.to.fp16(float) nounwind readnone
\ No newline at end of file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D2551.1.patch
Type: text/x-patch
Size: 2817 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140115/e4cea80c/attachment.bin>