[PATCH] [AArch64]Fix the problem can't select f16_to_f32 and f32_to_f16
Hao Liu
Hao.Liu at arm.com
Wed Jan 15 02:40:01 PST 2014
Hi t.p.northover,
Hi Tim and reviewers,
This patch fixes the problem that can't select f16_to_f32 and f32_to_f16, which are matched to fcvt instructions.
If using llc under -O0, a SUBREG_TO_REG node from FPR16 to FPR32 will be lowering into a copy from FPR16 to FPR16. This will cause an assertion failure in copyPhysReg, because there is no code for copy between two FPR16. So I also add code to implement such copy to avoid such failure under -O0.
Review, please.
Thanks,
-Hao
http://llvm-reviews.chandlerc.com/D2551
Files:
lib/Target/AArch64/AArch64InstrInfo.cpp
lib/Target/AArch64/AArch64InstrNEON.td
test/CodeGen/AArch64/fp16_fp32_convert.ll
Index: lib/Target/AArch64/AArch64InstrInfo.cpp
===================================================================
--- lib/Target/AArch64/AArch64InstrInfo.cpp
+++ lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -132,6 +132,16 @@
.addImm(16);
return;
}
+ } else if (AArch64::FPR16RegClass.contains(DestReg, SrcReg)) {
+ // The copy of two FPR16 registers is implemented by the copy of two FPR32
+ const TargetRegisterInfo *TRI = &getRegisterInfo();
+ unsigned Dst = TRI->getMatchingSuperReg(SrcReg, AArch64::sub_16,
+ &AArch64::FPR32RegClass);
+ unsigned Src = TRI->getMatchingSuperReg(DestReg, AArch64::sub_16,
+ &AArch64::FPR32RegClass);
+ BuildMI(MBB, I, DL, get(AArch64::FMOVss), Dst)
+ .addReg(Src);
+ return;
} else {
CopyPhysRegTuple(MBB, I, DL, DestReg, SrcReg);
return;
Index: lib/Target/AArch64/AArch64InstrNEON.td
===================================================================
--- lib/Target/AArch64/AArch64InstrNEON.td
+++ lib/Target/AArch64/AArch64InstrNEON.td
@@ -8848,6 +8848,12 @@
// Patterns for handling half-precision values
//
+// Convert between f16 value and f32 value
+def : Pat<(f32 (f16_to_f32 (i32 GPR32:$Rn))),
+ (FCVTsh (EXTRACT_SUBREG (FMOVsw $Rn), sub_16))>;
+def : Pat<(i32 (f32_to_f16 (f32 FPR32:$Rn))),
+ (FMOVws (SUBREG_TO_REG (i64 0), (f16 (FCVThs $Rn)), sub_16))>;
+
// Convert f16 value coming in as i16 value to f32
def : Pat<(f32 (f16_to_f32 (i32 (and (i32 GPR32:$Rn), 65535)))),
(FCVTsh (EXTRACT_SUBREG (FMOVsw GPR32:$Rn), sub_16))>;
Index: test/CodeGen/AArch64/fp16_fp32_convert.ll
===================================================================
--- /dev/null
+++ test/CodeGen/AArch64/fp16_fp32_convert.ll
@@ -0,0 +1,27 @@
+; RUN: llc -march=aarch64 -mattr=+neon < %s | FileCheck %s
+; RUN: llc -march=aarch64 -mattr=+neon -O0 < %s
+
+ at x = global i16 12902
+ at y = global i16 0
+
+; Check that f16_to_f32 and f32_to_f16 can be selected correctly.
+; If using -O0, there will be a copy of two FPR16 registers in copyPhysReg.
+; Make sure the compiler won't crash.
+define void @foo() nounwind {
+; CHECK-LABEL: foo:
+entry:
+ %0 = load i16* @x, align 2
+ %1 = load i16* @y, align 2
+ %2 = tail call float @llvm.convert.from.fp16(i16 %0)
+; CHECK: fcvt {{s[0-9]+}}, {{h[0-9]+}}
+ %3 = tail call float @llvm.convert.from.fp16(i16 %1)
+; CHECK: fcvt {{s[0-9]+}}, {{h[0-9]+}}
+ %4 = fadd float %2, %3
+ %5 = tail call i16 @llvm.convert.to.fp16(float %4)
+; CHECK: fcvt {{h[0-9]+}}, {{s[0-9]+}}
+ store i16 %5, i16* @x, align 2
+ ret void
+}
+
+declare float @llvm.convert.from.fp16(i16) nounwind readnone
+declare i16 @llvm.convert.to.fp16(float) nounwind readnone
\ No newline at end of file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D2551.1.patch
Type: text/x-patch
Size: 2817 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140115/e4cea80c/attachment.bin>
More information about the llvm-commits
mailing list