[llvm] [AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (PR #81724)
Nashe Mncube via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 20 06:29:13 PST 2024
================
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64 -mattr=-fp-armv8 -o - %s | FileCheck %s
+
+define half @f2h(float %a) {
+; CHECK-LABEL: f2h:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: bl __gnu_f2h_ieee
+; CHECK-NEXT: and w0, w0, #0xffff
+; CHECK-NEXT: bl __gnu_h2f_ieee
+; CHECK-NEXT: bl __gnu_f2h_ieee
----------------
nasherm wrote:
I've done a bit of investigation. Our DAG looks like the following:
```
Optimized lowered selection DAG: %bb.0 'f2h:entry'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t3: f32 = bitcast t2
t5: f16 = fp_round t3, TargetConstant:i64<0>
t6: i16 = bitcast t5
t7: i32 = any_extend t6
t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7
t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1
```
Then after some type legalization we get the following:
```
Legalized selection DAG: %bb.0 'f2h:entry'
SelectionDAG has 27 nodes:
t0: ch,glue = EntryToken
t36: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t38: ch,glue = CopyToReg t36, Register:i32 $w0, t2
t39: ch,glue = AArch64ISD::CALL t38, TargetExternalSymbol:i64'__gnu_f2h_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t38:1
t40: ch,glue = callseq_end t39, TargetConstant:i64<0>, TargetConstant:i64<0>, t39:1
t17: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
t41: i32,ch,glue = CopyFromReg t40, Register:i32 $w0, t40:1
t35: i32 = and t41, Constant:i32<65535>
t20: ch,glue = CopyToReg t17, Register:i32 $w0, t35
t23: ch,glue = AArch64ISD::CALL t20, TargetExternalSymbol:i64'__gnu_h2f_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t20:1
t24: ch,glue = callseq_end t23, TargetConstant:i64<0>, TargetConstant:i64<0>, t23:1
t27: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
t25: i32,ch,glue = CopyFromReg t24, Register:i32 $w0, t24:1
t29: ch,glue = CopyToReg t27, Register:i32 $w0, t25
t31: ch,glue = AArch64ISD::CALL t29, TargetExternalSymbol:i64'__gnu_f2h_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t29:1
t32: ch,glue = callseq_end t31, TargetConstant:i64<0>, TargetConstant:i64<0>, t31:1
t33: i32,ch,glue = CopyFromReg t32, Register:i32 $w0, t32:1
t9: ch,glue = CopyToReg t0, Register:i32 $w0, t33
t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1
```
It appears that the generic legalizer is doing some extra work where it converts float->half->float. The first float->half conversion is prior to the `fp_round` preparing the operands and (unnecessarily) masking out lower bits. The second half->float deals with the result of `fp_round` and is a promotion. The last float->half is the one that puzzles me as I'd expect this to be the only call necessary.
I think this could be avoided by writing a custom legalization for `fp_round` within `AArch64ISelLowering.cpp` similar to my initial patches, but obviously this wouldn't find the root cause.
https://github.com/llvm/llvm-project/pull/81724
More information about the llvm-commits
mailing list