[llvm] [AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (PR #81724)

Tue Feb 20 06:29:13 PST 2024

================
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64 -mattr=-fp-armv8 -o - %s | FileCheck %s
+
+define half @f2h(float %a) {
+; CHECK-LABEL: f2h:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    .cfi_offset w30, -16
+; CHECK-NEXT:    bl __gnu_f2h_ieee
+; CHECK-NEXT:    and w0, w0, #0xffff
+; CHECK-NEXT:    bl __gnu_h2f_ieee
+; CHECK-NEXT:    bl __gnu_f2h_ieee
----------------
nasherm wrote:

I've done a bit of investigation.  Our DAG looks like the following: 
```
Optimized lowered selection DAG: %bb.0 'f2h:entry'                                                                   
SelectionDAG has 11 nodes:                                                                                           
  t0: ch,glue = EntryToken                                                                                           
            t2: i32,ch = CopyFromReg t0, Register:i32 %0                                                             
          t3: f32 = bitcast t2                                                                                       
        t5: f16 = fp_round t3, TargetConstant:i64<0>                                                                 
      t6: i16 = bitcast t5                                                                                           
    t7: i32 = any_extend t6                                                                                          
  t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7                                                                   
  t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1    
```

Then after some type legalization we get the following:

```
Legalized selection DAG: %bb.0 'f2h:entry'                                                                           
SelectionDAG has 27 nodes:                                                                                           
  t0: ch,glue = EntryToken                                                                                           
    t36: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>                                    
    t2: i32,ch = CopyFromReg t0, Register:i32 %0                                                                     
  t38: ch,glue = CopyToReg t36, Register:i32 $w0, t2                                                                 
  t39: ch,glue = AArch64ISD::CALL t38, TargetExternalSymbol:i64'__gnu_f2h_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t38:1
  t40: ch,glue = callseq_end t39, TargetConstant:i64<0>, TargetConstant:i64<0>, t39:1
    t17: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
      t41: i32,ch,glue = CopyFromReg t40, Register:i32 $w0, t40:1
    t35: i32 = and t41, Constant:i32<65535>
  t20: ch,glue = CopyToReg t17, Register:i32 $w0, t35
  t23: ch,glue = AArch64ISD::CALL t20, TargetExternalSymbol:i64'__gnu_h2f_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t20:1
  t24: ch,glue = callseq_end t23, TargetConstant:i64<0>, TargetConstant:i64<0>, t23:1
    t27: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
    t25: i32,ch,glue = CopyFromReg t24, Register:i32 $w0, t24:1
  t29: ch,glue = CopyToReg t27, Register:i32 $w0, t25
  t31: ch,glue = AArch64ISD::CALL t29, TargetExternalSymbol:i64'__gnu_f2h_ieee', Register:i32 $w0, RegisterMask:Untyp
ed, t29:1
  t32: ch,glue = callseq_end t31, TargetConstant:i64<0>, TargetConstant:i64<0>, t31:1
    t33: i32,ch,glue = CopyFromReg t32, Register:i32 $w0, t32:1
  t9: ch,glue = CopyToReg t0, Register:i32 $w0, t33
  t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1

 ```

It appears that the generic legalizer is doing some extra work where it converts float->half->float. The first float->half conversion is prior to the `fp_round` preparing the operands and (unnecessarily) masking out lower bits. The second half->float deals with the result of `fp_round` and is a promotion. The last float->half is the one that puzzles me as I'd expect this to be the only call necessary. 

I think this could be avoided by writing a custom legalization for `fp_round` within `AArch64ISelLowering.cpp` similar to my initial patches, but obviously this wouldn't find the root cause.
 

https://github.com/llvm/llvm-project/pull/81724