[llvm] [LoongArch] Pass 'half' in the lower 16 bits of an f32 value with F/D ABI (PR #109368)

Tue Sep 24 06:15:58 PDT 2024

================
@@ -1354,6 +1358,40 @@ SDValue LoongArchTargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   return SDValue();
 }
 
+SDValue LoongArchTargetLowering::lowerFP_TO_FP16(SDValue Op,
+                                                 SelectionDAG &DAG) const {
+  // Custom lower to ensure the libcall return is passed in an FPR on hard
+  // float ABIs.
+  SDLoc DL(Op);
+  MakeLibCallOptions CallOptions;
+  SDValue Op0 = Op.getOperand(0);
+  SDValue Chain = SDValue();
+  RTLIB::Libcall LC = RTLIB::getFPROUND(Op0.getValueType(), MVT::f16);
+  SDValue Res;
+  std::tie(Res, Chain) =
+      makeLibCall(DAG, LC, MVT::f32, Op0, CallOptions, DL, Chain);
+  if (Subtarget.is64Bit())
+    return DAG.getNode(LoongArchISD::MOVFR2GR_S_LA64, DL, MVT::i64, Res);
+  return DAG.getBitcast(MVT::i32, Res);
+}
+
+SDValue LoongArchTargetLowering::lowerFP16_TO_FP(SDValue Op,
+                                                 SelectionDAG &DAG) const {
+  // Custom lower to ensure the libcall argument is passed in an FPR on hard
+  // float ABIs.
----------------
heiher wrote:

For this case:

```llvm
define float @test_fpextend_float(half %a) nounwind {
  %r = fpext half %a to float
  ret float %r
}
```
Debug log (without FP16_TO_FP custom lowering):

```
Optimized lowered selection DAG: %bb.0 'test_fpextend_float:'
SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
            t2: f32,ch = CopyFromReg t0, Register:f32 %0
          t3: i32 = bitcast t2
        t4: i16 = truncate t3
      t5: f16 = bitcast t4
    t6: f32 = fp_extend t5
  t8: ch,glue = CopyToReg t0, Register:f32 $f0, t6
  t9: ch = LoongArchISD::RET t8, Register:f32 $f0, t8:1

Type-legalized selection DAG: %bb.0 'test_fpextend_float:'
SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
          t2: f32,ch = CopyFromReg t0, Register:f32 %0
        t10: i64 = LoongArchISD::MOVFR2GR_S_LA64 t2
      t14: i64 = and t10, Constant:i64<65535>
    t12: f32 = fp16_to_fp t14
  t8: ch,glue = CopyToReg t0, Register:f32 $f0, t12
  t9: ch = LoongArchISD::RET t8, Register:f32 $f0, t8:1

Optimized type-legalized selection DAG: %bb.0 'test_fpextend_float:'
SelectionDAG has 8 nodes:
  t0: ch,glue = EntryToken
        t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t10: i64 = LoongArchISD::MOVFR2GR_S_LA64 t2
    t15: f32 = fp16_to_fp t10
  t8: ch,glue = CopyToReg t0, Register:f32 $f0, t15
  t9: ch = LoongArchISD::RET t8, Register:f32 $f0, t8:1

Legalized selection DAG: %bb.0 'test_fpextend_float:'
SelectionDAG has 8 nodes:
  t0: ch,glue = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
    t10: i64 = LoongArchISD::MOVFR2GR_S_LA64 t2
  t18: ch,glue = CopyToReg t0, Register:i64 $r4, t10
  t20: ch,glue = LoongArchISD::TAIL t18, TargetExternalSymbol:i64'__gnu_h2f_ieee' [TF=2], Register:i64 $r4, t18:1
```

In joinRegisterPartsIntoValue, the fp16 (wrapped as f32) in the parameter register (FPR) is first bitcast to i32, truncated to i16, and then bitcast to f16. Following this, the software promotion of FP_EXTEND generates FP16_TO_FP. At this point, the source operand of FP16_TO_FP is already i64, losing its association with the fp16 type. It seems that the target calling convention cannot handle passing through the FPR? :sob: 

https://github.com/llvm/llvm-project/pull/109368