[llvm] [LoongArch] Custom legalizing ConstantFP to avoid float loads (PR #158050)

Sun Sep 14 19:47:36 PDT 2025

================
@@ -549,10 +575,66 @@ SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
   case ISD::VECREDUCE_UMAX:
   case ISD::VECREDUCE_UMIN:
     return lowerVECREDUCE(Op, DAG);
+  case ISD::ConstantFP:
+    return lowerConstantFP(Op, DAG);
   }
   return SDValue();
 }
 
+SDValue LoongArchTargetLowering::lowerConstantFP(SDValue Op,
+                                                 SelectionDAG &DAG) const {
+  EVT VT = Op.getValueType();
+  ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Op);
+  const APFloat &FPVal = CFP->getValueAPF();
+  SDLoc DL(CFP);
+
+  assert((VT == MVT::f32 && Subtarget.hasBasicF()) ||
+         (VT == MVT::f64 && Subtarget.hasBasicD()));
+
+  // If value is 0.0 or -0.0, just ignore it.
+  if (FPVal.isZero())
+    return SDValue();
+
+  // If lsx enabled, use cheaper 'vldi' instruction if possible.
+  if (isFPImmVLDILegal(FPVal, VT))
+    return SDValue();
+
+  // Construct as integer, and move to float register.
+  APInt INTVal = FPVal.bitcastToAPInt();
+  switch (VT.getSimpleVT().SimpleTy) {
+  default:
+    llvm_unreachable("Unexpected floating point type!");
+    break;
+  case MVT::f32: {
+    SDValue NewVal = DAG.getConstant(INTVal, DL, MVT::i32);
+    if (Subtarget.is64Bit())
+      NewVal = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i64, NewVal);
+    return DAG.getNode(Subtarget.is64Bit() ? LoongArchISD::MOVGR2FR_W_LA64
+                                           : LoongArchISD::MOVGR2FR_W,
+                       DL, VT, NewVal);
+  }
+  case MVT::f64: {
+    // If more than MaterializeFPImmInsNum instructions will be used to
+    // generate the INTVal, fallback to use floating point load from the
+    // constant pool.
+    auto Seq = LoongArchMatInt::generateInstSeq(INTVal.getSExtValue());
+    if (Seq.size() > MaterializeFPImmInsNum && !FPVal.isExactlyValue(+1.0))
----------------
heiher wrote:

> `f32` requires a maximum of two instructions and a `movgr2fr.w`, which maybe always cheaper than loading from constant pool. But if we wish to also control the behavior of dealing with `f32` using this option, it can also be applied. Do you think it is necessary?

The `MaterializeFPImmInsNum` option applies to a range of 0-4 instructions, and `f32` also falls within this range. Unless explicitly documented, making `f32` a special case would be confusing.

In addition, Should `MaterializeFPImmInsNum` include the instruction count for `movgr2fr[h]`?

> `la32` maybe should always load non-zero `f64` from constant pool, because it seems no profit got from customing it. What do you think?

I’d prefer this to apply to both LA32 and LA64. If we count all instructions, how about using different thresholds for LA32 and LA64 (when a single value doesn’t work for both)?

https://github.com/llvm/llvm-project/pull/158050