[llvm] [RISCV][ISelLowering] Use Zicond for FP selects on Zfinx/Zdinx (PR #169299)

Wed Nov 26 01:34:00 PST 2025

================
@@ -9555,6 +9555,54 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
   if (SDValue V = lowerSelectToBinOp(Op.getNode(), DAG, Subtarget))
     return V;
 
+  // When there is no cost for GPR <-> FGPR, we can use zicond select for
+  // floating value when CondV is int type
+  bool FPinGPR = Subtarget.hasStdExtZfinx();
+
+  // We can handle FGPR without spliting into hi/lo parts
+  bool FitsInGPR = TypeSize::isKnownLE(VT.getSizeInBits(),
+                                       Subtarget.getXLenVT().getSizeInBits());
+
+  bool UseZicondForFPSel = Subtarget.hasStdExtZicond() && FPinGPR &&
+                           VT.isFloatingPoint() && FitsInGPR;
+
+  if (UseZicondForFPSel) {
+    MVT XLenIntVT = Subtarget.getXLenVT();
+
+    auto CastToInt = [&](SDValue V) -> SDValue {
+      // Treat +0.0 as integer 0 to enable single 'czero' instruction
+      // generation.
+      if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) {
+        if (CFP->isZero() && !CFP->isNegative())
+          return DAG.getConstant(0, DL, XLenIntVT);
+      }
----------------
fennecJ wrote:

I tried your select_i1_half_0_add case, and the result is correctly Nan-boxed at the return, I'll append the test result to newest commit.
```asm
select_i1_half_0_add:
# %bb.0: # %entry
  addi sp, sp, -16
  sd ra, 8(sp) # 8-byte Folded Spill
  # kill: def $x11_w killed $x11_w def $x11
  andi a0, a0, 1
  czero.eqz a0, a1, a0
  # kill: def $x10_w killed $x10_w killed $x10
  call __extendhfsf2
  lui a1, 260096
  fadd.s a0, a0, a1
  call __truncsfhf2
  # kill: def $x10_w killed $x10_w def $x10
  lui a1, 1048560       # a1= 0xFFFF0000
  or a0, a0, a1         # nan-boxing !!
  # kill: def $x10_w killed $x10_w killed $x10
  ld ra, 8(sp) # 8-byte Folded Reload
  addi sp, sp, 16
  ret
```

Regarding your question about whether this optimization results in incorrect Nan-boxing:

Actually, I initially shared the same assumption that explicit Nan-boxing was required here. However, upon double-checking the Zfinx/Zdinx/Zhinx specification[1], I discovered that Nan-boxing is not required for operands in these extensions. Instead, they rely on Sign-Extension.

According to **Section 1 (Processing of Narrower Values)**:
> "Floating-point operands of width w < XLEN bits occupy bits w-1:0 of an x register. Floating-point operations on w-bit operands **ignore operand bits XLEN-1: w**."
> "Floating-point operations that produce w < XLEN-bit results **fill bits XLEN-1: w with copies of bit w-1 (the sign bit)**."

Since my optimization targets +0.0 (sign bit is 0), the resulting all-zero integer effectively implements the required Sign-Extension. It is safe for input operands (as high bits are ignored) and valid as a result.

As for the ABI[2], the clause regarding Nan-boxing specifies:
> "When a floating-point argument narrower than FLEN bits is passed in a **floating-point register**, it is 1-extended (NaN-boxed) to FLEN bits."

Since Zfinx has no floating-point registers (it uses `x` registers), and the Unpriv spec mandates sign-extension for `x` registers, providing a zero-extended `0` is safe. Even if specific calling conventions require boxing at boundaries, the backend's `splitValueIntoRegisterParts` handles that explicitly.

Therefore, I believe keeping the `+0.0` optimization at the current location is safe and yields the best code generation (single `czero`).

[1] - [riscv unpriv zfinx](https://docs.riscv.org/reference/isa/unpriv/zfinx.html)
[2] - [riscv-elf-psabi-doc](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/tag/v1.0)

https://github.com/llvm/llvm-project/pull/169299