[llvm] [AArch64] Codegen for new SCVTF/UCVTF variants (FEAT_FPRCVT) (PR #123767)

Thu Jan 23 02:57:14 PST 2025

================
@@ -5511,6 +5511,15 @@ multiclass IntegerToFPSIMDScalar<bits<2> rmode, bits<3> opcode, string asm, SDPa
     let Inst{31} = 1; // 64-bit FPR flag
     let Inst{23-22} = 0b00; // 32-bit FPR flag
   }
+
+  def : Pat<(f16 (any_fpround (f32 (op (i32 FPR32:$Rn))))),
+          (!cast<Instruction>(NAME # HSr) $Rn)>;
+  def : Pat<(f64 (op (i32 (extractelt (v4i32 V128:$Rn), (i64 0))))),
----------------
SpencerAbson wrote:

Hi, I think you are right to restrict this to lane zero.

The benefit of the codegen here is that we do not actually have to perform an explicit vector extraction (the use of [UMOV](https://developer.arm.com/documentation/ddi0602/2024-12/SIMD-FP-Instructions/UMOV--Unsigned-move-vector-element-to-general-purpose-register-) above) because we know that the result of `extractelt` would be identical to the associated scalar FPR, which happens to be a valid operand for these new instructions. If we performed the extraction, we may as well use the SCVTF/UCVTF that operate on GPRs (as above). It would be invalid to apply this reasoning when extracting anything but the least-significant/lowest element.

I think what Carol might be trying to say is that, once the tests have been changed to use `extractelement` rather than the reduction intrinsics, we should add negative tests that show the pattern does not apply when the index argument to `extractelt` is anything other than zero. @CarolineConcatto please correct me if I've misunderstood.

This pattern applies well to reduction intrinsics because they are actually modeled as returning a vector, then immediately extracting the bottom element! (See `getReductionSDNode`).

Thanks again for all your work.

https://github.com/llvm/llvm-project/pull/123767