[PATCH] D153207: [AArch64] Add patterns for scalar FMUL, FMULX

Thu Jun 29 05:17:50 PDT 2023

overmighty added a comment.

Thanks for the suggestions. If you don't have more and this still looks good to you, please commit this as "OverMighty <its.overmighty at gmail.com>". Thank you. :)

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:4432

+multiclass FPScalarFromIndexedLane0Patterns<string inst, string inst_f16_suffix,
+                                            string inst_f32_suffix,
----------------
dmgreen wrote:
> overmighty wrote:
> > overmighty wrote:
> > > I wasn't sure where to put this.
> > Also, I assume this multiclass needs a better name that mentions the number of operands for the instructions. Scalar FMUL invokes `TwoOperandFPData` but scalar FMULX invokes `SIMDFPThreeScalar`, I'm not sure if something like `TwoOperandFPDataFromLane0Patterns` would be a better name.
> Perhaps move it next to the defm for FPScalarFromIndexedLane0Patterns?
> 
> Would it expect to be used for anything other than mul/mulx? If not, perhaps include Mul in the name instead of Scalar.
I don't expect it to be used for any instructions other than FMUL and FMULX without some changes. I'm not sure about removing `Scalar` from the name though, especially if `Indexed` stays.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:5231
+// Match scalar FMULX instead of indexed FMULX when extracting lane 0.
+let Predicates = [HasNEON, HasFullFP16] in {
+def : Pat<(f16 (int_aarch64_neon_fmulx
----------------
dmgreen wrote:
> overmighty wrote:
> > dmgreen wrote:
> > > I don't think this needs NEON according to the reference I was looking at. Just FP.
> > > The patterns for FMUL are very similar to the patterns for FMULX16. Could they share the same multiclass for the patterns?
> > Can you confirm that it doesn't need Neon?
> > 
> > I couldn't find some sort of AArch64 equivalent to the Intel 64 manual which lists CPUID feature flags along instructions, but a few things suggest that even scalar FMULX needs Neon:
> > 
> > - The Arm Cortex/Neoverse Software Optimization Guides that I read listed FMULX under the "ASIMD FP multiply" instruction group (and under SVE "Floating point multiply" too) but not under regular "FP multiply" where FMUL appears.
> > - I'm not sure what it's worth but FMULX is defined here with an intrinsic that contains `neon` in its name as `OpNode`, the predicate is `HasNEONorSME`, and `SIMDFPThreeScalar` defaults to `HasNEON`.
> > 
> > However, this page lists FMULX as an instruction that has been added as part of floating-point changes in AArch64 and not Neon changes: https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON/New-features-for-NEON-and-Floating-point-in-AArch64?lang=en.
> The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx it has:
> ```
> if elements == 1 then
>   CheckFPEnabled64();
> else
>   CheckFPAdvSIMDEnabled64(); 
> ```
> In this case, if the instruction definition already uses HasNEONorSME then its fine to keep the same thing for the new patterns. We can fix the two in another patch at some point. It likely doesn't matter a lot at the moment, I don't think there is a way to generate mulx intrinsics without Neon being enabled.
> 
> This could be moved down below the other instruction definitions.
By the way, this pseudocode from the Arm Architecture Reference Manual differs from the one found in the following web pages, where only `CheckFPAdvSIMDEnabled64` is called:

- https://developer.arm.com/architectures/instruction-sets/intrinsics/vmulxs_f32
- https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-?lang=en

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153207/new/

https://reviews.llvm.org/D153207