[PATCH] D153207: [AArch64] Add patterns for scalar FMUL, FMULX

Wed Jun 28 05:02:18 PDT 2023

dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks for the changes. With a few extra Nits and suggestions, this LGTM.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:4432

+multiclass FPScalarFromIndexedLane0Patterns<string inst, string inst_f16_suffix,
+                                            string inst_f32_suffix,
----------------
overmighty wrote:
> overmighty wrote:
> > I wasn't sure where to put this.
> Also, I assume this multiclass needs a better name that mentions the number of operands for the instructions. Scalar FMUL invokes `TwoOperandFPData` but scalar FMULX invokes `SIMDFPThreeScalar`, I'm not sure if something like `TwoOperandFPDataFromLane0Patterns` would be a better name.
Perhaps move it next to the defm for FPScalarFromIndexedLane0Patterns?

Would it expect to be used for anything other than mul/mulx? If not, perhaps include Mul in the name instead of Scalar.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:5231
+// Match scalar FMULX instead of indexed FMULX when extracting lane 0.
+let Predicates = [HasNEON, HasFullFP16] in {
+def : Pat<(f16 (int_aarch64_neon_fmulx
----------------
overmighty wrote:
> dmgreen wrote:
> > I don't think this needs NEON according to the reference I was looking at. Just FP.
> > The patterns for FMUL are very similar to the patterns for FMULX16. Could they share the same multiclass for the patterns?
> Can you confirm that it doesn't need Neon?
> 
> I couldn't find some sort of AArch64 equivalent to the Intel 64 manual which lists CPUID feature flags along instructions, but a few things suggest that even scalar FMULX needs Neon:
> 
> - The Arm Cortex/Neoverse Software Optimization Guides that I read listed FMULX under the "ASIMD FP multiply" instruction group (and under SVE "Floating point multiply" too) but not under regular "FP multiply" where FMUL appears.
> - I'm not sure what it's worth but FMULX is defined here with an intrinsic that contains `neon` in its name as `OpNode`, the predicate is `HasNEONorSME`, and `SIMDFPThreeScalar` defaults to `HasNEON`.
> 
> However, this page lists FMULX as an instruction that has been added as part of floating-point changes in AArch64 and not Neon changes: https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON/New-features-for-NEON-and-Floating-point-in-AArch64?lang=en.
The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx it has:
```
if elements == 1 then
  CheckFPEnabled64();
else
  CheckFPAdvSIMDEnabled64(); 
```
In this case, if the instruction definition already uses HasNEONorSME then its fine to keep the same thing for the new patterns. We can fix the two in another patch at some point. It likely doesn't matter a lot at the moment, I don't think there is a way to generate mulx intrinsics without Neon being enabled.

This could be moved down below the other instruction definitions.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153207/new/

https://reviews.llvm.org/D153207