[PATCH] D153207: [AArch64] Add patterns for scalar FMUL, FMULX

Thu Jun 29 21:03:36 PDT 2023

dmgreen added a comment.

Thanks. Yeah will do.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:5231
+// Match scalar FMULX instead of indexed FMULX when extracting lane 0.
+let Predicates = [HasNEON, HasFullFP16] in {
+def : Pat<(f16 (int_aarch64_neon_fmulx
----------------
overmighty wrote:
> dmgreen wrote:
> > overmighty wrote:
> > > dmgreen wrote:
> > > > I don't think this needs NEON according to the reference I was looking at. Just FP.
> > > > The patterns for FMUL are very similar to the patterns for FMULX16. Could they share the same multiclass for the patterns?
> > > Can you confirm that it doesn't need Neon?
> > > 
> > > I couldn't find some sort of AArch64 equivalent to the Intel 64 manual which lists CPUID feature flags along instructions, but a few things suggest that even scalar FMULX needs Neon:
> > > 
> > > - The Arm Cortex/Neoverse Software Optimization Guides that I read listed FMULX under the "ASIMD FP multiply" instruction group (and under SVE "Floating point multiply" too) but not under regular "FP multiply" where FMUL appears.
> > > - I'm not sure what it's worth but FMULX is defined here with an intrinsic that contains `neon` in its name as `OpNode`, the predicate is `HasNEONorSME`, and `SIMDFPThreeScalar` defaults to `HasNEON`.
> > > 
> > > However, this page lists FMULX as an instruction that has been added as part of floating-point changes in AArch64 and not Neon changes: https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON/New-features-for-NEON-and-Floating-point-in-AArch64?lang=en.
> > The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx it has:
> > ```
> > if elements == 1 then
> >   CheckFPEnabled64();
> > else
> >   CheckFPAdvSIMDEnabled64(); 
> > ```
> > In this case, if the instruction definition already uses HasNEONorSME then its fine to keep the same thing for the new patterns. We can fix the two in another patch at some point. It likely doesn't matter a lot at the moment, I don't think there is a way to generate mulx intrinsics without Neon being enabled.
> > 
> > This could be moved down below the other instruction definitions.
> By the way, this pseudocode from the Arm Architecture Reference Manual differs from the one found in the following web pages, where only `CheckFPAdvSIMDEnabled64` is called:
> 
> - https://developer.arm.com/architectures/instruction-sets/intrinsics/vmulxs_f32
> - https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-?lang=en
I see. That was a change made when SME was added to the architecture.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153207/new/

https://reviews.llvm.org/D153207