[PATCH] D155432: [AArch64][SME] Use `fmov` instead of NEON `movi` for FP value.

Tue Jul 18 06:12:43 PDT 2023

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp:1333
   if (STI->hasZeroCycleZeroingFP() && !STI->hasZeroCycleZeroingFPWorkaround() &&
-      STI->hasNEON()) {
+      STI->isNeonAvailable()) {
     // Convert H/S register to corresponding D register
----------------
hassnaa-arm wrote:
> hassnaa-arm wrote:
> > conceptually what is the difference between hasNEON and isNeonAvailable() ?
> > I saw the implementation of isNeonAvailable which is the opposite of hasNEON or the opposite of streaming mode. 
> > But I don't understand how the implementation of isNeonAvailable() represents its name.
> What is the difference between this class 'AArch64AsmPrinter' and tblgen files ?
> Another thing, I see that in the tblgen files there are some conditions about generating code compatible with specific features, so what is the difference between handling the compatible generated code in the tblgen files and handling it here ?
> conceptually what is the difference between hasNEON and isNeonAvailable() ?
`isNeonAvailable()` used to be `!forceStreamingCompatibleSVE()`, see ec6af93d0249d03a5babd547e072e4de3a2b5e48.
Basically, if the function is a streaming function, or a streaming-compatible function, then Neon is not available.

I could have named it `isNeonAvailableAtRuntime`, but `isNeonAvailable` was shorter.

`hasNEON` is more to do with what target features we compile for. So while we target a core that has NEON instructions, in the given _runtime_ mode we may not be able to use them.

> What is the difference between this class 'AArch64AsmPrinter' and tblgen files ?
AsmPrinter emits a the *actual* instruction for the given MachineInstr, which is an intermediate representation of either a pseudo or a real instruction. In this case, FMOVS0 is a pseudo node that gets expanded here to the appropriate instruction.

> Another thing, I see that in the tblgen files there are some conditions about generating code compatible with specific features, so what is the difference between handling the compatible generated code in the tblgen files and handling it here ?
In TableGen we define the instructions (with their encodings) and have patterns to map to those instructions, or to a pseudo node. I'm not entirely sure why they took the approach here to map to a pseudo node, rather than directly map to the appropriate instruction using a pattern. Perhaps the simpler pseudo node has some benefits, but I don't see that much code that uses it's simpler representation. There is code in InstrInfo to say that an `FMOV[HSD]0` is cheap and also a change in the scheduler specific to `FMOV[HSD]0`.

The purpose of this patch is to fix the issue, so I didn't touch the previous design choices on how this is represented.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155432/new/

https://reviews.llvm.org/D155432