[PATCH] D105575: [AArch64][SME] Add zero instruction

Tue Jul 20 08:28:06 PDT 2021

bryanpkc added a comment.

> Hi @bryanpkc, thanks for the comments. Do you also have an implementation for this?

Yes, we implemented the SME instructions internally. Most of our code is very similar to the patches you have upstreamed, but `ZERO` was tricky and our approach differs quite a bit.

================
Comment at: llvm/lib/Target/AArch64/AArch64RegisterInfo.td:1373
+def MatrixTileList32 : MatrixTileListOperand<"MPR32", 32, (ops MPR32)>;
+def MatrixTileList64 : MatrixTileListOperand<"MPR64", 64, (ops MPR64)>;
+
----------------
c-rhodes wrote:
> bryanpkc wrote:
> > Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like `zero { za0.s, za5.d }`. Did you consider supporting matrix tile lists of mixed element types?
> > Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like `zero { za0.s, za5.d }`. Did you consider supporting matrix tile lists of mixed element types?
> 
> Yeah the element types must be the same. The ISA docs don't explicitly impose that restriction, but the actual instruction takes a list of 64-bit element types, the other types are really aliases.
I understand that the bits in the immediate operand refer to .d tiles, but it seems useful to allow a programmer to keep referring to the register operands as .h and .d (for example), if that's how the registers are used after they are zeroed.

================
Comment at: llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:3876
+    if (RI->getEncodingValue(Reg) <= (RI->getEncodingValue(PrevReg))) {
+      Error(Loc, "registers must be sequential");
+      return MatchOperand_ParseFail;
----------------
c-rhodes wrote:
> bryanpkc wrote:
> > Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.
> > Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.
> 
> No, but I'm not sure there's a good reason for the tiles to not be in order?
IMO the assembler should be more forgiving here, e.g. when the user programmatically generates the assembly code and forgets to sort the registers. We may also want to be consistent with the ARM backend, which produces a warning, not an error, in a similar situation:
```
        .text
foo.s:1:14: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
             ^
foo.s:1:17: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
                ^
        stm     sp!, {r0, r3, r5}
```

================
Comment at: llvm/lib/Target/AArch64/SMEInstrFormats.td:661
+class sme_zero_inst<string mnemonic>
+    : I<(outs MatrixTileList64:$imm), (ins),
+        mnemonic, "\t$imm", "", []>, Sched<[]> {
----------------
c-rhodes wrote:
> bryanpkc wrote:
> > Wouldn't it be better for the instruction to accept an output `RegisterOperand` that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between `zero { za0.d, za4.d }` and `zero { za0.s }` when we need to parse assembly code back into machine IR with correct register semantics.
> > Wouldn't it be better for the instruction to accept an output `RegisterOperand` that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between `zero { za0.d, za4.d }` and `zero { za0.s }` when we need to parse assembly code back into machine IR with correct register semantics.
> 
> I've not given that a great deal of thought yet to be honest, our focus is on MC layer support at the moment, may revisit this in the future.
We also went with an immediate operand at first, but eventually replaced it with a register operand, mainly to allow register allocation to work.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105575/new/

https://reviews.llvm.org/D105575