[PATCH] D105575: [AArch64][SME] Add zero instruction

Cullen Rhodes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 20 06:59:30 PDT 2021


c-rhodes added a comment.

In D105575#2886914 <https://reviews.llvm.org/D105575#2886914>, @david-arm wrote:

> Hi @c-rhodes, I'm only part-way through the patch, but here are some minor comments I have so far!

Thanks for the comments Dave, I've not responded to them all yet but I'll update the patch shortly to address them.

In D105575#2887920 <https://reviews.llvm.org/D105575#2887920>, @bryanpkc wrote:

> Hi @c-rhodes, we had done some design work for the `ZERO` instruction, and it is interesting to see your implementation. I have some questions about the code, based on my understanding of the ISA.

Hi @bryanpkc, thanks for the comments. Do you also have an implementation for this?



================
Comment at: llvm/lib/Target/AArch64/AArch64RegisterInfo.td:1373
+def MatrixTileList32 : MatrixTileListOperand<"MPR32", 32, (ops MPR32)>;
+def MatrixTileList64 : MatrixTileListOperand<"MPR64", 64, (ops MPR64)>;
+
----------------
bryanpkc wrote:
> Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like `zero { za0.s, za5.d }`. Did you consider supporting matrix tile lists of mixed element types?
> Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like `zero { za0.s, za5.d }`. Did you consider supporting matrix tile lists of mixed element types?

Yeah the element types must be the same. The ISA docs don't explicitly impose that restriction, but the actual instruction takes a list of 64-bit element types, the other types are really aliases.


================
Comment at: llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:3876
+    if (RI->getEncodingValue(Reg) <= (RI->getEncodingValue(PrevReg))) {
+      Error(Loc, "registers must be sequential");
+      return MatchOperand_ParseFail;
----------------
bryanpkc wrote:
> Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.
> Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.

No, but I'm not sure there's a good reason for the tiles to not be in order?


================
Comment at: llvm/lib/Target/AArch64/SMEInstrFormats.td:661
+class sme_zero_inst<string mnemonic>
+    : I<(outs MatrixTileList64:$imm), (ins),
+        mnemonic, "\t$imm", "", []>, Sched<[]> {
----------------
bryanpkc wrote:
> Wouldn't it be better for the instruction to accept an output `RegisterOperand` that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between `zero { za0.d, za4.d }` and `zero { za0.s }` when we need to parse assembly code back into machine IR with correct register semantics.
> Wouldn't it be better for the instruction to accept an output `RegisterOperand` that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between `zero { za0.d, za4.d }` and `zero { za0.s }` when we need to parse assembly code back into machine IR with correct register semantics.

I've not given that a great deal of thought yet to be honest, our focus is on MC layer support at the moment, may revisit this in the future.


================
Comment at: llvm/lib/Target/AArch64/SMEInstrFormats.td:688
+  def : InstAlias<"zero\t\\{za1.s,za2.s\\}", (!cast<Instruction>(NAME) 0b01100110), 1>;
+  def : InstAlias<"zero\t\\{za2.s,za3.s\\}", (!cast<Instruction>(NAME) 0b11001100), 1>;
+  def : InstAlias<"zero\t\\{za0.s,za1.s,za2.s\\}", (!cast<Instruction>(NAME) 0b01110111), 1>;
----------------
bryanpkc wrote:
> david-arm wrote:
> > Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?
> Why are these implemented as `InstAlias`? Can they not be parsed by `AArch64AsmParser::tryParseMatrixTileList`?
> Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?

These aliases are for the preferred disassembly, the constraint is the instruction uses the shortest list of tile names that represent the encoded immediate mask. The parsed tile list gets converted (not necessary if input is .D tiles) to 64-bit tiles and then encoded as an 8-bit mask, a bit for each tile (za0.d-za7.d).

For `{za1.s,za3.s}`, the mapping is:
  
  za1.s -> {za1.d, za5.d}
  za3.s -> {za3.d, za7.d}
        -> {za1.d, za3.d, za5.d, za7.d} (mask: 10101010)

the shortest possible tile list for this mask is `{za1.h}` and that's defined above. Same principle applies for `{z0.s,za2.s}`.

For reference,  the following table describes all possible aliases:

```
    The following table describes all possible aliases:

              mask                                                 in                 preferred
    0   0b11111111                                          zero {za}                 zero {za}
    1   0b11111111                                       zero {za0.b}                 zero {za}
    2   0b01010101                                       zero {za0.h}              zero {za0.h}
    3   0b10101010                                       zero {za1.h}              zero {za1.h}
    4   0b11111111                                 zero {za0.h,za1.h}                 zero {za}
    5   0b00010001                                       zero {za0.s}              zero {za0.s}
    6   0b00100010                                       zero {za1.s}              zero {za1.s}
    7   0b01000100                                       zero {za2.s}              zero {za2.s}
    8   0b10001000                                       zero {za3.s}              zero {za3.s}
    9   0b00110011                                 zero {za0.s,za1.s}        zero {za0.s,za1.s}
    10  0b01010101                                 zero {za0.s,za2.s}              zero {za0.h}
    11  0b10011001                                 zero {za0.s,za3.s}        zero {za0.s,za3.s}
    12  0b01100110                                 zero {za1.s,za2.s}        zero {za1.s,za2.s}
    13  0b10101010                                 zero {za1.s,za3.s}              zero {za1.h}
    14  0b11001100                                 zero {za2.s,za3.s}        zero {za2.s,za3.s}
    15  0b01110111                           zero {za0.s,za1.s,za2.s}  zero {za0.s,za1.s,za2.s}
    16  0b10111011                           zero {za0.s,za1.s,za3.s}  zero {za0.s,za1.s,za3.s}
    17  0b11011101                           zero {za0.s,za2.s,za3.s}  zero {za0.s,za2.s,za3.s}
    18  0b11101110                           zero {za1.s,za2.s,za3.s}  zero {za1.s,za2.s,za3.s}
    19  0b11111111                     zero {za0.s,za1.s,za2.s,za3.s}                 zero {za}
    20  0b11111111  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6....                 zero {za}
    21  0b01010101                     zero {za0.d,za2.d,za4.d,za6.d}              zero {za0.h}
    22  0b10101010                     zero {za1.d,za3.d,za5.d,za7.d}              zero {za1.h}
    23  0b00010001                                 zero {za0.d,za4.d}              zero {za0.s}
    24  0b00100010                                 zero {za1.d,za5.d}              zero {za1.s}
    25  0b01000100                                 zero {za2.d,za6.d}              zero {za2.s}
    26  0b10001000                                 zero {za3.d,za7.d}              zero {za3.s}
    27  0b00110011                     zero {za0.d,za1.d,za4.d,za5.d}        zero {za0.s,za1.s}
    28  0b10011001                     zero {za0.d,za3.d,za4.d,za7.d}        zero {za0.s,za3.s}
    29  0b01100110                     zero {za1.d,za2.d,za5.d,za6.d}        zero {za1.s,za2.s}
    30  0b11001100                     zero {za2.d,za3.d,za6.d,za7.d}        zero {za2.s,za3.s}
    31  0b01110111         zero {za0.d,za1.d,za2.d,za4.d,za5.d,za6.d}  zero {za0.s,za1.s,za2.s}
    32  0b10111011         zero {za0.d,za1.d,za3.d,za4.d,za5.d,za7.d}  zero {za0.s,za1.s,za3.s}
    33  0b11011101         zero {za0.d,za2.d,za3.d,za4.d,za6.d,za7.d}  zero {za0.s,za2.s,za3.s}
    34  0b11101110         zero {za1.d,za2.d,za3.d,za5.d,za6.d,za7.d}  zero {za1.s,za2.s,za3.s}
```




================
Comment at: llvm/lib/Target/AArch64/SMEInstrFormats.td:688
+  def : InstAlias<"zero\t\\{za1.s,za2.s\\}", (!cast<Instruction>(NAME) 0b01100110), 1>;
+  def : InstAlias<"zero\t\\{za2.s,za3.s\\}", (!cast<Instruction>(NAME) 0b11001100), 1>;
+  def : InstAlias<"zero\t\\{za0.s,za1.s,za2.s\\}", (!cast<Instruction>(NAME) 0b01110111), 1>;
----------------
c-rhodes wrote:
> bryanpkc wrote:
> > david-arm wrote:
> > > Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?
> > Why are these implemented as `InstAlias`? Can they not be parsed by `AArch64AsmParser::tryParseMatrixTileList`?
> > Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?
> 
> These aliases are for the preferred disassembly, the constraint is the instruction uses the shortest list of tile names that represent the encoded immediate mask. The parsed tile list gets converted (not necessary if input is .D tiles) to 64-bit tiles and then encoded as an 8-bit mask, a bit for each tile (za0.d-za7.d).
> 
> For `{za1.s,za3.s}`, the mapping is:
>   
>   za1.s -> {za1.d, za5.d}
>   za3.s -> {za3.d, za7.d}
>         -> {za1.d, za3.d, za5.d, za7.d} (mask: 10101010)
> 
> the shortest possible tile list for this mask is `{za1.h}` and that's defined above. Same principle applies for `{z0.s,za2.s}`.
> 
> For reference,  the following table describes all possible aliases:
> 
> ```
>     The following table describes all possible aliases:
> 
>               mask                                                 in                 preferred
>     0   0b11111111                                          zero {za}                 zero {za}
>     1   0b11111111                                       zero {za0.b}                 zero {za}
>     2   0b01010101                                       zero {za0.h}              zero {za0.h}
>     3   0b10101010                                       zero {za1.h}              zero {za1.h}
>     4   0b11111111                                 zero {za0.h,za1.h}                 zero {za}
>     5   0b00010001                                       zero {za0.s}              zero {za0.s}
>     6   0b00100010                                       zero {za1.s}              zero {za1.s}
>     7   0b01000100                                       zero {za2.s}              zero {za2.s}
>     8   0b10001000                                       zero {za3.s}              zero {za3.s}
>     9   0b00110011                                 zero {za0.s,za1.s}        zero {za0.s,za1.s}
>     10  0b01010101                                 zero {za0.s,za2.s}              zero {za0.h}
>     11  0b10011001                                 zero {za0.s,za3.s}        zero {za0.s,za3.s}
>     12  0b01100110                                 zero {za1.s,za2.s}        zero {za1.s,za2.s}
>     13  0b10101010                                 zero {za1.s,za3.s}              zero {za1.h}
>     14  0b11001100                                 zero {za2.s,za3.s}        zero {za2.s,za3.s}
>     15  0b01110111                           zero {za0.s,za1.s,za2.s}  zero {za0.s,za1.s,za2.s}
>     16  0b10111011                           zero {za0.s,za1.s,za3.s}  zero {za0.s,za1.s,za3.s}
>     17  0b11011101                           zero {za0.s,za2.s,za3.s}  zero {za0.s,za2.s,za3.s}
>     18  0b11101110                           zero {za1.s,za2.s,za3.s}  zero {za1.s,za2.s,za3.s}
>     19  0b11111111                     zero {za0.s,za1.s,za2.s,za3.s}                 zero {za}
>     20  0b11111111  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6....                 zero {za}
>     21  0b01010101                     zero {za0.d,za2.d,za4.d,za6.d}              zero {za0.h}
>     22  0b10101010                     zero {za1.d,za3.d,za5.d,za7.d}              zero {za1.h}
>     23  0b00010001                                 zero {za0.d,za4.d}              zero {za0.s}
>     24  0b00100010                                 zero {za1.d,za5.d}              zero {za1.s}
>     25  0b01000100                                 zero {za2.d,za6.d}              zero {za2.s}
>     26  0b10001000                                 zero {za3.d,za7.d}              zero {za3.s}
>     27  0b00110011                     zero {za0.d,za1.d,za4.d,za5.d}        zero {za0.s,za1.s}
>     28  0b10011001                     zero {za0.d,za3.d,za4.d,za7.d}        zero {za0.s,za3.s}
>     29  0b01100110                     zero {za1.d,za2.d,za5.d,za6.d}        zero {za1.s,za2.s}
>     30  0b11001100                     zero {za2.d,za3.d,za6.d,za7.d}        zero {za2.s,za3.s}
>     31  0b01110111         zero {za0.d,za1.d,za2.d,za4.d,za5.d,za6.d}  zero {za0.s,za1.s,za2.s}
>     32  0b10111011         zero {za0.d,za1.d,za3.d,za4.d,za5.d,za7.d}  zero {za0.s,za1.s,za3.s}
>     33  0b11011101         zero {za0.d,za2.d,za3.d,za4.d,za6.d,za7.d}  zero {za0.s,za2.s,za3.s}
>     34  0b11101110         zero {za1.d,za2.d,za3.d,za5.d,za6.d,za7.d}  zero {za1.s,za2.s,za3.s}
> ```
> 
> 
> Why are these implemented as `InstAlias`? Can they not be parsed by `AArch64AsmParser::tryParseMatrixTileList`?

See my comment below regarding aliases. To expand on the parsing a bit, `tryParseMatrixTileList` will parse tile lists with 8/16/32 or 64-bit element types, the non 64-bit types are treated as aliases and get converted to .D tiles, then encoded as an 8-bit mask.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105575/new/

https://reviews.llvm.org/D105575



More information about the llvm-commits mailing list