[PATCH] D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv

Wed Jun 16 08:14:46 PDT 2021

jaykang10 added a comment.

In D104236#2821939 <https://reviews.llvm.org/D104236#2821939>, @dmgreen wrote:

> Thanks. I would expect 2 more types I think. v2i32 and v2i64?  It may be better to create a new multiclass like SIMDAcrossLanesIntrinsic.

The TableGen definition of UADDLV is as below.

  defm UADDLV  : SIMDAcrossLanesHSD<1, 0b00011, "uaddlv">;
  ...
  multiclass SIMDAcrossLanesHSD<bit U, bits<5> opcode, string asm> {
    def v8i8v  : BaseSIMDAcrossLanes<0, U, 0b00, opcode, FPR16, V64,
                                     asm, ".8b", []>;
    def v16i8v : BaseSIMDAcrossLanes<1, U, 0b00, opcode, FPR16, V128, 
                                     asm, ".16b", []>;
    def v4i16v : BaseSIMDAcrossLanes<0, U, 0b01, opcode, FPR32, V64,
                                     asm, ".4h", []>;
    def v8i16v : BaseSIMDAcrossLanes<1, U, 0b01, opcode, FPR32, V128, 
                                     asm, ".8h", []>;
    def v4i32v : BaseSIMDAcrossLanes<1, U, 0b10, opcode, FPR64, V128, 
                                     asm, ".4s", []>;
  }

Therefore, it supports below pairs of input and output.

  v8i8   ==>  i16
  v16i8  ==>  i16
  v4i16  ==>  i32
  v8i16  ==>  i32
  v4i32  ==>  i64

>From The TableGen definition of UADDLP, I think the AArch64uaddlp node supports below intput and output types.

  v8i8 ==> v4i16
  v16i8 ==> v8i16
  v4i16 ==> v2i32
  v8i16 ==> v4i32
  v2i32 ==> v1i64
  v4i32 ==> v2i64

The TableGen definition of UADDLV does not support v2i32 so I did not add the pattern for it.

Next, I think AArch64uaddv supports below input type.

  v8i8
  v16i8
  v4i16
  v8i16
  v4i32

We can see v4i16, v8i16, v4i32 among the output types of AArch64uaddlp are supported as input type of AArch64uaddv so I have added the patterns for these types.
I could make a mistake. If you feel something from above one, please let me know.

> It's a shame the types needed are a little different, else the same multiclass could be used for both the new and the old opcodes. That would add a both the base pattern and the insert/extract pattern, although I'm not entirely sure that both are needed.

um... I thought the multiclass is not suitable for our case because our case has different input/output types and needs multiple Opnodes combination.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104236/new/

https://reviews.llvm.org/D104236