[PATCH] D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv
JinGu Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 16 08:14:46 PDT 2021
jaykang10 added a comment.
In D104236#2821939 <https://reviews.llvm.org/D104236#2821939>, @dmgreen wrote:
> Thanks. I would expect 2 more types I think. v2i32 and v2i64? It may be better to create a new multiclass like SIMDAcrossLanesIntrinsic.
The TableGen definition of UADDLV is as below.
defm UADDLV : SIMDAcrossLanesHSD<1, 0b00011, "uaddlv">;
...
multiclass SIMDAcrossLanesHSD<bit U, bits<5> opcode, string asm> {
def v8i8v : BaseSIMDAcrossLanes<0, U, 0b00, opcode, FPR16, V64,
asm, ".8b", []>;
def v16i8v : BaseSIMDAcrossLanes<1, U, 0b00, opcode, FPR16, V128,
asm, ".16b", []>;
def v4i16v : BaseSIMDAcrossLanes<0, U, 0b01, opcode, FPR32, V64,
asm, ".4h", []>;
def v8i16v : BaseSIMDAcrossLanes<1, U, 0b01, opcode, FPR32, V128,
asm, ".8h", []>;
def v4i32v : BaseSIMDAcrossLanes<1, U, 0b10, opcode, FPR64, V128,
asm, ".4s", []>;
}
Therefore, it supports below pairs of input and output.
v8i8 ==> i16
v16i8 ==> i16
v4i16 ==> i32
v8i16 ==> i32
v4i32 ==> i64
>From The TableGen definition of UADDLP, I think the AArch64uaddlp node supports below intput and output types.
v8i8 ==> v4i16
v16i8 ==> v8i16
v4i16 ==> v2i32
v8i16 ==> v4i32
v2i32 ==> v1i64
v4i32 ==> v2i64
The TableGen definition of UADDLV does not support v2i32 so I did not add the pattern for it.
Next, I think AArch64uaddv supports below input type.
v8i8
v16i8
v4i16
v8i16
v4i32
We can see v4i16, v8i16, v4i32 among the output types of AArch64uaddlp are supported as input type of AArch64uaddv so I have added the patterns for these types.
I could make a mistake. If you feel something from above one, please let me know.
> It's a shame the types needed are a little different, else the same multiclass could be used for both the new and the old opcodes. That would add a both the base pattern and the insert/extract pattern, although I'm not entirely sure that both are needed.
um... I thought the multiclass is not suitable for our case because our case has different input/output types and needs multiple Opnodes combination.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104236/new/
https://reviews.llvm.org/D104236
More information about the llvm-commits
mailing list