[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.

Mon Oct 2 05:27:56 PDT 2023

sdesmalen added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mattr=+sve -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_128
----------------
Can you remove this RUN line? This RUN line is identical to the last RUN line.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:3
+; RUN: llc -mattr=+sve -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_128
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_256
----------------
Can you stick with one vector width (e.g. min==max==128)? I don't think the line below (min==max==256) is testing anything that the current RUN line isn't already testing when `-force-stremaing-compatible` is enabled.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:110
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <8 x i32> <i32 0, i32 7, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  ret <8 x i8> %1
----------------
Please rename this to something like ` shuffle_index_indices_from_op1` to make it clear the mask takes all the lanes from `%op1`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:211
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <8 x i32> <i32 8, i32 9, i32 9, i32 11, i32 12, i32 15, i32 14, i32 15>
+  ret <8 x i8> %1
----------------
Please rename this to something like  `shuffle_index_indices_from_op2` to make it clear the mask takes all the lanes from `%op2`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:345
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <8 x i32> <i32 1, i32 9, i32 10, i32 11, i32 12, i32 12, i32 14, i32 15>
+  ret <8 x i8> %1
----------------
Please rename this to something like  `shuffle_index_indices_from_both_ops` to make it clear the mask takes all the lanes from both `%op1` and `%op2`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:349
+
+define <8 x i8> @shuffle_index_size_unacceptable_op_poison(ptr %a, ptr %b) {
+; CHECK-LABEL: shuffle_index_size_unacceptable_op_poison:
----------------
Please remove the `acceptable` and `_unacceptable` from the names. If something cannot use the TBL instruction, then prepend the name with `negative_`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:380
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <8 x i32> <i32 1, i32 9, i32 10, i32 11, i32 12, i32 12, i32 14, i32 poison>
+  ret <8 x i8> %1
----------------
As I said [[https://reviews.llvm.org/D152205?id=556773#inline-1548136|here]], one of the indices being `poison` should not be a reason not to use vector instructions. This case can be handled by the algorithm.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:384
+
+define <8 x i8> @shuffle_index_size_unacceptable_op_both_maxhw(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: shuffle_index_size_unacceptable_op_both_maxhw:
----------------
Please inline `"target-features"="+sve2" vscale_range(16,16)` here rather than using `#0`, as that makes the test a bit easier to read.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:384
+
+define <8 x i8> @shuffle_index_size_unacceptable_op_both_maxhw(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: shuffle_index_size_unacceptable_op_both_maxhw:
----------------
sdesmalen wrote:
> Please inline `"target-features"="+sve2" vscale_range(16,16)` here rather than using `#0`, as that makes the test a bit easier to read.
Can you add a comment that clarifies why this function doesn't use TBL?

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:446
+;
+; SVE2_128-LABEL: shuffle_doublemask_size_unacceptable_form:
+; SVE2_128:       // %bb.0:
----------------
This test is named `_unacceptable_form`, yet it uses TBL without issues. So this is not an "unacceptable form"? :) 

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:702
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <16 x i32> <i32 1, i32 7, i32 15, i32 3, i32 0, i32 0, i32 0, i32 4, i32 1, i32 7, i32 15, i32 3, i32 0, i32 0, i32 0, i32 4>
+  ret <16 x i8> %1
----------------
How is this test different from `shuffle_index_size_acceptable_op_both`, other than it having a slightly different mask?

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:833
+  %op2 = load <16 x i8>, ptr %b
+  %1 = shufflevector <16 x i8> %op1, <16 x i8> %op2, <8 x i32> <i32 1, i32 7, i32 31, i32 3, i32 0, i32 0, i32 0, i32 4>
+  ret <8 x i8> %1
----------------
How is this test different from `shuffle_index_size_acceptable_op_both`, other than it having a slightly different mask and a wider return type?

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll:889
+  %op2 = load <8 x i8>, ptr %b
+  %1 = shufflevector <8 x i8> %op1, <8 x i8> %op2, <8 x i32> <i32 8, i32 9, i32 9, i32 11, i32 12, i32 15, i32 14, i32 15>
+  ret <8 x i8> %1
----------------
I don't think this is a good test, because this test is relying on a DAGCombine that swaps `%op1` and `%op2`. I think you can just remove it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152205/new/

https://reviews.llvm.org/D152205