[PATCH] D135564: [AArch64-SVE]: Force generating code compatible to streaming mode for (masked/extending/truncating) load/store

Thu Oct 20 06:13:24 PDT 2022

sdesmalen added a comment.

Hi @hassnaa-arm, I think you have the tests the wrong way around. The tests from D136147 <https://reviews.llvm.org/D136147> should be part of //this// patch, because this is the patch where you're implementing the lowering of the operations you're testing in D136147 <https://reviews.llvm.org/D136147>.
After this patch, you get the masked/truncating/extending load/store operations "for free", so the tests for those operations could be moved to a separate test-only patch like D136147 <https://reviews.llvm.org/D136147>.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1394
+      for (MVT VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
+                     MVT::v4i32, MVT::v1i64, MVT::v2i64})
+        addTypeForStreamingSVE(VT);
----------------
It should be able to use standard scalar instructions for v1i64 in streaming-compatible mode, so this one can be removed from the list.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1397
+
+      for (MVT VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
+                     MVT::v2f64})
----------------
Most scalar FP operations are valid in streaming mode, so we probably don't need to do anything custom for this type.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1609-1611
+  setOperationAction(ISD::ANY_EXTEND, VT, Custom);
+  setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
+  setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
----------------
nit: Perhaps it doesn't lead to an error, but these operations only operate on integers, so should be guarded by:

  if (VT.isInteger()) { ... }

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12264

-  // If this is extracting the upper 64-bits of a 128-bit vector, we match
-  // that directly.
-  if (Size == 64 && Idx * InVT.getScalarSizeInBits() == 64 &&
-      InVT.getSizeInBits() == 128)
-    return Op;
+  if (!Subtarget->forceStreamingCompatibleSVE()) {
+    // If this is extracting the upper 64-bits of a 128-bit vector, we match
----------------
nit: rather than wrapping this in another condition, can you just add it to the existing condition with `&& !Subtarget->forceStreamingCompatibleSVE()` ?

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:3035-3036
   // Extract element from vector with immediate index that's within the bottom 128-bits.
-  let AddedComplexity = 1 in {
-  def : Pat<(i32 (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)),
-            (i32 (UMOVvi8 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
-  def : Pat<(i32 (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)),
-            (i32 (UMOVvi16 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;
-  def : Pat<(i32 (vector_extract (nxv4i32 ZPR:$vec), VectorIndexS:$index)),
-            (i32 (UMOVvi32 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index))>;
-  def : Pat<(i64 (vector_extract (nxv2i64 ZPR:$vec), VectorIndexD:$index)),
-            (i64 (UMOVvi64 (v2i64 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexD:$index))>;
-  }
-
-  def : Pat<(sext_inreg (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index), i8),
-            (i32 (SMOVvi8to32 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
-  def : Pat<(sext_inreg (anyext (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)), i8),
-            (i64 (SMOVvi8to64 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
+  let Predicates = [NotInStreamingSVEMode] in {
+    let AddedComplexity = 1 in {
+    def : Pat<(i32 (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)),
----------------
nit: Can you change this into:

  let Predicates = [NotInStreamingSVEMode], AddedComplexity = 1 in {
  def : Pat<...>
  ..
  }

  let Predicates = [NotInStreamingSVEMode] in {
  def : Pat<..>
  ...
  }

Rather than indenting?

================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll:312
+
+define void @masked_store_v32f32(<32 x float>* %dst, <32 x i1> %mask) #0 {
+; CHECK-LABEL: masked_store_v32f32:
----------------
Can you remove all tests that are larger than "twice the size" of a 128bit vector (v32f32 is 8x the size, I'm not sure what value that adds for the testing of this functionality)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135564/new/

https://reviews.llvm.org/D135564