[PATCH] D136858: [AArch64-SVE]: Force generating code compatible to streaming mode for sve-fixed-length tests.

Wed Nov 16 10:11:10 PST 2022

sdesmalen added a comment.

Hi @hassnaa-arm, I've not been able to go through the entire patch yet, but I think it makes sense to split it up to make the changes a bit easier to review. I've left some comments to suggest how to split it up and also some comments on the code-changes itself.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1664
+    setCondCodeAction(ISD::SETOLT, VT, Expand);
+    setCondCodeAction(ISD::SETLT, VT, Expand);
+    setCondCodeAction(ISD::SETOLE, VT, Expand);
----------------
SETLT and SETLE should not be in this list, because they have a 1-1 mapping with instructions.
Most of the other nodes need expanding into two nodes (one for ordered/unordered and one for LE/LT), with SETO need expanding to `not(unordered)`.

It would also be nice if these changes could be moved to a separate patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1674-1682
+  setTruncStoreAction(MVT::v2f32, MVT::v2f16, Custom);
+  setTruncStoreAction(MVT::v4f32, MVT::v4f16, Custom);
+  setTruncStoreAction(MVT::v8f32, MVT::v8f16, Custom);
+  setTruncStoreAction(MVT::v1f64, MVT::v1f16, Custom);
+  setTruncStoreAction(MVT::v2f64, MVT::v2f16, Custom);
+  setTruncStoreAction(MVT::v4f64, MVT::v4f16, Custom);
+  setTruncStoreAction(MVT::v1f64, MVT::v1f32, Custom);
----------------
These actions are unrelated to `VT` as passed into the function, so they can be moved out of this function.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3840

-  if (useSVEForFixedLengthVectorVT(SrcVT))
+  if (useSVEForFixedLengthVectorVT(SrcVT,
+                                   Subtarget->forceStreamingCompatibleSVE()))
----------------
It would be nice if you could split up your patch such that each such a code change, lives in its own patch with a set of corresponding tests. That makes the patch a bit more manageable to review.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12772

-  if (useSVEForFixedLengthVectorVT(Op.getOperand(0).getValueType()))
+  if (useSVEForFixedLengthVectorVT(Op.getOperand(0).getValueType(),
+                                   Subtarget->forceStreamingCompatibleSVE()))
----------------
Please move this change and corresponding tests to a separate patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:14066
   bool UseScalable;
-  if (!Subtarget->hasNEON() ||
+  if (Subtarget->forceStreamingCompatibleSVE() ||
+      !Subtarget->hasNEON() ||
----------------
Rather than adding this condition here, you can add the condition to `isLegalInterleavedAccessType` like this:
```-  if (Subtarget->useSVEForFixedLengthVectors() &&
-      (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
-       (VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
-        isPowerOf2_32(NumElements) && VecSize > 128))) {
+  if (Subtarget->forceStreamingCompatibleSVE() ||
+      (Subtarget->useSVEForFixedLengthVectors() &&
+       (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
+        (VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
+         isPowerOf2_32(NumElements) && VecSize > 128)))) {
```

When you add `vscale_range(1,16)` to the attributes of the test file, you will get the code you expect.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12931
       useSVEForFixedLengthVectorVT(
           SrcVT, OverrideNEON && Subtarget->useSVEForFixedLengthVectors())) {

----------------
sdesmalen wrote:
> Instead of updating the OverrideNEON variable, I suspect that you actually want to do something like this:
> 
>   if (SrcVT.isScalableVector() ||
>       useSVEForFixedLengthVectorVT(
>           SrcVT, OverrideNEON && Subtarget->useSVEForFixedLengthVectors()) ||
>       useSVEForFixedLengthVectorVT(
>           SrcVT, Subtarget->forceStreamingCompatibleSVE()))
> 
> so to not alter the behaviour for non-streaming fixed-length vectors. This avoids the regressions in sve-fixed-length-int-reduce.ll where the SVE variants require an additional predicate (whereas the NEON reduction operations are unpredicated and thus only 1 instruction).
This can actually be simplified to:
```
bool OverrideNEON = Subtarget->forceStreamingCompatibleSVE() ||
                    (Subtarget->useSVEForFixedLengthVectors() &&
                     (Op.getOpcode() == ISD::VECREDUCE_AND ||
                      Op.getOpcode() == ISD::VECREDUCE_OR ||
                      Op.getOpcode() == ISD::VECREDUCE_XOR ||
                      Op.getOpcode() == ISD::VECREDUCE_FADD ||
                      (Op.getOpcode() != ISD::VECREDUCE_ADD &&
                       SrcVT.getVectorElementType() == MVT::i64)));
if (SrcVT.isScalableVector() ||
    useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {```

It would be nice if you could move this change, the changes in `addTypeForStreamingSVE` and corresponding reduction tests to a separate patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:3656
       AArch64::FPR128RegClass.contains(SrcReg)) {
-    if (Subtarget.hasNEON()) {
+    if (Subtarget.forceStreamingCompatibleSVE()) {
+      BuildMI(MBB, I, DL, get(AArch64::ORR_ZZZ))
----------------
Can you move this change to a separate patch and test it with something very simple, such as:

```define fp128 @test_streaming_compatible_register_mov(fp128 %q0, fp128 %q1) {
; CHECK-LABEL: test_streaming_compatible_register_mov:
; CHECK:       // %bb.0:
; CHECK-NEXT:    mov     z0.d, z1.d
; CHECK-NEXT:    ret
  ret fp128 %y
}```

================
Comment at: llvm/test/CodeGen/AArch64/-streaming-mode-fixed-length-fp-arith.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -force-streaming-compatible-sve < %s | FileCheck %s
----------------
The name of this file is incorrect, it should start with `sve-` instead of `-`

================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll:20
+  %strided.vec = shufflevector <8 x double> %load, <8 x double> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
+  store <8 x double> zeroinitializer, ptr %ptr
+  ret void
----------------
This test seems broken, because it's not using the result `%strided.vec` from the shufflevector, which as we can see from the assembly causes the load + shuffle to be removed entirely.

I've fixed `sve-fixed-ld2-alloca.ll` in c2600244fc14, can you update this test accordingly?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136858/new/

https://reviews.llvm.org/D136858