[PATCH] D95363: [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse

Mon Jan 25 08:46:37 PST 2021

sdesmalen added inline comments.

================
Comment at: llvm/include/llvm/IR/IRBuilder.h:2538
+        M, Intrinsic::experimental_vector_reverse, Ty);
+    return Insert(CallInst::Create(F, V), Name);
+  }
----------------
It would be nice if the IRBuilder interface supports both the scalable and fixed-width case, i.e. for scalable vectors it creates the intrinsic, for fixed-width vectors it creates a shufflevector.
Then in the LoopVectorizer it can just call `CreateVectorReverse(Vec)` and get the same behaviour.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:451-456
+  // Avoid to call BasicTTImpl for scalable vector and then assert
+  // ATM in the BasicTTImpl the cost for ISD that accepts scalable
+  // vector is 0
+  // but not all of them accept scalable vector.
+  if (isa<ScalableVectorType>(Src))
+    return 0;
----------------
I don't think you should return just some random value here. Is there anything in broken in BasicTTIImpl that needs fixing?

If you run into any asserts caused by the cast from FP -> Int, you can simplify the test so that it doesn't need that cast (and then fix this up in a separate patch)

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1190
+  // for TTI::SK_Reverse. Fixed vector types keep using shuffle vector
+  if (Kind == TTI::SK_Reverse && isa<ScalableVectorType>(Tp)) {
+    std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
----------------
Can you instead add cases to the table?

e.g.

  { TTI::SK_Reverse, MVT::nxv16i8, 1 },
  { TTI::SK_Reverse, MVT::nxv8i16, 1 },
   ...

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2800
     if (Reverse) {
-      assert(!VF.isScalable() &&
-             "Reversing vectors is not yet supported for scalable vectors.");
-
+      unsigned NumElts = VF.isScalable()
+                             ? VectorLoopValueMap.getNumCachedLanes()
----------------
`NumElts` can't be `unsigned` here, because it has to be a value that's evaluated at runtime.
This means it needs to multiply `VF.getKnownMinValue` by the runtime value of VScale, making `NumElts` a `Value *`.

For the first GEP you can also do the arithmetic on a pointer of the vector type, and index by `-Part`, e.g.
  getelementptr <vscale x 8 x double>, <vscale x 8 x double> *%ptr, i64 -<part>
<=>
  %vscale = call i64 @llvm.vscale()
  %part = mul i64 %vscale, 8
  getelementptr double, double *%ptr, i64 %part
The second GEP indexes a specific element, so that would still need to be a getelementptr on a `double*`.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll:23
+; CHECK-NEXT:  %[[REVERSE6:.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> %[[FADD]])
+; CHECK-NEXT:  %{{.*}} = getelementptr inbounds double, double* %{{.*}}, i64 %{{.*}}
+; CHECK-NEXT:  %[[CAST:.*]] = bitcast double* %{{.*}} to <vscale x 8 x double>*
----------------
It is difficult to verify what this code is generating, and the GEPs here are actually really important to get right, so can you just check for explicit values?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95363/new/

https://reviews.llvm.org/D95363