[PATCH] D72808: [mlir] [VectorOps] Lowering of vector.extract_slices to LLVM IR

Thu Jan 16 06:23:39 PST 2020

nicolasvasilache added a comment.

Please provide much more explanation in the commit message because the progressive lowering behavior may be unexpected for most people.

Nothing profound in my comments, just some things I think we should reorganize to keep our house in order.
This looks great, thanks Aart for pushing this forward!

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:900

+class VectorTupleGetOpConversion : public LLVMOpLowering {
+public:
----------------
This looks more like a rewrite / folding pattern than a conversion to me (i.e. the types don't change).
I would prob go for a folding pattern in VectorOps.cpp but we also need to make sure it is accessible and  
In any case please document with a before / after IR example.

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:901
+class VectorTupleGetOpConversion : public LLVMOpLowering {
+public:
+  explicit VectorTupleGetOpConversion(MLIRContext *context,
----------------
Well on second inspection you already have that in the VectorToVectorOps conversion.
So please don't duplicate here and just make the vectorToVector canonicalization patterns "insert" the "foldingPatterns".

Structurally this is also an indication that VectorToLLVM should start including all the VectorToVector rewrites so that things compose properly.

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:911
+    auto tupleGetOp = cast<vector::TupleGetOp>(op);
+    if (auto tupleOp = dyn_cast_or_null<TupleOp>(operands[0].getDefiningOp())) {
+      rewriter.replaceOp(tupleGetOp,
----------------
We should `assert(operands[0].getDefiningOp` is non-null.
In the current design, tuples do not pass function boundaries and do not lower to LLVMIR so we want to fail hard so use cases pop up and we can reevaluate the decision with concrete evidence.

Please also make sure the message is informative enough to surface the context I just explained.

At that point, the whole things should look like:
```
assert(...);
rewriter.replaceOp(cast<...>);
return matchSuccess(); 
```

i.e. no need to dyn_cast + check, this is guaranteed by the pattern rewrite infra.

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:920
+
+class VectorTupleOpConversion : public LLVMOpLowering {
+public:
----------------
Doc please, with before / after IR.

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:930
+                  ConversionPatternRewriter &rewriter) const override {
+    // TODO(ajcbik): bit of a hack for now; if not all uses are
+    //               folded into a matching TupleGetOp, erasing
----------------
Yes that is hacky indeed :)

You could just return failure if the uses are not empty:
```
if (!op->getResult()->use_empty())
  return matchFailure();
```

The pattern will fail to apply until all canonicalizations occured at which point it will apply.
This will also error out gracefully by using the pattern rewrite infrastructure and fail a "full conversion" if things don't canonicalize.

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:991

+/// Progressive lowering of ExtractSlicesOp to tuple of StridedSliceOp.
+class VectorExtractSlicesOpConversion
----------------
Very cool, yay intuition :) !

Same comments as above apply still: bit more doc with before/after IR.
I may not have set a good example myself here in the past, sorry about that! 
As the number of patterns grow we want short and intuitive examples to just get what these do without going deep in code.

Also, this should go through VectorToVector patterns and be included in the VectorToLLVM patterns (e.g. what if the target is not LLVM, some of our friends at Xilinx seem to be in this case for example). 

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:1015
+
+    // Compute the strides between slices in each dimension.
+    SmallVector<int64_t, 4> sliceStrides(rank);
----------------
Strides is probably become overloaded, maybe we should rename `strided_slice` and `insert_strided_slice` as Insert/ExtractSlice and these ops as Insert/ExtractSliceTuple or something along those lines?
In particular the `stride` attribute is not really a stride but a `step`.

In any case, we should have help functions for computing this.
Could you please reuse/refactor/adapt the first functions in VectorTransforms.cpp and move the to VectorOps/Utils.h as appropriate?
Thanks!

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:1026
+    for (int64_t i = 0; i < tupleSize; ++i) {
+      // Compute the vector offsets by de-linearizing the index.
+      SmallVector<int64_t, 4> vectorOffsets(rank);
----------------
We should have help functions for computing this.
Could you please reuse/refactor/adapt the first functions in VectorTransforms.cpp and move the to VectorOps/Utils.h as appropriate?
Thanks!

================
Comment at: mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp:1046
+
+    Value tuple = rewriter.create<vector::TupleOp>(loc, tupleType, tupleValues);
+    rewriter.replaceOp(op, tuple);
----------------
Great, hooray for progressive lowering and pattern/canonicalization compositions ! 

================
Comment at: mlir/lib/Dialect/VectorOps/VectorOps.cpp:1701
+
+void TupleGetOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
+                                             MLIRContext *context) {
----------------
This can be called directly from other getXXXPatterns that want to use it, no need to duplicate the pattern.

================
Comment at: mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir:532
+}
+// CHECK-LABEL: extract_strides(%arg0: !llvm<"[3 x <3 x float>]">)
+//      CHECK: %[[s0:.*]] = llvm.mlir.constant(dense<0.000000e+00> : vector<1x1xf32>) : !llvm<"[1 x <1 x float>]">
----------------
This should be made into a VectorToVector test, that more directly tests "extract_slices" -> "strided_slices" + the removal of the tuple op.
Atm it tests too much at a distance and involves multiple patterns coming together.
Testing those patterns coming together could be done in a separate test pass where canonicalization, DCE and pattern interaction would kick in too and many things would get simplified.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72808/new/

https://reviews.llvm.org/D72808