[PATCH] D77067: [mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.

Mon Apr 6 20:11:04 PDT 2020

nicolasvasilache marked 65 inline comments as done.
nicolasvasilache added inline comments.

================
Comment at: mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc:5
+
+// RUN: mlir-linalg-ods-gen %s -gen-ods-decl=1 -o %t \
+// RUN: && cat %S/../../include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td %t \
----------------
ftynse wrote:
> nicolasvasilache wrote:
> > silvas wrote:
> > > layering-wise would prefer to not test this here. If needed, we can add a separate test elsewhere that does this .td -> .inc file check. Strictly speaking what ends up in the .inc file is not really the concern of this component, only the contents of the .td file.
> > > 
> > This was suggested by @ftynse to show the ODS is valid and how it connects to tblgen by mirroring this test: https://github.com/llvm/llvm-project/blob/master/mlir/test/mlir-tblgen/llvm-intrinsics.td#L11.
> > 
> > I am fine either way, would just like consensus on this before reverting back to the previous state.
> > Please reopen if you feel strongly about this.
> > @ftynse any strong opinion?
> I'm not sure I understand what is the concern here? The `ODS` check verifies the content of the produced .td file, _not_ the result of feeding that .td file to `mlir-tblgen -gen-op-defs`, which is indeed a separate concern. The `IMPL` check verifies the implementations of methods that are declared in the `.td` file and there is simply no other place where we can verify them.
> 
> The staging here is:
> 1a. mlir-linalg-ods-gen -gen-ods-decl %this_file% > ods.td
> 1b. mlir-linalg-ods-gen -gen-impl %this_file% > impl.cc
> 2a. mlir-tblgen -gen-op-decl ods.td > ods.h
> 2b. mlir-tblgen -gen-op-decl ods.td > ods.cc
> 3. include impl.cc and ods.cc into the implementation file; and ods.h into the header file.
> 
> @nicolasvasilache the test you referenced also has `RUN` lines making sure `mlir-tblgen` can consume what the first stage produces. Consider adding them here as well. This could help detect cases of ODS syntax change (the simple syntactic test passes, but not the piping check). That's why there is only a trivial check to make sure FileCheck eats something.
```
I'm not sure I understand what is the concern here? 
...
Consider adding them here as well.
```

That's precisely what the concern was IIUC, piping through mlir-tblgen (see previous snapshot that I updated improperly https://reviews.llvm.org/D77067?id=254251).

Restored that part of the test.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:288
+                         .Case("ceildiv", Token::kw_ceildiv)
+                         .Case("mod", Token::kw_mod)
+                         .Default(Token::id);
----------------
ftynse wrote:
> Missed `select` keyword. We could have some macro magic to make sure modifying the list of tokens also handles them in the lexer.
Rather than continue duplicating here, MLIR should expose the lexer and parser and everyone's life will be better.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:294
+
+Token Lexer::lexInteger(const char *tokStart) {
+  // Match the rest of the identifier regex: [0-9a-zA-Z_\-]*
----------------
ftynse wrote:
> The code in getUInt64IntegerValue seems to support hex integers, but this clearly does not.
Yes, I am trimming liberally until MLIR exposes its lexer and parser at which point all this can disappear.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:859
+
+  static bool classof(const Expression *e) {
+    return e->kind == Kind::TensorUse;
----------------
ftynse wrote:
> If you use LLVM-style type system, you would normally want to avoid virtual functions...
why ? https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html#basic-setup shows it's perfectly fine to use abstract base classes and LLVM RTTI.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:910
+        return false;
+    for (unsigned i = 0, e = reductionDimensions.size(); i < e; ++i)
+      if (reductionDimensions[i] != other.reductionDimensions[i])
----------------
ftynse wrote:
> Do you care about the order of reduction dimensions?
you should otherwise your computation is non-deterministic

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:927
+  StringRef opId;
+  SmallVector<std::unique_ptr<Expression>, 4> expressions;
+  SetVector<unsigned> reductionDimensions;
----------------
ftynse wrote:
> Do you actually need pointers? Can't we just store `Expression`s as is, eventually with appropriate move semantics to avoid extra copies?
It's this or uniqu'ing, underlying storage, placement new etc etc.
I went for the simple solution.
When we have strong data that we need to scale much more we can revisit. 

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:928
+  SmallVector<std::unique_ptr<Expression>, 4> expressions;
+  SetVector<unsigned> reductionDimensions;
+};
----------------
ftynse wrote:
> Why SetVector? In TC, we wouldn't care about the order of reduction dimensions.
reductions loops don't commute in FP land

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1135
+  for (unsigned i = 0, e = tensors.size(); i < e; ++i)
+    if (tensors[i].name == id)
+      return tensors[i];
----------------
ftynse wrote:
> Have you considered storing tensors in an llvm::StringMap indexed by name instead of doing linear lookups every time?
I need 2 extra maps and really don't anticipate a single named op to ever to a point where this would matter.
Of course if proven otherwise I'm happy to reconsider.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1192
+  if (failed(parser.parseToken(Token::id, "expected a name id")) ||
+      failed(parser.parseToken(Token::colon, "expected colon")))
+    return failure();
----------------
ftynse wrote:
> Could you just have a default message `expected %tokenname%` instead of having a similar string everywhere
I'm reluctant to invest more in duplicating something that should be exposed by core in a later NFC revision. 

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1373
+          parser.emitError(
+              "Unexpected multi-read of a tensor with different accesses");
+          failed = true;
----------------
ftynse wrote:
> [Not for this commit]: I would rather have the parser accept the correct syntax, and have a separate check that implements "semantic" rules.
Agreed, there are a few other things for follwups too, thanks!

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1430
+  // newly encountered id in a tensor shape expression is treated as a new
+  // symbolicc. At this point, all tensors have been parsed and all the symbols
+  // that could be discovered eagerly are now known. Resize all AffineMaps to
----------------
ftynse wrote:
> typo: "symbolicc"
it's a faster `symbolcc`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1490
+      let extraClassDeclaration = [{{
+        llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
+        llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
----------------
ftynse wrote:
> Why is the result optional?
this is what the ODS currently is because of manual "named ops", will be cleaned later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77067/new/

https://reviews.llvm.org/D77067