[PATCH] D77067: [mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.

Mon Apr 6 04:50:08 PDT 2020

ftynse added a comment.

It would be great to share some parts with the main parser, for example affine expression parsing. I think we can pretty much have `parseAffineExpr(StringRef)` declared in a private header and use it here, possibly with some semantic post-checks on the expression not involving, e.g., SSA values.

================
Comment at: mlir/include/mlir/IR/AffineExpr.h:222

-raw_ostream &operator<<(raw_ostream &os, AffineExpr &expr);
+raw_ostream &operator<<(raw_ostream &os, const AffineExpr &expr);

----------------
Nit: AffineExpr is a value-type, can't we just pass it by-value ?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:86
+    // Keywords.
+    keyword_start,
+    kw_def,
----------------
This makes it look like it's a token `start`... How about `FIRST_KEYWORD = kw_def`, `LAST_KEYWORD=kw_select`?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:135
+
+/// This class implements a simple lexer for operation assembly format strings.
+class Lexer {
----------------
Copy-pasta comment, this is not "operation assembly"

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:288
+                         .Case("ceildiv", Token::kw_ceildiv)
+                         .Case("mod", Token::kw_mod)
+                         .Default(Token::id);
----------------
Missed `select` keyword. We could have some macro magic to make sure modifying the list of tokens also handles them in the lexer.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:294
+
+Token Lexer::lexInteger(const char *tokStart) {
+  // Match the rest of the identifier regex: [0-9a-zA-Z_\-]*
----------------
The code in getUInt64IntegerValue seems to support hex integers, but this clearly does not.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:371
+  LogicalResult
+  parseCommaSeparatedList(const std::function<ParseResult()> &parseElement) {
+    // Non-empty case starts with an element.
----------------
Nit: llvm::function_ref if you don't store the argument

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:396
+    if (failed(parseCommaSeparatedList(parseElement)) ||
+        failed(parseToken(rightToken, "expected ',' or proper token")))
+      return failure();
----------------
"proper token" is unclear as error message

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:545
+  case Token::plus:
+    parser.consumeToken(Token::plus);
+    return AffineLowPrecOp::Add;
----------------
Nit: the operand in `consumeToken` here and below is redundant, the case-expression just above ensures that the token is of the right kind.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:840
+  virtual ~Expression() {}
+  virtual bool equals(const Expression &e) const = 0;
+  operator bool() const { return kind != Kind::Uninitialized; }
----------------
This should be trivial to implement without resorting to virtual functions, dispatching on `kind` and using static_cast.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:849
+///   affine-expr-list ::= affine-expr (`,` affine-expr)*
+///   tensor-use ::= tensor-id `(` `)` |
+///                  tensor-id `(` affine-expr-list `)`
----------------
Nit: tensor-id is not defined

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:855
+  TensorUse() : TensorUse("", AffineMap()) {}
+  TensorUse(const TensorUse &use) : TensorUse(use.tensorId, use.indexingMap) {}
+  TensorUse(StringRef name, AffineMap map)
----------------
Nit: `= default` would also work

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:859
+
+  static bool classof(const Expression *e) {
+    return e->kind == Kind::TensorUse;
----------------
If you use LLVM-style type system, you would normally want to avoid virtual functions...

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:873
+
+  /// Visitation function. Performs `PreOrder` traversal and applies `callback`
+  /// on each node.
----------------
Nit: given that `PreOrder` is a boolean template parameter, I am not sure what "perfoms `PreOrder` traversal` means when the parameter is false. Post-order? In-order? Compilation error?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:893
+  TensorExpr(StringRef name,
+             SmallVector<std::unique_ptr<Expression>, 4> &&exprs,
+             ArrayRef<unsigned> reductionDims)
----------------
Would MutableArrayRef work instead of hardcoding SmallVector with a given size?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:910
+        return false;
+    for (unsigned i = 0, e = reductionDimensions.size(); i < e; ++i)
+      if (reductionDimensions[i] != other.reductionDimensions[i])
----------------
Do you care about the order of reduction dimensions?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:927
+  StringRef opId;
+  SmallVector<std::unique_ptr<Expression>, 4> expressions;
+  SetVector<unsigned> reductionDimensions;
----------------
Do you actually need pointers? Can't we just store `Expression`s as is, eventually with appropriate move semantics to avoid extra copies?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:928
+  SmallVector<std::unique_ptr<Expression>, 4> expressions;
+  SetVector<unsigned> reductionDimensions;
+};
----------------
Why SetVector? In TC, we wouldn't care about the order of reduction dimensions.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:940-941
+  /// Uses the AffineParser to parse the affine exprs used in a tensor
+  /// definition. All identifiers are interpreted as symbols, new symbols are
+  /// added eagerly.
+  SmallVector<AffineExpr, 4>
----------------
And what if discovery mode != symbols ?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:950
+  ///   affine-expr-list ::= affine-expr (`,` affine-expr )*
+  ///   tensor-typedef ::= type `(` `)`                  |
+  ///                      type `(` affine-expr-list `)`
----------------
Nit: something went wrong with formatting here: `|` ran away to the right.  I personally prefer something like

```
foo ::= token token
        continuation line of the same rule
      | another rule
```

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:974
+  ///
+  ///   op-spec ::= id `<` reduction-dims-list `>` |
+  ///               id
----------------
Nit: this comment repeats the comment on `struct TensorExpr`. I am worried about it getting out of sync if the syntax evolves. My recommendation would be to only keep the syntax in a single comment (preferably, the implementation of this method), and just refer to that from the other comments.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:981
+  LogicalResult parseExpression(TensorUse currentDefinition,
+                                std::unique_ptr<Expression> *result,
+                                ComprehensionParsingState &state);
----------------
Nit: why pass by-pointer rather than by-reference?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:989
+  ///   comprehension ::= tensor-def-list `=` tensor-expr-list `;`
+  LogicalResult parseOneComprehension(llvm::raw_ostream &os,
+                                      StringRef cppOpName,
----------------
Why does a parsing function accept an _output_ stream?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:994
+
+  /// Parse and print the information for a TC def.
+  ///
----------------
Plz document what does it print

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1050
+namespace llvm {
+template <> struct DenseMapInfo<TensorExpr> {
+  static TensorExpr getEmptyKey() { return TensorExpr("", {}, {}); }
----------------
Do you actually need this? I only see `DenseMap<TensorExpr *>`, which should be using a generic pointer-based map implementation.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1053
+  static TensorExpr getTombstoneKey() {
+    return TensorExpr("__tombstone__", {}, {});
+  }
----------------
How about `DenseMapInfo<StringRef>::getTombstoneKey()` instead?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1066
+  static TensorUse getTombstoneKey() {
+    return TensorUse("__tombstone__", AffineMap());
+  }
----------------
Same as above, AffineMap also has a DenseMapInfo if I'm not mistaken.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1083
+template <typename Lambda, bool PreOrder>
+void visit(Expression &expr, Lambda callback) {
+  switch (expr.kind) {
----------------
Nit: document how the visitation behaves if the callback mutates the visited object

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1126
+                              AffineMap map, bool isOutput) {
+  tensors.push_back(
+      RegisteredTensor{tensorId, tensorType, map, isOutput, AffineMap()});
----------------
Nit: would emplace_back work?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1135
+  for (unsigned i = 0, e = tensors.size(); i < e; ++i)
+    if (tensors[i].name == id)
+      return tensors[i];
----------------
Have you considered storing tensors in an llvm::StringMap indexed by name instead of doing linear lookups every time?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1147
+
+bool TCParser::isaRegisteredTensor(StringRef id) {
+  for (unsigned i = 0, e = tensors.size(); i < e; ++i)
----------------
Naming nit: `isa` is widely used for downcasting, this is just a lookup; prefer `is`.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1172
+          expr = getAffineSymbolExpr(symbols.size(), parser.context);
+          symbols.push_back(std::make_pair(sRef, expr));
+        } else if (discoveryMode == EagerDiscoveryMode::Dimensions) {
----------------
Would emplace_back work?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1192
+  if (failed(parser.parseToken(Token::id, "expected a name id")) ||
+      failed(parser.parseToken(Token::colon, "expected colon")))
+    return failure();
----------------
Could you just have a default message `expected %tokenname%` instead of having a similar string everywhere

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1196
+  StringRef tensorType = parser.curToken.getSpelling();
+  if (failed(parser.parseToken(Token::id, "expected a type id")))
+    return failure();
----------------
It looks like it would parse just about any id. "expected a type id" sounds a bit misleading because "type id" is not a production rule, and there's no additional check on the id somehow being a type.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1201
+  auto exprs = parseAffineExprs(EagerDiscoveryMode::Symbols, emptyDims);
+  assert(emptyDims.empty());
+  AffineMap map =
----------------
Nit: add a description in the assertion. Also, are we sure this can never happen?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1280
+  };
+  if (failed(parser.parseToken(Token::l_paren, "expected `(`")) ||
+      failed(
----------------
Ultra-nit: we tend to use single quotes rather than backticks in error messages

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1282
+      failed(
+          parser.parseCommaSeparatedListUntil(Token::r_paren, parseExpr, true)))
+    return failure();
----------------
Nit: `/*allowEmptyList=*/true`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1314
+  };
+  if (failed(parser.parseCommaSeparatedListUntil(Token::equal, parseUse, true)))
+    return failure();
----------------
`/*allowEmptyList=*/true`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1322
+    std::unique_ptr<Expression> expr;
+    if (failed(parseExpression(definitions[idx++], &expr, state)))
+      return failure();
----------------
This may crash if you have less LHS declarations than RHS definitions.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1327
+  };
+  if (failed(parser.parseCommaSeparatedListUntil(Token::semicolon, parseExpr,
+                                                 true)))
----------------
`/*allowEmptyList=*/true`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1332
+  // 3. Postprocess.
+  // 3.a. Normalize all maps to the proper dimCount and symbolCount.
+  SmallVector<TensorUse, 4> allUses;
----------------
`dimCount` and `symbolCount` make the comment look outdated, is it?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1352
+    if (seenDefs.count(def.tensorId) > 0) {
+      parser.emitError("Unexpected multi-write with different indexings to a "
+                       "single tensor");
----------------
Did you check that indexings were different?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1365
+        return;
+      if (auto *pUse = dyn_cast<TensorUse>(&e)) {
+        auto &use = *pUse;
----------------
Nit: I'd use early return here

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1373
+          parser.emitError(
+              "Unexpected multi-read of a tensor with different accesses");
+          failed = true;
----------------
[Not for this commit]: I would rather have the parser accept the correct syntax, and have a separate check that implements "semantic" rules.

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1414
+  if (failed(parser.parseCommaSeparatedListUntil(Token::r_paren, parseInputDef,
+                                                 false)))
+    return failure();
----------------
`/*allowEmptyList=*/true`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1425
+  if (failed(parser.parseCommaSeparatedListUntil(Token::r_paren, parseOutputDef,
+                                                 false)))
+    return failure();
----------------
`/*allowEmptyList=*/true`

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1430
+  // newly encountered id in a tensor shape expression is treated as a new
+  // symbolicc. At this point, all tensors have been parsed and all the symbols
+  // that could be discovered eagerly are now known. Resize all AffineMaps to
----------------
typo: "symbolicc"

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1490
+      let extraClassDeclaration = [{{
+        llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
+        llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
----------------
Why is the result optional?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1560
+  std::string mapsStr;
+  llvm::raw_string_ostream ss2(mapsStr);
+  SmallVector<TensorUse, 4> orderedUses(state.orderedTensorArgs.size());
----------------
Nit: could we use more meaningful names than `ss2`?

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1626
+  interleaveComma(state.orderedTensorArgs, ss,
+                  [&](decltype(state.orderedTensorArgs.begin())::value_type) {
+                    ss << "_" << idx << "(args[" << idx << "])";
----------------
C++14 supports `auto` for lambda arguments

================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1645
+
+  ss.flush();
+  ss2.flush();
----------------
Alternatively, you could use `ss.str()` instead of `valueHandleStr` below.
Also, consider better names than ss, ss2, ss3. One `ss` is acceptable in a short function, but here it's really tricky to keep in mind which stream is associated with which string.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77067/new/

https://reviews.llvm.org/D77067