[PATCH] D77067: [mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.
Alex Zinenko via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 6 04:50:08 PDT 2020
ftynse added a comment.
It would be great to share some parts with the main parser, for example affine expression parsing. I think we can pretty much have `parseAffineExpr(StringRef)` declared in a private header and use it here, possibly with some semantic post-checks on the expression not involving, e.g., SSA values.
================
Comment at: mlir/include/mlir/IR/AffineExpr.h:222
-raw_ostream &operator<<(raw_ostream &os, AffineExpr &expr);
+raw_ostream &operator<<(raw_ostream &os, const AffineExpr &expr);
----------------
Nit: AffineExpr is a value-type, can't we just pass it by-value ?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:86
+ // Keywords.
+ keyword_start,
+ kw_def,
----------------
This makes it look like it's a token `start`... How about `FIRST_KEYWORD = kw_def`, `LAST_KEYWORD=kw_select`?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:135
+
+/// This class implements a simple lexer for operation assembly format strings.
+class Lexer {
----------------
Copy-pasta comment, this is not "operation assembly"
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:288
+ .Case("ceildiv", Token::kw_ceildiv)
+ .Case("mod", Token::kw_mod)
+ .Default(Token::id);
----------------
Missed `select` keyword. We could have some macro magic to make sure modifying the list of tokens also handles them in the lexer.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:294
+
+Token Lexer::lexInteger(const char *tokStart) {
+ // Match the rest of the identifier regex: [0-9a-zA-Z_\-]*
----------------
The code in getUInt64IntegerValue seems to support hex integers, but this clearly does not.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:371
+ LogicalResult
+ parseCommaSeparatedList(const std::function<ParseResult()> &parseElement) {
+ // Non-empty case starts with an element.
----------------
Nit: llvm::function_ref if you don't store the argument
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:396
+ if (failed(parseCommaSeparatedList(parseElement)) ||
+ failed(parseToken(rightToken, "expected ',' or proper token")))
+ return failure();
----------------
"proper token" is unclear as error message
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:545
+ case Token::plus:
+ parser.consumeToken(Token::plus);
+ return AffineLowPrecOp::Add;
----------------
Nit: the operand in `consumeToken` here and below is redundant, the case-expression just above ensures that the token is of the right kind.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:840
+ virtual ~Expression() {}
+ virtual bool equals(const Expression &e) const = 0;
+ operator bool() const { return kind != Kind::Uninitialized; }
----------------
This should be trivial to implement without resorting to virtual functions, dispatching on `kind` and using static_cast.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:849
+/// affine-expr-list ::= affine-expr (`,` affine-expr)*
+/// tensor-use ::= tensor-id `(` `)` |
+/// tensor-id `(` affine-expr-list `)`
----------------
Nit: tensor-id is not defined
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:855
+ TensorUse() : TensorUse("", AffineMap()) {}
+ TensorUse(const TensorUse &use) : TensorUse(use.tensorId, use.indexingMap) {}
+ TensorUse(StringRef name, AffineMap map)
----------------
Nit: `= default` would also work
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:859
+
+ static bool classof(const Expression *e) {
+ return e->kind == Kind::TensorUse;
----------------
If you use LLVM-style type system, you would normally want to avoid virtual functions...
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:873
+
+ /// Visitation function. Performs `PreOrder` traversal and applies `callback`
+ /// on each node.
----------------
Nit: given that `PreOrder` is a boolean template parameter, I am not sure what "perfoms `PreOrder` traversal` means when the parameter is false. Post-order? In-order? Compilation error?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:893
+ TensorExpr(StringRef name,
+ SmallVector<std::unique_ptr<Expression>, 4> &&exprs,
+ ArrayRef<unsigned> reductionDims)
----------------
Would MutableArrayRef work instead of hardcoding SmallVector with a given size?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:910
+ return false;
+ for (unsigned i = 0, e = reductionDimensions.size(); i < e; ++i)
+ if (reductionDimensions[i] != other.reductionDimensions[i])
----------------
Do you care about the order of reduction dimensions?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:927
+ StringRef opId;
+ SmallVector<std::unique_ptr<Expression>, 4> expressions;
+ SetVector<unsigned> reductionDimensions;
----------------
Do you actually need pointers? Can't we just store `Expression`s as is, eventually with appropriate move semantics to avoid extra copies?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:928
+ SmallVector<std::unique_ptr<Expression>, 4> expressions;
+ SetVector<unsigned> reductionDimensions;
+};
----------------
Why SetVector? In TC, we wouldn't care about the order of reduction dimensions.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:940-941
+ /// Uses the AffineParser to parse the affine exprs used in a tensor
+ /// definition. All identifiers are interpreted as symbols, new symbols are
+ /// added eagerly.
+ SmallVector<AffineExpr, 4>
----------------
And what if discovery mode != symbols ?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:950
+ /// affine-expr-list ::= affine-expr (`,` affine-expr )*
+ /// tensor-typedef ::= type `(` `)` |
+ /// type `(` affine-expr-list `)`
----------------
Nit: something went wrong with formatting here: `|` ran away to the right. I personally prefer something like
```
foo ::= token token
continuation line of the same rule
| another rule
```
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:974
+ ///
+ /// op-spec ::= id `<` reduction-dims-list `>` |
+ /// id
----------------
Nit: this comment repeats the comment on `struct TensorExpr`. I am worried about it getting out of sync if the syntax evolves. My recommendation would be to only keep the syntax in a single comment (preferably, the implementation of this method), and just refer to that from the other comments.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:981
+ LogicalResult parseExpression(TensorUse currentDefinition,
+ std::unique_ptr<Expression> *result,
+ ComprehensionParsingState &state);
----------------
Nit: why pass by-pointer rather than by-reference?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:989
+ /// comprehension ::= tensor-def-list `=` tensor-expr-list `;`
+ LogicalResult parseOneComprehension(llvm::raw_ostream &os,
+ StringRef cppOpName,
----------------
Why does a parsing function accept an _output_ stream?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:994
+
+ /// Parse and print the information for a TC def.
+ ///
----------------
Plz document what does it print
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1050
+namespace llvm {
+template <> struct DenseMapInfo<TensorExpr> {
+ static TensorExpr getEmptyKey() { return TensorExpr("", {}, {}); }
----------------
Do you actually need this? I only see `DenseMap<TensorExpr *>`, which should be using a generic pointer-based map implementation.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1053
+ static TensorExpr getTombstoneKey() {
+ return TensorExpr("__tombstone__", {}, {});
+ }
----------------
How about `DenseMapInfo<StringRef>::getTombstoneKey()` instead?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1066
+ static TensorUse getTombstoneKey() {
+ return TensorUse("__tombstone__", AffineMap());
+ }
----------------
Same as above, AffineMap also has a DenseMapInfo if I'm not mistaken.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1083
+template <typename Lambda, bool PreOrder>
+void visit(Expression &expr, Lambda callback) {
+ switch (expr.kind) {
----------------
Nit: document how the visitation behaves if the callback mutates the visited object
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1126
+ AffineMap map, bool isOutput) {
+ tensors.push_back(
+ RegisteredTensor{tensorId, tensorType, map, isOutput, AffineMap()});
----------------
Nit: would emplace_back work?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1135
+ for (unsigned i = 0, e = tensors.size(); i < e; ++i)
+ if (tensors[i].name == id)
+ return tensors[i];
----------------
Have you considered storing tensors in an llvm::StringMap indexed by name instead of doing linear lookups every time?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1147
+
+bool TCParser::isaRegisteredTensor(StringRef id) {
+ for (unsigned i = 0, e = tensors.size(); i < e; ++i)
----------------
Naming nit: `isa` is widely used for downcasting, this is just a lookup; prefer `is`.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1172
+ expr = getAffineSymbolExpr(symbols.size(), parser.context);
+ symbols.push_back(std::make_pair(sRef, expr));
+ } else if (discoveryMode == EagerDiscoveryMode::Dimensions) {
----------------
Would emplace_back work?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1192
+ if (failed(parser.parseToken(Token::id, "expected a name id")) ||
+ failed(parser.parseToken(Token::colon, "expected colon")))
+ return failure();
----------------
Could you just have a default message `expected %tokenname%` instead of having a similar string everywhere
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1196
+ StringRef tensorType = parser.curToken.getSpelling();
+ if (failed(parser.parseToken(Token::id, "expected a type id")))
+ return failure();
----------------
It looks like it would parse just about any id. "expected a type id" sounds a bit misleading because "type id" is not a production rule, and there's no additional check on the id somehow being a type.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1201
+ auto exprs = parseAffineExprs(EagerDiscoveryMode::Symbols, emptyDims);
+ assert(emptyDims.empty());
+ AffineMap map =
----------------
Nit: add a description in the assertion. Also, are we sure this can never happen?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1280
+ };
+ if (failed(parser.parseToken(Token::l_paren, "expected `(`")) ||
+ failed(
----------------
Ultra-nit: we tend to use single quotes rather than backticks in error messages
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1282
+ failed(
+ parser.parseCommaSeparatedListUntil(Token::r_paren, parseExpr, true)))
+ return failure();
----------------
Nit: `/*allowEmptyList=*/true`
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1314
+ };
+ if (failed(parser.parseCommaSeparatedListUntil(Token::equal, parseUse, true)))
+ return failure();
----------------
`/*allowEmptyList=*/true`
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1322
+ std::unique_ptr<Expression> expr;
+ if (failed(parseExpression(definitions[idx++], &expr, state)))
+ return failure();
----------------
This may crash if you have less LHS declarations than RHS definitions.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1327
+ };
+ if (failed(parser.parseCommaSeparatedListUntil(Token::semicolon, parseExpr,
+ true)))
----------------
`/*allowEmptyList=*/true`
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1332
+ // 3. Postprocess.
+ // 3.a. Normalize all maps to the proper dimCount and symbolCount.
+ SmallVector<TensorUse, 4> allUses;
----------------
`dimCount` and `symbolCount` make the comment look outdated, is it?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1352
+ if (seenDefs.count(def.tensorId) > 0) {
+ parser.emitError("Unexpected multi-write with different indexings to a "
+ "single tensor");
----------------
Did you check that indexings were different?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1365
+ return;
+ if (auto *pUse = dyn_cast<TensorUse>(&e)) {
+ auto &use = *pUse;
----------------
Nit: I'd use early return here
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1373
+ parser.emitError(
+ "Unexpected multi-read of a tensor with different accesses");
+ failed = true;
----------------
[Not for this commit]: I would rather have the parser accept the correct syntax, and have a separate check that implements "semantic" rules.
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1414
+ if (failed(parser.parseCommaSeparatedListUntil(Token::r_paren, parseInputDef,
+ false)))
+ return failure();
----------------
`/*allowEmptyList=*/true`
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1425
+ if (failed(parser.parseCommaSeparatedListUntil(Token::r_paren, parseOutputDef,
+ false)))
+ return failure();
----------------
`/*allowEmptyList=*/true`
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1430
+ // newly encountered id in a tensor shape expression is treated as a new
+ // symbolicc. At this point, all tensors have been parsed and all the symbols
+ // that could be discovered eagerly are now known. Resize all AffineMaps to
----------------
typo: "symbolicc"
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1490
+ let extraClassDeclaration = [{{
+ llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
+ llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
----------------
Why is the result optional?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1560
+ std::string mapsStr;
+ llvm::raw_string_ostream ss2(mapsStr);
+ SmallVector<TensorUse, 4> orderedUses(state.orderedTensorArgs.size());
----------------
Nit: could we use more meaningful names than `ss2`?
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1626
+ interleaveComma(state.orderedTensorArgs, ss,
+ [&](decltype(state.orderedTensorArgs.begin())::value_type) {
+ ss << "_" << idx << "(args[" << idx << "])";
----------------
C++14 supports `auto` for lambda arguments
================
Comment at: mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp:1645
+
+ ss.flush();
+ ss2.flush();
----------------
Alternatively, you could use `ss.str()` instead of `valueHandleStr` below.
Also, consider better names than ss, ss2, ss3. One `ss` is acceptable in a short function, but here it's really tricky to keep in mind which stream is associated with which string.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77067/new/
https://reviews.llvm.org/D77067
More information about the llvm-commits
mailing list