[Mlir-commits] [mlir] mlir/Presburger: contribute a free-standing parser (PR #94916)

Mon Jul 8 08:11:34 PDT 2024

artagnon wrote:

I had some time to think about the parser, and here are my notes.

The parser is a garden-variety recursive-descent parser with no lookahead. This means that all it can do when it encounters an operation token is to call the overload for the operation on lhs and rhs subexpressions.

To take a simple parsing problem, consider:

```
  (x + 2) floordiv 3
```

The parser sees x. What is x? Is it an IntegerRelation?

```cpp
struct IntegerRelation {
  PresburgerSpace space;
  IntMatrix equalities;
  IntMatrix inequalities;
}
```

If so, is it an equality and inequality?

Is it a MultiAffineFunction?

```cpp
struct MultiAffineFunction {
  PresburgerSpace space;
  IntMatrix output;
  DivisonRepr divs;
}
```

If so, is x the output, or is it a div?

Let's assume that it's an output, and let's continue parsing with this MultiAffineFunction that has an IntMatrix with one row and one column.

We encounter 2, which we handle the same way as x, and x + 2 is a MultiAffineFunction with output having two columns and one row.

Now, we encounter the floordiv. We now need to construct a fresh MultiAffineFunction with a fresh PresburgerSpace that has numDivs = 0 + 1.

In the current parser, this is represented as:

```
{
  .linearDividend = {0, 2} // vector
  .divisor = 3
}
```

Much more efficient and less wasteful, complete with move semantics, and std::unique_ptr, so that there is only ever one root of the parse tree.

Let us set aside the fact that building a MultiAffineFunction piece-wise is wasteful, and proceed to a more complicated example:

```
  (x floordiv 2 + y floordiv 3) floordiv 4
```

x floordiv 2 and y floordiv 3 are MultiAffineFunctions with single divs in them. When you add them, let's assume that you extract the two divs, and create a fresh MultiAffineFunction with two divs (this is a vector with a bunch of numbers).

Now, when we encounter floordiv 4, how do we disambiguate between this expression and:

```
  (x floordiv 2 floordiv 3) floordiv 4
```

In both cases, all we have is a MultiAffineFunction with two divs, which are just a bunch of numbers. Digging this information out of the numbers is quite non-trivial, and we're working with matrices of numbers instead of first-class types.

Moreover, consider the output itself in an exapanded example:

```
  x + y + 2 * (x floordiv 2 + y floordiv 3) floordiv 4
```

When we parse x + y, the output is straightforward: {1, 1}. However, when we get to the div with a multiplication factor, it becomes more complicated, because the following numbers need to be appended: {0, 0, 2}. Basically, as we parse each nested div, we need to add 0 to the output (but how do we know that it's nested in a recursive descent parser?), and when we parse the outermost div, we need to add the multiplication factor 2 (again, how do we know that it's the outermost div?). It's not impossible though: when we parse the subexpression:

```
  (x floordiv 2 + y floordiv 3) floordiv 4
```

We get a MultiAffineFunction with three divs, but it's non-trivial to figure out the nesting structure: all we have are a bunch of numbers, as opposed to first-class types in the current parser. This is why the current parser necessitates the use of a simple flattener that visits nested expressions, and fills in the correct output.

The conclusion is that it's very difficult to parse directly into a MultiAffineFunction (although not impossible), and we'd be paining ourselves for no gain. Apart from the fact that it's wasteful, taking this approach would create more supporting code that inherits from MultiAffineFunction and extends the API.

https://github.com/llvm/llvm-project/pull/94916