[PATCH] D80860: Exact integer emptiness checks for FlatAffineConstraints

Wed Jun 3 15:29:28 PDT 2020

ftynse requested changes to this revision.
ftynse added a comment.
This revision now requires changes to proceed.

First of all, thanks for your contribution! This is an important and hard piece of polyhedral machinery.

The sheer size of the patch is huge and this will take some time to review. I only looked at the +/-style and simple issues and did not go into details of the algorithm itself. It would be helpful to know if you ran some code coverage tools on it.

Here is a list of major observations, which I did not always repeat:

- I am concerned by 32 as default number of stack elements in SmallVector, do you have evidence for this specific number?
- Please drop most of `llvm::` prefixes, common classes are re-exported in the `mlir::` namespace.
- In general, all top-level entities as well as class methods must have documentation.
- Consider making the API surface, both public and private, of classes smaller and using auxiliary functions with static linkage instead; same goes for multiple in-line lambdas that capture the world by-reference. These functions only need to be methods if they access class state, everything else can take that state as (const when possible) arguments and be cleaner about mutating it.
- Some additional documentation on the practical implementation details is necessary, in particular, the ownership model on the tableau and the bitwise negation in indexing.

Please bear with us.

================
Comment at: mlir/include/mlir/Analysis/AffineStructures.h:150
+  /// Returns such a point if one exists, or an empty llvm::Optional otherwise.
+  llvm::Optional<llvm::SmallVector<int64_t, 32>> findIntegerSample() const;
+
----------------
Multiple common LLVM ADT structures are re-exported in `mlir` namespace, see the full list here https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Support/LLVM.h. We generally avoid prefixing these with `llvm::` or `mlir::` for brevity.

================
Comment at: mlir/include/mlir/Analysis/Presburger/Fraction.h:17
+
+#include <stdint.h>
+
----------------
Prefer <cstdint> in C++ code for namespacing reasons

================
Comment at: mlir/include/mlir/Analysis/Presburger/Fraction.h:21
+
+struct Fraction {
+  Fraction() : num(0), den(1) {}
----------------
This needs a doc

================
Comment at: mlir/include/mlir/Analysis/Presburger/Fraction.h:33
+inline int compare(Fraction x, Fraction y) {
+  int64_t diff = x.num * y.den - y.num * x.den;
+  if (diff > 0)
----------------
This may still be subject to overflows. I'm fine with documenting this behavior somewhere and leaving as is, but we need to be aware of the problem. Have you considered performance implications of basic Fraction on APInt instead of int64?

================
Comment at: mlir/include/mlir/Analysis/Presburger/Matrix.h:66
+  unsigned nColumns;
+  std::vector<std::vector<INT>> data;
+};
----------------
arjunp wrote:
> arjunp wrote:
> > Kayjukh wrote:
> > > arjunp wrote:
> > > > Kayjukh wrote:
> > > > > Why not store the matrix data in a one-dimensional array?
> > > > This makes the code simpler. In that case, resizing would be more expensive, and resizing is not uncommon, so it's unclear to me if performance considerations weigh in favour of having it be one-dimensional.
> > > If resizing is a common operation then I agree that it can make sense to keep nested `SmallVector`s. If there is a noticeable performance impact in can always be fixed in a follow-up.
> > In fact, resizing wouldn't be a problem because we only resize vertically, i.e. we only ever change the number of columns. But another operation that's used frequently is `swapRows` which should also be faster under this representation, so I think we can still keep it this way for now.
> Sorry, I meant "we only ever change the number of rows".
This rationale should be documented in the code. The interplay between resizes and cache behavior in case of consecutive reads (which I suppose are much more common) is non-trivial, especially if we allow `reserve`-ing space. I'm also slightly concerned about using 8k bytes on stack by default, we rarely use more than 8 stack elements in small vectors in MLIR.

================
Comment at: mlir/include/mlir/Analysis/Presburger/Matrix.h:148
+void Matrix<INT>::print(llvm::raw_ostream &os) const {
+  os << "Dumping matrix, rows = " << getNumRows()
+     << ", columns: " << getNumColumns() << '\n';
----------------
arjunp wrote:
> Kayjukh wrote:
> > This seems to be a very specific way to print a matrix. Do you need the `print` method or would `dump` be enough?
> > Also, I would probably not include the `Dumping matrix` part.
> It seemed like `print` was generally declared whenever `dump` was. Should it be removed?
Please keep the print. It allows one to print objects into streams different than llvm::errs().

================
Comment at: mlir/include/mlir/Analysis/Presburger/Matrix.h:35
+
+  static Matrix identity(unsigned dimension);
+
----------------
All public methods should have at least some documentation.

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:126
+    bool restricted;
+    unsigned pos;
+  };
----------------
Kayjukh wrote:
> nit: Reorder members by decreasing bit-width to limit padding.
Extra points: use a bit field or a uint8_t for flags

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:36-40
+/// the number of variables. Each row and column except the first two columns
+/// correspond to an unknown -- either a constraint or a variable.
+///
+/// Every row is a representation of the unknown x corresponding to that row in
+/// terms of the unknowns in column positions u_1, u_2, ... u_m. If the row
----------------
This comment lost me about what are unknowns, variables and constraints, and what is the relation between them, even though I know the theory behind this.

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:46
+/// the ith variable is the unknown being addressed. If the index is negative,
+/// then a constraint is being addressed, having index ~i.
+///
----------------
Could you please elaborate a little about this bit-completent indexing, e.g. that -1 gets transformed into 0 and -2 into 1 and so on, sparing you an addition. This is a clever trick, but clever tricks are very expensive to maintain 

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:55
+  enum class Direction { Up, Down };
+  enum class UndoOp { Deallocate, UnmarkEmpty };
+
----------------
Nit: suffix Op is pervasively used for IR Operations, please consider a different name

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:95
+  ///
+  /// Returns a {num, den} pair denoting the optimum, or a null value if no
+  /// optimum exists, i.e., if the expression is unbounded in this direction.
----------------
Nit: null value -> None

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:112
+  /// Returns the current sample point if it is integral. Otherwise, returns an
+  /// empty llvm::Optional.
+  llvm::Optional<llvm::SmallVector<int64_t, 32>>
----------------
Nit: empty llvm::Optional -> None, here and below

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:130
+        : pos(oPos), ownsRow(oOwnsRow), restricted(oRestricted) {}
+    Unknown() : Unknown(false, false, -1) {}
+    unsigned pos;
----------------
Nit: -1u (or UINT_MAX if you prefer)

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:132
+    unsigned pos;
+    bool ownsRow;
+    bool restricted;
----------------
Please, provide documentation about this class. Especially the ownership model and how it is maintained.

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:186
+  /// sample value, false otherwise.
+  bool restoreRow(Unknown &u);
+
----------------
Prefer mlir::LogicalResult for success/failure conditions to avoid deadling with LLVM's changing mapping of boolean values to success status

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:202
+
+  /// Returns true value is positive and direction is Direction::UP, or if
+  /// value is negative and direction is Direction::DOWN. Returns false
----------------
Nit: true _if_ value is positive

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:217
+  /// basis reduction.
+  void reduceBasis(Matrix &basis, unsigned level);
+
----------------
General design question: how much of these functions need to be private methods, i.e. have access to class members, as opposed to be just free functions, static to the compilation unit, that actual methods can call ?

================
Comment at: mlir/include/mlir/Analysis/Presburger/Simplex.h:236
+  /// These hold the indexes of the unknown at a given row or column position.
+  llvm::SmallVector<int, 32> rowVar, colVar;
+
----------------
Why is this `int` when pretty much everything else uses either unsigned or int64_t?

================
Comment at: mlir/lib/Analysis/Presburger/Matrix.cpp:18
+  Matrix matrix(dimension, dimension);
+  for (size_t i = 0; i < dimension; ++i)
+    matrix(i, i) = 1;
----------------
unsigned is used everywhere else

================
Comment at: mlir/lib/Analysis/Presburger/Matrix.cpp:76
+    for (int64_t elem : row)
+      os << elem << " ";
+    os << '\n';
----------------
nit: use single quotes for one character

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:1
+//===- Simplex.cpp - MLIR Simplex Class -------------------------*- C++ -*-===//
+//
----------------
Drop "C++" in .cpp files, it's only necessary for headers (.h) so that tooling can tell between C and C++ that share the extension

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:14
+#include <algorithm>
+#include <cstdio>
+
----------------
Is this include necessary?

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:416
+        row = *maybeRow;
+      else {
+        // The loop doesn't find a pivot row only if the column has zero
----------------
Please use braces symmetrically, i.e. use them if at least one if/else block needs them

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:533
+  auto appendRowFromB = [&](unsigned row) {
+    result.tableau(result.nRow, 0) = a.tableau(row, 0);
+    result.tableau(result.nRow, 1) = a.tableau(row, 1);
----------------
Why does this copy the first two columns from A?

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:562
+      sample.push_back(0);
+    else {
+      if (tableau(u.pos, 1) % tableau(u.pos, 0) != 0)
----------------
Please use braces symmetrically; here and below

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:590
+  Fraction computeWidthAndDuals(llvm::ArrayRef<int64_t> dir,
+                                llvm::SmallVector<int64_t, 32> &dual,
+                                int64_t &dualDenom) {
----------------
Use SmallVectorImpl with references so you don't need to hardcode the size

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:606
+        // The dual variable is the negative of the row coefficient.
+        dual.push_back(-simplex.tableau(row, simplex.con[i].pos));
+      }
----------------
Nit: consider a ternary expression here

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:613
+
+  // Remove the last equality that was added through addEqualityForDirection.
+  void removeLastEquality() {
----------------
This class needs a better description of what are the invariants on the snapshot stack. This method rolls back one snapshot, which happens to be be before an equality was added, but without proper documentation, there is a risk this might be broken in future.

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:627
+    llvm::SmallVector<int64_t, 32> coeffs;
+    for (unsigned i = 0; i < dir.size(); i++)
+      coeffs.emplace_back(dir[i]);
----------------
Don't evaluate the exit condition on every iteration, use

for (unsigned i = 0, e = dir.size(); i < e; ++i)

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:696
+  GBRSimplex gbrSimplex(*this);
+  llvm::SmallVector<Fraction, 32> f;
+  llvm::SmallVector<int64_t, 32> alpha;
----------------
Please consider a lengthier name for this and all derived objects, e.g. fAtI

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:757
+      assert(i + 1 >= level + f.size() && "We don't know alpha_i but we know "
+                                          "F_{i+1}, this should never happen");
+      // We don't know alpha for our level, so let's find it.
----------------
Nit: asserts are for things that should never happen, no need to mention it in the description

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:833
+
+  auto getBounds = [&]() -> std::pair<int64_t, int64_t> {
+    int64_t minRoundedUp;
----------------
This function would be cleaner if it took `opt` as argument and did not capture the world by-reference. It could then also go outside as a helper with static linkage

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:866
+    addEquality(basisCoeffVector);
+    if (auto opt = findIntegerSampleRecursively(basis, level + 1))
+      return *opt;
----------------
MLIR generally discourages recursive functions on anything that runs on pieces of IR, stack overflows in the compiler are bad. Any chance of making this iterative? Otherwise, please provide some textual argument on how fast the recursion stops.

================
Comment at: mlir/lib/Analysis/Presburger/Simplex.cpp:882
+void Simplex::print(llvm::raw_ostream &os) const {
+  os << "Dumping Simplex, rows = " << nRow << ", columns = " << nCol << "\n";
+  if (empty)
----------------
Nit: `print` may be used outside `dump`, so I'd abstain from refering to "Dumping" here

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80860/new/

https://reviews.llvm.org/D80860