[PATCH] D124750: [MLIR] Add a utility to sort the operands of commutative ops

Thu Jun 30 08:28:27 PDT 2022

Mogball added inline comments.

================
Comment at: mlir/lib/Transforms/Utils/CommutativityUtils.cpp:329-353
+  assert(frontPosition >= 0 && frontPosition < bfsOfOperands.size() &&
+         "`frontPosition` should be valid");
+  unsigned positionOfOperandToShift;
+  bool foundOperandToShift = false;
+  for (auto &indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {
+    std::unique_ptr<OperandBFS> &bfsOfOperand = indexedBfsOfOperand.value();
+    if (bfsOfOperand->isSorted)
----------------
srishti-pm wrote:
> Mogball wrote:
> > srishti-pm wrote:
> > > Mogball wrote:
> > > > There is no way you need this much code. A `std::swap` between the current operand and the first unsorted position should be enough.
> > > If I do a swap, the sorting will no longer be stable and I believe that there was a discussion that concluded with the fact that "we want stable sorting".
> > That's true, but shifting like this is very slow as well. At this point, you might want to give `std::stable_sort` with a custom comparator that does extra BFS iterations on demand a try.
> So, this is what I think:
> 
> The number of commutative operands is not expected to be huge. So, we can afford to do shifting. In most cases, we wouldn't have to shift more than 1 or 2 positions. But, the custom comparator might cost us a lot, considering that each BFS could potentially be very large, especially for deep learning models. So, doing the BFS traversals again and again for the same operand, even though caching will be involved, doesn't sound like a good idea to me.
> 
> What are your views?
Very rough estimate: on its own, this function is N. Finding the smallest key is N, and then finding all matching elements is N. This function is called for each operand that needs to be moved, but the number of such operands decreases. So the sort itself averages out to be 3N^2 iterations over the operand list.

Now for traversals, doing BFS on demand inside the comparator doesn't mean it has to restart every time. It would do extra iterations on top of existing iteration results only when needed to break ties. In your case, you do an extra iteration of BFS for all operands if the current smallest key is identical, not just for the ones needed. It's hard to estimate the number of iterations of BFS, but certainly it's more in your case. Using `std::stable_sort` would also bring the complexity down to N logN

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124750/new/

https://reviews.llvm.org/D124750