[PATCH] D61144: [LoopIdiomRecognize] BCmp loop idiom recognition

Fri May 31 12:32:41 PDT 2019

lebedev.ri marked 2 inline comments as done.
lebedev.ri added a comment.

Thank you for taking a look!

I think it is unproductive for me to reply/fix until there is a review with the entire change in mind, not parts of it.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1924
+  if (!match(LoopHeaderBB->getTerminator(),
+             m_Br(m_CombineAnd(m_ICmp(BCmpPred, m_Value(), m_Value()),
+                               m_Value(BCmpValue)),
----------------
reames wrote:
> Using m_CombineAnd like this is correct, but I found quite confusing.  You might want to consider pulling out a helper function and using early return matching instead.
> 
> e.g.
> struct MatchedBCmpCompairsonState {...};
> bool matchBCmpComparison(Instruction *Term, MatchState&)
> 
I'm sorry, i do not understand what you are saying here.

This for sure will need refactoring when adding memcmp support,
but until then i'm not yet seeing how to rewrite this
while improving readability at the same time.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:2020
+    for (const PHINode &PHI : ExitBB->phis()) {
+      for (const BasicBlock *LoopBB :
+           make_filter_range(PHI.blocks(), [this](BasicBlock *PredecessorBB) {
----------------
reames wrote:
> If the loop has unique exits, then each phi must have a single predecessor which must be in the loop.  LoopSimplify creates unique exits, so I'd just bail on the non-unique case.
Similarly, i do not understand what you are saying here.
You have read the comment just before these loops?
```
  // No loop instructions must be used outside of the loop. 
```
It doesn't matter whether we are in LCSSA form or not (we are),
whether we have dedicated exits or not (we do),
whether we have unique exits or not (that's the blocks we are checking here, so we do?).

The check is **specifically** that no instructions "other than the latches" are used outside of the loop.
As in, that we can fully erase the entire loop and replace it with a single `bcmp` call.
It's great the LCSSA form allows us to only check the PHI nodes, but i'm not sure how else
the check i'm looking for can be performed.

This is tested with `@loop_instruction_used_in_phi_node_outside_loop` in `bcmp-negative-tests.ll`.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:2036
+  }
+  // Similarly, the loop should not have any other observable side-effects
+  // other than the final comparison result.
----------------
reames wrote:
> Given the base of each object and the length, you can form a MemoryLocation for the entire portion being read from both arrays.  Given that, a simple alias query can check for modification or ordering.  
> 
> MemoryLocation LocA(Src1, Size);
> for (Instruction &I : instructions(L))
>   if (I aliases LocA) return false;
> 
> 
> Note that if you length is non-constant, this ends up being quite conservative, but so does your current code.
> 
> 
> See also: mayLoopAccessLocation
I think i should wait until there is a full review..
Similarly, i'm not sure what this is about.
The end goal here is completely zap the entire old loop,
to fully delete it out of existence, replace with a single `bcmp` call.
If the loop has any other side-effects aside from those you'd get if you inline `bcmp`,
then the loop can not be transformed.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:2127
+      LoadA->getPointerAddressSpace() != 0 || !LoadA->isSimple() ||
+      !LoadB->isSimple()) {
+    StringLiteral L("Unsupported loads in idiom - only support identical, "
----------------
reames wrote:
> lebedev.ri wrote:
> > reames wrote:
> > > Actually, you do handle volatile.  Move this closer to the def for readability would you?
> > The important difference is - if i bailout at the very beginning, no further analysis will be done.
> > But if i bailout here, then if this fires then one who sees this diagnostic may know that
> > the issue here isn't the volatile loads themselves, but rather the missing codegen.
> > As in, all other legality checks have passed here.
> > 
> > With that info, the comment still stands?
> Really not sure what that gives the user?  If the loop contains a volatile or ordered load, we're done.  The transform isn't going to run, no matter what other legality predicates are met.
(note the `OptimizationRemarkMissed` and the `LLVM_DEBUG` printout?)

================
Comment at: test/Transforms/LoopIdiom/bcmp-basic.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -loop-idiom < %s -S | FileCheck %s
+; RUN: opt -loop-idiom -verify -verify-each -verify-dom-info -verify-loop-info < %s -S | FileCheck %s

----------------
reames wrote:
> lebedev.ri wrote:
> > reames wrote:
> > > Unrelated test change.  Remove please.
> > How is this unrelated? This is one of the dedicated test files for this diff,
> > and that change is showing that the patch does not anger any of these verifiers.
> Presumably it didn't anger any of the verifiiers before, so the change can be checked in and doesn't need to be part of this diff.
> Presumably it didn't anger any of the verifiiers before

But that is precisely the point here, those options aren't free, why have them in-tree for
who knows how long (until this change lands, which is far from given) until then?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61144/new/

https://reviews.llvm.org/D61144