[PATCH] D61144: [LoopIdiomRecognize] BCmp loop idiom recognition

Thu Aug 22 14:24:35 PDT 2019

courbet added inline comments.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1924
+  if (!match(LoopHeaderBB->getTerminator(),
+             m_Br(m_CombineAnd(m_ICmp(BCmpPred, m_Value(), m_Value()),
+                               m_Value(BCmpValue)),
----------------
lebedev.ri wrote:
> reames wrote:
> > Using m_CombineAnd like this is correct, but I found quite confusing.  You might want to consider pulling out a helper function and using early return matching instead.
> > 
> > e.g.
> > struct MatchedBCmpCompairsonState {...};
> > bool matchBCmpComparison(Instruction *Term, MatchState&)
> > 
> I'm sorry, i do not understand what you are saying here.
> 
> This for sure will need refactoring when adding memcmp support,
> but until then i'm not yet seeing how to rewrite this
> while improving readability at the same time.
I think what was meant is: If you refactor the matches to separate functions, the control flow for this function becomes clearer:

```

struct CmpLoopStructure {
  Value *BCmpValue, *LatchCmpValue;
  BasicBlock *HeaderBrEqualBB, *HeaderBrUnequalBB;
  BasicBlock *LatchBrFinishBB, *LatchBrContinueBB;
};

CmpLoopStructure CmpLoop;

if (!matchCmpLoopStructure(LoopHeaderBB, CmpLoop)) {
  return false;
}

struct CmpOfLoads {
  ICmpInst::Predicate BCmpPred;
  Value *LoadSrcA, *LoadSrcB;
  Value *LoadA_, *LoadB_;
}

if (!matchCmpOfLoads(BcmpLoop.BcmpValue, CmpOfLoads)) {
  return false;
}

```

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1947
+  // they would have been transformed into a load of bitcast.
+  // FIXME: can be memcmp()/bcmp() too.
+  if (!match(
----------------
I don't find the comment very clear. Maybe rephrase as: "FIXME: memcmp()/bcmp() calls have the same  semantics as icmp. Match them too."

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1953
+                 m_CombineAnd(m_Load(m_Value(LoadSrcB)), m_Value(LoadB_)))) ||
+      !ICmpInst::isEquality(BCmpPred)) {
+    LLVM_DEBUG(dbgs() << "Loop header icmp did not match bcmp pattern.\n");
----------------
This is redundant with the previous check. Do you want to remove it from the previous one (this will make handling `memcmp()` easier in the future).

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1961
+
+  // Be vary, comparisons can be inverted, canonicalize order.
+  // If this 'element' comparison passed, we expect to proceed to the next elt.
----------------
Typo: "Be vary"

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1986
+
+  BCmpInst = cast<ICmpInst>(BCmpValue);        // FIXME: is there no
+  LatchCmpInst = cast<CmpInst>(LatchCmpValue); // way to combine
----------------
If you refactor as mentioned above, you can do the casts in the subfunctions. That being said I don't think the framework allows the cast during the match, which would be cool.

================
Comment at: lib/Transforms/Scalar/LoopIdiomRecognize.cpp:2129
+  // we'd want to compare more bytes than could be represented by size_t, But
+  // allocation functions also take size_t. So how'd you produce such buffer?
+  NBytes = SE->getMulExpr(
----------------
lebedev.ri wrote:
> nikic wrote:
> > Is this kind reasoning legal on the level of LLVM IR? Especially if no inbounds GEPs are involved, aren't we just dealing in raw memory and there isn't necessary any object with representable size involved?
> > 
> > (Context: Wondering about this in https://reviews.llvm.org/D61934#inline-559489 and your comment seems like a plausible explanation.)
> I don't have an answer.
> The obvious alternative would be to do saturating multiplication, i guess?
Saturating multiplication would actually change the semantics of the code. The memset loop idiom already has this issue actually.
I don't think the explanation is convincing because we could indeed be loading from raw memory. I guess the only thing that is saving us right now is that it's unlikely that people are memset/memcmp-ing more than size_t-max bytes of memory in real life. But the optimization is buggy :(

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61144/new/

https://reviews.llvm.org/D61144