[PATCH] D28200: [BypassSlowDivision] Do not bypass division of hash-like values

Sun Mar 5 11:01:23 PST 2017

jlebar requested changes to this revision.
jlebar added a subscriber: hfinkel.
jlebar added a comment.
This revision now requires changes to proceed.

> The reason is that opt -codegenprapare doesn't run BypassSlowDivision for X86. So, I have to run the whole X86 backend to check that the division is not bypassed.

We can't make it an NVPTX test like we did the last time we had this problem?

> I believe that the new version of the heuristic is much more robust than the previous one. However, a naive implementation didn't work initially. It turned out that hash values weren't recognized because of PHI nodes with undef incoming values generated by loop-unroll. I had to amend the patch to work around the problem.

It doesn't look like you added any "unit" tests to cover this changed behavior.  Was that intentional?

> Previously you suggested manually running the testcase through opt -O3 ... and checking that in. This wouldn't work either. Some future change in the middle-end will break the pattern recognition in codegenprepare but we never find it out because our codegen input stays the same.

Sure.  It's also true that clang might change how it emits this code, or that you might have a front-end that isn't clang and also emits this code differently.

Ultimately I'm not comfortable LGTM'ing this patch so long as it contains this -O3 test.  This is very different than how I understood we write tests.  But that might just be my ignorance speaking; what you're trying to do may in fact be perfectly fine; I really don't know enough to say for sure.  Let's get a second opinion?  If @hfinkel or someone thinks this is the right approach, I'm happy.

================
Comment at: lib/Transforms/Utils/BypassSlowDivision.cpp:85
   typedef DenseMap<unsigned, unsigned> BypassWidthsTy;
+  typedef SmallSet<Value *, 4> VisitedTy;
 }
----------------
I would prefer `VisitedSetTy` or something like that.

================
Comment at: lib/Transforms/Utils/BypassSlowDivision.cpp:85
   typedef DenseMap<unsigned, unsigned> BypassWidthsTy;
+  typedef SmallSet<Value *, 4> VisitedTy;
 }
----------------
jlebar wrote:
> I would prefer `VisitedSetTy` or something like that.
SmallSet<T*> is a typedef for SmallPtrSet.  There are 3x as many mentions of SmallPtrSet as SmallSet in LLVM (that's counting all SmallSets, not just SmallSets of pointers), so I think we should call this SmallPtrSet (and change the #include).

================
Comment at: lib/Transforms/Utils/BypassSlowDivision.cpp:229
+    if (Visited.count(V))
+      return true;
+    Visited.insert(V);
----------------
Can we add a comment explaining why we return true here?

================
Comment at: lib/Transforms/Utils/BypassSlowDivision.cpp:230
+      return true;
+    Visited.insert(V);
+    return llvm::all_of(cast<PHINode>(I)->incoming_values(), [&](Value *V) {
----------------
Sounds like we should call this VisitedPhis?

================
Comment at: lib/Transforms/Utils/BypassSlowDivision.cpp:231
+    Visited.insert(V);
+    return llvm::all_of(cast<PHINode>(I)->incoming_values(), [&](Value *V) {
+      // Loop-unroll sometimes generates PHI-nodes with Undef incoming values.
----------------
Why the change from any_of to all_of?  You said earlier:

> This way it would also be safe (no false positives) to increase the search depth and probably make it unlimited.

I am not sure what you mean by this.  Are you making two claims about this change:

 a. It's "safe" in that it reduces our false positives, and
 b. It lets us safely increase the search depth to infinity?

I don't see how the switch to any_of changes anything with respect to the cost of the recursion.  I agree that pretty much any sane code is not going to have deep recursion here.  But that's not what I'm worried about.  I'm worried about some crazy code that causes us to have quadratic behavior.  *That's* what the depth limit is there to protect against.

With respect to (a) I see "safety" opposite of how you've described it.  If we declare that something is hash-like, we won't optimize it, and not optimizing is the "safe" thing to do.  Therefore the safe/conservative choice is to declare something as "maybe hash-like".  Therefore any_of is safer / more conservative than all_of.

I do have a major safety concern here, but it's about how isHashLikeValue is being used in getValueRange.  Regardless of whether we use any_of or all_of here, it is *not* safe to return VALRNG_LONG if isHashLikeValue(V).  If we return VALRNG_LONG, that is a *promise* that the value is long, and we take advantage of this explicit promise.  isHashLikeValue is *not* a promise that the value is long; it's just a heuristic.  So that whole thing needs to change.  Please also add tests to catch this tricky bug.

(Perhaps you were accidentally working around this bug in getValueRange by using all_of here and that's the source of this confusion.)

https://reviews.llvm.org/D28200