[PATCH] D28200: [BypassSlowDivision] Do not bypass division of hash-like values

Sat Mar 4 15:04:17 PST 2017

n.bozhenov added a comment.

>> That wouldn't make much sense. The recognized patterns are already checked with simpler tests I have put into bypass-slow-div-special-cases.ll. The purpose of this particular file is to make sure that the patterns we recognize are indeed the patterns produced by the middle-end at the maximal optimization level (I have removed the explicit unroll-count option).
> 
> It sounds like there's an implicit "on this particular testcase" at the end of this sentence. "We are checking that we recognize the patterns produced by the middle-end at -O3 *on this particular testcase*."

Of course, on this particular testcase. But it is not some arbitrary testcase. While working on this patch I wanted to come up with an heuristic to recognize the most important hashing algorithms. And FNV algorithms are definitely very important algorithms. For example, they are used to calculate hashes for strings in both libstdc++ and libc++.

Unfortunately, the implemented heuristic is still incapable of disabling division bypassing in std::unordered_map<std::string, whatever>. The problem is that the calculated hash value is put to memory before division. So, while I'm happy with XOR and MUL heuristics, I'm not quite satisfied with the PHI heuristic.

>> The problem this test tries to solve is that the PHI-heuristic in isHashLikeValue is quite fragile. For example, it could be broken if some loop optimization (unrolling/vectorization/whatever) would produce slightly different code with an additional instruction between XOR and REM. This test verifies that there are no such additional instructions and the code is correctly recognized as hash-like.
> 
> It seems to me that if you think the code is fragile as-is, we should try to make it less fragile, so that a test targeting one specific testcase isn't necessary. For instance, we could increase the search depth some? This would obviate the need for an uber-specific, full-pipeline testcase. It would also keep us honest, in that we're not optimizing for one particular chunk of code.

Just increasing the search depth doesn't sound right to me because we would be at risk of having many false positive results.

After thinking it over once again, my idea now is to change the PHI heuristic. I believe a better approach would be to test all values instead of any_of and to use a check like

  return all_of(P->incoming_values(), [&](Value *V) {
    return getValueRange(V) == VALRNG_LONG;
  });

This way it would also be safe (no false positives) to increase the search depth and probably make it unlimited.

I will try to prepare a new patch and upload it by tomorrow.

https://reviews.llvm.org/D28200