[PATCH] D43256: [MBP] Move a latch block with conditional exit and multi predecessors to top of loop

Tue Jul 23 05:04:25 PDT 2019

ebrevnov added a comment.

Here is a C++ equivalent of my original code (which is actually java application) for you to reproduce.

> clang++ -c -O2 floatmin.cpp  -march=skylake

  extern float a[];
  extern float b[];
  extern float c[];

  bool foo(int M, bool flag) {
    for (int i = 0; i < M; i++) {
      float x = a[i];
      float y = b[i];
      float min;
      if (x != x) {
        min = x;   // a is NaN
      }
      else if (y == 0.0f) {
        goto fail;
      }
      else {
       min = (x <= y) ? x : y;
      }
      c[i] = min;
    }

    return true;
  fail:
    return false;
  }

With C++ reproducer I can measure about 9% slowdown only. In this case CPI is identical (for the original test case I still don't know root cause of CPI difference) and all slowdown comes from increased path length due to one extra jump.

With this reproducer on hands you can gather profile data if needed. But that's a separate story. I don't think we can afford such a regression when profile is not available.
You probably could assume worst case if profile is not available but I believe it won't help in this and root cause is that heuristic just doesn't take extra jump into account.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D43256/new/

https://reviews.llvm.org/D43256