[PATCH] D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A

Mon Oct 15 07:28:58 PDT 2018

spatel added a comment.

In https://reviews.llvm.org/D52177#1262757, @dmgreen wrote:

> Um, in terms of fixes, I guess our alternatives are make this instcombine dependant on the target, or try to undo it somehow in the backend. My understanding is that the first option is not generally done in instcombine. Can someone give me some history there? Was it something that was decided as a rule, or just something that never came up. (I guess also in this case, not supporting this fold because it happens to cause the extra subs in a random test isn't a great reason to not do it).

Instcombine is an early IR canonicalization pass. Its purpose is to reduce logically equivalent IR sequences to some common form (usually minimal instruction count). Often, the canonical/minimal IR form also happens to be the maximal perf form for a target, but there's no guarantee on that. It's the backend's job to transform the code for better perf based on target capabilities. So as long as this patch reduced the IR correctly, I don't think it's at fault (although we sometimes temporarily revert to avoid regressions while we fix the later passes). As always, consider an alternate scenario where the benchmark source was already in the logically equivalent form that this patch created. The perf problem already exists for that hypothetical benchmark independent of this patch, so we have to deal with the perf problem some other way.

Looking back at https://bugs.llvm.org/show_bug.cgi?id=35717#c14 ... is this the source for the loop that is causing problems:

  void rgb_to_cmyk(char * restrict A, char * restrict B, unsigned I) {
      for (int i = 0; i < I; i++) {
        char xc = *A++;
        char xm = *A++;
        char xy = *A++;

        xc = 255-xc;
        xm = 255-xm;
        xy = 255-xy;

        char xk;
        if (xc < xm)
          xk = xc < xy ? xc : xy;
        else
          xk = xm < xy ? xm : xy;

        xc = xc - xk;
        xm = xm - xk;
        xy = xy - xk;

        *B++ = xk;
        *B++ = xc;
        *B++ = xm;
        *B++ = xy;
      }
  }

That vectorizes for an AVX2 target, but not AVX1 or earlier, so we should see if the vectorizer cost model is behaving as expected before dealing with the backend problems?

Repository:
  rL LLVM

https://reviews.llvm.org/D52177