[PATCH] D129650: [InstCombine] change conditions for transform of sub to xor

Fri Jul 15 05:56:50 PDT 2022

spatel added a comment.

In D129650#3654287 <https://reviews.llvm.org/D129650#3654287>, @nikic wrote:

> At a high level, I find the notion of disabling folds to preserve nowrap flags somewhat concerning. Doing this means that an improvement in one place (which infers more nowrap flags) can disable a later optimization, because it now considers preserving those flags more important. We sometimes do this for correctness reasons (mainly in that an operation with nowrap flags cannot be guaranteed-not-poison), but do we have any precedent in InstCombine for doing this specifically to preserve the flags?

I don't know of any other cases where we avoid a transform based on no-wrap. AFAIK, we have always leaned towards doing a transform to a simpler op even if it meant losing no-wrap.

> In this specific case, I think we should question the value of the entire transform, at least on the IR level. Is there some specific motivation for it (i.e. some chain of follow-on transforms it enables?) or are we just doing it because we can? Our KnownBits handling of sub is optimal, so for computeKnownBits() I believe it shouldn't matter whether the instruction is a sub or a xor (in cases where we convert one into the other based on known bits only). However, for ConstantRange for example we also have an optimal sub implementation, but xor is heavily approximated. Similarly SCEV does not understand xor, but understands sub. So I question the premise here that we want to prefer xor over sub as a canonical form. I believe on average, our support for sub is better (the only xor variant that has decently universal support is xor by minus one, aka not).

We have been canonicalizing away from sub recently, and I assumed xor was better than sub for bit-tracking, so I didn't consider the opposite transform.

I tracked down the original fold to:
https://github.com/llvm/llvm-project/commit/010337c8385b919f
...so the motivation was really about codegen, but we've also generally assumed that bitwise ops are a better canonicalization than math.

The fold was moved to InstCombiner::visitSub() with:
D9417 <https://reviews.llvm.org/D9417> / https://github.com/llvm/llvm-project/commit/ec6833420fa3c6f44c0e464ed2cbc6c483f37781
...and that brings us to the recent history:

1. I noticed that we could use the transform in SDAG, so I proposed to copy the InstCombine: D128080 <https://reviews.llvm.org/D128080>
2. Then, I saw that x86 had a stronger version of the fold, so I made that a generic SDAG fold: D128123 <https://reviews.llvm.org/D128123>
3. There was feedback that we could relax the constraints a bit more, so: d0eec5f7e787 <https://reviews.llvm.org/rGd0eec5f7e7874ecfdface473cd4718ac5e6e629e>
4. I figured if we're doing this fold in both places, we should make it consistent: 79bb915fb60b <https://reviews.llvm.org/rG79bb915fb60b2cd220d89e3bb54f67abb8cdb7bd>

So it just accumulated to current state.
This reminds me of turning 'add' into 'or' with no common bits set (no carries) - that led to a long series of work-arounds to match new patterns with 'or'. 
I'm not sure what (if any) SCEV changes were made to accommodate that transform, but it makes sense to me if we want to avoid that with sub->xor.

I think this patch is going to effectively prevent the transform in almost all cases because we seem to infer no-wrap on these patterns effectively. We could commit this, wait a bit to see if there's any fallout, then remove the transform completely.

That raises the question: do we want to canonicalize xor to sub? Maybe we just ignore that and live with both possibilities.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129650/new/

https://reviews.llvm.org/D129650