[PATCH] D136582: [InstCombine] fold `sub + and` pattern with specific const value

Mon Oct 31 13:59:00 PDT 2022

bcl5980 added a comment.

In D136582#3896480 <https://reviews.llvm.org/D136582#3896480>, @spatel wrote:

> Please pre-commit the baseline tests, so we will see the diffs here.
>
> I'm still not sure if we want to add some more general transforms in addition to or instead of this fold.
>
> Should we have the subtract version of this patch for consistency?
> D130080 <https://reviews.llvm.org/D130080>
>
> In case it was not clear, that was a transform I suggested earlier. Depending where it is added, I think the patch would look something like this:
>
>   diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
>   index c3dabc2d4a07..e7ae1a9be39c 100644
>   --- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
>   +++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
>   @@ -2035,6 +2035,18 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
>                                    ConstantInt::getNullValue(Ty));
>        }
>    
>   +    // If all bits affected by a sub are included in a high-bit-mask, do the
>   +    // mask op before the adjusted sub. Example:
>   +    // (0x0f - X) & 0xf8 --> 0x08 - (X & 0xf8)
>   +    const APInt *SubC;
>   +    if (C->isNegatedPowerOf2() &&
>   +        match(Op0, m_OneUse(m_Sub(m_APInt(SubC), m_Value(X)))) &&
>   +        (~*C).isSubsetOf(*SubC)) {
>   +      Value *NewAnd = Builder.CreateAnd(X, *C);
>   +      Constant *NewSubC = ConstantInt::get(Ty, *C & *SubC);
>   +      return BinaryOperator::CreateSub(NewSubC, NewAnd);
>   +    }
>   +
>        Constant *C1, *C2;
>        const APInt *C3 = C;
>        Value *X;
>
> I didn't see any immediate failures in regression tests with that patch applied.

I guess D130080 <https://reviews.llvm.org/D130080>'s motivation is AMDGPU's load/store instruction have very strong address pattern that they prefer `add` close to `load`/`store`. 
But for the `sub`, as far as I know they can't get any benifit from it. Actually I think most of the backend can't support `sub` in `load`/`store`.
I prefer to keep `sub`+`and` except we make sure the `sub` can be optimized.

I can update the code you paste here but this code doesn't consider the nuw flag so it miss some cases and if we add constant shrink later, it works even worst I think.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136582/new/

https://reviews.llvm.org/D136582