[llvm] 70472f3 - [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation

Tue Nov 17 22:05:58 PST 2020

Hi Roman,

We are seeing performance regressions in multiple benchmarks because
of the patch. Here is a reproducible below. Note that
https://github.com/llvm/llvm-project/commit/93f3d7f7b3a902ad9e90ef0d9bf147582a2cf620
doesn't fix the regression.

================ 1.ll ================
define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
local_unnamed_addr #11 align 32 {
  %3 = getelementptr inbounds i64, i64* %1, i64 1
  %4 = bitcast i64* %3 to i8*
  %5 = getelementptr inbounds i8, i8* %4, i64 -4
  %6 = bitcast i8* %5 to i32*
  %7 = load i32, i32* %6, align 4
  %8 = zext i32 %7 to i64
  %9 = shl nuw i64 %8, 32
  %10 = bitcast i64* %1 to i32*
  %11 = load i32, i32* %10, align 8
  %12 = zext i32 %11 to i64
  %13 = or i64 %9, %12               <===
  %14 = add i64 %13, %0            <===
  %15 = zext i64 %14 to i128
  %16 = mul nuw nsw i128 %15, 8192506886679785011
  %17 = lshr i128 %16, 64
  %18 = xor i128 %17, %16
  %19 = trunc i128 %18 to i64
  ret i64 %19
}
===================================
Without the reassociate change, the output from "opt --reassociate -S
1.ll" is the same as 1.ll.

With the reassociate change, opt --reassociate -S 1.ll > 2.ll

================ 2.ll ================
define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
local_unnamed_addr align 32 {
  %3 = getelementptr inbounds i64, i64* %1, i64 1
  %4 = bitcast i64* %3 to i8*
  %5 = getelementptr inbounds i8, i8* %4, i64 -4
  %6 = bitcast i8* %5 to i32*
  %7 = load i32, i32* %6, align 4
  %8 = zext i32 %7 to i64
  %9 = shl nuw i64 %8, 32
  %10 = bitcast i64* %1 to i32*
  %11 = load i32, i32* %10, align 8
  %12 = zext i32 %11 to i64
  %13 = add i64 %12, %0            <====
  %14 = add i64 %13, %9            <====
  %15 = zext i64 %14 to i128
  %16 = mul nuw nsw i128 %15, 8192506886679785011
  %17 = lshr i128 %16, 64
  %18 = xor i128 %17, %16
  %19 = trunc i128 %18 to i64
  ret i64 %19
}
===============================

In 1.ll, there is the snippet:
    %13 = or i64 %9, %12
    %14 = add i64 %13, %0
and it is doing "%9 + %12 + %0"

In 2.ll, there is the snippet:
    %13 = add i64 %12, %0
    %14 = add i64 %13, %9
and it is doing "%12 + %0 + %9"

Note %9 + %12 can be simplified in later phase, but %12 + %0 + %9
cannot because %9 and %12 are not added directly after reassociation.
We can see llc result for 2.ll has more instructions than that for
1.ll.

llc -O2 < 1.ll
 foo:                                    # @foo
        .cfi_startproc
# %bb.0:
        movq    %rdi, %rax
        addq    (%rsi), %rax
        movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
        mulq    %rcx
        xorq    %rdx, %rax
        retq
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc

llc -O2 < 2.ll
foo:                                    # @foo
        .cfi_startproc
# %bb.0:
        movl    (%rsi), %eax
        movl    4(%rsi), %ecx
        shlq    $32, %rcx
        addq    %rdi, %rax
        addq    %rcx, %rax
        movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
        mulq    %rcx
        xorq    %rdx, %rax
        retq
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc

On Tue, Nov 3, 2020 at 12:53 PM Roman Lebedev via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
>
> On Tue, Nov 3, 2020 at 11:34 PM Nikita Popov <nikita.ppv at gmail.com> wrote:
> >
> > On Tue, Nov 3, 2020 at 8:31 PM Roman Lebedev via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> >>
> >>
> >> Author: Roman Lebedev
> >> Date: 2020-11-03T22:30:51+03:00
> >> New Revision: 70472f34b289d663a9f70740dfc9ae32fc89e077
> >>
> >> URL: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077
> >> DIFF: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077.diff
> >>
> >> LOG: [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation
> >>
> >> InstCombine is quite aggressive in doing the opposite transform,
> >> folding `add` of operands with no common bits set into an `or`,
> >> and that not many things support that new pattern..
> >>
> >> In this case, teaching Reassociate about it is easy,
> >> there's preexisting art for `sub`/`shl`:
> >> just convert such an `or` into an `add`:
> >> https://rise4fun.com/Alive/Xlyv
> >>
> >> Added:
> >>
> >>
> >> Modified:
> >>     llvm/lib/Transforms/Scalar/Reassociate.cpp
> >>     llvm/test/Transforms/Reassociate/add-like-or.ll
> >>
> >> Removed:
> >>
> >>
> >>
> >> ################################################################################
> >> diff  --git a/llvm/lib/Transforms/Scalar/Reassociate.cpp b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> >> index ba7f367267fe..164971c916d4 100644
> >> --- a/llvm/lib/Transforms/Scalar/Reassociate.cpp
> >> +++ b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> >> @@ -920,6 +920,24 @@ static Value *NegateValue(Value *V, Instruction *BI,
> >>    return NewNeg;
> >>  }
> >>
> >> +/// If we have (X|Y), and iff X and Y have no common bits set,
> >> +/// transform this into (X+Y) to allow arithmetics reassociation.
> >> +static BinaryOperator *ConvertOrWithNoCommonBitsToAdd(Instruction *Or) {
> >> +  // Convert an or into an add.
> >> +  BinaryOperator *New =
> >> +      CreateAdd(Or->getOperand(0), Or->getOperand(1), "", Or, Or);
> >> +  New->setHasNoSignedWrap();
> >> +  New->setHasNoUnsignedWrap();
> >> +  New->takeName(Or);
> >> +
> >> +  // Everyone now refers to the add instruction.
> >> +  Or->replaceAllUsesWith(New);
> >> +  New->setDebugLoc(Or->getDebugLoc());
> >> +
> >> +  LLVM_DEBUG(dbgs() << "Converted or into an add: " << *New << '\n');
> >> +  return New;
> >> +}
> >> +
> >>  /// Return true if we should break up this subtract of X-Y into (X + -Y).
> >>  static bool ShouldBreakUpSubtract(Instruction *Sub) {
> >>    // If this is a negation, we can't split it up!
> >> @@ -2116,6 +2134,18 @@ void ReassociatePass::OptimizeInst(Instruction *I) {
> >>    if (I->getType()->isIntegerTy(1))
> >>      return;
> >>
> >> +  // If this is a bitwise or instruction of operands
> >> +  // with no common bits set, convert it to X+Y.
> >> +  if (I->getOpcode() == Instruction::Or &&
> >> +      haveNoCommonBitsSet(I->getOperand(0), I->getOperand(1),
> >> +                          I->getModule()->getDataLayout(), /*AC=*/nullptr, I,
> >> +                          /*DT=*/nullptr)) {
> >> +    Instruction *NI = ConvertOrWithNoCommonBitsToAdd(I);
> >> +    RedoInsts.insert(I);
> >> +    MadeChange = true;
> >> +    I = NI;
> >> +  }
> >> +
> >>    // If this is a subtract instruction which is not already in negate form,
> >>    // see if we can convert it to X+-Y.
> >>    if (I->getOpcode() == Instruction::Sub) {
> >>
> >> diff  --git a/llvm/test/Transforms/Reassociate/add-like-or.ll b/llvm/test/Transforms/Reassociate/add-like-or.ll
> >> index 25f2c0563da9..13477ad5a491 100644
> >> --- a/llvm/test/Transforms/Reassociate/add-like-or.ll
> >> +++ b/llvm/test/Transforms/Reassociate/add-like-or.ll
> >> @@ -17,7 +17,7 @@ define i32 @test1(i32 %a, i32 %b) {
> >>  define i32 @test2(i32 %x) {
> >>  ; CHECK-LABEL: @test2(
> >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0:!range !.*]]
> >> -; CHECK-NEXT:    [[RES:%.*]] = or i32 [[X_NUMLZ]], -32
> >> +; CHECK-NEXT:    [[RES:%.*]] = add nuw nsw i32 [[X_NUMLZ]], -32
> >>  ; CHECK-NEXT:    ret i32 [[RES]]
> >>  ;
> >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> >> @@ -29,9 +29,8 @@ define i32 @test2(i32 %x) {
> >>  define i32 @test3(i32 %x, i32 %bit) {
> >>  ; CHECK-LABEL: @test3(
> >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0]]
> >> -; CHECK-NEXT:    [[ZERO_MINUS_X_NUMACTIVEBITS:%.*]] = or i32 [[X_NUMLZ]], -32
> >> -; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], 1
> >> -; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[ZERO_MINUS_X_NUMACTIVEBITS]]
> >> +; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], -31
> >> +; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[X_NUMLZ]]
> >>  ; CHECK-NEXT:    ret i32 [[RES]]
> >>  ;
> >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> >
> >
> > Not familiar with this transform, so possibly stupid question:
>
> > The way I'm reading this change, you are unconditionally reverting the InstCombine canonicalization.
> Correct.
>
> > The other transform you mention (shl -> mul) first checks whether doing this will result in reassociation opportunities.
>
> > I'm wondering if the same shouldn't be done here as well?
>
> As far as i can tell, either the reassociation will happen, or
> instcombine will uglify that back:
> https://github.com/llvm/llvm-project/blob/88241ffb5636ebc0579d3ab8eeec78446a769c54/llvm/test/Other/opt-O3-pipeline.ll#L107-L132
> In fact, i recall adding some transforms to instcombine to do just that.
>
> Unless doing it eagerly in Reassociate causes trouble for Reassociate itself
> (https://github.com/llvm/llvm-project/commit/8db41e9dbd141d9904da6e29d098a49eed933400),
> i'm not really sure there's much difference in doing the
> 'profitability' check before or after..
>
> I believe, another example of that question is SROA's AggLoadStoreRewriter.
>
> > Regards,
> > Nikita
> Roman
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits