[llvm] 70472f3 - [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation

Tue Nov 17 22:33:15 PST 2020

I understand the problem could mainly be caused by the imperfect
heuristic in reassociation instead of the patch. The patch just
creates some churn and exposes the issue. However, since it affects
the performance of our multiple benchmarks, if some fix or workaround
can be found, that would be great. If not, could you consider
reverting the patch while looking for a solution?

Thanks,
Wei.

On Tue, Nov 17, 2020 at 10:05 PM Wei Mi <wmi at google.com> wrote:
>
> Hi Roman,
>
> We are seeing performance regressions in multiple benchmarks because
> of the patch. Here is a reproducible below. Note that
> https://github.com/llvm/llvm-project/commit/93f3d7f7b3a902ad9e90ef0d9bf147582a2cf620
> doesn't fix the regression.
>
> ================ 1.ll ================
> define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
> local_unnamed_addr #11 align 32 {
>   %3 = getelementptr inbounds i64, i64* %1, i64 1
>   %4 = bitcast i64* %3 to i8*
>   %5 = getelementptr inbounds i8, i8* %4, i64 -4
>   %6 = bitcast i8* %5 to i32*
>   %7 = load i32, i32* %6, align 4
>   %8 = zext i32 %7 to i64
>   %9 = shl nuw i64 %8, 32
>   %10 = bitcast i64* %1 to i32*
>   %11 = load i32, i32* %10, align 8
>   %12 = zext i32 %11 to i64
>   %13 = or i64 %9, %12               <===
>   %14 = add i64 %13, %0            <===
>   %15 = zext i64 %14 to i128
>   %16 = mul nuw nsw i128 %15, 8192506886679785011
>   %17 = lshr i128 %16, 64
>   %18 = xor i128 %17, %16
>   %19 = trunc i128 %18 to i64
>   ret i64 %19
> }
> ===================================
> Without the reassociate change, the output from "opt --reassociate -S
> 1.ll" is the same as 1.ll.
>
> With the reassociate change, opt --reassociate -S 1.ll > 2.ll
>
> ================ 2.ll ================
> define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
> local_unnamed_addr align 32 {
>   %3 = getelementptr inbounds i64, i64* %1, i64 1
>   %4 = bitcast i64* %3 to i8*
>   %5 = getelementptr inbounds i8, i8* %4, i64 -4
>   %6 = bitcast i8* %5 to i32*
>   %7 = load i32, i32* %6, align 4
>   %8 = zext i32 %7 to i64
>   %9 = shl nuw i64 %8, 32
>   %10 = bitcast i64* %1 to i32*
>   %11 = load i32, i32* %10, align 8
>   %12 = zext i32 %11 to i64
>   %13 = add i64 %12, %0            <====
>   %14 = add i64 %13, %9            <====
>   %15 = zext i64 %14 to i128
>   %16 = mul nuw nsw i128 %15, 8192506886679785011
>   %17 = lshr i128 %16, 64
>   %18 = xor i128 %17, %16
>   %19 = trunc i128 %18 to i64
>   ret i64 %19
> }
> ===============================
>
> In 1.ll, there is the snippet:
>     %13 = or i64 %9, %12
>     %14 = add i64 %13, %0
> and it is doing "%9 + %12 + %0"
>
> In 2.ll, there is the snippet:
>     %13 = add i64 %12, %0
>     %14 = add i64 %13, %9
> and it is doing "%12 + %0 + %9"
>
> Note %9 + %12 can be simplified in later phase, but %12 + %0 + %9
> cannot because %9 and %12 are not added directly after reassociation.
> We can see llc result for 2.ll has more instructions than that for
> 1.ll.
>
> llc -O2 < 1.ll
>  foo:                                    # @foo
>         .cfi_startproc
> # %bb.0:
>         movq    %rdi, %rax
>         addq    (%rsi), %rax
>         movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
>         mulq    %rcx
>         xorq    %rdx, %rax
>         retq
> .Lfunc_end0:
>         .size   foo, .Lfunc_end0-foo
>         .cfi_endproc
>
> llc -O2 < 2.ll
> foo:                                    # @foo
>         .cfi_startproc
> # %bb.0:
>         movl    (%rsi), %eax
>         movl    4(%rsi), %ecx
>         shlq    $32, %rcx
>         addq    %rdi, %rax
>         addq    %rcx, %rax
>         movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
>         mulq    %rcx
>         xorq    %rdx, %rax
>         retq
> .Lfunc_end0:
>         .size   foo, .Lfunc_end0-foo
>         .cfi_endproc
>
>
>
> On Tue, Nov 3, 2020 at 12:53 PM Roman Lebedev via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
> >
> > On Tue, Nov 3, 2020 at 11:34 PM Nikita Popov <nikita.ppv at gmail.com> wrote:
> > >
> > > On Tue, Nov 3, 2020 at 8:31 PM Roman Lebedev via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> > >>
> > >>
> > >> Author: Roman Lebedev
> > >> Date: 2020-11-03T22:30:51+03:00
> > >> New Revision: 70472f34b289d663a9f70740dfc9ae32fc89e077
> > >>
> > >> URL: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077
> > >> DIFF: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077.diff
> > >>
> > >> LOG: [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation
> > >>
> > >> InstCombine is quite aggressive in doing the opposite transform,
> > >> folding `add` of operands with no common bits set into an `or`,
> > >> and that not many things support that new pattern..
> > >>
> > >> In this case, teaching Reassociate about it is easy,
> > >> there's preexisting art for `sub`/`shl`:
> > >> just convert such an `or` into an `add`:
> > >> https://rise4fun.com/Alive/Xlyv
> > >>
> > >> Added:
> > >>
> > >>
> > >> Modified:
> > >>     llvm/lib/Transforms/Scalar/Reassociate.cpp
> > >>     llvm/test/Transforms/Reassociate/add-like-or.ll
> > >>
> > >> Removed:
> > >>
> > >>
> > >>
> > >> ################################################################################
> > >> diff  --git a/llvm/lib/Transforms/Scalar/Reassociate.cpp b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > >> index ba7f367267fe..164971c916d4 100644
> > >> --- a/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > >> +++ b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > >> @@ -920,6 +920,24 @@ static Value *NegateValue(Value *V, Instruction *BI,
> > >>    return NewNeg;
> > >>  }
> > >>
> > >> +/// If we have (X|Y), and iff X and Y have no common bits set,
> > >> +/// transform this into (X+Y) to allow arithmetics reassociation.
> > >> +static BinaryOperator *ConvertOrWithNoCommonBitsToAdd(Instruction *Or) {
> > >> +  // Convert an or into an add.
> > >> +  BinaryOperator *New =
> > >> +      CreateAdd(Or->getOperand(0), Or->getOperand(1), "", Or, Or);
> > >> +  New->setHasNoSignedWrap();
> > >> +  New->setHasNoUnsignedWrap();
> > >> +  New->takeName(Or);
> > >> +
> > >> +  // Everyone now refers to the add instruction.
> > >> +  Or->replaceAllUsesWith(New);
> > >> +  New->setDebugLoc(Or->getDebugLoc());
> > >> +
> > >> +  LLVM_DEBUG(dbgs() << "Converted or into an add: " << *New << '\n');
> > >> +  return New;
> > >> +}
> > >> +
> > >>  /// Return true if we should break up this subtract of X-Y into (X + -Y).
> > >>  static bool ShouldBreakUpSubtract(Instruction *Sub) {
> > >>    // If this is a negation, we can't split it up!
> > >> @@ -2116,6 +2134,18 @@ void ReassociatePass::OptimizeInst(Instruction *I) {
> > >>    if (I->getType()->isIntegerTy(1))
> > >>      return;
> > >>
> > >> +  // If this is a bitwise or instruction of operands
> > >> +  // with no common bits set, convert it to X+Y.
> > >> +  if (I->getOpcode() == Instruction::Or &&
> > >> +      haveNoCommonBitsSet(I->getOperand(0), I->getOperand(1),
> > >> +                          I->getModule()->getDataLayout(), /*AC=*/nullptr, I,
> > >> +                          /*DT=*/nullptr)) {
> > >> +    Instruction *NI = ConvertOrWithNoCommonBitsToAdd(I);
> > >> +    RedoInsts.insert(I);
> > >> +    MadeChange = true;
> > >> +    I = NI;
> > >> +  }
> > >> +
> > >>    // If this is a subtract instruction which is not already in negate form,
> > >>    // see if we can convert it to X+-Y.
> > >>    if (I->getOpcode() == Instruction::Sub) {
> > >>
> > >> diff  --git a/llvm/test/Transforms/Reassociate/add-like-or.ll b/llvm/test/Transforms/Reassociate/add-like-or.ll
> > >> index 25f2c0563da9..13477ad5a491 100644
> > >> --- a/llvm/test/Transforms/Reassociate/add-like-or.ll
> > >> +++ b/llvm/test/Transforms/Reassociate/add-like-or.ll
> > >> @@ -17,7 +17,7 @@ define i32 @test1(i32 %a, i32 %b) {
> > >>  define i32 @test2(i32 %x) {
> > >>  ; CHECK-LABEL: @test2(
> > >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0:!range !.*]]
> > >> -; CHECK-NEXT:    [[RES:%.*]] = or i32 [[X_NUMLZ]], -32
> > >> +; CHECK-NEXT:    [[RES:%.*]] = add nuw nsw i32 [[X_NUMLZ]], -32
> > >>  ; CHECK-NEXT:    ret i32 [[RES]]
> > >>  ;
> > >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> > >> @@ -29,9 +29,8 @@ define i32 @test2(i32 %x) {
> > >>  define i32 @test3(i32 %x, i32 %bit) {
> > >>  ; CHECK-LABEL: @test3(
> > >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0]]
> > >> -; CHECK-NEXT:    [[ZERO_MINUS_X_NUMACTIVEBITS:%.*]] = or i32 [[X_NUMLZ]], -32
> > >> -; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], 1
> > >> -; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[ZERO_MINUS_X_NUMACTIVEBITS]]
> > >> +; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], -31
> > >> +; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[X_NUMLZ]]
> > >>  ; CHECK-NEXT:    ret i32 [[RES]]
> > >>  ;
> > >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> > >
> > >
> > > Not familiar with this transform, so possibly stupid question:
> >
> > > The way I'm reading this change, you are unconditionally reverting the InstCombine canonicalization.
> > Correct.
> >
> > > The other transform you mention (shl -> mul) first checks whether doing this will result in reassociation opportunities.
> >
> > > I'm wondering if the same shouldn't be done here as well?
> >
> > As far as i can tell, either the reassociation will happen, or
> > instcombine will uglify that back:
> > https://github.com/llvm/llvm-project/blob/88241ffb5636ebc0579d3ab8eeec78446a769c54/llvm/test/Other/opt-O3-pipeline.ll#L107-L132
> > In fact, i recall adding some transforms to instcombine to do just that.
> >
> > Unless doing it eagerly in Reassociate causes trouble for Reassociate itself
> > (https://github.com/llvm/llvm-project/commit/8db41e9dbd141d9904da6e29d098a49eed933400),
> > i'm not really sure there's much difference in doing the
> > 'profitability' check before or after..
> >
> > I believe, another example of that question is SROA's AggLoadStoreRewriter.
> >
> > > Regards,
> > > Nikita
> > Roman
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits