[llvm] 70472f3 - [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation

Roman Lebedev via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 18 06:56:53 PST 2020


The motivational pattern is fixed with 34ff90ad5d7d17a464bff9abbad70229c693a551.
Let me know which other IR patterns, if any, are still regressed.

Roman.

On Wed, Nov 18, 2020 at 1:56 PM Roman Lebedev <lebedev.ri at gmail.com> wrote:
>
> Hi.
>
> Are all affected patterns like this?
> The problem here is: https://alive2.llvm.org/ce/z/_s5qYy
> I.e. we break apart load widening idiom.
>
> There are two fix directions here: special-case this particular pattern
> as being unprofitable, which is trivial to do;
> and actually finally introduce load widening pass :)
>
> I'll look into the former, this should be reasonably easy..
>
>
> Roman
>
> On Wed, Nov 18, 2020 at 9:33 AM Wei Mi <wmi at google.com> wrote:
> >
> > I understand the problem could mainly be caused by the imperfect
> > heuristic in reassociation instead of the patch. The patch just
> > creates some churn and exposes the issue. However, since it affects
> > the performance of our multiple benchmarks, if some fix or workaround
> > can be found, that would be great. If not, could you consider
> > reverting the patch while looking for a solution?
> >
> > Thanks,
> > Wei.
> >
> > On Tue, Nov 17, 2020 at 10:05 PM Wei Mi <wmi at google.com> wrote:
> > >
> > > Hi Roman,
> > >
> > > We are seeing performance regressions in multiple benchmarks because
> > > of the patch. Here is a reproducible below. Note that
> > > https://github.com/llvm/llvm-project/commit/93f3d7f7b3a902ad9e90ef0d9bf147582a2cf620
> > > doesn't fix the regression.
> > >
> > > ================ 1.ll ================
> > > define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
> > > local_unnamed_addr #11 align 32 {
> > >   %3 = getelementptr inbounds i64, i64* %1, i64 1
> > >   %4 = bitcast i64* %3 to i8*
> > >   %5 = getelementptr inbounds i8, i8* %4, i64 -4
> > >   %6 = bitcast i8* %5 to i32*
> > >   %7 = load i32, i32* %6, align 4
> > >   %8 = zext i32 %7 to i64
> > >   %9 = shl nuw i64 %8, 32
> > >   %10 = bitcast i64* %1 to i32*
> > >   %11 = load i32, i32* %10, align 8
> > >   %12 = zext i32 %11 to i64
> > >   %13 = or i64 %9, %12               <===
> > >   %14 = add i64 %13, %0            <===
> > >   %15 = zext i64 %14 to i128
> > >   %16 = mul nuw nsw i128 %15, 8192506886679785011
> > >   %17 = lshr i128 %16, 64
> > >   %18 = xor i128 %17, %16
> > >   %19 = trunc i128 %18 to i64
> > >   ret i64 %19
> > > }
> > > ===================================
> > > Without the reassociate change, the output from "opt --reassociate -S
> > > 1.ll" is the same as 1.ll.
> > >
> > > With the reassociate change, opt --reassociate -S 1.ll > 2.ll
> > >
> > > ================ 2.ll ================
> > > define i64 @foo(i64 %0, i64* nonnull align 8 dereferenceable(8) %1)
> > > local_unnamed_addr align 32 {
> > >   %3 = getelementptr inbounds i64, i64* %1, i64 1
> > >   %4 = bitcast i64* %3 to i8*
> > >   %5 = getelementptr inbounds i8, i8* %4, i64 -4
> > >   %6 = bitcast i8* %5 to i32*
> > >   %7 = load i32, i32* %6, align 4
> > >   %8 = zext i32 %7 to i64
> > >   %9 = shl nuw i64 %8, 32
> > >   %10 = bitcast i64* %1 to i32*
> > >   %11 = load i32, i32* %10, align 8
> > >   %12 = zext i32 %11 to i64
> > >   %13 = add i64 %12, %0            <====
> > >   %14 = add i64 %13, %9            <====
> > >   %15 = zext i64 %14 to i128
> > >   %16 = mul nuw nsw i128 %15, 8192506886679785011
> > >   %17 = lshr i128 %16, 64
> > >   %18 = xor i128 %17, %16
> > >   %19 = trunc i128 %18 to i64
> > >   ret i64 %19
> > > }
> > > ===============================
> > >
> > > In 1.ll, there is the snippet:
> > >     %13 = or i64 %9, %12
> > >     %14 = add i64 %13, %0
> > > and it is doing "%9 + %12 + %0"
> > >
> > > In 2.ll, there is the snippet:
> > >     %13 = add i64 %12, %0
> > >     %14 = add i64 %13, %9
> > > and it is doing "%12 + %0 + %9"
> > >
> > > Note %9 + %12 can be simplified in later phase, but %12 + %0 + %9
> > > cannot because %9 and %12 are not added directly after reassociation.
> > > We can see llc result for 2.ll has more instructions than that for
> > > 1.ll.
> > >
> > > llc -O2 < 1.ll
> > >  foo:                                    # @foo
> > >         .cfi_startproc
> > > # %bb.0:
> > >         movq    %rdi, %rax
> > >         addq    (%rsi), %rax
> > >         movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
> > >         mulq    %rcx
> > >         xorq    %rdx, %rax
> > >         retq
> > > .Lfunc_end0:
> > >         .size   foo, .Lfunc_end0-foo
> > >         .cfi_endproc
> > >
> > > llc -O2 < 2.ll
> > > foo:                                    # @foo
> > >         .cfi_startproc
> > > # %bb.0:
> > >         movl    (%rsi), %eax
> > >         movl    4(%rsi), %ecx
> > >         shlq    $32, %rcx
> > >         addq    %rdi, %rax
> > >         addq    %rcx, %rax
> > >         movabsq $8192506886679785011, %rcx      # imm = 0x71B1A19B907D6E33
> > >         mulq    %rcx
> > >         xorq    %rdx, %rax
> > >         retq
> > > .Lfunc_end0:
> > >         .size   foo, .Lfunc_end0-foo
> > >         .cfi_endproc
> > >
> > >
> > >
> > > On Tue, Nov 3, 2020 at 12:53 PM Roman Lebedev via llvm-commits
> > > <llvm-commits at lists.llvm.org> wrote:
> > > >
> > > > On Tue, Nov 3, 2020 at 11:34 PM Nikita Popov <nikita.ppv at gmail.com> wrote:
> > > > >
> > > > > On Tue, Nov 3, 2020 at 8:31 PM Roman Lebedev via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> > > > >>
> > > > >>
> > > > >> Author: Roman Lebedev
> > > > >> Date: 2020-11-03T22:30:51+03:00
> > > > >> New Revision: 70472f34b289d663a9f70740dfc9ae32fc89e077
> > > > >>
> > > > >> URL: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077
> > > > >> DIFF: https://github.com/llvm/llvm-project/commit/70472f34b289d663a9f70740dfc9ae32fc89e077.diff
> > > > >>
> > > > >> LOG: [Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation
> > > > >>
> > > > >> InstCombine is quite aggressive in doing the opposite transform,
> > > > >> folding `add` of operands with no common bits set into an `or`,
> > > > >> and that not many things support that new pattern..
> > > > >>
> > > > >> In this case, teaching Reassociate about it is easy,
> > > > >> there's preexisting art for `sub`/`shl`:
> > > > >> just convert such an `or` into an `add`:
> > > > >> https://rise4fun.com/Alive/Xlyv
> > > > >>
> > > > >> Added:
> > > > >>
> > > > >>
> > > > >> Modified:
> > > > >>     llvm/lib/Transforms/Scalar/Reassociate.cpp
> > > > >>     llvm/test/Transforms/Reassociate/add-like-or.ll
> > > > >>
> > > > >> Removed:
> > > > >>
> > > > >>
> > > > >>
> > > > >> ################################################################################
> > > > >> diff  --git a/llvm/lib/Transforms/Scalar/Reassociate.cpp b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > > > >> index ba7f367267fe..164971c916d4 100644
> > > > >> --- a/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > > > >> +++ b/llvm/lib/Transforms/Scalar/Reassociate.cpp
> > > > >> @@ -920,6 +920,24 @@ static Value *NegateValue(Value *V, Instruction *BI,
> > > > >>    return NewNeg;
> > > > >>  }
> > > > >>
> > > > >> +/// If we have (X|Y), and iff X and Y have no common bits set,
> > > > >> +/// transform this into (X+Y) to allow arithmetics reassociation.
> > > > >> +static BinaryOperator *ConvertOrWithNoCommonBitsToAdd(Instruction *Or) {
> > > > >> +  // Convert an or into an add.
> > > > >> +  BinaryOperator *New =
> > > > >> +      CreateAdd(Or->getOperand(0), Or->getOperand(1), "", Or, Or);
> > > > >> +  New->setHasNoSignedWrap();
> > > > >> +  New->setHasNoUnsignedWrap();
> > > > >> +  New->takeName(Or);
> > > > >> +
> > > > >> +  // Everyone now refers to the add instruction.
> > > > >> +  Or->replaceAllUsesWith(New);
> > > > >> +  New->setDebugLoc(Or->getDebugLoc());
> > > > >> +
> > > > >> +  LLVM_DEBUG(dbgs() << "Converted or into an add: " << *New << '\n');
> > > > >> +  return New;
> > > > >> +}
> > > > >> +
> > > > >>  /// Return true if we should break up this subtract of X-Y into (X + -Y).
> > > > >>  static bool ShouldBreakUpSubtract(Instruction *Sub) {
> > > > >>    // If this is a negation, we can't split it up!
> > > > >> @@ -2116,6 +2134,18 @@ void ReassociatePass::OptimizeInst(Instruction *I) {
> > > > >>    if (I->getType()->isIntegerTy(1))
> > > > >>      return;
> > > > >>
> > > > >> +  // If this is a bitwise or instruction of operands
> > > > >> +  // with no common bits set, convert it to X+Y.
> > > > >> +  if (I->getOpcode() == Instruction::Or &&
> > > > >> +      haveNoCommonBitsSet(I->getOperand(0), I->getOperand(1),
> > > > >> +                          I->getModule()->getDataLayout(), /*AC=*/nullptr, I,
> > > > >> +                          /*DT=*/nullptr)) {
> > > > >> +    Instruction *NI = ConvertOrWithNoCommonBitsToAdd(I);
> > > > >> +    RedoInsts.insert(I);
> > > > >> +    MadeChange = true;
> > > > >> +    I = NI;
> > > > >> +  }
> > > > >> +
> > > > >>    // If this is a subtract instruction which is not already in negate form,
> > > > >>    // see if we can convert it to X+-Y.
> > > > >>    if (I->getOpcode() == Instruction::Sub) {
> > > > >>
> > > > >> diff  --git a/llvm/test/Transforms/Reassociate/add-like-or.ll b/llvm/test/Transforms/Reassociate/add-like-or.ll
> > > > >> index 25f2c0563da9..13477ad5a491 100644
> > > > >> --- a/llvm/test/Transforms/Reassociate/add-like-or.ll
> > > > >> +++ b/llvm/test/Transforms/Reassociate/add-like-or.ll
> > > > >> @@ -17,7 +17,7 @@ define i32 @test1(i32 %a, i32 %b) {
> > > > >>  define i32 @test2(i32 %x) {
> > > > >>  ; CHECK-LABEL: @test2(
> > > > >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0:!range !.*]]
> > > > >> -; CHECK-NEXT:    [[RES:%.*]] = or i32 [[X_NUMLZ]], -32
> > > > >> +; CHECK-NEXT:    [[RES:%.*]] = add nuw nsw i32 [[X_NUMLZ]], -32
> > > > >>  ; CHECK-NEXT:    ret i32 [[RES]]
> > > > >>  ;
> > > > >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> > > > >> @@ -29,9 +29,8 @@ define i32 @test2(i32 %x) {
> > > > >>  define i32 @test3(i32 %x, i32 %bit) {
> > > > >>  ; CHECK-LABEL: @test3(
> > > > >>  ; CHECK-NEXT:    [[X_NUMLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 true), [[RNG0]]
> > > > >> -; CHECK-NEXT:    [[ZERO_MINUS_X_NUMACTIVEBITS:%.*]] = or i32 [[X_NUMLZ]], -32
> > > > >> -; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], 1
> > > > >> -; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[ZERO_MINUS_X_NUMACTIVEBITS]]
> > > > >> +; CHECK-NEXT:    [[BIT_PLUS_ONE:%.*]] = add i32 [[BIT:%.*]], -31
> > > > >> +; CHECK-NEXT:    [[RES:%.*]] = add i32 [[BIT_PLUS_ONE]], [[X_NUMLZ]]
> > > > >>  ; CHECK-NEXT:    ret i32 [[RES]]
> > > > >>  ;
> > > > >>    %x.numlz = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true), !range !0
> > > > >
> > > > >
> > > > > Not familiar with this transform, so possibly stupid question:
> > > >
> > > > > The way I'm reading this change, you are unconditionally reverting the InstCombine canonicalization.
> > > > Correct.
> > > >
> > > > > The other transform you mention (shl -> mul) first checks whether doing this will result in reassociation opportunities.
> > > >
> > > > > I'm wondering if the same shouldn't be done here as well?
> > > >
> > > > As far as i can tell, either the reassociation will happen, or
> > > > instcombine will uglify that back:
> > > > https://github.com/llvm/llvm-project/blob/88241ffb5636ebc0579d3ab8eeec78446a769c54/llvm/test/Other/opt-O3-pipeline.ll#L107-L132
> > > > In fact, i recall adding some transforms to instcombine to do just that.
> > > >
> > > > Unless doing it eagerly in Reassociate causes trouble for Reassociate itself
> > > > (https://github.com/llvm/llvm-project/commit/8db41e9dbd141d9904da6e29d098a49eed933400),
> > > > i'm not really sure there's much difference in doing the
> > > > 'profitability' check before or after..
> > > >
> > > > I believe, another example of that question is SROA's AggLoadStoreRewriter.
> > > >
> > > > > Regards,
> > > > > Nikita
> > > > Roman
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list