[llvm] r354298 - [CGP] form usub with overflow from sub+icmp
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 24 08:23:10 PDT 2019
As I'm rummaging around here in CGP, I see another possibility:
We have a set of "RemovedInsts" populated by another transform that enables
delayed deletion. I think we can have the overflow transforms use that. So
we would RAUW, but not actually remove/delete the original cmp/add/sub when
doing the overflow transforms. That would mean we don't need to mark the DT
as modified in any way (we mark the DT as modified to prevent using a stale
instruction iterator).
I'll post a draft of that change, so you can try that with your motivating
tests.
On Wed, Apr 24, 2019 at 8:42 AM Sanjay Patel <spatel at rotateright.com> wrote:
> Hi Yevgeny,
>
> I'm curious what size/mix of IR makes this a perf problem. That is, how
> many overflow intrinsics and how many instructions/blocks are in the
> problem function? If you can share an example file, that would be great.
>
> You're correct that the CFG is not modified, so DomTreeUpdater might be a
> solution. But I had not seen DTU until just now, so if anyone with
> knowledge of that class has suggestions about how to use it here, I'd be
> grateful. :)
>
> On Tue, Apr 23, 2019 at 9:42 PM Yevgeny Rouban <yevgeny.rouban at azul.com>
> wrote:
>
>> Hello Sanjay.
>>
>>
>>
>> We have all these changes integrated but still suffer from significant
>> performance degradation.
>>
>> I’d like to mention that the NFCI change
>> http://llvm.org/viewvc/llvm-project?view=revision&revision=354689 seems
>> to introduce the complexity as it adds DomTree building not only to
>> combineToUSubWithOverflow but also to combineToUAddWithOverflow().
>>
>>
>>
>> The methods combineToUSubWithOverflow() and combineToUAddWithOverflow()
>> seem to be keeping CFG intact. So I believe they should not trigger DT
>> rebuild. In other words, they use DT but do not change it.
>>
>> If I’m not right and DT needs to be rebuilt is it possible to make use of
>> DomTreeUpdater wich should save performance in these cases?
>>
>>
>>
>> Thanks.
>>
>> -Yevgeny Rouban
>>
>>
>>
>> *From:* Sanjay Patel <spatel at rotateright.com>
>> *Sent:* Wednesday, April 24, 2019 1:12 AM
>> *To:* Philip Reames <listmail at philipreames.com>
>> *Cc:* Teresa Johnson <tejohnson at google.com>; Guozhi Wei <
>> carrot at google.com>; llvm-commits <llvm-commits at lists.llvm.org>; Yevgeny
>> Rouban <yevgeny.rouban at azul.com>
>> *Subject:* Re: [llvm] r354298 - [CGP] form usub with overflow from
>> sub+icmp
>>
>>
>>
>> Philip,
>>
>> Thanks for letting me know. For reference, here are possibly relevant
>> changes to CGP that came after this commit:
>>
>> https://reviews.llvm.org/D58995 (rL355512)
>>
>> https://reviews.llvm.org/D59139 (rL355751)
>>
>> https://reviews.llvm.org/D59696 (rL356937)
>>
>> https://reviews.llvm.org/D59889 (rL357111)
>>
>>
>>
>>
>>
>> On Tue, Apr 23, 2019 at 11:13 AM Philip Reames <listmail at philipreames.com>
>> wrote:
>>
>> Sanjay,
>>
>> We are also seeing fall out from this change. We have a relatively
>> widely felt compile time regression which appears to be triggered by this
>> change. The operating theory I've heard is that the use of dom tree is
>> forcing many more rebuilds of a previously invalidated tree. Yevgeny (CCd)
>> can provide more information; he's worked around the problem in our
>> downstream tree and can share his analysis.
>>
>> Philip
>>
>> On 3/13/19 9:54 PM, Teresa Johnson via llvm-commits wrote:
>>
>> Hi Sanjay,
>>
>>
>>
>> Unfortunately we are having some additional problems with this patch. One
>> is a compiler assertion (which goes away after r355823 although since that
>> patch just added a heuristic guard on the transformation it is likely just
>> hidden). I filed https://bugs.llvm.org/show_bug.cgi?id=41064 for that
>> one.
>>
>>
>>
>> The other is a performance slowdown. Carrot who is copied here can send
>> you more info about that.
>>
>>
>>
>> Thanks,
>>
>> Teresa
>>
>>
>>
>> On Mon, Feb 18, 2019 at 3:32 PM Sanjay Patel via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>> Author: spatel
>> Date: Mon Feb 18 15:33:05 2019
>> New Revision: 354298
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=354298&view=rev
>> Log:
>> [CGP] form usub with overflow from sub+icmp
>>
>> The motivating x86 cases for forming the intrinsic are shown in PR31754
>> and PR40487:
>> https://bugs.llvm.org/show_bug.cgi?id=31754
>> https://bugs.llvm.org/show_bug.cgi?id=40487
>> ..and those are shown in the IR test file and x86 codegen file.
>>
>> Matching the usubo pattern is harder than uaddo because we have 2
>> independent values rather than a def-use.
>>
>> This adds a TLI hook that should preserve the existing behavior for uaddo
>> formation, but disables usubo
>> formation by default. Only x86 overrides that setting for now although
>> other targets will likely benefit
>> by forming usbuo too.
>>
>> Differential Revision: https://reviews.llvm.org/D57789
>>
>> Modified:
>> llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>> llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> llvm/trunk/lib/Target/X86/X86ISelLowering.h
>> llvm/trunk/test/CodeGen/X86/cgp-usubo.ll
>> llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
>> llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>>
>> Modified: llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/TargetLowering.h?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/CodeGen/TargetLowering.h (original)
>> +++ llvm/trunk/include/llvm/CodeGen/TargetLowering.h Mon Feb 18 15:33:05
>> 2019
>> @@ -2439,6 +2439,23 @@ public:
>> return false;
>> }
>>
>> + /// Try to convert math with an overflow comparison into the
>> corresponding DAG
>> + /// node operation. Targets may want to override this independently of
>> whether
>> + /// the operation is legal/custom for the given type because it may
>> obscure
>> + /// matching of other patterns.
>> + virtual bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const {
>> + // TODO: The default logic is inherited from code in CodeGenPrepare.
>> + // The opcode should not make a difference by default?
>> + if (Opcode != ISD::UADDO)
>> + return false;
>> +
>> + // Allow the transform as long as we have an integer type that is not
>> + // obviously illegal and unsupported.
>> + if (VT.isVector())
>> + return false;
>> + return VT.isSimple() || !isOperationExpand(Opcode, VT);
>> + }
>> +
>> // Return true if it is profitable to use a scalar input to a
>> BUILD_VECTOR
>> // even if the vector itself has multiple uses.
>> virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {
>>
>> Modified: llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp Mon Feb 18 15:33:05 2019
>> @@ -1162,9 +1162,18 @@ static bool OptimizeNoopCopyExpression(C
>> static void replaceMathCmpWithIntrinsic(BinaryOperator *BO, CmpInst *Cmp,
>> Instruction *InsertPt,
>> Intrinsic::ID IID) {
>> + Value *Arg0 = BO->getOperand(0);
>> + Value *Arg1 = BO->getOperand(1);
>> +
>> + // We allow matching the canonical IR (add X, C) back to (usubo X, -C).
>> + if (BO->getOpcode() == Instruction::Add &&
>> + IID == Intrinsic::usub_with_overflow) {
>> + assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
>> + Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
>> + }
>> +
>> IRBuilder<> Builder(InsertPt);
>> - Value *MathOV = Builder.CreateBinaryIntrinsic(IID, BO->getOperand(0),
>> - BO->getOperand(1));
>> + Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
>> Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
>> Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
>> BO->replaceAllUsesWith(Math);
>> @@ -1182,13 +1191,8 @@ static bool combineToUAddWithOverflow(Cm
>> if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B),
>> m_BinOp(Add))))
>> return false;
>>
>> - // Allow the transform as long as we have an integer type that is not
>> - // obviously illegal and unsupported.
>> - Type *Ty = Add->getType();
>> - if (!isa<IntegerType>(Ty))
>> - return false;
>> - EVT CodegenVT = TLI.getValueType(DL, Ty);
>> - if (!CodegenVT.isSimple() && TLI.isOperationExpand(ISD::UADDO,
>> CodegenVT))
>> + if (!TLI.shouldFormOverflowOp(ISD::UADDO,
>> + TLI.getValueType(DL, Add->getType())))
>> return false;
>>
>> // We don't want to move around uses of condition values this late, so
>> we
>> @@ -1210,6 +1214,64 @@ static bool combineToUAddWithOverflow(Cm
>> return true;
>> }
>>
>> +static bool combineToUSubWithOverflow(CmpInst *Cmp, const TargetLowering
>> &TLI,
>> + const DataLayout &DL, bool
>> &ModifiedDT) {
>> + // Convert (A u> B) to (A u< B) to simplify pattern matching.
>> + Value *A = Cmp->getOperand(0), *B = Cmp->getOperand(1);
>> + ICmpInst::Predicate Pred = Cmp->getPredicate();
>> + if (Pred == ICmpInst::ICMP_UGT) {
>> + std::swap(A, B);
>> + Pred = ICmpInst::ICMP_ULT;
>> + }
>> + // Convert special-case: (A == 0) is the same as (A u< 1).
>> + if (Pred == ICmpInst::ICMP_EQ && match(B, m_ZeroInt())) {
>> + B = ConstantInt::get(B->getType(), 1);
>> + Pred = ICmpInst::ICMP_ULT;
>> + }
>> + if (Pred != ICmpInst::ICMP_ULT)
>> + return false;
>> +
>> + // Walk the users of a variable operand of a compare looking for a
>> subtract or
>> + // add with that same operand. Also match the 2nd operand of the
>> compare to
>> + // the add/sub, but that may be a negated constant operand of an add.
>> + Value *CmpVariableOperand = isa<Constant>(A) ? B : A;
>> + BinaryOperator *Sub = nullptr;
>> + for (User *U : CmpVariableOperand->users()) {
>> + // A - B, A u< B --> usubo(A, B)
>> + if (match(U, m_Sub(m_Specific(A), m_Specific(B)))) {
>> + Sub = cast<BinaryOperator>(U);
>> + break;
>> + }
>> +
>> + // A + (-C), A u< C (canonicalized form of (sub A, C))
>> + const APInt *CmpC, *AddC;
>> + if (match(U, m_Add(m_Specific(A), m_APInt(AddC))) &&
>> + match(B, m_APInt(CmpC)) && *AddC == -(*CmpC)) {
>> + Sub = cast<BinaryOperator>(U);
>> + break;
>> + }
>> + }
>> + if (!Sub)
>> + return false;
>> +
>> + if (!TLI.shouldFormOverflowOp(ISD::USUBO,
>> + TLI.getValueType(DL, Sub->getType())))
>> + return false;
>> +
>> + // Pattern matched and profitability checked. Check dominance to
>> determine the
>> + // insertion point for an intrinsic that replaces the subtract and
>> compare.
>> + DominatorTree DT(*Sub->getFunction());
>> + bool SubDominates = DT.dominates(Sub, Cmp);
>> + if (!SubDominates && !DT.dominates(Cmp, Sub))
>> + return false;
>> + Instruction *InPt = SubDominates ? cast<Instruction>(Sub)
>> + : cast<Instruction>(Cmp);
>> + replaceMathCmpWithIntrinsic(Sub, Cmp, InPt,
>> Intrinsic::usub_with_overflow);
>> + // Reset callers - do not crash by iterating over a dead instruction.
>> + ModifiedDT = true;
>> + return true;
>> +}
>> +
>> /// Sink the given CmpInst into user blocks to reduce the number of
>> virtual
>> /// registers that must be created and coalesced. This is a clear win
>> except on
>> /// targets with multiple condition code registers (PowerPC), where it
>> might
>> @@ -1276,14 +1338,17 @@ static bool sinkCmpExpression(CmpInst *C
>> return MadeChange;
>> }
>>
>> -static bool optimizeCmpExpression(CmpInst *Cmp, const TargetLowering
>> &TLI,
>> - const DataLayout &DL) {
>> +static bool optimizeCmp(CmpInst *Cmp, const TargetLowering &TLI,
>> + const DataLayout &DL, bool &ModifiedDT) {
>> if (sinkCmpExpression(Cmp, TLI))
>> return true;
>>
>> if (combineToUAddWithOverflow(Cmp, TLI, DL))
>> return true;
>>
>> + if (combineToUSubWithOverflow(Cmp, TLI, DL, ModifiedDT))
>> + return true;
>> +
>> return false;
>> }
>>
>> @@ -6770,8 +6835,8 @@ bool CodeGenPrepare::optimizeInst(Instru
>> return false;
>> }
>>
>> - if (CmpInst *CI = dyn_cast<CmpInst>(I))
>> - if (TLI && optimizeCmpExpression(CI, *TLI, *DL))
>> + if (auto *Cmp = dyn_cast<CmpInst>(I))
>> + if (TLI && optimizeCmp(Cmp, *TLI, *DL, ModifiedDT))
>> return true;
>>
>> if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Mon Feb 18 15:33:05 2019
>> @@ -4934,6 +4934,13 @@ bool X86TargetLowering::shouldScalarizeB
>> return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);
>> }
>>
>> +bool X86TargetLowering::shouldFormOverflowOp(unsigned Opcode, EVT VT)
>> const {
>> + // TODO: Allow vectors?
>> + if (VT.isVector())
>> + return false;
>> + return VT.isSimple() || !isOperationExpand(Opcode, VT);
>> +}
>> +
>> bool X86TargetLowering::isCheapToSpeculateCttz() const {
>> // Speculate cttz only if we can directly use TZCNT.
>> return Subtarget.hasBMI();
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Mon Feb 18 15:33:05 2019
>> @@ -1071,6 +1071,11 @@ namespace llvm {
>> /// supported.
>> bool shouldScalarizeBinop(SDValue) const override;
>>
>> + /// Overflow nodes should get combined/lowered to optimal
>> instructions
>> + /// (they should allow eliminating explicit compares by getting
>> flags from
>> + /// math ops).
>> + bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const override;
>> +
>> bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,
>> unsigned AddrSpace) const override
>> {
>> // If we can replace more than 2 scalar stores, there will be a
>> reduction
>>
>> Modified: llvm/trunk/test/CodeGen/X86/cgp-usubo.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/cgp-usubo.ll?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/cgp-usubo.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/cgp-usubo.ll Mon Feb 18 15:33:05 2019
>> @@ -7,8 +7,8 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>> ; CHECK-LABEL: usubo_ult_i64:
>> ; CHECK: # %bb.0:
>> ; CHECK-NEXT: subq %rsi, %rdi
>> -; CHECK-NEXT: movq %rdi, (%rdx)
>> ; CHECK-NEXT: setb %al
>> +; CHECK-NEXT: movq %rdi, (%rdx)
>> ; CHECK-NEXT: retq
>> %s = sub i64 %x, %y
>> store i64 %s, i64* %p
>> @@ -21,9 +21,8 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>> define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) nounwind {
>> ; CHECK-LABEL: usubo_ugt_i32:
>> ; CHECK: # %bb.0:
>> -; CHECK-NEXT: cmpl %edi, %esi
>> -; CHECK-NEXT: seta %al
>> ; CHECK-NEXT: subl %esi, %edi
>> +; CHECK-NEXT: setb %al
>> ; CHECK-NEXT: movl %edi, (%rdx)
>> ; CHECK-NEXT: retq
>> %ov = icmp ugt i32 %y, %x
>> @@ -39,8 +38,7 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>> ; CHECK: # %bb.0:
>> ; CHECK-NEXT: movb $42, %cl
>> ; CHECK-NEXT: subb %dil, %cl
>> -; CHECK-NEXT: cmpb $42, %dil
>> -; CHECK-NEXT: seta %al
>> +; CHECK-NEXT: setb %al
>> ; CHECK-NEXT: movb %cl, (%rsi)
>> ; CHECK-NEXT: retq
>> %s = sub i8 42, %x
>> @@ -54,10 +52,9 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>> define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) nounwind {
>> ; CHECK-LABEL: usubo_ult_constant_op0_i16:
>> ; CHECK: # %bb.0:
>> -; CHECK-NEXT: movl $43, %ecx
>> -; CHECK-NEXT: subl %edi, %ecx
>> -; CHECK-NEXT: cmpw $43, %di
>> -; CHECK-NEXT: seta %al
>> +; CHECK-NEXT: movw $43, %cx
>> +; CHECK-NEXT: subw %di, %cx
>> +; CHECK-NEXT: setb %al
>> ; CHECK-NEXT: movw %cx, (%rsi)
>> ; CHECK-NEXT: retq
>> %s = sub i16 43, %x
>> @@ -71,11 +68,9 @@ define i1 @usubo_ult_constant_op0_i16(i1
>> define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) nounwind {
>> ; CHECK-LABEL: usubo_ult_constant_op1_i16:
>> ; CHECK: # %bb.0:
>> -; CHECK-NEXT: movl %edi, %ecx
>> -; CHECK-NEXT: addl $-44, %ecx
>> -; CHECK-NEXT: cmpw $44, %di
>> +; CHECK-NEXT: subw $44, %di
>> ; CHECK-NEXT: setb %al
>> -; CHECK-NEXT: movw %cx, (%rsi)
>> +; CHECK-NEXT: movw %di, (%rsi)
>> ; CHECK-NEXT: retq
>> %s = add i16 %x, -44
>> %ov = icmp ult i16 %x, 44
>> @@ -86,9 +81,8 @@ define i1 @usubo_ult_constant_op1_i16(i1
>> define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) nounwind {
>> ; CHECK-LABEL: usubo_ugt_constant_op1_i8:
>> ; CHECK: # %bb.0:
>> -; CHECK-NEXT: cmpb $45, %dil
>> +; CHECK-NEXT: subb $45, %dil
>> ; CHECK-NEXT: setb %al
>> -; CHECK-NEXT: addb $-45, %dil
>> ; CHECK-NEXT: movb %dil, (%rsi)
>> ; CHECK-NEXT: retq
>> %ov = icmp ugt i8 45, %x
>> @@ -102,11 +96,9 @@ define i1 @usubo_ugt_constant_op1_i8(i8
>> define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) nounwind {
>> ; CHECK-LABEL: usubo_eq_constant1_op1_i32:
>> ; CHECK: # %bb.0:
>> -; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
>> -; CHECK-NEXT: leal -1(%rdi), %ecx
>> -; CHECK-NEXT: testl %edi, %edi
>> -; CHECK-NEXT: sete %al
>> -; CHECK-NEXT: movl %ecx, (%rsi)
>> +; CHECK-NEXT: subl $1, %edi
>> +; CHECK-NEXT: setb %al
>> +; CHECK-NEXT: movl %edi, (%rsi)
>> ; CHECK-NEXT: retq
>> %s = add i32 %x, -1
>> %ov = icmp eq i32 %x, 0
>> @@ -124,17 +116,14 @@ define i1 @usubo_ult_sub_dominates_i64(i
>> ; CHECK-NEXT: testb $1, %cl
>> ; CHECK-NEXT: je .LBB7_2
>> ; CHECK-NEXT: # %bb.1: # %t
>> -; CHECK-NEXT: movq %rdi, %rax
>> -; CHECK-NEXT: subq %rsi, %rax
>> -; CHECK-NEXT: movq %rax, (%rdx)
>> -; CHECK-NEXT: testb $1, %cl
>> -; CHECK-NEXT: je .LBB7_2
>> -; CHECK-NEXT: # %bb.3: # %end
>> -; CHECK-NEXT: cmpq %rsi, %rdi
>> +; CHECK-NEXT: subq %rsi, %rdi
>> ; CHECK-NEXT: setb %al
>> -; CHECK-NEXT: retq
>> +; CHECK-NEXT: movq %rdi, (%rdx)
>> +; CHECK-NEXT: testb $1, %cl
>> +; CHECK-NEXT: jne .LBB7_3
>> ; CHECK-NEXT: .LBB7_2: # %f
>> ; CHECK-NEXT: movl %ecx, %eax
>> +; CHECK-NEXT: .LBB7_3: # %end
>> ; CHECK-NEXT: retq
>> entry:
>> br i1 %cond, label %t, label %f
>>
>> Modified: llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll Mon Feb 18 15:33:05
>> 2019
>> @@ -16,11 +16,11 @@ define void @t(i8* nocapture %in, i8* no
>> ; GENERIC-NEXT: movl (%rdx), %eax
>> ; GENERIC-NEXT: movl 4(%rdx), %ebx
>> ; GENERIC-NEXT: decl %ecx
>> -; GENERIC-NEXT: leaq 20(%rdx), %r14
>> +; GENERIC-NEXT: leaq 20(%rdx), %r11
>> ; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9
>> ; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8
>> ; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10
>> -; GENERIC-NEXT: movq %rcx, %r11
>> +; GENERIC-NEXT: movq %rcx, %r14
>> ; GENERIC-NEXT: jmp LBB0_1
>> ; GENERIC-NEXT: .p2align 4, 0x90
>> ; GENERIC-NEXT: LBB0_2: ## %bb1
>> @@ -29,14 +29,13 @@ define void @t(i8* nocapture %in, i8* no
>> ; GENERIC-NEXT: shrl $16, %ebx
>> ; GENERIC-NEXT: movzbl %bl, %ebx
>> ; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax
>> -; GENERIC-NEXT: xorl -4(%r14), %eax
>> +; GENERIC-NEXT: xorl -4(%r11), %eax
>> ; GENERIC-NEXT: shrl $24, %edi
>> ; GENERIC-NEXT: movzbl %bpl, %ebx
>> ; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx
>> ; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx
>> -; GENERIC-NEXT: xorl (%r14), %ebx
>> -; GENERIC-NEXT: decq %r11
>> -; GENERIC-NEXT: addq $16, %r14
>> +; GENERIC-NEXT: xorl (%r11), %ebx
>> +; GENERIC-NEXT: addq $16, %r11
>> ; GENERIC-NEXT: LBB0_1: ## %bb
>> ; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1
>> ; GENERIC-NEXT: movzbl %al, %edi
>> @@ -47,16 +46,16 @@ define void @t(i8* nocapture %in, i8* no
>> ; GENERIC-NEXT: movzbl %bpl, %ebp
>> ; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp
>> ; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp
>> -; GENERIC-NEXT: xorl -12(%r14), %ebp
>> +; GENERIC-NEXT: xorl -12(%r11), %ebp
>> ; GENERIC-NEXT: shrl $24, %ebx
>> ; GENERIC-NEXT: movl (%r10,%rdi,4), %edi
>> ; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi
>> -; GENERIC-NEXT: xorl -8(%r14), %edi
>> +; GENERIC-NEXT: xorl -8(%r11), %edi
>> ; GENERIC-NEXT: movl %ebp, %eax
>> ; GENERIC-NEXT: shrl $24, %eax
>> ; GENERIC-NEXT: movl (%r9,%rax,4), %eax
>> -; GENERIC-NEXT: testq %r11, %r11
>> -; GENERIC-NEXT: jne LBB0_2
>> +; GENERIC-NEXT: subq $1, %r14
>> +; GENERIC-NEXT: jae LBB0_2
>> ; GENERIC-NEXT: ## %bb.3: ## %bb2
>> ; GENERIC-NEXT: shlq $4, %rcx
>> ; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000
>> @@ -99,27 +98,26 @@ define void @t(i8* nocapture %in, i8* no
>> ; ATOM-NEXT: ## kill: def $ecx killed $ecx def $rcx
>> ; ATOM-NEXT: movl (%rdx), %r15d
>> ; ATOM-NEXT: movl 4(%rdx), %eax
>> -; ATOM-NEXT: leaq 20(%rdx), %r14
>> +; ATOM-NEXT: leaq 20(%rdx), %r11
>> ; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9
>> ; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8
>> ; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10
>> ; ATOM-NEXT: decl %ecx
>> -; ATOM-NEXT: movq %rcx, %r11
>> +; ATOM-NEXT: movq %rcx, %r14
>> ; ATOM-NEXT: jmp LBB0_1
>> ; ATOM-NEXT: .p2align 4, 0x90
>> ; ATOM-NEXT: LBB0_2: ## %bb1
>> ; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1
>> ; ATOM-NEXT: shrl $16, %eax
>> ; ATOM-NEXT: shrl $24, %edi
>> -; ATOM-NEXT: decq %r11
>> -; ATOM-NEXT: movzbl %al, %ebp
>> +; ATOM-NEXT: movzbl %al, %eax
>> +; ATOM-NEXT: xorl (%r8,%rax,4), %r15d
>> ; ATOM-NEXT: movzbl %bl, %eax
>> ; ATOM-NEXT: movl (%r10,%rax,4), %eax
>> -; ATOM-NEXT: xorl (%r8,%rbp,4), %r15d
>> +; ATOM-NEXT: xorl -4(%r11), %r15d
>> ; ATOM-NEXT: xorl (%r9,%rdi,4), %eax
>> -; ATOM-NEXT: xorl -4(%r14), %r15d
>> -; ATOM-NEXT: xorl (%r14), %eax
>> -; ATOM-NEXT: addq $16, %r14
>> +; ATOM-NEXT: xorl (%r11), %eax
>> +; ATOM-NEXT: addq $16, %r11
>> ; ATOM-NEXT: LBB0_1: ## %bb
>> ; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1
>> ; ATOM-NEXT: movl %eax, %edi
>> @@ -132,15 +130,15 @@ define void @t(i8* nocapture %in, i8* no
>> ; ATOM-NEXT: movzbl %r15b, %edi
>> ; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx
>> ; ATOM-NEXT: movl (%r10,%rdi,4), %edi
>> -; ATOM-NEXT: xorl -12(%r14), %ebx
>> +; ATOM-NEXT: xorl -12(%r11), %ebx
>> ; ATOM-NEXT: xorl (%r9,%rax,4), %edi
>> ; ATOM-NEXT: movl %ebx, %eax
>> -; ATOM-NEXT: xorl -8(%r14), %edi
>> +; ATOM-NEXT: xorl -8(%r11), %edi
>> ; ATOM-NEXT: shrl $24, %eax
>> ; ATOM-NEXT: movl (%r9,%rax,4), %r15d
>> -; ATOM-NEXT: testq %r11, %r11
>> +; ATOM-NEXT: subq $1, %r14
>> ; ATOM-NEXT: movl %edi, %eax
>> -; ATOM-NEXT: jne LBB0_2
>> +; ATOM-NEXT: jae LBB0_2
>> ; ATOM-NEXT: ## %bb.3: ## %bb2
>> ; ATOM-NEXT: shrl $16, %eax
>> ; ATOM-NEXT: shrl $8, %edi
>>
>> Modified:
>> llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll?rev=354298&r1=354297&r2=354298&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>> (original)
>> +++ llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>> Mon Feb 18 15:33:05 2019
>> @@ -175,10 +175,11 @@ define i1 @uaddo_i42_increment_illegal_t
>>
>> define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) {
>> ; CHECK-LABEL: @usubo_ult_i64(
>> -; CHECK-NEXT: [[S:%.*]] = sub i64 [[X:%.*]], [[Y:%.*]]
>> -; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i64, i1 }
>> @llvm.usub.with.overflow.i64(i64 [[X:%.*]], i64 [[Y:%.*]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %s = sub i64 %x, %y
>> store i64 %s, i64* %p
>> @@ -190,10 +191,11 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>>
>> define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) {
>> ; CHECK-LABEL: @usubo_ugt_i32(
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ugt i32 [[Y:%.*]], [[X:%.*]]
>> -; CHECK-NEXT: [[S:%.*]] = sub i32 [[X]], [[Y]]
>> -; CHECK-NEXT: store i32 [[S]], i32* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i32, i1 }
>> @llvm.usub.with.overflow.i32(i32 [[X:%.*]], i32 [[Y:%.*]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i32 [[MATH]], i32* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %ov = icmp ugt i32 %y, %x
>> %s = sub i32 %x, %y
>> @@ -205,10 +207,11 @@ define i1 @usubo_ugt_i32(i32 %x, i32 %y,
>>
>> define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) {
>> ; CHECK-LABEL: @usubo_ugt_constant_op0_i8(
>> -; CHECK-NEXT: [[S:%.*]] = sub i8 42, [[X:%.*]]
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ugt i8 [[X]], 42
>> -; CHECK-NEXT: store i8 [[S]], i8* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i8, i1 }
>> @llvm.usub.with.overflow.i8(i8 42, i8 [[X:%.*]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i8 [[MATH]], i8* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %s = sub i8 42, %x
>> %ov = icmp ugt i8 %x, 42
>> @@ -220,10 +223,11 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>>
>> define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) {
>> ; CHECK-LABEL: @usubo_ult_constant_op0_i16(
>> -; CHECK-NEXT: [[S:%.*]] = sub i16 43, [[X:%.*]]
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ult i16 43, [[X]]
>> -; CHECK-NEXT: store i16 [[S]], i16* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i16, i1 }
>> @llvm.usub.with.overflow.i16(i16 43, i16 [[X:%.*]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i16 [[MATH]], i16* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %s = sub i16 43, %x
>> %ov = icmp ult i16 43, %x
>> @@ -235,10 +239,11 @@ define i1 @usubo_ult_constant_op0_i16(i1
>>
>> define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) {
>> ; CHECK-LABEL: @usubo_ult_constant_op1_i16(
>> -; CHECK-NEXT: [[S:%.*]] = add i16 [[X:%.*]], -44
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ult i16 [[X]], 44
>> -; CHECK-NEXT: store i16 [[S]], i16* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i16, i1 }
>> @llvm.usub.with.overflow.i16(i16 [[X:%.*]], i16 44)
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i16 [[MATH]], i16* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %s = add i16 %x, -44
>> %ov = icmp ult i16 %x, 44
>> @@ -248,10 +253,11 @@ define i1 @usubo_ult_constant_op1_i16(i1
>>
>> define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) {
>> ; CHECK-LABEL: @usubo_ugt_constant_op1_i8(
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ugt i8 45, [[X:%.*]]
>> -; CHECK-NEXT: [[S:%.*]] = add i8 [[X]], -45
>> -; CHECK-NEXT: store i8 [[S]], i8* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i8, i1 }
>> @llvm.usub.with.overflow.i8(i8 [[X:%.*]], i8 45)
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i8 [[MATH]], i8* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %ov = icmp ugt i8 45, %x
>> %s = add i8 %x, -45
>> @@ -263,10 +269,11 @@ define i1 @usubo_ugt_constant_op1_i8(i8
>>
>> define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) {
>> ; CHECK-LABEL: @usubo_eq_constant1_op1_i32(
>> -; CHECK-NEXT: [[S:%.*]] = add i32 [[X:%.*]], -1
>> -; CHECK-NEXT: [[OV:%.*]] = icmp eq i32 [[X]], 0
>> -; CHECK-NEXT: store i32 [[S]], i32* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: [[TMP1:%.*]] = call { i32, i1 }
>> @llvm.usub.with.overflow.i32(i32 [[X:%.*]], i32 1)
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
>> +; CHECK-NEXT: store i32 [[MATH]], i32* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> %s = add i32 %x, -1
>> %ov = icmp eq i32 %x, 0
>> @@ -283,14 +290,15 @@ define i1 @usubo_ult_sub_dominates_i64(i
>> ; CHECK-NEXT: entry:
>> ; CHECK-NEXT: br i1 [[COND:%.*]], label [[T:%.*]], label [[F:%.*]]
>> ; CHECK: t:
>> -; CHECK-NEXT: [[S:%.*]] = sub i64 [[X:%.*]], [[Y:%.*]]
>> -; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]
>> +; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 }
>> @llvm.usub.with.overflow.i64(i64 [[X:%.*]], i64 [[Y:%.*]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
>> +; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
>> ; CHECK-NEXT: br i1 [[COND]], label [[END:%.*]], label [[F]]
>> ; CHECK: f:
>> ; CHECK-NEXT: ret i1 [[COND]]
>> ; CHECK: end:
>> -; CHECK-NEXT: [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]
>> -; CHECK-NEXT: ret i1 [[OV]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> entry:
>> br i1 %cond, label %t, label %f
>> @@ -319,10 +327,11 @@ define i1 @usubo_ult_cmp_dominates_i64(i
>> ; CHECK: f:
>> ; CHECK-NEXT: ret i1 [[COND]]
>> ; CHECK: end:
>> -; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i64 [[X]], [[Y]]
>> -; CHECK-NEXT: [[S:%.*]] = sub i64 [[X]], [[Y]]
>> -; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]
>> -; CHECK-NEXT: ret i1 [[TMP0]]
>> +; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 }
>> @llvm.usub.with.overflow.i64(i64 [[X]], i64 [[Y]])
>> +; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
>> +; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
>> +; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
>> +; CHECK-NEXT: ret i1 [[OV1]]
>> ;
>> entry:
>> br i1 %cond, label %t, label %f
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> --
>>
>> Teresa Johnson |
>>
>> Software Engineer |
>>
>> tejohnson at google.com |
>>
>>
>>
>> _______________________________________________
>>
>> llvm-commits mailing list
>>
>> llvm-commits at lists.llvm.org
>>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190424/eb4f4c95/attachment.html>
More information about the llvm-commits
mailing list