[llvm] r354298 - [CGP] form usub with overflow from sub+icmp

Wed Apr 24 09:06:45 PDT 2019

https://reviews.llvm.org/D61075

On Wed, Apr 24, 2019 at 9:23 AM Sanjay Patel <spatel at rotateright.com> wrote:

> As I'm rummaging around here in CGP, I see another possibility:
> We have a set of "RemovedInsts" populated by another transform that
> enables delayed deletion. I think we can have the overflow transforms use
> that. So we would RAUW, but not actually remove/delete the original
> cmp/add/sub when doing the overflow transforms. That would mean we don't
> need to mark the DT as modified in any way (we mark the DT as modified to
> prevent using a stale instruction iterator).
>
> I'll post a draft of that change, so you can try that with your motivating
> tests.
>
> On Wed, Apr 24, 2019 at 8:42 AM Sanjay Patel <spatel at rotateright.com>
> wrote:
>
>> Hi Yevgeny,
>>
>> I'm curious what size/mix of IR makes this a perf problem. That is, how
>> many overflow intrinsics and how many instructions/blocks are in the
>> problem function? If you can share an example file, that would be great.
>>
>> You're correct that the CFG is not modified, so DomTreeUpdater might be a
>> solution. But I had not seen DTU until just now, so if anyone with
>> knowledge of that class has suggestions about how to use it here, I'd be
>> grateful. :)
>>
>> On Tue, Apr 23, 2019 at 9:42 PM Yevgeny Rouban <yevgeny.rouban at azul.com>
>> wrote:
>>
>>> Hello Sanjay.
>>>
>>>
>>>
>>> We have all these changes integrated but still suffer from significant
>>> performance degradation.
>>>
>>> I’d like to mention that the NFCI change
>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=354689 seems
>>> to introduce the complexity as it adds DomTree building not only to
>>> combineToUSubWithOverflow but also to combineToUAddWithOverflow().
>>>
>>>
>>>
>>> The methods combineToUSubWithOverflow() and combineToUAddWithOverflow()
>>> seem to be keeping CFG intact. So I believe they should not trigger DT
>>> rebuild. In other words, they use DT but do not change it.
>>>
>>> If I’m not right and DT needs to be rebuilt is it possible to make use
>>> of DomTreeUpdater wich should save performance in these cases?
>>>
>>>
>>>
>>> Thanks.
>>>
>>> -Yevgeny Rouban
>>>
>>>
>>>
>>> *From:* Sanjay Patel <spatel at rotateright.com>
>>> *Sent:* Wednesday, April 24, 2019 1:12 AM
>>> *To:* Philip Reames <listmail at philipreames.com>
>>> *Cc:* Teresa Johnson <tejohnson at google.com>; Guozhi Wei <
>>> carrot at google.com>; llvm-commits <llvm-commits at lists.llvm.org>; Yevgeny
>>> Rouban <yevgeny.rouban at azul.com>
>>> *Subject:* Re: [llvm] r354298 - [CGP] form usub with overflow from
>>> sub+icmp
>>>
>>>
>>>
>>> Philip,
>>>
>>> Thanks for letting me know. For reference, here are possibly relevant
>>> changes to CGP that came after this commit:
>>>
>>> https://reviews.llvm.org/D58995 (rL355512)
>>>
>>> https://reviews.llvm.org/D59139 (rL355751)
>>>
>>> https://reviews.llvm.org/D59696 (rL356937)
>>>
>>> https://reviews.llvm.org/D59889 (rL357111)
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 23, 2019 at 11:13 AM Philip Reames <
>>> listmail at philipreames.com> wrote:
>>>
>>> Sanjay,
>>>
>>> We are also seeing fall out from this change.  We have a relatively
>>> widely felt compile time regression which appears to be triggered by this
>>> change.  The operating theory I've heard is that the use of dom tree is
>>> forcing many more rebuilds of a previously invalidated tree.  Yevgeny (CCd)
>>> can provide more information; he's worked around the problem in our
>>> downstream tree and can share his analysis.
>>>
>>> Philip
>>>
>>> On 3/13/19 9:54 PM, Teresa Johnson via llvm-commits wrote:
>>>
>>> Hi Sanjay,
>>>
>>>
>>>
>>> Unfortunately we are having some additional problems with this patch.
>>> One is a compiler assertion (which goes away after r355823 although since
>>> that patch just added a heuristic guard on the transformation it is likely
>>> just hidden). I filed https://bugs.llvm.org/show_bug.cgi?id=41064 for
>>> that one.
>>>
>>>
>>>
>>> The other is a performance slowdown. Carrot who is copied here can send
>>> you more info about that.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Teresa
>>>
>>>
>>>
>>> On Mon, Feb 18, 2019 at 3:32 PM Sanjay Patel via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>> Author: spatel
>>> Date: Mon Feb 18 15:33:05 2019
>>> New Revision: 354298
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=354298&view=rev
>>> Log:
>>> [CGP] form usub with overflow from sub+icmp
>>>
>>> The motivating x86 cases for forming the intrinsic are shown in PR31754
>>> and PR40487:
>>> https://bugs.llvm.org/show_bug.cgi?id=31754
>>> https://bugs.llvm.org/show_bug.cgi?id=40487
>>> ..and those are shown in the IR test file and x86 codegen file.
>>>
>>> Matching the usubo pattern is harder than uaddo because we have 2
>>> independent values rather than a def-use.
>>>
>>> This adds a TLI hook that should preserve the existing behavior for
>>> uaddo formation, but disables usubo
>>> formation by default. Only x86 overrides that setting for now although
>>> other targets will likely benefit
>>> by forming usbuo too.
>>>
>>> Differential Revision: https://reviews.llvm.org/D57789
>>>
>>> Modified:
>>>     llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>>>     llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>     llvm/trunk/lib/Target/X86/X86ISelLowering.h
>>>     llvm/trunk/test/CodeGen/X86/cgp-usubo.ll
>>>     llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
>>>     llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>>>
>>> Modified: llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/TargetLowering.h?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/CodeGen/TargetLowering.h (original)
>>> +++ llvm/trunk/include/llvm/CodeGen/TargetLowering.h Mon Feb 18 15:33:05
>>> 2019
>>> @@ -2439,6 +2439,23 @@ public:
>>>      return false;
>>>    }
>>>
>>> +  /// Try to convert math with an overflow comparison into the
>>> corresponding DAG
>>> +  /// node operation. Targets may want to override this independently
>>> of whether
>>> +  /// the operation is legal/custom for the given type because it may
>>> obscure
>>> +  /// matching of other patterns.
>>> +  virtual bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const {
>>> +    // TODO: The default logic is inherited from code in CodeGenPrepare.
>>> +    // The opcode should not make a difference by default?
>>> +    if (Opcode != ISD::UADDO)
>>> +      return false;
>>> +
>>> +    // Allow the transform as long as we have an integer type that is
>>> not
>>> +    // obviously illegal and unsupported.
>>> +    if (VT.isVector())
>>> +      return false;
>>> +    return VT.isSimple() || !isOperationExpand(Opcode, VT);
>>> +  }
>>> +
>>>    // Return true if it is profitable to use a scalar input to a
>>> BUILD_VECTOR
>>>    // even if the vector itself has multiple uses.
>>>    virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {
>>>
>>> Modified: llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp (original)
>>> +++ llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp Mon Feb 18 15:33:05 2019
>>> @@ -1162,9 +1162,18 @@ static bool OptimizeNoopCopyExpression(C
>>>  static void replaceMathCmpWithIntrinsic(BinaryOperator *BO, CmpInst
>>> *Cmp,
>>>                                          Instruction *InsertPt,
>>>                                          Intrinsic::ID IID) {
>>> +  Value *Arg0 = BO->getOperand(0);
>>> +  Value *Arg1 = BO->getOperand(1);
>>> +
>>> +  // We allow matching the canonical IR (add X, C) back to (usubo X,
>>> -C).
>>> +  if (BO->getOpcode() == Instruction::Add &&
>>> +      IID == Intrinsic::usub_with_overflow) {
>>> +    assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
>>> +    Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
>>> +  }
>>> +
>>>    IRBuilder<> Builder(InsertPt);
>>> -  Value *MathOV = Builder.CreateBinaryIntrinsic(IID, BO->getOperand(0),
>>> -                                                BO->getOperand(1));
>>> +  Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
>>>    Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
>>>    Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
>>>    BO->replaceAllUsesWith(Math);
>>> @@ -1182,13 +1191,8 @@ static bool combineToUAddWithOverflow(Cm
>>>    if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B),
>>> m_BinOp(Add))))
>>>      return false;
>>>
>>> -  // Allow the transform as long as we have an integer type that is not
>>> -  // obviously illegal and unsupported.
>>> -  Type *Ty = Add->getType();
>>> -  if (!isa<IntegerType>(Ty))
>>> -    return false;
>>> -  EVT CodegenVT = TLI.getValueType(DL, Ty);
>>> -  if (!CodegenVT.isSimple() && TLI.isOperationExpand(ISD::UADDO,
>>> CodegenVT))
>>> +  if (!TLI.shouldFormOverflowOp(ISD::UADDO,
>>> +                                TLI.getValueType(DL, Add->getType())))
>>>      return false;
>>>
>>>    // We don't want to move around uses of condition values this late,
>>> so we
>>> @@ -1210,6 +1214,64 @@ static bool combineToUAddWithOverflow(Cm
>>>    return true;
>>>  }
>>>
>>> +static bool combineToUSubWithOverflow(CmpInst *Cmp, const
>>> TargetLowering &TLI,
>>> +                                      const DataLayout &DL, bool
>>> &ModifiedDT) {
>>> +  // Convert (A u> B) to (A u< B) to simplify pattern matching.
>>> +  Value *A = Cmp->getOperand(0), *B = Cmp->getOperand(1);
>>> +  ICmpInst::Predicate Pred = Cmp->getPredicate();
>>> +  if (Pred == ICmpInst::ICMP_UGT) {
>>> +    std::swap(A, B);
>>> +    Pred = ICmpInst::ICMP_ULT;
>>> +  }
>>> +  // Convert special-case: (A == 0) is the same as (A u< 1).
>>> +  if (Pred == ICmpInst::ICMP_EQ && match(B, m_ZeroInt())) {
>>> +    B = ConstantInt::get(B->getType(), 1);
>>> +    Pred = ICmpInst::ICMP_ULT;
>>> +  }
>>> +  if (Pred != ICmpInst::ICMP_ULT)
>>> +    return false;
>>> +
>>> +  // Walk the users of a variable operand of a compare looking for a
>>> subtract or
>>> +  // add with that same operand. Also match the 2nd operand of the
>>> compare to
>>> +  // the add/sub, but that may be a negated constant operand of an add.
>>> +  Value *CmpVariableOperand = isa<Constant>(A) ? B : A;
>>> +  BinaryOperator *Sub = nullptr;
>>> +  for (User *U : CmpVariableOperand->users()) {
>>> +    // A - B, A u< B --> usubo(A, B)
>>> +    if (match(U, m_Sub(m_Specific(A), m_Specific(B)))) {
>>> +      Sub = cast<BinaryOperator>(U);
>>> +      break;
>>> +    }
>>> +
>>> +    // A + (-C), A u< C (canonicalized form of (sub A, C))
>>> +    const APInt *CmpC, *AddC;
>>> +    if (match(U, m_Add(m_Specific(A), m_APInt(AddC))) &&
>>> +        match(B, m_APInt(CmpC)) && *AddC == -(*CmpC)) {
>>> +      Sub = cast<BinaryOperator>(U);
>>> +      break;
>>> +    }
>>> +  }
>>> +  if (!Sub)
>>> +    return false;
>>> +
>>> +  if (!TLI.shouldFormOverflowOp(ISD::USUBO,
>>> +                                TLI.getValueType(DL, Sub->getType())))
>>> +    return false;
>>> +
>>> +  // Pattern matched and profitability checked. Check dominance to
>>> determine the
>>> +  // insertion point for an intrinsic that replaces the subtract and
>>> compare.
>>> +  DominatorTree DT(*Sub->getFunction());
>>> +  bool SubDominates = DT.dominates(Sub, Cmp);
>>> +  if (!SubDominates && !DT.dominates(Cmp, Sub))
>>> +    return false;
>>> +  Instruction *InPt = SubDominates ? cast<Instruction>(Sub)
>>> +                                   : cast<Instruction>(Cmp);
>>> +  replaceMathCmpWithIntrinsic(Sub, Cmp, InPt,
>>> Intrinsic::usub_with_overflow);
>>> +  // Reset callers - do not crash by iterating over a dead instruction.
>>> +  ModifiedDT = true;
>>> +  return true;
>>> +}
>>> +
>>>  /// Sink the given CmpInst into user blocks to reduce the number of
>>> virtual
>>>  /// registers that must be created and coalesced. This is a clear win
>>> except on
>>>  /// targets with multiple condition code registers (PowerPC), where it
>>> might
>>> @@ -1276,14 +1338,17 @@ static bool sinkCmpExpression(CmpInst *C
>>>    return MadeChange;
>>>  }
>>>
>>> -static bool optimizeCmpExpression(CmpInst *Cmp, const TargetLowering
>>> &TLI,
>>> -                                  const DataLayout &DL) {
>>> +static bool optimizeCmp(CmpInst *Cmp, const TargetLowering &TLI,
>>> +                        const DataLayout &DL, bool &ModifiedDT) {
>>>    if (sinkCmpExpression(Cmp, TLI))
>>>      return true;
>>>
>>>    if (combineToUAddWithOverflow(Cmp, TLI, DL))
>>>      return true;
>>>
>>> +  if (combineToUSubWithOverflow(Cmp, TLI, DL, ModifiedDT))
>>> +    return true;
>>> +
>>>    return false;
>>>  }
>>>
>>> @@ -6770,8 +6835,8 @@ bool CodeGenPrepare::optimizeInst(Instru
>>>      return false;
>>>    }
>>>
>>> -  if (CmpInst *CI = dyn_cast<CmpInst>(I))
>>> -    if (TLI && optimizeCmpExpression(CI, *TLI, *DL))
>>> +  if (auto *Cmp = dyn_cast<CmpInst>(I))
>>> +    if (TLI && optimizeCmp(Cmp, *TLI, *DL, ModifiedDT))
>>>        return true;
>>>
>>>    if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Mon Feb 18 15:33:05
>>> 2019
>>> @@ -4934,6 +4934,13 @@ bool X86TargetLowering::shouldScalarizeB
>>>    return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);
>>>  }
>>>
>>> +bool X86TargetLowering::shouldFormOverflowOp(unsigned Opcode, EVT VT)
>>> const {
>>> +  // TODO: Allow vectors?
>>> +  if (VT.isVector())
>>> +    return false;
>>> +  return VT.isSimple() || !isOperationExpand(Opcode, VT);
>>> +}
>>> +
>>>  bool X86TargetLowering::isCheapToSpeculateCttz() const {
>>>    // Speculate cttz only if we can directly use TZCNT.
>>>    return Subtarget.hasBMI();
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Mon Feb 18 15:33:05 2019
>>> @@ -1071,6 +1071,11 @@ namespace llvm {
>>>      /// supported.
>>>      bool shouldScalarizeBinop(SDValue) const override;
>>>
>>> +    /// Overflow nodes should get combined/lowered to optimal
>>> instructions
>>> +    /// (they should allow eliminating explicit compares by getting
>>> flags from
>>> +    /// math ops).
>>> +    bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const override;
>>> +
>>>      bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,
>>>                                        unsigned AddrSpace) const
>>> override {
>>>        // If we can replace more than 2 scalar stores, there will be a
>>> reduction
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/cgp-usubo.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/cgp-usubo.ll?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/cgp-usubo.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/cgp-usubo.ll Mon Feb 18 15:33:05 2019
>>> @@ -7,8 +7,8 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>>>  ; CHECK-LABEL: usubo_ult_i64:
>>>  ; CHECK:       # %bb.0:
>>>  ; CHECK-NEXT:    subq %rsi, %rdi
>>> -; CHECK-NEXT:    movq %rdi, (%rdx)
>>>  ; CHECK-NEXT:    setb %al
>>> +; CHECK-NEXT:    movq %rdi, (%rdx)
>>>  ; CHECK-NEXT:    retq
>>>    %s = sub i64 %x, %y
>>>    store i64 %s, i64* %p
>>> @@ -21,9 +21,8 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>>>  define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) nounwind {
>>>  ; CHECK-LABEL: usubo_ugt_i32:
>>>  ; CHECK:       # %bb.0:
>>> -; CHECK-NEXT:    cmpl %edi, %esi
>>> -; CHECK-NEXT:    seta %al
>>>  ; CHECK-NEXT:    subl %esi, %edi
>>> +; CHECK-NEXT:    setb %al
>>>  ; CHECK-NEXT:    movl %edi, (%rdx)
>>>  ; CHECK-NEXT:    retq
>>>    %ov = icmp ugt i32 %y, %x
>>> @@ -39,8 +38,7 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>>>  ; CHECK:       # %bb.0:
>>>  ; CHECK-NEXT:    movb $42, %cl
>>>  ; CHECK-NEXT:    subb %dil, %cl
>>> -; CHECK-NEXT:    cmpb $42, %dil
>>> -; CHECK-NEXT:    seta %al
>>> +; CHECK-NEXT:    setb %al
>>>  ; CHECK-NEXT:    movb %cl, (%rsi)
>>>  ; CHECK-NEXT:    retq
>>>    %s = sub i8 42, %x
>>> @@ -54,10 +52,9 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>>>  define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) nounwind {
>>>  ; CHECK-LABEL: usubo_ult_constant_op0_i16:
>>>  ; CHECK:       # %bb.0:
>>> -; CHECK-NEXT:    movl $43, %ecx
>>> -; CHECK-NEXT:    subl %edi, %ecx
>>> -; CHECK-NEXT:    cmpw $43, %di
>>> -; CHECK-NEXT:    seta %al
>>> +; CHECK-NEXT:    movw $43, %cx
>>> +; CHECK-NEXT:    subw %di, %cx
>>> +; CHECK-NEXT:    setb %al
>>>  ; CHECK-NEXT:    movw %cx, (%rsi)
>>>  ; CHECK-NEXT:    retq
>>>    %s = sub i16 43, %x
>>> @@ -71,11 +68,9 @@ define i1 @usubo_ult_constant_op0_i16(i1
>>>  define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) nounwind {
>>>  ; CHECK-LABEL: usubo_ult_constant_op1_i16:
>>>  ; CHECK:       # %bb.0:
>>> -; CHECK-NEXT:    movl %edi, %ecx
>>> -; CHECK-NEXT:    addl $-44, %ecx
>>> -; CHECK-NEXT:    cmpw $44, %di
>>> +; CHECK-NEXT:    subw $44, %di
>>>  ; CHECK-NEXT:    setb %al
>>> -; CHECK-NEXT:    movw %cx, (%rsi)
>>> +; CHECK-NEXT:    movw %di, (%rsi)
>>>  ; CHECK-NEXT:    retq
>>>    %s = add i16 %x, -44
>>>    %ov = icmp ult i16 %x, 44
>>> @@ -86,9 +81,8 @@ define i1 @usubo_ult_constant_op1_i16(i1
>>>  define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) nounwind {
>>>  ; CHECK-LABEL: usubo_ugt_constant_op1_i8:
>>>  ; CHECK:       # %bb.0:
>>> -; CHECK-NEXT:    cmpb $45, %dil
>>> +; CHECK-NEXT:    subb $45, %dil
>>>  ; CHECK-NEXT:    setb %al
>>> -; CHECK-NEXT:    addb $-45, %dil
>>>  ; CHECK-NEXT:    movb %dil, (%rsi)
>>>  ; CHECK-NEXT:    retq
>>>    %ov = icmp ugt i8 45, %x
>>> @@ -102,11 +96,9 @@ define i1 @usubo_ugt_constant_op1_i8(i8
>>>  define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) nounwind {
>>>  ; CHECK-LABEL: usubo_eq_constant1_op1_i32:
>>>  ; CHECK:       # %bb.0:
>>> -; CHECK-NEXT:    # kill: def $edi killed $edi def $rdi
>>> -; CHECK-NEXT:    leal -1(%rdi), %ecx
>>> -; CHECK-NEXT:    testl %edi, %edi
>>> -; CHECK-NEXT:    sete %al
>>> -; CHECK-NEXT:    movl %ecx, (%rsi)
>>> +; CHECK-NEXT:    subl $1, %edi
>>> +; CHECK-NEXT:    setb %al
>>> +; CHECK-NEXT:    movl %edi, (%rsi)
>>>  ; CHECK-NEXT:    retq
>>>    %s = add i32 %x, -1
>>>    %ov = icmp eq i32 %x, 0
>>> @@ -124,17 +116,14 @@ define i1 @usubo_ult_sub_dominates_i64(i
>>>  ; CHECK-NEXT:    testb $1, %cl
>>>  ; CHECK-NEXT:    je .LBB7_2
>>>  ; CHECK-NEXT:  # %bb.1: # %t
>>> -; CHECK-NEXT:    movq %rdi, %rax
>>> -; CHECK-NEXT:    subq %rsi, %rax
>>> -; CHECK-NEXT:    movq %rax, (%rdx)
>>> -; CHECK-NEXT:    testb $1, %cl
>>> -; CHECK-NEXT:    je .LBB7_2
>>> -; CHECK-NEXT:  # %bb.3: # %end
>>> -; CHECK-NEXT:    cmpq %rsi, %rdi
>>> +; CHECK-NEXT:    subq %rsi, %rdi
>>>  ; CHECK-NEXT:    setb %al
>>> -; CHECK-NEXT:    retq
>>> +; CHECK-NEXT:    movq %rdi, (%rdx)
>>> +; CHECK-NEXT:    testb $1, %cl
>>> +; CHECK-NEXT:    jne .LBB7_3
>>>  ; CHECK-NEXT:  .LBB7_2: # %f
>>>  ; CHECK-NEXT:    movl %ecx, %eax
>>> +; CHECK-NEXT:  .LBB7_3: # %end
>>>  ; CHECK-NEXT:    retq
>>>  entry:
>>>    br i1 %cond, label %t, label %f
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll Mon Feb 18
>>> 15:33:05 2019
>>> @@ -16,11 +16,11 @@ define void @t(i8* nocapture %in, i8* no
>>>  ; GENERIC-NEXT:    movl (%rdx), %eax
>>>  ; GENERIC-NEXT:    movl 4(%rdx), %ebx
>>>  ; GENERIC-NEXT:    decl %ecx
>>> -; GENERIC-NEXT:    leaq 20(%rdx), %r14
>>> +; GENERIC-NEXT:    leaq 20(%rdx), %r11
>>>  ; GENERIC-NEXT:    movq _Te0@{{.*}}(%rip), %r9
>>>  ; GENERIC-NEXT:    movq _Te1@{{.*}}(%rip), %r8
>>>  ; GENERIC-NEXT:    movq _Te3@{{.*}}(%rip), %r10
>>> -; GENERIC-NEXT:    movq %rcx, %r11
>>> +; GENERIC-NEXT:    movq %rcx, %r14
>>>  ; GENERIC-NEXT:    jmp LBB0_1
>>>  ; GENERIC-NEXT:    .p2align 4, 0x90
>>>  ; GENERIC-NEXT:  LBB0_2: ## %bb1
>>> @@ -29,14 +29,13 @@ define void @t(i8* nocapture %in, i8* no
>>>  ; GENERIC-NEXT:    shrl $16, %ebx
>>>  ; GENERIC-NEXT:    movzbl %bl, %ebx
>>>  ; GENERIC-NEXT:    xorl (%r8,%rbx,4), %eax
>>> -; GENERIC-NEXT:    xorl -4(%r14), %eax
>>> +; GENERIC-NEXT:    xorl -4(%r11), %eax
>>>  ; GENERIC-NEXT:    shrl $24, %edi
>>>  ; GENERIC-NEXT:    movzbl %bpl, %ebx
>>>  ; GENERIC-NEXT:    movl (%r10,%rbx,4), %ebx
>>>  ; GENERIC-NEXT:    xorl (%r9,%rdi,4), %ebx
>>> -; GENERIC-NEXT:    xorl (%r14), %ebx
>>> -; GENERIC-NEXT:    decq %r11
>>> -; GENERIC-NEXT:    addq $16, %r14
>>> +; GENERIC-NEXT:    xorl (%r11), %ebx
>>> +; GENERIC-NEXT:    addq $16, %r11
>>>  ; GENERIC-NEXT:  LBB0_1: ## %bb
>>>  ; GENERIC-NEXT:    ## =>This Inner Loop Header: Depth=1
>>>  ; GENERIC-NEXT:    movzbl %al, %edi
>>> @@ -47,16 +46,16 @@ define void @t(i8* nocapture %in, i8* no
>>>  ; GENERIC-NEXT:    movzbl %bpl, %ebp
>>>  ; GENERIC-NEXT:    movl (%r8,%rbp,4), %ebp
>>>  ; GENERIC-NEXT:    xorl (%r9,%rax,4), %ebp
>>> -; GENERIC-NEXT:    xorl -12(%r14), %ebp
>>> +; GENERIC-NEXT:    xorl -12(%r11), %ebp
>>>  ; GENERIC-NEXT:    shrl $24, %ebx
>>>  ; GENERIC-NEXT:    movl (%r10,%rdi,4), %edi
>>>  ; GENERIC-NEXT:    xorl (%r9,%rbx,4), %edi
>>> -; GENERIC-NEXT:    xorl -8(%r14), %edi
>>> +; GENERIC-NEXT:    xorl -8(%r11), %edi
>>>  ; GENERIC-NEXT:    movl %ebp, %eax
>>>  ; GENERIC-NEXT:    shrl $24, %eax
>>>  ; GENERIC-NEXT:    movl (%r9,%rax,4), %eax
>>> -; GENERIC-NEXT:    testq %r11, %r11
>>> -; GENERIC-NEXT:    jne LBB0_2
>>> +; GENERIC-NEXT:    subq $1, %r14
>>> +; GENERIC-NEXT:    jae LBB0_2
>>>  ; GENERIC-NEXT:  ## %bb.3: ## %bb2
>>>  ; GENERIC-NEXT:    shlq $4, %rcx
>>>  ; GENERIC-NEXT:    andl $-16777216, %eax ## imm = 0xFF000000
>>> @@ -99,27 +98,26 @@ define void @t(i8* nocapture %in, i8* no
>>>  ; ATOM-NEXT:    ## kill: def $ecx killed $ecx def $rcx
>>>  ; ATOM-NEXT:    movl (%rdx), %r15d
>>>  ; ATOM-NEXT:    movl 4(%rdx), %eax
>>> -; ATOM-NEXT:    leaq 20(%rdx), %r14
>>> +; ATOM-NEXT:    leaq 20(%rdx), %r11
>>>  ; ATOM-NEXT:    movq _Te0@{{.*}}(%rip), %r9
>>>  ; ATOM-NEXT:    movq _Te1@{{.*}}(%rip), %r8
>>>  ; ATOM-NEXT:    movq _Te3@{{.*}}(%rip), %r10
>>>  ; ATOM-NEXT:    decl %ecx
>>> -; ATOM-NEXT:    movq %rcx, %r11
>>> +; ATOM-NEXT:    movq %rcx, %r14
>>>  ; ATOM-NEXT:    jmp LBB0_1
>>>  ; ATOM-NEXT:    .p2align 4, 0x90
>>>  ; ATOM-NEXT:  LBB0_2: ## %bb1
>>>  ; ATOM-NEXT:    ## in Loop: Header=BB0_1 Depth=1
>>>  ; ATOM-NEXT:    shrl $16, %eax
>>>  ; ATOM-NEXT:    shrl $24, %edi
>>> -; ATOM-NEXT:    decq %r11
>>> -; ATOM-NEXT:    movzbl %al, %ebp
>>> +; ATOM-NEXT:    movzbl %al, %eax
>>> +; ATOM-NEXT:    xorl (%r8,%rax,4), %r15d
>>>  ; ATOM-NEXT:    movzbl %bl, %eax
>>>  ; ATOM-NEXT:    movl (%r10,%rax,4), %eax
>>> -; ATOM-NEXT:    xorl (%r8,%rbp,4), %r15d
>>> +; ATOM-NEXT:    xorl -4(%r11), %r15d
>>>  ; ATOM-NEXT:    xorl (%r9,%rdi,4), %eax
>>> -; ATOM-NEXT:    xorl -4(%r14), %r15d
>>> -; ATOM-NEXT:    xorl (%r14), %eax
>>> -; ATOM-NEXT:    addq $16, %r14
>>> +; ATOM-NEXT:    xorl (%r11), %eax
>>> +; ATOM-NEXT:    addq $16, %r11
>>>  ; ATOM-NEXT:  LBB0_1: ## %bb
>>>  ; ATOM-NEXT:    ## =>This Inner Loop Header: Depth=1
>>>  ; ATOM-NEXT:    movl %eax, %edi
>>> @@ -132,15 +130,15 @@ define void @t(i8* nocapture %in, i8* no
>>>  ; ATOM-NEXT:    movzbl %r15b, %edi
>>>  ; ATOM-NEXT:    xorl (%r9,%rbp,4), %ebx
>>>  ; ATOM-NEXT:    movl (%r10,%rdi,4), %edi
>>> -; ATOM-NEXT:    xorl -12(%r14), %ebx
>>> +; ATOM-NEXT:    xorl -12(%r11), %ebx
>>>  ; ATOM-NEXT:    xorl (%r9,%rax,4), %edi
>>>  ; ATOM-NEXT:    movl %ebx, %eax
>>> -; ATOM-NEXT:    xorl -8(%r14), %edi
>>> +; ATOM-NEXT:    xorl -8(%r11), %edi
>>>  ; ATOM-NEXT:    shrl $24, %eax
>>>  ; ATOM-NEXT:    movl (%r9,%rax,4), %r15d
>>> -; ATOM-NEXT:    testq %r11, %r11
>>> +; ATOM-NEXT:    subq $1, %r14
>>>  ; ATOM-NEXT:    movl %edi, %eax
>>> -; ATOM-NEXT:    jne LBB0_2
>>> +; ATOM-NEXT:    jae LBB0_2
>>>  ; ATOM-NEXT:  ## %bb.3: ## %bb2
>>>  ; ATOM-NEXT:    shrl $16, %eax
>>>  ; ATOM-NEXT:    shrl $8, %edi
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll?rev=354298&r1=354297&r2=354298&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>>> (original)
>>> +++ llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
>>> Mon Feb 18 15:33:05 2019
>>> @@ -175,10 +175,11 @@ define i1 @uaddo_i42_increment_illegal_t
>>>
>>>  define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) {
>>>  ; CHECK-LABEL: @usubo_ult_i64(
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i64 [[X:%.*]], [[Y:%.*]]
>>> -; CHECK-NEXT:    store i64 [[S]], i64* [[P:%.*]]
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i64, i1 }
>>> @llvm.usub.with.overflow.i64(i64 [[X:%.*]], i64 [[Y:%.*]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i64 [[MATH]], i64* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %s = sub i64 %x, %y
>>>    store i64 %s, i64* %p
>>> @@ -190,10 +191,11 @@ define i1 @usubo_ult_i64(i64 %x, i64 %y,
>>>
>>>  define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) {
>>>  ; CHECK-LABEL: @usubo_ugt_i32(
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ugt i32 [[Y:%.*]], [[X:%.*]]
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i32 [[X]], [[Y]]
>>> -; CHECK-NEXT:    store i32 [[S]], i32* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i1 }
>>> @llvm.usub.with.overflow.i32(i32 [[X:%.*]], i32 [[Y:%.*]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i32 [[MATH]], i32* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %ov = icmp ugt i32 %y, %x
>>>    %s = sub i32 %x, %y
>>> @@ -205,10 +207,11 @@ define i1 @usubo_ugt_i32(i32 %x, i32 %y,
>>>
>>>  define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) {
>>>  ; CHECK-LABEL: @usubo_ugt_constant_op0_i8(
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i8 42, [[X:%.*]]
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ugt i8 [[X]], 42
>>> -; CHECK-NEXT:    store i8 [[S]], i8* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i8, i1 }
>>> @llvm.usub.with.overflow.i8(i8 42, i8 [[X:%.*]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i8 [[MATH]], i8* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %s = sub i8 42, %x
>>>    %ov = icmp ugt i8 %x, 42
>>> @@ -220,10 +223,11 @@ define i1 @usubo_ugt_constant_op0_i8(i8
>>>
>>>  define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) {
>>>  ; CHECK-LABEL: @usubo_ult_constant_op0_i16(
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i16 43, [[X:%.*]]
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ult i16 43, [[X]]
>>> -; CHECK-NEXT:    store i16 [[S]], i16* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i16, i1 }
>>> @llvm.usub.with.overflow.i16(i16 43, i16 [[X:%.*]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i16 [[MATH]], i16* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %s = sub i16 43, %x
>>>    %ov = icmp ult i16 43, %x
>>> @@ -235,10 +239,11 @@ define i1 @usubo_ult_constant_op0_i16(i1
>>>
>>>  define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) {
>>>  ; CHECK-LABEL: @usubo_ult_constant_op1_i16(
>>> -; CHECK-NEXT:    [[S:%.*]] = add i16 [[X:%.*]], -44
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ult i16 [[X]], 44
>>> -; CHECK-NEXT:    store i16 [[S]], i16* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i16, i1 }
>>> @llvm.usub.with.overflow.i16(i16 [[X:%.*]], i16 44)
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i16 [[MATH]], i16* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %s = add i16 %x, -44
>>>    %ov = icmp ult i16 %x, 44
>>> @@ -248,10 +253,11 @@ define i1 @usubo_ult_constant_op1_i16(i1
>>>
>>>  define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) {
>>>  ; CHECK-LABEL: @usubo_ugt_constant_op1_i8(
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ugt i8 45, [[X:%.*]]
>>> -; CHECK-NEXT:    [[S:%.*]] = add i8 [[X]], -45
>>> -; CHECK-NEXT:    store i8 [[S]], i8* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i8, i1 }
>>> @llvm.usub.with.overflow.i8(i8 [[X:%.*]], i8 45)
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i8 [[MATH]], i8* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %ov = icmp ugt i8 45, %x
>>>    %s = add i8 %x, -45
>>> @@ -263,10 +269,11 @@ define i1 @usubo_ugt_constant_op1_i8(i8
>>>
>>>  define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) {
>>>  ; CHECK-LABEL: @usubo_eq_constant1_op1_i32(
>>> -; CHECK-NEXT:    [[S:%.*]] = add i32 [[X:%.*]], -1
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp eq i32 [[X]], 0
>>> -; CHECK-NEXT:    store i32 [[S]], i32* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i1 }
>>> @llvm.usub.with.overflow.i32(i32 [[X:%.*]], i32 1)
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
>>> +; CHECK-NEXT:    store i32 [[MATH]], i32* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>    %s = add i32 %x, -1
>>>    %ov = icmp eq i32 %x, 0
>>> @@ -283,14 +290,15 @@ define i1 @usubo_ult_sub_dominates_i64(i
>>>  ; CHECK-NEXT:  entry:
>>>  ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[T:%.*]], label [[F:%.*]]
>>>  ; CHECK:       t:
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i64 [[X:%.*]], [[Y:%.*]]
>>> -; CHECK-NEXT:    store i64 [[S]], i64* [[P:%.*]]
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = call { i64, i1 }
>>> @llvm.usub.with.overflow.i64(i64 [[X:%.*]], i64 [[Y:%.*]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
>>> +; CHECK-NEXT:    store i64 [[MATH]], i64* [[P:%.*]]
>>>  ; CHECK-NEXT:    br i1 [[COND]], label [[END:%.*]], label [[F]]
>>>  ; CHECK:       f:
>>>  ; CHECK-NEXT:    ret i1 [[COND]]
>>>  ; CHECK:       end:
>>> -; CHECK-NEXT:    [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]
>>> -; CHECK-NEXT:    ret i1 [[OV]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>  entry:
>>>    br i1 %cond, label %t, label %f
>>> @@ -319,10 +327,11 @@ define i1 @usubo_ult_cmp_dominates_i64(i
>>>  ; CHECK:       f:
>>>  ; CHECK-NEXT:    ret i1 [[COND]]
>>>  ; CHECK:       end:
>>> -; CHECK-NEXT:    [[TMP0:%.*]] = icmp ult i64 [[X]], [[Y]]
>>> -; CHECK-NEXT:    [[S:%.*]] = sub i64 [[X]], [[Y]]
>>> -; CHECK-NEXT:    store i64 [[S]], i64* [[P:%.*]]
>>> -; CHECK-NEXT:    ret i1 [[TMP0]]
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = call { i64, i1 }
>>> @llvm.usub.with.overflow.i64(i64 [[X]], i64 [[Y]])
>>> +; CHECK-NEXT:    [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
>>> +; CHECK-NEXT:    [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
>>> +; CHECK-NEXT:    store i64 [[MATH]], i64* [[P:%.*]]
>>> +; CHECK-NEXT:    ret i1 [[OV1]]
>>>  ;
>>>  entry:
>>>    br i1 %cond, label %t, label %f
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Teresa Johnson |
>>>
>>>  Software Engineer |
>>>
>>>  tejohnson at google.com |
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> llvm-commits mailing list
>>>
>>> llvm-commits at lists.llvm.org
>>>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190424/9b272fc9/attachment-0001.html>