[llvm] r292621 - [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".

Fri Jan 27 13:23:47 PST 2017

That is great the cause was found! Yes, I didn't consider subreg
liveness because I have no experience on target using them, but I did
notice that other similar functions like adjustCopiesBackFrom and
removeCopyByCommutingDef in regcoalescing pass have code to maintain
subreg liveness. I need to watch Matthias's talk on llvm dev and learn
this part carefully.

Thank you and Matthias very much for helping on the problem!

Regards,
Wei.

On Fri, Jan 27, 2017 at 12:00 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> Hi Wei,
>
> Found the bug and Matthias helped me fixing it.
>
> The bug occurs on target that uses the sub reg liveness. The code in this patch to update the liveness does not take into account sub registers.
> I’m checking if I can expose the problem on an in-tree target that uses subreg liveness (AMDGPU, Hexagon), otherwise I’ll have to push the fix without a test case.
>
> I may just push the fix and try to come up with a test case later to have the performance back :).
>
> Thanks for your patience.
>
> Cheers,
> -Quentin
>> On Jan 26, 2017, at 3:22 PM, Wei Mi <wmi at google.com> wrote:
>>
>> Thanks for helping on it! Maybe we can add some instrumentation and
>> dump some logs, then I can manually check anything wrong with the live
>> range update after each removePartialRedundancy transformation. Do you
>> think it is doable? If it is ok, I will prepare you the
>> instrumentation patch.
>>
>> Thanks,
>> Wei.
>>
>> On Thu, Jan 26, 2017 at 2:25 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>> Thanks Wei.
>>>
>>> I’ve reduced the test case from 3k lines of IR to ~150 but I am still not sure what is going on, i.e., I still don’t have a reproducer for an in-tree target.
>>>
>>> I’ll keep you posted.
>>>
>>> Q.
>>>> On Jan 24, 2017, at 2:14 PM, Wei Mi <wmi at google.com> wrote:
>>>>
>>>> Quentin, sorry to break the bot. I will revert the patch and wait you
>>>> to get a testcase for the problem.
>>>>
>>>> On Tue, Jan 24, 2017 at 2:07 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>>> Hi Wei,
>>>>>
>>>>> Because of that patch, I am seeing on some of our bots some machine verifier
>>>>> failures related to the update of the live-ranges.
>>>>> I can’t share a test case for now, but the errors I am seeing are of the
>>>>> kind:
>>>>> 1. *** Bad machine code: No live segment at def ***
>>>>> 2. *** Bad machine code: No instruction at VNInfo def index ***
>>>>> 3. *** Bad machine code: Live segment doesn't end at a valid instruction ***
>>>>>
>>>>>
>>>>> For #1, it seems we don’t extend the live-range of the definition of the
>>>>> copy being moved properly.
>>>>> For #2 and #3, I am guessing we are using the slot index of the instruction
>>>>> being removed.
>>>>>
>>>>> I’ll see how long it takes for me to reduce/fix the problem, but I might
>>>>> have to reserve this patch.
>>>>>
>>>>> Cheers,
>>>>> -Quentin
>>>>>
>>>>>
>>>>> On Jan 20, 2017, at 9:38 AM, Wei Mi via llvm-commits
>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>
>>>>> Author: wmi
>>>>> Date: Fri Jan 20 11:38:54 2017
>>>>> New Revision: 292621
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=292621&view=rev
>>>>> Log:
>>>>> [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".
>>>>>
>>>>> The recommit fixes a bug related with live interval update after the partial
>>>>> redundent copy is moved.
>>>>>
>>>>> The original patch is to solve the performance problem described in PR27827.
>>>>> Register coalescing sometimes cannot remove a copy because of interference.
>>>>> But if we can find a reverse copy in one of the predecessor block of the
>>>>> copy,
>>>>> the copy is partially redundent and we may remove the copy partially by
>>>>> moving
>>>>> it to the predecessor block without the reverse copy.
>>>>>
>>>>> Differential Revision: https://reviews.llvm.org/D28585
>>>>>
>>>>> Added:
>>>>>  llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll
>>>>>  llvm/trunk/test/CodeGen/X86/pre-coalesce.ll
>>>>>  llvm/trunk/test/CodeGen/X86/pre-coalesce.mir
>>>>> Modified:
>>>>>  llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp
>>>>>
>>>>> Modified: llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp?rev=292621&r1=292620&r2=292621&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp (original)
>>>>> +++ llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp Fri Jan 20 11:38:54 2017
>>>>> @@ -22,6 +22,7 @@
>>>>> #include "llvm/CodeGen/LiveRangeEdit.h"
>>>>> #include "llvm/CodeGen/MachineFrameInfo.h"
>>>>> #include "llvm/CodeGen/MachineInstr.h"
>>>>> +#include "llvm/CodeGen/MachineInstrBuilder.h"
>>>>> #include "llvm/CodeGen/MachineLoopInfo.h"
>>>>> #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>> #include "llvm/CodeGen/Passes.h"
>>>>> @@ -189,6 +190,9 @@ namespace {
>>>>>   /// This returns true if an interval was modified.
>>>>>   bool removeCopyByCommutingDef(const CoalescerPair &CP,MachineInstr
>>>>> *CopyMI);
>>>>>
>>>>> +    /// We found a copy which can be moved to its less frequent
>>>>> predecessor.
>>>>> +    bool removePartialRedundancy(const CoalescerPair &CP, MachineInstr
>>>>> &CopyMI);
>>>>> +
>>>>>   /// If the source of a copy is defined by a
>>>>>   /// trivial computation, replace the copy by rematerialize the
>>>>> definition.
>>>>>   bool reMaterializeTrivialDef(const CoalescerPair &CP, MachineInstr
>>>>> *CopyMI,
>>>>> @@ -861,6 +865,167 @@ bool RegisterCoalescer::removeCopyByComm
>>>>> return true;
>>>>> }
>>>>>
>>>>> +/// For copy B = A in BB2, if A is defined by A = B in BB0 which is a
>>>>> +/// predecessor of BB2, and if B is not redefined on the way from A = B
>>>>> +/// in BB2 to B = A in BB2, B = A in BB2 is partially redundant if the
>>>>> +/// execution goes through the path from BB0 to BB2. We may move B = A
>>>>> +/// to the predecessor without such reversed copy.
>>>>> +/// So we will transform the program from:
>>>>> +///   BB0:
>>>>> +///      A = B;    BB1:
>>>>> +///       ...         ...
>>>>> +///     /     \      /
>>>>> +///             BB2:
>>>>> +///               ...
>>>>> +///               B = A;
>>>>> +///
>>>>> +/// to:
>>>>> +///
>>>>> +///   BB0:         BB1:
>>>>> +///      A = B;        ...
>>>>> +///       ...          B = A;
>>>>> +///     /     \       /
>>>>> +///             BB2:
>>>>> +///               ...
>>>>> +///
>>>>> +/// A special case is when BB0 and BB2 are the same BB which is the only
>>>>> +/// BB in a loop:
>>>>> +///   BB1:
>>>>> +///        ...
>>>>> +///   BB0/BB2:  ----
>>>>> +///        B = A;   |
>>>>> +///        ...      |
>>>>> +///        A = B;   |
>>>>> +///          |-------
>>>>> +///          |
>>>>> +/// We may hoist B = A from BB0/BB2 to BB1.
>>>>> +///
>>>>> +/// The major preconditions for correctness to remove such partial
>>>>> +/// redundancy include:
>>>>> +/// 1. A in B = A in BB2 is defined by a PHI in BB2, and one operand of
>>>>> +///    the PHI is defined by the reversed copy A = B in BB0.
>>>>> +/// 2. No B is referenced from the start of BB2 to B = A.
>>>>> +/// 3. No B is defined from A = B to the end of BB0.
>>>>> +/// 4. BB1 has only one successor.
>>>>> +///
>>>>> +/// 2 and 4 implicitly ensure B is not live at the end of BB1.
>>>>> +/// 4 guarantees BB2 is hotter than BB1, so we can only move a copy to a
>>>>> +/// colder place, which not only prevent endless loop, but also make sure
>>>>> +/// the movement of copy is beneficial.
>>>>> +bool RegisterCoalescer::removePartialRedundancy(const CoalescerPair &CP,
>>>>> +                                                MachineInstr &CopyMI) {
>>>>> +  assert(!CP.isPhys());
>>>>> +  if (!CopyMI.isFullCopy())
>>>>> +    return false;
>>>>> +
>>>>> +  MachineBasicBlock &MBB = *CopyMI.getParent();
>>>>> +  if (MBB.isEHPad())
>>>>> +    return false;
>>>>> +
>>>>> +  if (MBB.pred_size() != 2)
>>>>> +    return false;
>>>>> +
>>>>> +  LiveInterval &IntA =
>>>>> +      LIS->getInterval(CP.isFlipped() ? CP.getDstReg() : CP.getSrcReg());
>>>>> +  LiveInterval &IntB =
>>>>> +      LIS->getInterval(CP.isFlipped() ? CP.getSrcReg() : CP.getDstReg());
>>>>> +
>>>>> +  // A is defined by PHI at the entry of MBB.
>>>>> +  SlotIndex CopyIdx = LIS->getInstructionIndex(CopyMI).getRegSlot(true);
>>>>> +  VNInfo *AValNo = IntA.getVNInfoAt(CopyIdx);
>>>>> +  assert(AValNo && !AValNo->isUnused() && "COPY source not live");
>>>>> +  if (!AValNo->isPHIDef())
>>>>> +    return false;
>>>>> +
>>>>> +  // No B is referenced before CopyMI in MBB.
>>>>> +  if (IntB.overlaps(LIS->getMBBStartIdx(&MBB), CopyIdx))
>>>>> +    return false;
>>>>> +
>>>>> +  // MBB has two predecessors: one contains A = B so no copy will be
>>>>> inserted
>>>>> +  // for it. The other one will have a copy moved from MBB.
>>>>> +  bool FoundReverseCopy = false;
>>>>> +  MachineBasicBlock *CopyLeftBB = nullptr;
>>>>> +  for (MachineBasicBlock *Pred : MBB.predecessors()) {
>>>>> +    VNInfo *PVal = IntA.getVNInfoBefore(LIS->getMBBEndIdx(Pred));
>>>>> +    MachineInstr *DefMI = LIS->getInstructionFromIndex(PVal->def);
>>>>> +    if (!DefMI || !DefMI->isFullCopy()) {
>>>>> +      CopyLeftBB = Pred;
>>>>> +      continue;
>>>>> +    }
>>>>> +    // Check DefMI is a reverse copy and it is in BB Pred.
>>>>> +    if (DefMI->getOperand(0).getReg() != IntA.reg ||
>>>>> +        DefMI->getOperand(1).getReg() != IntB.reg ||
>>>>> +        DefMI->getParent() != Pred) {
>>>>> +      CopyLeftBB = Pred;
>>>>> +      continue;
>>>>> +    }
>>>>> +    // If there is any other def of B after DefMI and before the end of
>>>>> Pred,
>>>>> +    // we need to keep the copy of B = A at the end of Pred if we remove
>>>>> +    // B = A from MBB.
>>>>> +    bool ValB_Changed = false;
>>>>> +    for (auto VNI : IntB.valnos) {
>>>>> +      if (VNI->isUnused())
>>>>> +        continue;
>>>>> +      if (PVal->def < VNI->def && VNI->def < LIS->getMBBEndIdx(Pred)) {
>>>>> +        ValB_Changed = true;
>>>>> +        break;
>>>>> +      }
>>>>> +    }
>>>>> +    if (ValB_Changed) {
>>>>> +      CopyLeftBB = Pred;
>>>>> +      continue;
>>>>> +    }
>>>>> +    FoundReverseCopy = true;
>>>>> +  }
>>>>> +
>>>>> +  // If no reverse copy is found in predecessors, nothing to do.
>>>>> +  if (!FoundReverseCopy)
>>>>> +    return false;
>>>>> +
>>>>> +  // If CopyLeftBB is nullptr, it means every predecessor of MBB contains
>>>>> +  // reverse copy, CopyMI can be removed trivially if only IntA/IntB is
>>>>> updated.
>>>>> +  // If CopyLeftBB is not nullptr, move CopyMI from MBB to CopyLeftBB and
>>>>> +  // update IntA/IntB.
>>>>> +  //
>>>>> +  // If CopyLeftBB is not nullptr, ensure CopyLeftBB has a single succ so
>>>>> +  // MBB is hotter than CopyLeftBB.
>>>>> +  if (CopyLeftBB && CopyLeftBB->succ_size() > 1)
>>>>> +    return false;
>>>>> +
>>>>> +  // Now ok to move copy.
>>>>> +  if (CopyLeftBB) {
>>>>> +    DEBUG(dbgs() << "\tremovePartialRedundancy: Move the copy to BB#"
>>>>> +                 << CopyLeftBB->getNumber() << '\t' << CopyMI);
>>>>> +
>>>>> +    // Insert new copy to CopyLeftBB.
>>>>> +    auto InsPos = CopyLeftBB->getFirstTerminator();
>>>>> +    MachineInstr *NewCopyMI = BuildMI(*CopyLeftBB, InsPos,
>>>>> CopyMI.getDebugLoc(),
>>>>> +                                      TII->get(TargetOpcode::COPY),
>>>>> IntB.reg)
>>>>> +                                  .addReg(IntA.reg);
>>>>> +    SlotIndex NewCopyIdx =
>>>>> +        LIS->InsertMachineInstrInMaps(*NewCopyMI).getRegSlot();
>>>>> +    VNInfo *VNI = IntB.getNextValue(NewCopyIdx, LIS->getVNInfoAllocator());
>>>>> +    IntB.createDeadDef(VNI);
>>>>> +  } else {
>>>>> +    DEBUG(dbgs() << "\tremovePartialRedundancy: Remove the copy from BB#"
>>>>> +                 << MBB.getNumber() << '\t' << CopyMI);
>>>>> +  }
>>>>> +
>>>>> +  // Remove CopyMI.
>>>>> +  SmallVector<SlotIndex, 8> EndPoints;
>>>>> +  VNInfo *BValNo = IntB.Query(CopyIdx.getRegSlot()).valueOutOrDead();
>>>>> +  LIS->pruneValue(IntB, CopyIdx.getRegSlot(), &EndPoints);
>>>>> +  BValNo->markUnused();
>>>>> +  LIS->RemoveMachineInstrFromMaps(CopyMI);
>>>>> +  CopyMI.eraseFromParent();
>>>>> +
>>>>> +  // Extend IntB to the EndPoints of its original live interval.
>>>>> +  LIS->extendToIndices(IntB, EndPoints);
>>>>> +
>>>>> +  shrinkToUses(&IntA);
>>>>> +  return true;
>>>>> +}
>>>>> +
>>>>> /// Returns true if @p MI defines the full vreg @p Reg, as opposed to just
>>>>> /// defining a subregister.
>>>>> static bool definesFullReg(const MachineInstr &MI, unsigned Reg) {
>>>>> @@ -1486,6 +1651,12 @@ bool RegisterCoalescer::joinCopy(Machine
>>>>>     }
>>>>>   }
>>>>>
>>>>> +    // Try and see if we can partially eliminate the copy by moving the
>>>>> copy to
>>>>> +    // its predecessor.
>>>>> +    if (!CP.isPartial() && !CP.isPhys())
>>>>> +      if (removePartialRedundancy(CP, *CopyMI))
>>>>> +        return true;
>>>>> +
>>>>>   // Otherwise, we are unable to join the intervals.
>>>>>   DEBUG(dbgs() << "\tInterference!\n");
>>>>>   Again = true;  // May be possible to coalesce later.
>>>>>
>>>>> Added: llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll?rev=292621&view=auto
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll (added)
>>>>> +++ llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll Fri Jan 20 11:38:54 2017
>>>>> @@ -0,0 +1,281 @@
>>>>> +; RUN: llc -regalloc=greedy -verify-coalescing
>>>>> -mtriple=x86_64-unknown-linux-gnu < %s
>>>>> +; Check the live range is updated properly after register coalescing.
>>>>> +
>>>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> +
>>>>> + at .str = internal unnamed_addr constant { [17 x i8], [47 x i8] } { [17 x i8]
>>>>> c"0123456789ABCDEF\00", [47 x i8] zeroinitializer }, align 32
>>>>> + at b = common local_unnamed_addr global i32 0, align 4
>>>>> + at a = common local_unnamed_addr global i32* null, align 8
>>>>> + at __sancov_gen_cov = private global [9 x i32] zeroinitializer
>>>>> +
>>>>> +; Function Attrs: nounwind sanitize_address
>>>>> +define void @fn2(i8* %p1) local_unnamed_addr #0 {
>>>>> +entry:
>>>>> +  %0 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 4) to i32*) monotonic, align 4
>>>>> +  %1 = icmp sge i32 0, %0
>>>>> +  br i1 %1, label %2, label %3
>>>>> +
>>>>> +; <label>:2:                                      ; preds = %entry
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 4) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %3
>>>>> +
>>>>> +; <label>:3:                                      ; preds = %entry, %2
>>>>> +  br label %while.cond.outer
>>>>> +
>>>>> +while.cond.outer:                                 ; preds = %75, %3
>>>>> +  %e.0.ph = phi i8* [ %e.058, %75 ], [ undef, %3 ]
>>>>> +  %c.0.ph = phi i32* [ %c.059, %75 ], [ undef, %3 ]
>>>>> +  %p1.addr.0.ph = phi i8* [ %incdec.ptr60, %75 ], [ %p1, %3 ]
>>>>> +  %4 = ptrtoint i8* %p1.addr.0.ph to i64
>>>>> +  %5 = lshr i64 %4, 3
>>>>> +  %6 = add i64 %5, 2147450880
>>>>> +  %7 = inttoptr i64 %6 to i8*
>>>>> +  %8 = load i8, i8* %7
>>>>> +  %9 = icmp ne i8 %8, 0
>>>>> +  br i1 %9, label %10, label %15
>>>>> +
>>>>> +; <label>:10:                                     ; preds =
>>>>> %while.cond.outer
>>>>> +  %11 = and i64 %4, 7
>>>>> +  %12 = trunc i64 %11 to i8
>>>>> +  %13 = icmp sge i8 %12, %8
>>>>> +  br i1 %13, label %14, label %15
>>>>> +
>>>>> +; <label>:14:                                     ; preds = %10
>>>>> +  call void @__asan_report_load1(i64 %4)
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  unreachable
>>>>> +
>>>>> +; <label>:15:                                     ; preds = %10,
>>>>> %while.cond.outer
>>>>> +  %16 = load i8, i8* %p1.addr.0.ph, align 1
>>>>> +  call void @__sanitizer_cov_trace_cmp1(i8 %16, i8 0)
>>>>> +  %cmp57 = icmp eq i8 %16, 0
>>>>> +  br i1 %cmp57, label %while.cond.outer.enoent.loopexit96_crit_edge, label
>>>>> %while.body.preheader
>>>>> +
>>>>> +while.cond.outer.enoent.loopexit96_crit_edge:     ; preds = %15
>>>>> +  %17 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 8) to i32*) monotonic, align 4
>>>>> +  %18 = icmp sge i32 0, %17
>>>>> +  br i1 %18, label %19, label %20
>>>>> +
>>>>> +; <label>:19:                                     ; preds =
>>>>> %while.cond.outer.enoent.loopexit96_crit_edge
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 8) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %20
>>>>> +
>>>>> +; <label>:20:                                     ; preds =
>>>>> %while.cond.outer.enoent.loopexit96_crit_edge, %19
>>>>> +  br label %enoent.loopexit96
>>>>> +
>>>>> +while.body.preheader:                             ; preds = %15
>>>>> +  br label %while.body
>>>>> +
>>>>> +while.body:                                       ; preds = %56,
>>>>> %while.body.preheader
>>>>> +  %21 = phi i8 [ %52, %56 ], [ %16, %while.body.preheader ]
>>>>> +  %p1.addr.0.ph.pn = phi i8* [ %incdec.ptr60, %56 ], [ %p1.addr.0.ph,
>>>>> %while.body.preheader ]
>>>>> +  %c.059 = phi i32* [ %incdec.ptr18, %56 ], [ %c.0.ph,
>>>>> %while.body.preheader ]
>>>>> +  %e.058 = phi i8* [ %incdec.ptr60, %56 ], [ %e.0.ph, %while.body.preheader
>>>>> ]
>>>>> +  %incdec.ptr60 = getelementptr inbounds i8, i8* %p1.addr.0.ph.pn, i64 1
>>>>> +  %conv = sext i8 %21 to i32
>>>>> +  %call = tail call i32 (i8*, i32, ...) bitcast (i32 (...)* @fn3 to i32
>>>>> (i8*, i32, ...)*)(i8* getelementptr inbounds ({ [17 x i8], [47 x i8] }, {
>>>>> [17 x i8], [47 x i8] }* @.str, i32 0, i32 0, i64 0), i32 %conv) #2
>>>>> +  call void @__sanitizer_cov_trace_cmp4(i32 %call, i32 0)
>>>>> +  %tobool = icmp eq i32 %call, 0
>>>>> +  br i1 %tobool, label %if.end5, label %cleanup
>>>>> +
>>>>> +if.end5:                                          ; preds = %while.body
>>>>> +  call void @__sanitizer_cov_trace_cmp1(i8 %21, i8 58)
>>>>> +  %cmp6 = icmp eq i8 %21, 58
>>>>> +  br i1 %cmp6, label %if.end14, label %cleanup.thread40
>>>>> +
>>>>> +if.end14:                                         ; preds = %if.end5
>>>>> +  %22 = load i8, i8* inttoptr (i64 add (i64 lshr (i64 ptrtoint (i32** @a to
>>>>> i64), i64 3), i64 2147450880) to i8*)
>>>>> +  %23 = icmp ne i8 %22, 0
>>>>> +  br i1 %23, label %24, label %25
>>>>> +
>>>>> +; <label>:24:                                     ; preds = %if.end14
>>>>> +  call void @__asan_report_load8(i64 ptrtoint (i32** @a to i64))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  unreachable
>>>>> +
>>>>> +; <label>:25:                                     ; preds = %if.end14
>>>>> +  %26 = load i32*, i32** @a, align 8
>>>>> +  %tobool15 = icmp eq i32* %26, null
>>>>> +  br i1 %tobool15, label %cleanup.thread39, label %cleanup23.loopexit
>>>>> +
>>>>> +cleanup.thread39:                                 ; preds = %25
>>>>> +  %incdec.ptr18 = getelementptr inbounds i32, i32* %c.059, i64 1
>>>>> +  %27 = ptrtoint i32* %c.059 to i64
>>>>> +  %28 = lshr i64 %27, 3
>>>>> +  %29 = add i64 %28, 2147450880
>>>>> +  %30 = inttoptr i64 %29 to i8*
>>>>> +  %31 = load i8, i8* %30
>>>>> +  %32 = icmp ne i8 %31, 0
>>>>> +  br i1 %32, label %33, label %39
>>>>> +
>>>>> +; <label>:33:                                     ; preds =
>>>>> %cleanup.thread39
>>>>> +  %34 = and i64 %27, 7
>>>>> +  %35 = add i64 %34, 3
>>>>> +  %36 = trunc i64 %35 to i8
>>>>> +  %37 = icmp sge i8 %36, %31
>>>>> +  br i1 %37, label %38, label %39
>>>>> +
>>>>> +; <label>:38:                                     ; preds = %33
>>>>> +  call void @__asan_report_store4(i64 %27)
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  unreachable
>>>>> +
>>>>> +; <label>:39:                                     ; preds = %33,
>>>>> %cleanup.thread39
>>>>> +  store i32 0, i32* %c.059, align 4
>>>>> +  %40 = ptrtoint i8* %incdec.ptr60 to i64
>>>>> +  %41 = lshr i64 %40, 3
>>>>> +  %42 = add i64 %41, 2147450880
>>>>> +  %43 = inttoptr i64 %42 to i8*
>>>>> +  %44 = load i8, i8* %43
>>>>> +  %45 = icmp ne i8 %44, 0
>>>>> +  br i1 %45, label %46, label %51
>>>>> +
>>>>> +; <label>:46:                                     ; preds = %39
>>>>> +  %47 = and i64 %40, 7
>>>>> +  %48 = trunc i64 %47 to i8
>>>>> +  %49 = icmp sge i8 %48, %44
>>>>> +  br i1 %49, label %50, label %51
>>>>> +
>>>>> +; <label>:50:                                     ; preds = %46
>>>>> +  call void @__asan_report_load1(i64 %40)
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  unreachable
>>>>> +
>>>>> +; <label>:51:                                     ; preds = %46, %39
>>>>> +  %52 = load i8, i8* %incdec.ptr60, align 1
>>>>> +  call void @__sanitizer_cov_trace_cmp1(i8 %52, i8 0)
>>>>> +  %cmp = icmp eq i8 %52, 0
>>>>> +  br i1 %cmp, label %enoent.loopexit, label
>>>>> %cleanup.thread39.while.body_crit_edge
>>>>> +
>>>>> +cleanup.thread39.while.body_crit_edge:            ; preds = %51
>>>>> +  %53 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 12) to i32*) monotonic, align 4
>>>>> +  %54 = icmp sge i32 0, %53
>>>>> +  br i1 %54, label %55, label %56
>>>>> +
>>>>> +; <label>:55:                                     ; preds =
>>>>> %cleanup.thread39.while.body_crit_edge
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 12) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %56
>>>>> +
>>>>> +; <label>:56:                                     ; preds =
>>>>> %cleanup.thread39.while.body_crit_edge, %55
>>>>> +  br label %while.body
>>>>> +
>>>>> +cleanup.thread40:                                 ; preds = %if.end5
>>>>> +  %57 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 16) to i32*) monotonic, align 4
>>>>> +  %58 = icmp sge i32 0, %57
>>>>> +  br i1 %58, label %59, label %60
>>>>> +
>>>>> +; <label>:59:                                     ; preds =
>>>>> %cleanup.thread40
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 16) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %60
>>>>> +
>>>>> +; <label>:60:                                     ; preds =
>>>>> %cleanup.thread40, %59
>>>>> +  %call20 = tail call i32 (i8*, ...) bitcast (i32 (...)* @fn4 to i32 (i8*,
>>>>> ...)*)(i8* %e.058) #2
>>>>> +  br label %enoent
>>>>> +
>>>>> +cleanup:                                          ; preds = %while.body
>>>>> +  %61 = load i8, i8* inttoptr (i64 add (i64 lshr (i64 ptrtoint (i32* @b to
>>>>> i64), i64 3), i64 2147450880) to i8*)
>>>>> +  %62 = icmp ne i8 %61, 0
>>>>> +  br i1 %62, label %63, label %66
>>>>> +
>>>>> +; <label>:63:                                     ; preds = %cleanup
>>>>> +  %64 = icmp sge i8 trunc (i64 add (i64 and (i64 ptrtoint (i32* @b to i64),
>>>>> i64 7), i64 3) to i8), %61
>>>>> +  br i1 %64, label %65, label %66
>>>>> +
>>>>> +; <label>:65:                                     ; preds = %63
>>>>> +  call void @__asan_report_load4(i64 ptrtoint (i32* @b to i64))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  unreachable
>>>>> +
>>>>> +; <label>:66:                                     ; preds = %63, %cleanup
>>>>> +  %67 = load i32, i32* @b, align 4
>>>>> +  call void @__sanitizer_cov_trace_cmp4(i32 %67, i32 0)
>>>>> +  %tobool3 = icmp eq i32 %67, 0
>>>>> +  br i1 %tobool3, label %cleanup.while.cond.outer_crit_edge, label
>>>>> %cleanup.enoent.loopexit96_crit_edge
>>>>> +
>>>>> +cleanup.enoent.loopexit96_crit_edge:              ; preds = %66
>>>>> +  %68 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 20) to i32*) monotonic, align 4
>>>>> +  %69 = icmp sge i32 0, %68
>>>>> +  br i1 %69, label %70, label %71
>>>>> +
>>>>> +; <label>:70:                                     ; preds =
>>>>> %cleanup.enoent.loopexit96_crit_edge
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 20) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %71
>>>>> +
>>>>> +; <label>:71:                                     ; preds =
>>>>> %cleanup.enoent.loopexit96_crit_edge, %70
>>>>> +  br label %enoent.loopexit96
>>>>> +
>>>>> +cleanup.while.cond.outer_crit_edge:               ; preds = %66
>>>>> +  %72 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 24) to i32*) monotonic, align 4
>>>>> +  %73 = icmp sge i32 0, %72
>>>>> +  br i1 %73, label %74, label %75
>>>>> +
>>>>> +; <label>:74:                                     ; preds =
>>>>> %cleanup.while.cond.outer_crit_edge
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 24) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %75
>>>>> +
>>>>> +; <label>:75:                                     ; preds =
>>>>> %cleanup.while.cond.outer_crit_edge, %74
>>>>> +  br label %while.cond.outer
>>>>> +
>>>>> +enoent.loopexit:                                  ; preds = %51
>>>>> +  %76 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 28) to i32*) monotonic, align 4
>>>>> +  %77 = icmp sge i32 0, %76
>>>>> +  br i1 %77, label %78, label %79
>>>>> +
>>>>> +; <label>:78:                                     ; preds =
>>>>> %enoent.loopexit
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 28) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %79
>>>>> +
>>>>> +; <label>:79:                                     ; preds =
>>>>> %enoent.loopexit, %78
>>>>> +  br label %enoent
>>>>> +
>>>>> +enoent.loopexit96:                                ; preds = %71, %20
>>>>> +  br label %enoent
>>>>> +
>>>>> +enoent:                                           ; preds =
>>>>> %enoent.loopexit96, %79, %60
>>>>> +  %call22 = tail call i32* (...) @fn1() #2
>>>>> +  br label %cleanup23
>>>>> +
>>>>> +cleanup23.loopexit:                               ; preds = %25
>>>>> +  %80 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]*
>>>>> @__sancov_gen_cov to i64), i64 32) to i32*) monotonic, align 4
>>>>> +  %81 = icmp sge i32 0, %80
>>>>> +  br i1 %81, label %82, label %83
>>>>> +
>>>>> +; <label>:82:                                     ; preds =
>>>>> %cleanup23.loopexit
>>>>> +  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x
>>>>> i32]* @__sancov_gen_cov to i64), i64 32) to i32*))
>>>>> +  call void asm sideeffect "", ""()
>>>>> +  br label %83
>>>>> +
>>>>> +; <label>:83:                                     ; preds =
>>>>> %cleanup23.loopexit, %82
>>>>> +  br label %cleanup23
>>>>> +
>>>>> +cleanup23:                                        ; preds = %83, %enoent
>>>>> +  ret void
>>>>> +}
>>>>> +
>>>>> +declare i32 @fn3(...) local_unnamed_addr #1
>>>>> +
>>>>> +declare i32 @fn4(...) local_unnamed_addr #1
>>>>> +
>>>>> +declare i32* @fn1(...) local_unnamed_addr #1
>>>>> +
>>>>> +declare void @__sanitizer_cov(i32*)
>>>>> +
>>>>> +declare void @__sanitizer_cov_trace_cmp1(i8, i8)
>>>>> +
>>>>> +declare void @__sanitizer_cov_trace_cmp4(i32, i32)
>>>>> +
>>>>> +declare void @__asan_report_load1(i64)
>>>>> +
>>>>> +declare void @__asan_report_load4(i64)
>>>>> +
>>>>> +declare void @__asan_report_load8(i64)
>>>>> +
>>>>> +declare void @__asan_report_store4(i64)
>>>>> +
>>>>>
>>>>> Added: llvm/trunk/test/CodeGen/X86/pre-coalesce.ll
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.ll?rev=292621&view=auto
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/pre-coalesce.ll (added)
>>>>> +++ llvm/trunk/test/CodeGen/X86/pre-coalesce.ll Fri Jan 20 11:38:54 2017
>>>>> @@ -0,0 +1,51 @@
>>>>> +; RUN: llc -regalloc=greedy -mtriple=x86_64-unknown-linux-gnu  < %s -o - |
>>>>> FileCheck %s
>>>>> +;
>>>>> +; The test is to check no redundent mov as follows will be generated in
>>>>> %while.body loop.
>>>>> +;  .LBB0_2:
>>>>> +;    movsbl %cl, %ecx
>>>>> +;    movl %edx, %eax   ==> This movl can be promoted outside of loop.
>>>>> +;    shll $5, %eax
>>>>> +;    ...
>>>>> +;    movl %eax, %edx
>>>>> +;    jne     .LBB0_2
>>>>> +;
>>>>> +; CHECK-LABEL: foo:
>>>>> +; CHECK: [[L0:.LBB0_[0-9]+]]: # %while.body
>>>>> +; CHECK: movl %[[REGA:.*]], %[[REGB:.*]]
>>>>> +; CHECK-NOT: movl %[[REGB]], %[[REGA]]
>>>>> +; CHECK: jne [[L0]]
>>>>> +;
>>>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> +
>>>>> + at b = common local_unnamed_addr global i8* null, align 8
>>>>> + at a = common local_unnamed_addr global i32 0, align 4
>>>>> +
>>>>> +define i32 @foo() local_unnamed_addr {
>>>>> +entry:
>>>>> +  %t0 = load i8*, i8** @b, align 8
>>>>> +  %t1 = load i8, i8* %t0, align 1
>>>>> +  %cmp4 = icmp eq i8 %t1, 0
>>>>> +  %t2 = load i32, i32* @a, align 4
>>>>> +  br i1 %cmp4, label %while.end, label %while.body.preheader
>>>>> +
>>>>> +while.body.preheader:                             ; preds = %entry
>>>>> +  br label %while.body
>>>>> +
>>>>> +while.body:                                       ; preds =
>>>>> %while.body.preheader, %while.body
>>>>> +  %t3 = phi i32 [ %add3, %while.body ], [ %t2, %while.body.preheader ]
>>>>> +  %t4 = phi i8 [ %t5, %while.body ], [ %t1, %while.body.preheader ]
>>>>> +  %conv = sext i8 %t4 to i32
>>>>> +  %add = mul i32 %t3, 33
>>>>> +  %add3 = add nsw i32 %add, %conv
>>>>> +  store i32 %add3, i32* @a, align 4
>>>>> +  %t5 = load i8, i8* %t0, align 1
>>>>> +  %cmp = icmp eq i8 %t5, 0
>>>>> +  br i1 %cmp, label %while.end.loopexit, label %while.body
>>>>> +
>>>>> +while.end.loopexit:                               ; preds = %while.body
>>>>> +  br label %while.end
>>>>> +
>>>>> +while.end:                                        ; preds =
>>>>> %while.end.loopexit, %entry
>>>>> +  %.lcssa = phi i32 [ %t2, %entry ], [ %add3, %while.end.loopexit ]
>>>>> +  ret i32 %.lcssa
>>>>> +}
>>>>>
>>>>> Added: llvm/trunk/test/CodeGen/X86/pre-coalesce.mir
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.mir?rev=292621&view=auto
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/pre-coalesce.mir (added)
>>>>> +++ llvm/trunk/test/CodeGen/X86/pre-coalesce.mir Fri Jan 20 11:38:54 2017
>>>>> @@ -0,0 +1,122 @@
>>>>> +# RUN: llc -mtriple=x86_64-unknown-linux-gnu -run-pass
>>>>> simple-register-coalescing -o - %s | FileCheck %s
>>>>> +# Check there is no partial redundent copy left in the loop after register
>>>>> coalescing.
>>>>> +--- |
>>>>> +  ; ModuleID = '<stdin>'
>>>>> +  source_filename = "<stdin>"
>>>>> +  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> +  target triple = "x86_64-unknown-linux-gnu"
>>>>> +
>>>>> +  @b = common local_unnamed_addr global i8* null, align 8
>>>>> +  @a = common local_unnamed_addr global i32 0, align 4
>>>>> +
>>>>> +  define i32 @foo() local_unnamed_addr {
>>>>> +  entry:
>>>>> +    %t0 = load i8*, i8** @b, align 8
>>>>> +    %t1 = load i8, i8* %t0, align 1
>>>>> +    %cmp4 = icmp eq i8 %t1, 0
>>>>> +    %t2 = load i32, i32* @a, align 4
>>>>> +    br i1 %cmp4, label %while.end, label %while.body.preheader
>>>>> +
>>>>> +  while.body.preheader:                             ; preds = %entry
>>>>> +    br label %while.body
>>>>> +
>>>>> +  while.body:                                       ; preds = %while.body,
>>>>> %while.body.preheader
>>>>> +    %t3 = phi i32 [ %add3, %while.body ], [ %t2, %while.body.preheader ]
>>>>> +    %t4 = phi i8 [ %t5, %while.body ], [ %t1, %while.body.preheader ]
>>>>> +    %conv = sext i8 %t4 to i32
>>>>> +    %add = mul i32 %t3, 33
>>>>> +    %add3 = add nsw i32 %add, %conv
>>>>> +    store i32 %add3, i32* @a, align 4
>>>>> +    %t5 = load i8, i8* %t0, align 1
>>>>> +    %cmp = icmp eq i8 %t5, 0
>>>>> +    br i1 %cmp, label %while.end, label %while.body
>>>>> +
>>>>> +  while.end:                                        ; preds = %while.body,
>>>>> %entry
>>>>> +    %.lcssa = phi i32 [ %t2, %entry ], [ %add3, %while.body ]
>>>>> +    ret i32 %.lcssa
>>>>> +  }
>>>>> +
>>>>> +...
>>>>> +---
>>>>> +# Check A = B and B = A copies will not exist in the loop at the same time.
>>>>> +# CHECK: name: foo
>>>>> +# CHECK: [[L1:bb.3.while.body]]:
>>>>> +# CHECK: %[[REGA:.*]] = COPY %[[REGB:.*]]
>>>>> +# CHECK-NOT: %[[REGB]] = COPY %[[REGA]]
>>>>> +# CHECK: JNE_1 %[[L1]]
>>>>> +
>>>>> +name:            foo
>>>>> +alignment:       4
>>>>> +exposesReturnsTwice: false
>>>>> +legalized:       false
>>>>> +regBankSelected: false
>>>>> +selected:        false
>>>>> +tracksRegLiveness: true
>>>>> +registers:
>>>>> +  - { id: 0, class: gr64 }
>>>>> +  - { id: 1, class: gr8 }
>>>>> +  - { id: 2, class: gr32 }
>>>>> +  - { id: 3, class: gr32 }
>>>>> +  - { id: 4, class: gr8 }
>>>>> +  - { id: 5, class: gr32 }
>>>>> +  - { id: 6, class: gr8 }
>>>>> +  - { id: 7, class: gr32 }
>>>>> +  - { id: 8, class: gr32 }
>>>>> +  - { id: 9, class: gr32 }
>>>>> +  - { id: 10, class: gr32 }
>>>>> +  - { id: 11, class: gr32 }
>>>>> +  - { id: 12, class: gr8 }
>>>>> +  - { id: 13, class: gr32 }
>>>>> +frameInfo:
>>>>> +  isFrameAddressTaken: false
>>>>> +  isReturnAddressTaken: false
>>>>> +  hasStackMap:     false
>>>>> +  hasPatchPoint:   false
>>>>> +  stackSize:       0
>>>>> +  offsetAdjustment: 0
>>>>> +  maxAlignment:    0
>>>>> +  adjustsStack:    false
>>>>> +  hasCalls:        false
>>>>> +  maxCallFrameSize: 0
>>>>> +  hasOpaqueSPAdjustment: false
>>>>> +  hasVAStart:      false
>>>>> +  hasMustTailInVarArgFunc: false
>>>>> +body:             |
>>>>> +  bb.0.entry:
>>>>> +    successors: %bb.4(0x30000000), %bb.1.while.body.preheader(0x50000000)
>>>>> +
>>>>> +    %0 = MOV64rm %rip, 1, _, @b, _ :: (dereferenceable load 8 from @b)
>>>>> +    %12 = MOV8rm %0, 1, _, 0, _ :: (load 1 from %ir.t0)
>>>>> +    TEST8rr %12, %12, implicit-def %eflags
>>>>> +    %11 = MOV32rm %rip, 1, _, @a, _ :: (dereferenceable load 4 from @a)
>>>>> +    JNE_1 %bb.1.while.body.preheader, implicit killed %eflags
>>>>> +
>>>>> +  bb.4:
>>>>> +    successors: %bb.3.while.end(0x80000000)
>>>>> +
>>>>> +    %10 = COPY %11
>>>>> +    JMP_1 %bb.3.while.end
>>>>> +
>>>>> +  bb.1.while.body.preheader:
>>>>> +    successors: %bb.2.while.body(0x80000000)
>>>>> +
>>>>> +  bb.2.while.body:
>>>>> +    successors: %bb.3.while.end(0x04000000), %bb.2.while.body(0x7c000000)
>>>>> +
>>>>> +    %8 = MOVSX32rr8 %12
>>>>> +    %10 = COPY %11
>>>>> +    %10 = SHL32ri %10, 5, implicit-def dead %eflags
>>>>> +    %10 = ADD32rr %10, %11, implicit-def dead %eflags
>>>>> +    %10 = ADD32rr %10, %8, implicit-def dead %eflags
>>>>> +    MOV32mr %rip, 1, _, @a, _, %10 :: (store 4 into @a)
>>>>> +    %12 = MOV8rm %0, 1, _, 0, _ :: (load 1 from %ir.t0)
>>>>> +    TEST8rr %12, %12, implicit-def %eflags
>>>>> +    %11 = COPY %10
>>>>> +    JNE_1 %bb.2.while.body, implicit killed %eflags
>>>>> +    JMP_1 %bb.3.while.end
>>>>> +
>>>>> +  bb.3.while.end:
>>>>> +    %eax = COPY %10
>>>>> +    RET 0, killed %eax
>>>>> +
>>>>> +...
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>
>