<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Wei,<div class=""><br class=""></div><div class="">Because of that patch, I am seeing on some of our bots some machine verifier failures related to the update of the live-ranges.</div><div class="">I can’t share a test case for now, but the errors I am seeing are of the kind:</div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1. *** Bad machine code: No live segment at def ***</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">2. *** Bad machine code: No instruction at VNInfo def index ***</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">3. </span>*** Bad machine code: Live segment doesn't end at a valid instruction ***</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><br class=""></div><div><br class=""></div><div>For #1, it seems we don’t extend the live-range of the definition of the copy being moved properly.</div><div>For #2 and #3, I am guessing we are using the slot index of the instruction being removed.</div><div><br class=""></div><div>I’ll see how long it takes for me to reduce/fix the problem, but I might have to reserve this patch.</div><div><br class=""></div><div>Cheers,</div><div>-Quentin</div><div> <br class=""><blockquote type="cite" class=""><div class="">On Jan 20, 2017, at 9:38 AM, Wei Mi via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Author: wmi<br class="">Date: Fri Jan 20 11:38:54 2017<br class="">New Revision: 292621<br class=""><br class="">URL: <a href="http://llvm.org/viewvc/llvm-project?rev=292621&view=rev" class="">http://llvm.org/viewvc/llvm-project?rev=292621&view=rev</a><br class="">Log:<br class="">[RegisterCoalescing] Recommit the patch "Remove partial redundent copy".<br class=""><br class="">The recommit fixes a bug related with live interval update after the partial<br class="">redundent copy is moved.<br class=""><br class="">The original patch is to solve the performance problem described in PR27827.<br class="">Register coalescing sometimes cannot remove a copy because of interference.<br class="">But if we can find a reverse copy in one of the predecessor block of the copy,<br class="">the copy is partially redundent and we may remove the copy partially by moving<br class="">it to the predecessor block without the reverse copy.<br class=""><br class="">Differential Revision: <a href="https://reviews.llvm.org/D28585" class="">https://reviews.llvm.org/D28585</a><br class=""><br class="">Added:<br class="">    llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll<br class="">    llvm/trunk/test/CodeGen/X86/pre-coalesce.ll<br class="">    llvm/trunk/test/CodeGen/X86/pre-coalesce.mir<br class="">Modified:<br class="">    llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp?rev=292621&r1=292620&r2=292621&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp?rev=292621&r1=292620&r2=292621&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp Fri Jan 20 11:38:54 2017<br class="">@@ -22,6 +22,7 @@<br class=""> #include "llvm/CodeGen/LiveRangeEdit.h"<br class=""> #include "llvm/CodeGen/MachineFrameInfo.h"<br class=""> #include "llvm/CodeGen/MachineInstr.h"<br class="">+#include "llvm/CodeGen/MachineInstrBuilder.h"<br class=""> #include "llvm/CodeGen/MachineLoopInfo.h"<br class=""> #include "llvm/CodeGen/MachineRegisterInfo.h"<br class=""> #include "llvm/CodeGen/Passes.h"<br class="">@@ -189,6 +190,9 @@ namespace {<br class="">     /// This returns true if an interval was modified.<br class="">     bool removeCopyByCommutingDef(const CoalescerPair &CP,MachineInstr *CopyMI);<br class=""><br class="">+    /// We found a copy which can be moved to its less frequent predecessor.<br class="">+    bool removePartialRedundancy(const CoalescerPair &CP, MachineInstr &CopyMI);<br class="">+<br class="">     /// If the source of a copy is defined by a<br class="">     /// trivial computation, replace the copy by rematerialize the definition.<br class="">     bool reMaterializeTrivialDef(const CoalescerPair &CP, MachineInstr *CopyMI,<br class="">@@ -861,6 +865,167 @@ bool RegisterCoalescer::removeCopyByComm<br class="">   return true;<br class=""> }<br class=""><br class="">+/// For copy B = A in BB2, if A is defined by A = B in BB0 which is a<br class="">+/// predecessor of BB2, and if B is not redefined on the way from A = B<br class="">+/// in BB2 to B = A in BB2, B = A in BB2 is partially redundant if the<br class="">+/// execution goes through the path from BB0 to BB2. We may move B = A<br class="">+/// to the predecessor without such reversed copy.<br class="">+/// So we will transform the program from:<br class="">+///   BB0:<br class="">+///      A = B;    BB1:<br class="">+///       ...         ...<br class="">+///     /     \      /<br class="">+///             BB2:<br class="">+///               ...<br class="">+///               B = A;<br class="">+///<br class="">+/// to:<br class="">+///<br class="">+///   BB0:         BB1:<br class="">+///      A = B;        ...<br class="">+///       ...          B = A;<br class="">+///     /     \       /<br class="">+///             BB2:<br class="">+///               ...<br class="">+///<br class="">+/// A special case is when BB0 and BB2 are the same BB which is the only<br class="">+/// BB in a loop:<br class="">+///   BB1:<br class="">+///        ...<br class="">+///   BB0/BB2:  ----<br class="">+///        B = A;   |<br class="">+///        ...      |<br class="">+///        A = B;   |<br class="">+///          |-------<br class="">+///          |<br class="">+/// We may hoist B = A from BB0/BB2 to BB1.<br class="">+///<br class="">+/// The major preconditions for correctness to remove such partial<br class="">+/// redundancy include:<br class="">+/// 1. A in B = A in BB2 is defined by a PHI in BB2, and one operand of<br class="">+///    the PHI is defined by the reversed copy A = B in BB0.<br class="">+/// 2. No B is referenced from the start of BB2 to B = A.<br class="">+/// 3. No B is defined from A = B to the end of BB0.<br class="">+/// 4. BB1 has only one successor.<br class="">+///<br class="">+/// 2 and 4 implicitly ensure B is not live at the end of BB1.<br class="">+/// 4 guarantees BB2 is hotter than BB1, so we can only move a copy to a<br class="">+/// colder place, which not only prevent endless loop, but also make sure<br class="">+/// the movement of copy is beneficial.<br class="">+bool RegisterCoalescer::removePartialRedundancy(const CoalescerPair &CP,<br class="">+                                                MachineInstr &CopyMI) {<br class="">+  assert(!CP.isPhys());<br class="">+  if (!CopyMI.isFullCopy())<br class="">+    return false;<br class="">+<br class="">+  MachineBasicBlock &MBB = *CopyMI.getParent();<br class="">+  if (MBB.isEHPad())<br class="">+    return false;<br class="">+<br class="">+  if (MBB.pred_size() != 2)<br class="">+    return false;<br class="">+<br class="">+  LiveInterval &IntA =<br class="">+      LIS->getInterval(CP.isFlipped() ? CP.getDstReg() : CP.getSrcReg());<br class="">+  LiveInterval &IntB =<br class="">+      LIS->getInterval(CP.isFlipped() ? CP.getSrcReg() : CP.getDstReg());<br class="">+<br class="">+  // A is defined by PHI at the entry of MBB.<br class="">+  SlotIndex CopyIdx = LIS->getInstructionIndex(CopyMI).getRegSlot(true);<br class="">+  VNInfo *AValNo = IntA.getVNInfoAt(CopyIdx);<br class="">+  assert(AValNo && !AValNo->isUnused() && "COPY source not live");<br class="">+  if (!AValNo->isPHIDef())<br class="">+    return false;<br class="">+<br class="">+  // No B is referenced before CopyMI in MBB.<br class="">+  if (IntB.overlaps(LIS->getMBBStartIdx(&MBB), CopyIdx))<br class="">+    return false;<br class="">+<br class="">+  // MBB has two predecessors: one contains A = B so no copy will be inserted<br class="">+  // for it. The other one will have a copy moved from MBB.<br class="">+  bool FoundReverseCopy = false;<br class="">+  MachineBasicBlock *CopyLeftBB = nullptr;<br class="">+  for (MachineBasicBlock *Pred : MBB.predecessors()) {<br class="">+    VNInfo *PVal = IntA.getVNInfoBefore(LIS->getMBBEndIdx(Pred));<br class="">+    MachineInstr *DefMI = LIS->getInstructionFromIndex(PVal->def);<br class="">+    if (!DefMI || !DefMI->isFullCopy()) {<br class="">+      CopyLeftBB = Pred;<br class="">+      continue;<br class="">+    }<br class="">+    // Check DefMI is a reverse copy and it is in BB Pred.<br class="">+    if (DefMI->getOperand(0).getReg() != IntA.reg ||<br class="">+        DefMI->getOperand(1).getReg() != IntB.reg ||<br class="">+        DefMI->getParent() != Pred) {<br class="">+      CopyLeftBB = Pred;<br class="">+      continue;<br class="">+    }<br class="">+    // If there is any other def of B after DefMI and before the end of Pred,<br class="">+    // we need to keep the copy of B = A at the end of Pred if we remove<br class="">+    // B = A from MBB.<br class="">+    bool ValB_Changed = false;<br class="">+    for (auto VNI : IntB.valnos) {<br class="">+      if (VNI->isUnused())<br class="">+        continue;<br class="">+      if (PVal->def < VNI->def && VNI->def < LIS->getMBBEndIdx(Pred)) {<br class="">+        ValB_Changed = true;<br class="">+        break;<br class="">+      }<br class="">+    }<br class="">+    if (ValB_Changed) {<br class="">+      CopyLeftBB = Pred;<br class="">+      continue;<br class="">+    }<br class="">+    FoundReverseCopy = true;<br class="">+  }<br class="">+<br class="">+  // If no reverse copy is found in predecessors, nothing to do.<br class="">+  if (!FoundReverseCopy)<br class="">+    return false;<br class="">+<br class="">+  // If CopyLeftBB is nullptr, it means every predecessor of MBB contains<br class="">+  // reverse copy, CopyMI can be removed trivially if only IntA/IntB is updated.<br class="">+  // If CopyLeftBB is not nullptr, move CopyMI from MBB to CopyLeftBB and<br class="">+  // update IntA/IntB.<br class="">+  //<br class="">+  // If CopyLeftBB is not nullptr, ensure CopyLeftBB has a single succ so<br class="">+  // MBB is hotter than CopyLeftBB.<br class="">+  if (CopyLeftBB && CopyLeftBB->succ_size() > 1)<br class="">+    return false;<br class="">+<br class="">+  // Now ok to move copy.<br class="">+  if (CopyLeftBB) {<br class="">+    DEBUG(dbgs() << "\tremovePartialRedundancy: Move the copy to BB#"<br class="">+                 << CopyLeftBB->getNumber() << '\t' << CopyMI);<br class="">+<br class="">+    // Insert new copy to CopyLeftBB.<br class="">+    auto InsPos = CopyLeftBB->getFirstTerminator();<br class="">+    MachineInstr *NewCopyMI = BuildMI(*CopyLeftBB, InsPos, CopyMI.getDebugLoc(),<br class="">+                                      TII->get(TargetOpcode::COPY), IntB.reg)<br class="">+                                  .addReg(IntA.reg);<br class="">+    SlotIndex NewCopyIdx =<br class="">+        LIS->InsertMachineInstrInMaps(*NewCopyMI).getRegSlot();<br class="">+    VNInfo *VNI = IntB.getNextValue(NewCopyIdx, LIS->getVNInfoAllocator());<br class="">+    IntB.createDeadDef(VNI);<br class="">+  } else {<br class="">+    DEBUG(dbgs() << "\tremovePartialRedundancy: Remove the copy from BB#"<br class="">+                 << MBB.getNumber() << '\t' << CopyMI);<br class="">+  }<br class="">+<br class="">+  // Remove CopyMI.<br class="">+  SmallVector<SlotIndex, 8> EndPoints;<br class="">+  VNInfo *BValNo = IntB.Query(CopyIdx.getRegSlot()).valueOutOrDead();<br class="">+  LIS->pruneValue(IntB, CopyIdx.getRegSlot(), &EndPoints);<br class="">+  BValNo->markUnused();<br class="">+  LIS->RemoveMachineInstrFromMaps(CopyMI);<br class="">+  CopyMI.eraseFromParent();<br class="">+<br class="">+  // Extend IntB to the EndPoints of its original live interval.<br class="">+  LIS->extendToIndices(IntB, EndPoints);<br class="">+<br class="">+  shrinkToUses(&IntA);<br class="">+  return true;<br class="">+}<br class="">+<br class=""> /// Returns true if @p MI defines the full vreg @p Reg, as opposed to just<br class=""> /// defining a subregister.<br class=""> static bool definesFullReg(const MachineInstr &MI, unsigned Reg) {<br class="">@@ -1486,6 +1651,12 @@ bool RegisterCoalescer::joinCopy(Machine<br class="">       }<br class="">     }<br class=""><br class="">+    // Try and see if we can partially eliminate the copy by moving the copy to<br class="">+    // its predecessor.<br class="">+    if (!CP.isPartial() && !CP.isPhys())<br class="">+      if (removePartialRedundancy(CP, *CopyMI))<br class="">+        return true;<br class="">+<br class="">     // Otherwise, we are unable to join the intervals.<br class="">     DEBUG(dbgs() << "\tInterference!\n");<br class="">     Again = true;  // May be possible to coalesce later.<br class=""><br class="">Added: llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll?rev=292621&view=auto" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll?rev=292621&view=auto</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll (added)<br class="">+++ llvm/trunk/test/CodeGen/X86/pre-coalesce-2.ll Fri Jan 20 11:38:54 2017<br class="">@@ -0,0 +1,281 @@<br class="">+; RUN: llc -regalloc=greedy -verify-coalescing -mtriple=x86_64-unknown-linux-gnu < %s<br class="">+; Check the live range is updated properly after register coalescing.<br class="">+<br class="">+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br class="">+<br class="">+@.str = internal unnamed_addr constant { [17 x i8], [47 x i8] } { [17 x i8] c"0123456789ABCDEF\00", [47 x i8] zeroinitializer }, align 32<br class="">+@b = common local_unnamed_addr global i32 0, align 4<br class="">+@a = common local_unnamed_addr global i32* null, align 8<br class="">+@__sancov_gen_cov = private global [9 x i32] zeroinitializer<br class="">+<br class="">+; Function Attrs: nounwind sanitize_address<br class="">+define void @fn2(i8* %p1) local_unnamed_addr #0 {<br class="">+entry:<br class="">+  %0 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 4) to i32*) monotonic, align 4<br class="">+  %1 = icmp sge i32 0, %0<br class="">+  br i1 %1, label %2, label %3<br class="">+<br class="">+; <label>:2:                                      ; preds = %entry<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 4) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %3<br class="">+<br class="">+; <label>:3:                                      ; preds = %entry, %2<br class="">+  br label %while.cond.outer<br class="">+<br class="">+while.cond.outer:                                 ; preds = %75, %3<br class="">+  %e.0.ph = phi i8* [ %e.058, %75 ], [ undef, %3 ]<br class="">+  %c.0.ph = phi i32* [ %c.059, %75 ], [ undef, %3 ]<br class="">+  %p1.addr.0.ph = phi i8* [ %incdec.ptr60, %75 ], [ %p1, %3 ]<br class="">+  %4 = ptrtoint i8* %p1.addr.0.ph to i64<br class="">+  %5 = lshr i64 %4, 3<br class="">+  %6 = add i64 %5, 2147450880<br class="">+  %7 = inttoptr i64 %6 to i8*<br class="">+  %8 = load i8, i8* %7<br class="">+  %9 = icmp ne i8 %8, 0<br class="">+  br i1 %9, label %10, label %15<br class="">+<br class="">+; <label>:10:                                     ; preds = %while.cond.outer<br class="">+  %11 = and i64 %4, 7<br class="">+  %12 = trunc i64 %11 to i8<br class="">+  %13 = icmp sge i8 %12, %8<br class="">+  br i1 %13, label %14, label %15<br class="">+<br class="">+; <label>:14:                                     ; preds = %10<br class="">+  call void @__asan_report_load1(i64 %4)<br class="">+  call void asm sideeffect "", ""()<br class="">+  unreachable<br class="">+<br class="">+; <label>:15:                                     ; preds = %10, %while.cond.outer<br class="">+  %16 = load i8, i8* %p1.addr.0.ph, align 1<br class="">+  call void @__sanitizer_cov_trace_cmp1(i8 %16, i8 0)<br class="">+  %cmp57 = icmp eq i8 %16, 0<br class="">+  br i1 %cmp57, label %while.cond.outer.enoent.loopexit96_crit_edge, label %while.body.preheader<br class="">+<br class="">+while.cond.outer.enoent.loopexit96_crit_edge:     ; preds = %15<br class="">+  %17 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 8) to i32*) monotonic, align 4<br class="">+  %18 = icmp sge i32 0, %17<br class="">+  br i1 %18, label %19, label %20<br class="">+<br class="">+; <label>:19:                                     ; preds = %while.cond.outer.enoent.loopexit96_crit_edge<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 8) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %20<br class="">+<br class="">+; <label>:20:                                     ; preds = %while.cond.outer.enoent.loopexit96_crit_edge, %19<br class="">+  br label %enoent.loopexit96<br class="">+<br class="">+while.body.preheader:                             ; preds = %15<br class="">+  br label %while.body<br class="">+<br class="">+while.body:                                       ; preds = %56, %while.body.preheader<br class="">+  %21 = phi i8 [ %52, %56 ], [ %16, %while.body.preheader ]<br class="">+  %p1.addr.0.ph.pn = phi i8* [ %incdec.ptr60, %56 ], [ %p1.addr.0.ph, %while.body.preheader ]<br class="">+  %c.059 = phi i32* [ %incdec.ptr18, %56 ], [ %c.0.ph, %while.body.preheader ]<br class="">+  %e.058 = phi i8* [ %incdec.ptr60, %56 ], [ %e.0.ph, %while.body.preheader ]<br class="">+  %incdec.ptr60 = getelementptr inbounds i8, i8* %p1.addr.0.ph.pn, i64 1<br class="">+  %conv = sext i8 %21 to i32<br class="">+  %call = tail call i32 (i8*, i32, ...) bitcast (i32 (...)* @fn3 to i32 (i8*, i32, ...)*)(i8* getelementptr inbounds ({ [17 x i8], [47 x i8] }, { [17 x i8], [47 x i8] }* @.str, i32 0, i32 0, i64 0), i32 %conv) #2<br class="">+  call void @__sanitizer_cov_trace_cmp4(i32 %call, i32 0)<br class="">+  %tobool = icmp eq i32 %call, 0<br class="">+  br i1 %tobool, label %if.end5, label %cleanup<br class="">+<br class="">+if.end5:                                          ; preds = %while.body<br class="">+  call void @__sanitizer_cov_trace_cmp1(i8 %21, i8 58)<br class="">+  %cmp6 = icmp eq i8 %21, 58<br class="">+  br i1 %cmp6, label %if.end14, label %cleanup.thread40<br class="">+<br class="">+if.end14:                                         ; preds = %if.end5<br class="">+  %22 = load i8, i8* inttoptr (i64 add (i64 lshr (i64 ptrtoint (i32** @a to i64), i64 3), i64 2147450880) to i8*)<br class="">+  %23 = icmp ne i8 %22, 0<br class="">+  br i1 %23, label %24, label %25<br class="">+<br class="">+; <label>:24:                                     ; preds = %if.end14<br class="">+  call void @__asan_report_load8(i64 ptrtoint (i32** @a to i64))<br class="">+  call void asm sideeffect "", ""()<br class="">+  unreachable<br class="">+<br class="">+; <label>:25:                                     ; preds = %if.end14<br class="">+  %26 = load i32*, i32** @a, align 8<br class="">+  %tobool15 = icmp eq i32* %26, null<br class="">+  br i1 %tobool15, label %cleanup.thread39, label %cleanup23.loopexit<br class="">+<br class="">+cleanup.thread39:                                 ; preds = %25<br class="">+  %incdec.ptr18 = getelementptr inbounds i32, i32* %c.059, i64 1<br class="">+  %27 = ptrtoint i32* %c.059 to i64<br class="">+  %28 = lshr i64 %27, 3<br class="">+  %29 = add i64 %28, 2147450880<br class="">+  %30 = inttoptr i64 %29 to i8*<br class="">+  %31 = load i8, i8* %30<br class="">+  %32 = icmp ne i8 %31, 0<br class="">+  br i1 %32, label %33, label %39<br class="">+<br class="">+; <label>:33:                                     ; preds = %cleanup.thread39<br class="">+  %34 = and i64 %27, 7<br class="">+  %35 = add i64 %34, 3<br class="">+  %36 = trunc i64 %35 to i8<br class="">+  %37 = icmp sge i8 %36, %31<br class="">+  br i1 %37, label %38, label %39<br class="">+<br class="">+; <label>:38:                                     ; preds = %33<br class="">+  call void @__asan_report_store4(i64 %27)<br class="">+  call void asm sideeffect "", ""()<br class="">+  unreachable<br class="">+<br class="">+; <label>:39:                                     ; preds = %33, %cleanup.thread39<br class="">+  store i32 0, i32* %c.059, align 4<br class="">+  %40 = ptrtoint i8* %incdec.ptr60 to i64<br class="">+  %41 = lshr i64 %40, 3<br class="">+  %42 = add i64 %41, 2147450880<br class="">+  %43 = inttoptr i64 %42 to i8*<br class="">+  %44 = load i8, i8* %43<br class="">+  %45 = icmp ne i8 %44, 0<br class="">+  br i1 %45, label %46, label %51<br class="">+<br class="">+; <label>:46:                                     ; preds = %39<br class="">+  %47 = and i64 %40, 7<br class="">+  %48 = trunc i64 %47 to i8<br class="">+  %49 = icmp sge i8 %48, %44<br class="">+  br i1 %49, label %50, label %51<br class="">+<br class="">+; <label>:50:                                     ; preds = %46<br class="">+  call void @__asan_report_load1(i64 %40)<br class="">+  call void asm sideeffect "", ""()<br class="">+  unreachable<br class="">+<br class="">+; <label>:51:                                     ; preds = %46, %39<br class="">+  %52 = load i8, i8* %incdec.ptr60, align 1<br class="">+  call void @__sanitizer_cov_trace_cmp1(i8 %52, i8 0)<br class="">+  %cmp = icmp eq i8 %52, 0<br class="">+  br i1 %cmp, label %enoent.loopexit, label %cleanup.thread39.while.body_crit_edge<br class="">+<br class="">+cleanup.thread39.while.body_crit_edge:            ; preds = %51<br class="">+  %53 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 12) to i32*) monotonic, align 4<br class="">+  %54 = icmp sge i32 0, %53<br class="">+  br i1 %54, label %55, label %56<br class="">+<br class="">+; <label>:55:                                     ; preds = %cleanup.thread39.while.body_crit_edge<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 12) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %56<br class="">+<br class="">+; <label>:56:                                     ; preds = %cleanup.thread39.while.body_crit_edge, %55<br class="">+  br label %while.body<br class="">+<br class="">+cleanup.thread40:                                 ; preds = %if.end5<br class="">+  %57 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 16) to i32*) monotonic, align 4<br class="">+  %58 = icmp sge i32 0, %57<br class="">+  br i1 %58, label %59, label %60<br class="">+<br class="">+; <label>:59:                                     ; preds = %cleanup.thread40<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 16) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %60<br class="">+<br class="">+; <label>:60:                                     ; preds = %cleanup.thread40, %59<br class="">+  %call20 = tail call i32 (i8*, ...) bitcast (i32 (...)* @fn4 to i32 (i8*, ...)*)(i8* %e.058) #2<br class="">+  br label %enoent<br class="">+<br class="">+cleanup:                                          ; preds = %while.body<br class="">+  %61 = load i8, i8* inttoptr (i64 add (i64 lshr (i64 ptrtoint (i32* @b to i64), i64 3), i64 2147450880) to i8*)<br class="">+  %62 = icmp ne i8 %61, 0<br class="">+  br i1 %62, label %63, label %66<br class="">+<br class="">+; <label>:63:                                     ; preds = %cleanup<br class="">+  %64 = icmp sge i8 trunc (i64 add (i64 and (i64 ptrtoint (i32* @b to i64), i64 7), i64 3) to i8), %61<br class="">+  br i1 %64, label %65, label %66<br class="">+<br class="">+; <label>:65:                                     ; preds = %63<br class="">+  call void @__asan_report_load4(i64 ptrtoint (i32* @b to i64))<br class="">+  call void asm sideeffect "", ""()<br class="">+  unreachable<br class="">+<br class="">+; <label>:66:                                     ; preds = %63, %cleanup<br class="">+  %67 = load i32, i32* @b, align 4<br class="">+  call void @__sanitizer_cov_trace_cmp4(i32 %67, i32 0)<br class="">+  %tobool3 = icmp eq i32 %67, 0<br class="">+  br i1 %tobool3, label %cleanup.while.cond.outer_crit_edge, label %cleanup.enoent.loopexit96_crit_edge<br class="">+<br class="">+cleanup.enoent.loopexit96_crit_edge:              ; preds = %66<br class="">+  %68 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 20) to i32*) monotonic, align 4<br class="">+  %69 = icmp sge i32 0, %68<br class="">+  br i1 %69, label %70, label %71<br class="">+<br class="">+; <label>:70:                                     ; preds = %cleanup.enoent.loopexit96_crit_edge<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 20) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %71<br class="">+<br class="">+; <label>:71:                                     ; preds = %cleanup.enoent.loopexit96_crit_edge, %70<br class="">+  br label %enoent.loopexit96<br class="">+<br class="">+cleanup.while.cond.outer_crit_edge:               ; preds = %66<br class="">+  %72 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 24) to i32*) monotonic, align 4<br class="">+  %73 = icmp sge i32 0, %72<br class="">+  br i1 %73, label %74, label %75<br class="">+<br class="">+; <label>:74:                                     ; preds = %cleanup.while.cond.outer_crit_edge<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 24) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %75<br class="">+<br class="">+; <label>:75:                                     ; preds = %cleanup.while.cond.outer_crit_edge, %74<br class="">+  br label %while.cond.outer<br class="">+<br class="">+enoent.loopexit:                                  ; preds = %51<br class="">+  %76 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 28) to i32*) monotonic, align 4<br class="">+  %77 = icmp sge i32 0, %76<br class="">+  br i1 %77, label %78, label %79<br class="">+<br class="">+; <label>:78:                                     ; preds = %enoent.loopexit<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 28) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %79<br class="">+<br class="">+; <label>:79:                                     ; preds = %enoent.loopexit, %78<br class="">+  br label %enoent<br class="">+<br class="">+enoent.loopexit96:                                ; preds = %71, %20<br class="">+  br label %enoent<br class="">+<br class="">+enoent:                                           ; preds = %enoent.loopexit96, %79, %60<br class="">+  %call22 = tail call i32* (...) @fn1() #2<br class="">+  br label %cleanup23<br class="">+<br class="">+cleanup23.loopexit:                               ; preds = %25<br class="">+  %80 = load atomic i32, i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 32) to i32*) monotonic, align 4<br class="">+  %81 = icmp sge i32 0, %80<br class="">+  br i1 %81, label %82, label %83<br class="">+<br class="">+; <label>:82:                                     ; preds = %cleanup23.loopexit<br class="">+  call void @__sanitizer_cov(i32* inttoptr (i64 add (i64 ptrtoint ([9 x i32]* @__sancov_gen_cov to i64), i64 32) to i32*))<br class="">+  call void asm sideeffect "", ""()<br class="">+  br label %83<br class="">+<br class="">+; <label>:83:                                     ; preds = %cleanup23.loopexit, %82<br class="">+  br label %cleanup23<br class="">+<br class="">+cleanup23:                                        ; preds = %83, %enoent<br class="">+  ret void<br class="">+}<br class="">+<br class="">+declare i32 @fn3(...) local_unnamed_addr #1<br class="">+<br class="">+declare i32 @fn4(...) local_unnamed_addr #1<br class="">+<br class="">+declare i32* @fn1(...) local_unnamed_addr #1<br class="">+<br class="">+declare void @__sanitizer_cov(i32*)<br class="">+<br class="">+declare void @__sanitizer_cov_trace_cmp1(i8, i8)<br class="">+<br class="">+declare void @__sanitizer_cov_trace_cmp4(i32, i32)<br class="">+<br class="">+declare void @__asan_report_load1(i64)<br class="">+<br class="">+declare void @__asan_report_load4(i64)<br class="">+<br class="">+declare void @__asan_report_load8(i64)<br class="">+<br class="">+declare void @__asan_report_store4(i64)<br class="">+<br class=""><br class="">Added: llvm/trunk/test/CodeGen/X86/pre-coalesce.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.ll?rev=292621&view=auto" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.ll?rev=292621&view=auto</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/pre-coalesce.ll (added)<br class="">+++ llvm/trunk/test/CodeGen/X86/pre-coalesce.ll Fri Jan 20 11:38:54 2017<br class="">@@ -0,0 +1,51 @@<br class="">+; RUN: llc -regalloc=greedy -mtriple=x86_64-unknown-linux-gnu  < %s -o - | FileCheck %s<br class="">+;<br class="">+; The test is to check no redundent mov as follows will be generated in %while.body loop.<br class="">+;  .LBB0_2:<br class="">+;    movsbl<span class="Apple-tab-span" style="white-space:pre">   </span>%cl, %ecx<br class="">+;    movl<span class="Apple-tab-span" style="white-space:pre">     </span>%edx, %eax   ==> This movl can be promoted outside of loop.<br class="">+;    shll<span class="Apple-tab-span" style="white-space:pre">      </span>$5, %eax<br class="">+;    ...<br class="">+;    movl<span class="Apple-tab-span" style="white-space:pre"> </span>%eax, %edx<br class="">+;    jne     .LBB0_2<br class="">+;<br class="">+; CHECK-LABEL: foo:<br class="">+; CHECK: [[L0:.LBB0_[0-9]+]]: # %while.body<br class="">+; CHECK: movl %[[REGA:.*]], %[[REGB:.*]]<br class="">+; CHECK-NOT: movl %[[REGB]], %[[REGA]]<br class="">+; CHECK: jne [[L0]]<br class="">+;<br class="">+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br class="">+<br class="">+@b = common local_unnamed_addr global i8* null, align 8<br class="">+@a = common local_unnamed_addr global i32 0, align 4<br class="">+<br class="">+define i32 @foo() local_unnamed_addr {<br class="">+entry:<br class="">+  %t0 = load i8*, i8** @b, align 8<br class="">+  %t1 = load i8, i8* %t0, align 1<br class="">+  %cmp4 = icmp eq i8 %t1, 0<br class="">+  %t2 = load i32, i32* @a, align 4<br class="">+  br i1 %cmp4, label %while.end, label %while.body.preheader<br class="">+<br class="">+while.body.preheader:                             ; preds = %entry<br class="">+  br label %while.body<br class="">+<br class="">+while.body:                                       ; preds = %while.body.preheader, %while.body<br class="">+  %t3 = phi i32 [ %add3, %while.body ], [ %t2, %while.body.preheader ]<br class="">+  %t4 = phi i8 [ %t5, %while.body ], [ %t1, %while.body.preheader ]<br class="">+  %conv = sext i8 %t4 to i32<br class="">+  %add = mul i32 %t3, 33<br class="">+  %add3 = add nsw i32 %add, %conv<br class="">+  store i32 %add3, i32* @a, align 4<br class="">+  %t5 = load i8, i8* %t0, align 1<br class="">+  %cmp = icmp eq i8 %t5, 0<br class="">+  br i1 %cmp, label %while.end.loopexit, label %while.body<br class="">+<br class="">+while.end.loopexit:                               ; preds = %while.body<br class="">+  br label %while.end<br class="">+<br class="">+while.end:                                        ; preds = %while.end.loopexit, %entry<br class="">+  %.lcssa = phi i32 [ %t2, %entry ], [ %add3, %while.end.loopexit ]<br class="">+  ret i32 %.lcssa<br class="">+}<br class=""><br class="">Added: llvm/trunk/test/CodeGen/X86/pre-coalesce.mir<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.mir?rev=292621&view=auto" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pre-coalesce.mir?rev=292621&view=auto</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/pre-coalesce.mir (added)<br class="">+++ llvm/trunk/test/CodeGen/X86/pre-coalesce.mir Fri Jan 20 11:38:54 2017<br class="">@@ -0,0 +1,122 @@<br class="">+# RUN: llc -mtriple=x86_64-unknown-linux-gnu -run-pass simple-register-coalescing -o - %s | FileCheck %s<br class="">+# Check there is no partial redundent copy left in the loop after register coalescing.<br class="">+--- |<br class="">+  ; ModuleID = '<stdin>'<br class="">+  source_filename = "<stdin>"<br class="">+  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br class="">+  target triple = "x86_64-unknown-linux-gnu"<br class="">+  <br class="">+  @b = common local_unnamed_addr global i8* null, align 8<br class="">+  @a = common local_unnamed_addr global i32 0, align 4<br class="">+  <br class="">+  define i32 @foo() local_unnamed_addr {<br class="">+  entry:<br class="">+    %t0 = load i8*, i8** @b, align 8<br class="">+    %t1 = load i8, i8* %t0, align 1<br class="">+    %cmp4 = icmp eq i8 %t1, 0<br class="">+    %t2 = load i32, i32* @a, align 4<br class="">+    br i1 %cmp4, label %while.end, label %while.body.preheader<br class="">+  <br class="">+  while.body.preheader:                             ; preds = %entry<br class="">+    br label %while.body<br class="">+  <br class="">+  while.body:                                       ; preds = %while.body, %while.body.preheader<br class="">+    %t3 = phi i32 [ %add3, %while.body ], [ %t2, %while.body.preheader ]<br class="">+    %t4 = phi i8 [ %t5, %while.body ], [ %t1, %while.body.preheader ]<br class="">+    %conv = sext i8 %t4 to i32<br class="">+    %add = mul i32 %t3, 33<br class="">+    %add3 = add nsw i32 %add, %conv<br class="">+    store i32 %add3, i32* @a, align 4<br class="">+    %t5 = load i8, i8* %t0, align 1<br class="">+    %cmp = icmp eq i8 %t5, 0<br class="">+    br i1 %cmp, label %while.end, label %while.body<br class="">+  <br class="">+  while.end:                                        ; preds = %while.body, %entry<br class="">+    %.lcssa = phi i32 [ %t2, %entry ], [ %add3, %while.body ]<br class="">+    ret i32 %.lcssa<br class="">+  }<br class="">+<br class="">+...<br class="">+---<br class="">+# Check A = B and B = A copies will not exist in the loop at the same time.<br class="">+# CHECK: name: foo<br class="">+# CHECK: [[L1:bb.3.while.body]]:<br class="">+# CHECK: %[[REGA:.*]] = COPY %[[REGB:.*]]<br class="">+# CHECK-NOT: %[[REGB]] = COPY %[[REGA]]<br class="">+# CHECK: JNE_1 %[[L1]]<br class="">+<br class="">+name:            foo<br class="">+alignment:       4<br class="">+exposesReturnsTwice: false<br class="">+legalized:       false<br class="">+regBankSelected: false<br class="">+selected:        false<br class="">+tracksRegLiveness: true<br class="">+registers:       <br class="">+  - { id: 0, class: gr64 }<br class="">+  - { id: 1, class: gr8 }<br class="">+  - { id: 2, class: gr32 }<br class="">+  - { id: 3, class: gr32 }<br class="">+  - { id: 4, class: gr8 }<br class="">+  - { id: 5, class: gr32 }<br class="">+  - { id: 6, class: gr8 }<br class="">+  - { id: 7, class: gr32 }<br class="">+  - { id: 8, class: gr32 }<br class="">+  - { id: 9, class: gr32 }<br class="">+  - { id: 10, class: gr32 }<br class="">+  - { id: 11, class: gr32 }<br class="">+  - { id: 12, class: gr8 }<br class="">+  - { id: 13, class: gr32 }<br class="">+frameInfo:       <br class="">+  isFrameAddressTaken: false<br class="">+  isReturnAddressTaken: false<br class="">+  hasStackMap:     false<br class="">+  hasPatchPoint:   false<br class="">+  stackSize:       0<br class="">+  offsetAdjustment: 0<br class="">+  maxAlignment:    0<br class="">+  adjustsStack:    false<br class="">+  hasCalls:        false<br class="">+  maxCallFrameSize: 0<br class="">+  hasOpaqueSPAdjustment: false<br class="">+  hasVAStart:      false<br class="">+  hasMustTailInVarArgFunc: false<br class="">+body:             |<br class="">+  bb.0.entry:<br class="">+    successors: %bb.4(0x30000000), %bb.1.while.body.preheader(0x50000000)<br class="">+  <br class="">+    %0 = MOV64rm %rip, 1, _, @b, _ :: (dereferenceable load 8 from @b)<br class="">+    %12 = MOV8rm %0, 1, _, 0, _ :: (load 1 from %ir.t0)<br class="">+    TEST8rr %12, %12, implicit-def %eflags<br class="">+    %11 = MOV32rm %rip, 1, _, @a, _ :: (dereferenceable load 4 from @a)<br class="">+    JNE_1 %bb.1.while.body.preheader, implicit killed %eflags<br class="">+  <br class="">+  bb.4:<br class="">+    successors: %bb.3.while.end(0x80000000)<br class="">+  <br class="">+    %10 = COPY %11<br class="">+    JMP_1 %bb.3.while.end<br class="">+  <br class="">+  bb.1.while.body.preheader:<br class="">+    successors: %bb.2.while.body(0x80000000)<br class="">+<br class="">+  bb.2.while.body:<br class="">+    successors: %bb.3.while.end(0x04000000), %bb.2.while.body(0x7c000000)<br class="">+  <br class="">+    %8 = MOVSX32rr8 %12<br class="">+    %10 = COPY %11<br class="">+    %10 = SHL32ri %10, 5, implicit-def dead %eflags<br class="">+    %10 = ADD32rr %10, %11, implicit-def dead %eflags<br class="">+    %10 = ADD32rr %10, %8, implicit-def dead %eflags<br class="">+    MOV32mr %rip, 1, _, @a, _, %10 :: (store 4 into @a)<br class="">+    %12 = MOV8rm %0, 1, _, 0, _ :: (load 1 from %ir.t0)<br class="">+    TEST8rr %12, %12, implicit-def %eflags<br class="">+    %11 = COPY %10<br class="">+    JNE_1 %bb.2.while.body, implicit killed %eflags<br class="">+    JMP_1 %bb.3.while.end<br class="">+  <br class="">+  bb.3.while.end:<br class="">+    %eax = COPY %10<br class="">+    RET 0, killed %eax<br class="">+<br class="">+...<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits<br class=""></div></div></blockquote></div><br class=""></div></body></html>