[llvm-commits] Patch for review: Speeding up ScheduleDAG computations

Fri Feb 29 10:25:05 PST 2008

BTW, please commit the depth / height patch before worrying about  
priorityqueue issue.

Thanks!

Evan

On Feb 29, 2008, at 9:27 AM, Roman Levenstein wrote:

> Hi Evan, Hi Dan,
>
> Thanks a lot for your review of my patch. I'm working on incorporating
> the feedback. See my comments below.
>
> 2008/2/28, Evan Cheng <evan.cheng at apple.com>:
>
>>  Latencies[SU->NodeNum] = 1;
>> This should be 0, not 1 since the original code is:
>>     if (SU->Succs.empty())
>>       WorkList.push_back(std::make_pair(SU, 0U));
>
> Initially, I have though exactly like you and have written
> Latencies[SU->NodeNum] = 0;
>
> But then I let both old and new algorithm to run on the same data
> side-by-side and compared the results. It turned out that if you set
> this value to 0, then ALL latencies computed by the new algorithm are
> less by 1 then those computed by the old algorithm. Therefore I
> changed this assignment to 1, which fixed the problem, even though I
> do not quite understand why it does not produce the correct result
> with 0,
>
> Do you have any ideas?
>
>> For testing, it would be good to run the original code and then the
>> new code and then verify they produce the same numbers.
>
> This is what I did while developing the algorithm. The numbers were
> exactly the same on my use-cases.
>
>> A bug in depth / height calculation is not likely to cause  
>> correctness failures
>> but something much more subtile.
> Indeed ;-)
>
> Now,  a new problem I have descovered:
> While improving my patch, I also looked at the LatencyPriorityQueue
> implementation and  the RegReductionPriorityQueue implementation. Both
> are using priority queues and have an extremely inefficient remove()
> method, that consumes 90% of the compilation time on very big BBs. So,
> I changed the data structure from the priority queue to the ordered
> std::set. Obviously, the max element is basically the last element of
> the set. This change produces a 3x speedup of compilation time on huge
> basic blocks. But there is a problem (or a bug!):
>
> the comparison operator
> td_ls_rr_sort::operator()(const SUnit* left, const SUnit* right) (see
> ScheduleDAGRRList.cpp)
> seems to be implemented incorrectly (see the code below).
>
> bool td_ls_rr_sort::operator()(const SUnit *left, const SUnit  
> *right) const {
> unsigned LPriority = SPQ->getNodePriority(left);
> unsigned RPriority = SPQ->getNodePriority(right);
>  bool LIsTarget = left->Node && left->Node->isTargetOpcode();
>  bool RIsTarget = right->Node && right->Node->isTargetOpcode();
>  bool LIsFloater = LIsTarget && left->NumPreds == 0;
>  bool RIsFloater = RIsTarget && right->NumPreds == 0;
>  unsigned LBonus = (SumOfUnscheduledPredsOfSuccs(left) == 1) ? 2 : 0;
>  unsigned RBonus = (SumOfUnscheduledPredsOfSuccs(right) == 1) ? 2 : 0;
>
>  if (left->NumSuccs == 0 && right->NumSuccs != 0)
>    return false;
>  else if (left->NumSuccs != 0 && right->NumSuccs == 0)
>    return true;
>
>  // Special tie breaker: if two nodes share a operand, the one that  
> use it
>  // as a def&use operand is preferred.
>  if (LIsTarget && RIsTarget) {
>    if (left->isTwoAddress && !right->isTwoAddress) {
>      SDNode *DUNode = left->Node->getOperand(0).Val;
>      if (DUNode->isOperand(right->Node))
>        RBonus += 2;
>    }
>    if (!left->isTwoAddress && right->isTwoAddress) {
>      SDNode *DUNode = right->Node->getOperand(0).Val;
>      if (DUNode->isOperand(left->Node))
>        LBonus += 2;
>    }
>  }
>  if (LIsFloater)
>    LBonus -= 2;
>  if (RIsFloater)
>    RBonus -= 2;
>  if (left->NumSuccs == 1)
>    LBonus += 2;
>  if (right->NumSuccs == 1)
>    RBonus += 2;
>
>  if (LPriority+LBonus < RPriority+RBonus)
>    return true;
>  else if (LPriority == RPriority) {
>    if (left->Depth < right->Depth)
>      return true;
>    else if (left->Depth == right->Depth) {
>      if (left->NumSuccsLeft > right->NumSuccsLeft)
>        return true;
>      else if (left->NumSuccsLeft == right->NumSuccsLeft) {
>        if (left->CycleBound > right->CycleBound)
>          return true;
>      }
>    }
>  }
>  return false;
> }
>
> Namely, I discovered by debugging two specific values L and R, where
> L!=R, so that this operator returns FALSE (by return statement at the
> very end of the method) for both pair (L,R) and (R,L). That means that
> L >= R and R >=L and L!=R at the same time, which cannot be.  This
> incorrect comparison operator breaks all set operations based on the
> ordering of elements, e.g. find() and insert(). Actually, I'm
> wondering why it didn't break the priority queue operations before.
>
> The same operator for  bu_ls_rr_sort has a very long disclaimer
> related to a similar issue:
>  // There used to be a special tie breaker here that looked for
>  // two-address instructions and preferred the instruction with a
>  // def&use operand.  The special case triggered diagnostics when
>  // _GLIBCXX_DEBUG was enabled because it broke the strict weak
>  // ordering that priority_queue requires. It didn't help much anyway
>  // because AddPseudoTwoAddrDeps already covers many of the cases
>  // where it would have applied.  In addition, it's counter-intuitive
>  // that a tie breaker would be the first thing attempted.  There's a
>  // "real" tie breaker below that is the operation of last resort.
>  // The fact that the "special tie breaker" would trigger when there
>  // wasn't otherwise a tie is what broke the strict weak ordering
>  // constraint.
>
> So, there seems to be a problem. But I don't have a solution for it. I
> need you help to understand how to fix it.
>
> Thanks,
>  Roman
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits