[llvm-commits] Patch for review: Speeding up ScheduleDAG computations

Roman Levenstein romix.llvm at googlemail.com
Fri Feb 29 09:27:32 PST 2008


Hi Evan, Hi Dan,

Thanks a lot for your review of my patch. I'm working on incorporating
the feedback. See my comments below.

2008/2/28, Evan Cheng <evan.cheng at apple.com>:

>   Latencies[SU->NodeNum] = 1;
>  This should be 0, not 1 since the original code is:
>      if (SU->Succs.empty())
>        WorkList.push_back(std::make_pair(SU, 0U));

Initially, I have though exactly like you and have written
 Latencies[SU->NodeNum] = 0;

But then I let both old and new algorithm to run on the same data
side-by-side and compared the results. It turned out that if you set
this value to 0, then ALL latencies computed by the new algorithm are
less by 1 then those computed by the old algorithm. Therefore I
changed this assignment to 1, which fixed the problem, even though I
do not quite understand why it does not produce the correct result
with 0,

Do you have any ideas?

>  For testing, it would be good to run the original code and then the
>  new code and then verify they produce the same numbers.

This is what I did while developing the algorithm. The numbers were
exactly the same on my use-cases.

> A bug in depth / height calculation is not likely to cause correctness failures
>  but something much more subtile.
Indeed ;-)

Now,  a new problem I have descovered:
While improving my patch, I also looked at the LatencyPriorityQueue
implementation and  the RegReductionPriorityQueue implementation. Both
are using priority queues and have an extremely inefficient remove()
method, that consumes 90% of the compilation time on very big BBs. So,
I changed the data structure from the priority queue to the ordered
std::set. Obviously, the max element is basically the last element of
the set. This change produces a 3x speedup of compilation time on huge
basic blocks. But there is a problem (or a bug!):

the comparison operator
 td_ls_rr_sort::operator()(const SUnit* left, const SUnit* right) (see
ScheduleDAGRRList.cpp)
seems to be implemented incorrectly (see the code below).

bool td_ls_rr_sort::operator()(const SUnit *left, const SUnit *right) const {
 unsigned LPriority = SPQ->getNodePriority(left);
 unsigned RPriority = SPQ->getNodePriority(right);
  bool LIsTarget = left->Node && left->Node->isTargetOpcode();
  bool RIsTarget = right->Node && right->Node->isTargetOpcode();
  bool LIsFloater = LIsTarget && left->NumPreds == 0;
  bool RIsFloater = RIsTarget && right->NumPreds == 0;
  unsigned LBonus = (SumOfUnscheduledPredsOfSuccs(left) == 1) ? 2 : 0;
  unsigned RBonus = (SumOfUnscheduledPredsOfSuccs(right) == 1) ? 2 : 0;

  if (left->NumSuccs == 0 && right->NumSuccs != 0)
    return false;
  else if (left->NumSuccs != 0 && right->NumSuccs == 0)
    return true;

  // Special tie breaker: if two nodes share a operand, the one that use it
  // as a def&use operand is preferred.
  if (LIsTarget && RIsTarget) {
    if (left->isTwoAddress && !right->isTwoAddress) {
      SDNode *DUNode = left->Node->getOperand(0).Val;
      if (DUNode->isOperand(right->Node))
        RBonus += 2;
    }
    if (!left->isTwoAddress && right->isTwoAddress) {
      SDNode *DUNode = right->Node->getOperand(0).Val;
      if (DUNode->isOperand(left->Node))
        LBonus += 2;
    }
  }
  if (LIsFloater)
    LBonus -= 2;
  if (RIsFloater)
    RBonus -= 2;
  if (left->NumSuccs == 1)
    LBonus += 2;
  if (right->NumSuccs == 1)
    RBonus += 2;

  if (LPriority+LBonus < RPriority+RBonus)
    return true;
  else if (LPriority == RPriority) {
    if (left->Depth < right->Depth)
      return true;
    else if (left->Depth == right->Depth) {
      if (left->NumSuccsLeft > right->NumSuccsLeft)
        return true;
      else if (left->NumSuccsLeft == right->NumSuccsLeft) {
        if (left->CycleBound > right->CycleBound)
          return true;
      }
    }
  }
  return false;
}

Namely, I discovered by debugging two specific values L and R, where
L!=R, so that this operator returns FALSE (by return statement at the
very end of the method) for both pair (L,R) and (R,L). That means that
L >= R and R >=L and L!=R at the same time, which cannot be.  This
incorrect comparison operator breaks all set operations based on the
ordering of elements, e.g. find() and insert(). Actually, I'm
wondering why it didn't break the priority queue operations before.

The same operator for  bu_ls_rr_sort has a very long disclaimer
related to a similar issue:
  // There used to be a special tie breaker here that looked for
  // two-address instructions and preferred the instruction with a
  // def&use operand.  The special case triggered diagnostics when
  // _GLIBCXX_DEBUG was enabled because it broke the strict weak
  // ordering that priority_queue requires. It didn't help much anyway
  // because AddPseudoTwoAddrDeps already covers many of the cases
  // where it would have applied.  In addition, it's counter-intuitive
  // that a tie breaker would be the first thing attempted.  There's a
  // "real" tie breaker below that is the operation of last resort.
  // The fact that the "special tie breaker" would trigger when there
  // wasn't otherwise a tie is what broke the strict weak ordering
  // constraint.

So, there seems to be a problem. But I don't have a solution for it. I
need you help to understand how to fix it.

Thanks,
  Roman



More information about the llvm-commits mailing list