[llvm-commits] Patch for review: Speeding up ScheduleDAG computations
Evan Cheng
evan.cheng at apple.com
Fri Feb 29 10:25:05 PST 2008
BTW, please commit the depth / height patch before worrying about
priorityqueue issue.
Thanks!
Evan
On Feb 29, 2008, at 9:27 AM, Roman Levenstein wrote:
> Hi Evan, Hi Dan,
>
> Thanks a lot for your review of my patch. I'm working on incorporating
> the feedback. See my comments below.
>
> 2008/2/28, Evan Cheng <evan.cheng at apple.com>:
>
>> Latencies[SU->NodeNum] = 1;
>> This should be 0, not 1 since the original code is:
>> if (SU->Succs.empty())
>> WorkList.push_back(std::make_pair(SU, 0U));
>
> Initially, I have though exactly like you and have written
> Latencies[SU->NodeNum] = 0;
>
> But then I let both old and new algorithm to run on the same data
> side-by-side and compared the results. It turned out that if you set
> this value to 0, then ALL latencies computed by the new algorithm are
> less by 1 then those computed by the old algorithm. Therefore I
> changed this assignment to 1, which fixed the problem, even though I
> do not quite understand why it does not produce the correct result
> with 0,
>
> Do you have any ideas?
>
>> For testing, it would be good to run the original code and then the
>> new code and then verify they produce the same numbers.
>
> This is what I did while developing the algorithm. The numbers were
> exactly the same on my use-cases.
>
>> A bug in depth / height calculation is not likely to cause
>> correctness failures
>> but something much more subtile.
> Indeed ;-)
>
> Now, a new problem I have descovered:
> While improving my patch, I also looked at the LatencyPriorityQueue
> implementation and the RegReductionPriorityQueue implementation. Both
> are using priority queues and have an extremely inefficient remove()
> method, that consumes 90% of the compilation time on very big BBs. So,
> I changed the data structure from the priority queue to the ordered
> std::set. Obviously, the max element is basically the last element of
> the set. This change produces a 3x speedup of compilation time on huge
> basic blocks. But there is a problem (or a bug!):
>
> the comparison operator
> td_ls_rr_sort::operator()(const SUnit* left, const SUnit* right) (see
> ScheduleDAGRRList.cpp)
> seems to be implemented incorrectly (see the code below).
>
> bool td_ls_rr_sort::operator()(const SUnit *left, const SUnit
> *right) const {
> unsigned LPriority = SPQ->getNodePriority(left);
> unsigned RPriority = SPQ->getNodePriority(right);
> bool LIsTarget = left->Node && left->Node->isTargetOpcode();
> bool RIsTarget = right->Node && right->Node->isTargetOpcode();
> bool LIsFloater = LIsTarget && left->NumPreds == 0;
> bool RIsFloater = RIsTarget && right->NumPreds == 0;
> unsigned LBonus = (SumOfUnscheduledPredsOfSuccs(left) == 1) ? 2 : 0;
> unsigned RBonus = (SumOfUnscheduledPredsOfSuccs(right) == 1) ? 2 : 0;
>
> if (left->NumSuccs == 0 && right->NumSuccs != 0)
> return false;
> else if (left->NumSuccs != 0 && right->NumSuccs == 0)
> return true;
>
> // Special tie breaker: if two nodes share a operand, the one that
> use it
> // as a def&use operand is preferred.
> if (LIsTarget && RIsTarget) {
> if (left->isTwoAddress && !right->isTwoAddress) {
> SDNode *DUNode = left->Node->getOperand(0).Val;
> if (DUNode->isOperand(right->Node))
> RBonus += 2;
> }
> if (!left->isTwoAddress && right->isTwoAddress) {
> SDNode *DUNode = right->Node->getOperand(0).Val;
> if (DUNode->isOperand(left->Node))
> LBonus += 2;
> }
> }
> if (LIsFloater)
> LBonus -= 2;
> if (RIsFloater)
> RBonus -= 2;
> if (left->NumSuccs == 1)
> LBonus += 2;
> if (right->NumSuccs == 1)
> RBonus += 2;
>
> if (LPriority+LBonus < RPriority+RBonus)
> return true;
> else if (LPriority == RPriority) {
> if (left->Depth < right->Depth)
> return true;
> else if (left->Depth == right->Depth) {
> if (left->NumSuccsLeft > right->NumSuccsLeft)
> return true;
> else if (left->NumSuccsLeft == right->NumSuccsLeft) {
> if (left->CycleBound > right->CycleBound)
> return true;
> }
> }
> }
> return false;
> }
>
> Namely, I discovered by debugging two specific values L and R, where
> L!=R, so that this operator returns FALSE (by return statement at the
> very end of the method) for both pair (L,R) and (R,L). That means that
> L >= R and R >=L and L!=R at the same time, which cannot be. This
> incorrect comparison operator breaks all set operations based on the
> ordering of elements, e.g. find() and insert(). Actually, I'm
> wondering why it didn't break the priority queue operations before.
>
> The same operator for bu_ls_rr_sort has a very long disclaimer
> related to a similar issue:
> // There used to be a special tie breaker here that looked for
> // two-address instructions and preferred the instruction with a
> // def&use operand. The special case triggered diagnostics when
> // _GLIBCXX_DEBUG was enabled because it broke the strict weak
> // ordering that priority_queue requires. It didn't help much anyway
> // because AddPseudoTwoAddrDeps already covers many of the cases
> // where it would have applied. In addition, it's counter-intuitive
> // that a tie breaker would be the first thing attempted. There's a
> // "real" tie breaker below that is the operation of last resort.
> // The fact that the "special tie breaker" would trigger when there
> // wasn't otherwise a tie is what broke the strict weak ordering
> // constraint.
>
> So, there seems to be a problem. But I don't have a solution for it. I
> need you help to understand how to fix it.
>
> Thanks,
> Roman
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list