bug in PrescheduleNodesWithMultipleUses()

Fri Sep 6 04:29:14 PDT 2013

Hi Andrew,

Changing to Sched::Source seems fine.
Thank you.
Patch attached.

Thank you also for the answers to my questions - appreciated.

I notice that only the X86 target uses resetOperationActions(), which in turn is called by the X86TargetLowering() constructor
All other targets (including XCore) do all the work in the <target>TargetLowering() constructors.
Should anything that does not require the '<target>TargetMachine' argument be moved into resetOperationActions()?
Is this the only intended use of resetOperationActions() - to be called during the <target>TargetLowering() constructor viz:
        // TargetLowering Configuration Methods - These methods should be invoked by
        // the derived class constructor to configure this object for the target.

Robert

________________________________
From: Andrew Trick [atrick at apple.com]
Sent: 04 September 2013 06:36
To: Robert Lytton
Cc: llvm commits
Subject: Re: bug in PrescheduleNodesWithMultipleUses()

On Aug 23, 2013, at 3:11 AM, Robert Lytton <robert at xmos.com<mailto:robert at xmos.com>> wrote:
(this picks up from the thread in llvmdev: "PrescheduleNodesWithMultipleUses() causing failure in PickNodeToScheduleBottomUp() ???")

PrescheduleNodesWithMultipleUses() is a function that tries to transform a DAG where nodes have multiple uses, with the aim of reducing the live range.
(see comment in lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp)

However, it does not check to see if callResources will be affected by the transformation.

The problem
========
 Take the following correctly scheduled DAG (arrow to predecessor):

  SetUp2                SetUp1
    ^                     ^
    |                     |
    |                     |
 Destroy2---->PredSU <----SU
    ^           ^         ^
    |           |         |
    |           |         |
    ----------- | ---------
              | | |
            Destroy1
                ^
                |

 In this example there are two successors of 'PredSU' with type
 getCallFrameDestroyOpcode (Destroy) and one is a successor of the other.
 Taking the successor of the two Destroys (Destroy1), note that it's
 matching getCallFrameSetupOpcode (Setup1) is a predecessor of 'SU'.
 In this situation, re-routing the dependency on 'PredSU' through 'SU' will
 cause a dead lock Viz:

  SetUp2      PredSU    SetUp1
    ^            ^        ^
    |            |        |
    |            |        |
 Destroy2-----> SU -------
    ^           ^ ^
    |           | |
    |           | |
    ----------- | |
              | | |
             Destroy1
                ^
                |

The new function callResourceDeadLockDanger() will check for this situation
and prevent PrescheduleNodesWithMultipleUses() from making the transformation
if it will cause a callResource dead lock.

Robert,

I appreciate that you put a lot of care into this patch. Your comments are amazing. But do you absolutely need to call PrescheduleNodesWithMultipleUses? x86_64 and arm don't use it, and it is about to be deprecated.

All targets should move to source order SD scheduling:

<target>TargetLowering::resetOperationActions() {
...
  setSchedulingPreference(Sched::Source);

If instruction scheduling is required, the target can enable MachineScheduler:

bool <target>SubTargetInfo::enableMachineScheduler() const { return true; }

If for some reason the source order scheduler won't work in the short term, you can choose one of these SD schedulers that don't run PrescheduleNodesWithMultipleUses:

setSchedulingPreference(Sched::ILP);
setSchedulingPreference(Sched::Hybrid);

For educational purposes I'll try to answer your questions below (although the only good fix I can see is deleting PrescheduleNodesWithMultipleUses()).

Outstanding issues
============

1. Is it too aggressive in searching predecessors and successors?
   Should the algorithm give up and assume the worst if the depth of search reaches a predefined limit?
   Can someone confirm that the extra processing time is not onerous.

Walking all paths in the DAG is exponential. In a case like this, a search limit is good. Otherwise you need to maintain a visited set.

I also think recursive DAG walking is very bad form. LLVM runtime compilers don’t want to see stack overflow.

2. Should the initial search for 'SetUp1' and 'Destroy1' only search along chains? viz conditional upon II->isCtrl()
   This will reduce the search space, but are getCallFrameSetupOpcode & getCallFrameDestroyOpcode always 'chained'?
    (Later searches for Destroy2 need to check all predecessors)

Yes, the call sequence nodes are chained. Not all nodes within the sequence are chained. If a node in the call sequence has a chain edge, Setup/Destroy will be reachable in that direction.

3. What is the best way to construct the test case?
    Using an IR as input does not guarantee the required DAG will be output for testing.
    The test IR only produces the correct DAG when built for the XCore target.
    (see attached test case)

That’s because other targets don’t call PrescheduleNodesWithMultipleUses.

4. Where should the test be placed?
    Currently I have placed it under test/CodeGen/XCore but it relates to lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

That’s fine. Generic test cases are better unless you need to CHECK the disassembly.

-Andy

Thank you

Robert

<PatchCallResourceDeadLock>_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130906/fe466b8b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PatchScheduler
Type: application/octet-stream
Size: 2424 bytes
Desc: PatchScheduler
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130906/fe466b8b/attachment.obj>