[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

Sun Feb 12 19:02:37 PST 2017

   Hello.
     After looking at the debug information from llc, it seems actually the pre-RA 
scheduler (NOT the post-RA scheduler) is the one breaking my INLINEASM SDNodes from the 
"associated" instructions in my program, (there is a simple dataflow edge between the 
INLINEASM and the associated node).

     Is it possible to generate instruction bundles (or pseudo-instructions) in the pre-RA 
scheduler pass? At http://llvm.org/docs/CodeGenerator.html#machineinstr-bundles it is 
written that: "Packing / bundling of MachineInstr’s should be done as part of the register 
allocation super-pass.", etc.

     Matthias, thank you for pointing out that at least the register allocator can move 
around my 2 instructions - but note that a MachineSDNode with one destination register and 
an immediate value and a consecutive INLINEASM (which has no register) should NOT be 
separated by the register allocator. What other passes from llc (llc -O3) would you 
believe could separate my 2 instructions?

     I will read about mutations in the documentation (for example, 
http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleDAGMI.html and 
http://llvm.org/docs/doxygen/html/MachineScheduler_8h_source.html) .

   Thank you,
     Alex


On 2/10/2017 11:36 PM, Krzysztof Parzyszek via llvm-dev wrote:
> On 2/10/2017 3:26 PM, Matthias Braun via llvm-dev wrote:
>> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
>> in createPostMachineScheduler() that adds a cluster edge between the two nodes so
>> the scheduler tries hard to keep them together. Unfortunately this doesn't work
>> always today because the schedulemodel is always checked for stalls first (Pending
>> vs. Available lists in the MachineScheduler) before the scheduler even checks its
>> usual cost function with the cluster heuristic.
>
> You can do that with the regular post-RA scheduler as well via
> "TargetSubtargetInfo::getPostRAMutations".
>
> -Krzysztof


With best regards,
     Alex Susu

On 2/10/2017 11:26 PM, Matthias Braun wrote:
>
>> On Feb 10, 2017, at 12:52 PM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org>
>> wrote:
>>
>> Hello. I am using the post-RA (Register Allocation) scheduler to avoid data hazards
>> by inserting other USEFUL instructions from the program (besides NOPs) and it breaks
>> apart some sequences of instructions which should remain "glued" together. More
>> exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace for example a
>> BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an INLINEASM, the latter
>> having a simple dataflow dependence (black solid edge when outputting the DAG as a
>> .DOT after instruction selection) on the result of the former instruction. (I can
>> present the .DOT after instruction selection obtained with llc -view-sched-dags).
>> When I run the default pre-RA scheduler (which seems to be a "List Scheduling"
>> algorithm)  I always obtain the ASM generated code where the string of the INLINEASM
>> follows immediately after the associated asm instruction for the VLOAD_D_WO_IMM. But
>> when I use also the post-RA scheduler (llc -post-RA-scheduler ...) I get some
>> different instructions inserted between the VLOAD_D_WO_IMM and the INLINEASM, which
>> is not correct semantically.
>>
>> How can I avoid these 2 instructions being separated by the post-RA scheduler? Can I
>> customize the behavior of the post-RA scheduler (I found some documentation at
>> http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?
>>
>> The first natural idea was to use SelectionDAG glue edges, but I noticed that they
>> are not very reliable (sometimes I even have difficulties in creating them for
>> example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering). Also I understood
>> that anyhow the scheduler can disregard the glue edges between SelectionDAG nodes.
>> For example: - from http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
>> <<You can't Glue the two nodes together forever. All Glue really does is keep them
>> together long enough for LLVM to put together a data dependency through "Uses" and
>> "Defs" implicit operands. Once the MachineInstrs have been created, the two
>> instructions are at the whim of the scheduler as much as any others. If you really
>> need them to remain together, you have to either create a pseudo-instruction and
>> expand it extremely late, or create a bundle (depending on what's natural for your
>> target).>> - from http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
>> <<If you want to have these nodes stick together, using glue may not be sufficient.
>> After the machine instructions are generated, the scheduler may place instructions
>> between the interrupt disable/restore and the atomic load itself.  Also, the register
>> allocator may insert some spills there---there are ways that this sequence may get
>> separated. For this, the best approach may be to define a pseudo-instruction, which
>> will be expanded into real instruction in the post-RA expansion pass.>>
>>
>> Also, I don't want to use MachineInstr bundles or pseudo-instructions. MachineInstr
>> bundles seem to difficult to use and too late in the code generation (I prefer
>> working at the level of instruction selection). Also, I found little information
>> about pseudo-instructions - there is some API support, namely expandPostRAPseudo()
>> described at http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html.
>> Also, some documentation at
>> http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf, slide
>> 55 (and 53, 54).

> Well if it is two instructions, then there is always a chance that some pass moves them
> around or inserts new instructions in between (esp. regalloc may insert
> spills/reloads/copies). The only guaranteed solution is indeed to a pseudo instruction
> or an instruction bundle so the instructions look like a single unit to codegen.

> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
> in createPostMachineScheduler() that adds a cluster edge between the two nodes so the
> scheduler tries hard to keep them together. Unfortunately this doesn't work always
> today because the schedulemodel is always checked for stalls first (Pending vs.
> Available lists in the MachineScheduler) before the scheduler even checks its usual
> cost function with the cluster heuristic.
>
> - Matthias
>
>