[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

Sat Feb 18 13:56:56 PST 2017

   Hello.
     I would like to report how I managed to solve my problem with bundling groups of two 
MachineInstrs together, after the pre-RA scheduler pass. In detail I do the following:
       - I override the [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) method. 
Note that just giving MI.bundleWithPred(), for example, seems NOT to work. More exactly:
           bool [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
               ...
               // We now know that MI is the INLINEASM instruction that needs to be 
bundled with the previous instruction, predMI.
               // I have to iterate through the MBB (Machine Basic Block) to obtain pred 
and succ of the MI.
               /*
               We do NOT use MIBundleBuilder, just predMI and succMI iterators.
                   Note that succMI is required if we want to bundle instructions
                   in the interval
                   predMI..MI, where succMI = succ(MI).
               */
               llvm::finalizeBundle(MBB,
                               (MachineBasicBlock::instr_iterator)predMI,
                               (MachineBasicBlock::instr_iterator)succMI);
               ...
           }

       - we need to unpack these bundles by adding in the following method this code:
     void [Target]AsmPrinter::EmitInstruction(const MachineInstr *MI) {
	// Inspired from lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
	if (MI->isBundle()) {
             const MachineBasicBlock *MBB = MI->getParent();
	    MachineBasicBlock::const_instr_iterator I = MI->getIterator();
	    I++;
	    while (I != MBB->instr_end() && I->isInsideBundle()) {
			EmitInstruction(& (*I) );
			++I;
	    }
             return;
	}
         ...
     }

       - then, in InstPrinter/[Target]InstPrinter.cpp we adjust the following method to 
handle the INLINEASMs I bundle and then unpack:
       void [Target]InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
                                StringRef Annot, const MCSubtargetInfo &STI) {
         /* For some reason, [Target]GenAsmWriter.inc cannot print INLINEASM from the
             MachineInstr bundles I create in [Target]InstrInfo.cpp, expandPostRAPseudo(),
             and then unpack in [Target]AsmPrinter::EmitInstruction().
            So I handle these INLINEASMs myself here.
         */
         if (MI->getOpcode() == 1) {
           printOperand(MI, 0, O);
         }
         else {
           printInstruction(MI, O);
         }

         printAnnotation(O, Annot);
       }

     Doing these changes in the back end allows me to stick some INLINEASMs with their 
previous instructions, in order to remain next to each other after the post-RA scheduler 
that I apply to avoid some data hazards in my resulting ASM program.

   Best regards,
     Alex


On 2/13/2017 5:02 AM, Alex Susu wrote:
>   Hello.
>     After looking at the debug information from llc, it seems actually the pre-RA
> scheduler (NOT the post-RA scheduler) is the one breaking my INLINEASM SDNodes from the
> "associated" instructions in my program, (there is a simple dataflow edge between the
> INLINEASM and the associated node). Note that removing this dependence/dataflow edge is
 > THE reason why the post-RA schedule generates ASM code with other instructions
 > inbetween.
>
>     Is it possible to generate instruction bundles (or pseudo-instructions) in the pre-RA
> scheduler pass? At http://llvm.org/docs/CodeGenerator.html#machineinstr-bundles it is
> written that: "Packing / bundling of MachineInstr’s should be done as part of the register
> allocation super-pass.", etc.
>
>     Matthias, thank you for pointing out that at least the register allocator can move
> around my 2 instructions - but note that a MachineSDNode with one destination register and
> an immediate value and a consecutive INLINEASM (which has no register) should NOT be
> separated by the register allocator. What other passes from llc (llc -O3) would you
> believe could separate my 2 instructions?
>
>     I will read about mutations in the documentation (for example,
> http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleDAGMI.html and
> http://llvm.org/docs/doxygen/html/MachineScheduler_8h_source.html) .
>
>   Thank you,
>     Alex
>
>
> On 2/10/2017 11:36 PM, Krzysztof Parzyszek via llvm-dev wrote:
>> On 2/10/2017 3:26 PM, Matthias Braun via llvm-dev wrote:
>>> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
>>> in createPostMachineScheduler() that adds a cluster edge between the two nodes so
>>> the scheduler tries hard to keep them together. Unfortunately this doesn't work
>>> always today because the schedulemodel is always checked for stalls first (Pending
>>> vs. Available lists in the MachineScheduler) before the scheduler even checks its
>>> usual cost function with the cluster heuristic.
>>
>> You can do that with the regular post-RA scheduler as well via
>> "TargetSubtargetInfo::getPostRAMutations".
>>
>> -Krzysztof
>
>
>
>
> With best regards,
>     Alex Susu
>
> On 2/10/2017 11:26 PM, Matthias Braun wrote:
>>
>>> On Feb 10, 2017, at 12:52 PM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org>
>>> wrote:
>>>
>>> Hello. I am using the post-RA (Register Allocation) scheduler to avoid data hazards
>>> by inserting other USEFUL instructions from the program (besides NOPs) and it breaks
>>> apart some sequences of instructions which should remain "glued" together. More
>>> exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace for example a
>>> BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an INLINEASM, the latter
>>> having a simple dataflow dependence (black solid edge when outputting the DAG as a
>>> .DOT after instruction selection) on the result of the former instruction. (I can
>>> present the .DOT after instruction selection obtained with llc -view-sched-dags).
>>> When I run the default pre-RA scheduler (which seems to be a "List Scheduling"
>>> algorithm)  I always obtain the ASM generated code where the string of the INLINEASM
>>> follows immediately after the associated asm instruction for the VLOAD_D_WO_IMM. But
>>> when I use also the post-RA scheduler (llc -post-RA-scheduler ...) I get some
>>> different instructions inserted between the VLOAD_D_WO_IMM and the INLINEASM, which
>>> is not correct semantically.
>>>
>>> How can I avoid these 2 instructions being separated by the post-RA scheduler? Can I
>>> customize the behavior of the post-RA scheduler (I found some documentation at
>>> http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?
>>>
>>> The first natural idea was to use SelectionDAG glue edges, but I noticed that they
>>> are not very reliable (sometimes I even have difficulties in creating them for
>>> example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering). Also I understood
>>> that anyhow the scheduler can disregard the glue edges between SelectionDAG nodes.
>>> For example: - from http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
>>> <<You can't Glue the two nodes together forever. All Glue really does is keep them
>>> together long enough for LLVM to put together a data dependency through "Uses" and
>>> "Defs" implicit operands. Once the MachineInstrs have been created, the two
>>> instructions are at the whim of the scheduler as much as any others. If you really
>>> need them to remain together, you have to either create a pseudo-instruction and
>>> expand it extremely late, or create a bundle (depending on what's natural for your
>>> target).>> - from http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
>>> <<If you want to have these nodes stick together, using glue may not be sufficient.
>>> After the machine instructions are generated, the scheduler may place instructions
>>> between the interrupt disable/restore and the atomic load itself.  Also, the register
>>> allocator may insert some spills there---there are ways that this sequence may get
>>> separated. For this, the best approach may be to define a pseudo-instruction, which
>>> will be expanded into real instruction in the post-RA expansion pass.>>
>>>
>>> Also, I don't want to use MachineInstr bundles or pseudo-instructions. MachineInstr
>>> bundles seem to difficult to use and too late in the code generation (I prefer
>>> working at the level of instruction selection). Also, I found little information
>>> about pseudo-instructions - there is some API support, namely expandPostRAPseudo()
>>> described at http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html.
>>> Also, some documentation at
>>> http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf, slide
>>> 55 (and 53, 54).
>
>> Well if it is two instructions, then there is always a chance that some pass moves them
>> around or inserts new instructions in between (esp. regalloc may insert
>> spills/reloads/copies). The only guaranteed solution is indeed to a pseudo instruction
>> or an instruction bundle so the instructions look like a single unit to codegen.
>
>> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
>> in createPostMachineScheduler() that adds a cluster edge between the two nodes so the
>> scheduler tries hard to keep them together. Unfortunately this doesn't work always
>> today because the schedulemodel is always checked for stalls first (Pending vs.
>> Available lists in the MachineScheduler) before the scheduler even checks its usual
>> cost function with the cluster heuristic.
>>
>> - Matthias
>>
>>