[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together
Alex Susu via llvm-dev
llvm-dev at lists.llvm.org
Sat Feb 18 13:56:56 PST 2017
Hello.
I would like to report how I managed to solve my problem with bundling groups of two
MachineInstrs together, after the pre-RA scheduler pass. In detail I do the following:
- I override the [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) method.
Note that just giving MI.bundleWithPred(), for example, seems NOT to work. More exactly:
bool [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
...
// We now know that MI is the INLINEASM instruction that needs to be
bundled with the previous instruction, predMI.
// I have to iterate through the MBB (Machine Basic Block) to obtain pred
and succ of the MI.
/*
We do NOT use MIBundleBuilder, just predMI and succMI iterators.
Note that succMI is required if we want to bundle instructions
in the interval
predMI..MI, where succMI = succ(MI).
*/
llvm::finalizeBundle(MBB,
(MachineBasicBlock::instr_iterator)predMI,
(MachineBasicBlock::instr_iterator)succMI);
...
}
- we need to unpack these bundles by adding in the following method this code:
void [Target]AsmPrinter::EmitInstruction(const MachineInstr *MI) {
// Inspired from lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
if (MI->isBundle()) {
const MachineBasicBlock *MBB = MI->getParent();
MachineBasicBlock::const_instr_iterator I = MI->getIterator();
I++;
while (I != MBB->instr_end() && I->isInsideBundle()) {
EmitInstruction(& (*I) );
++I;
}
return;
}
...
}
- then, in InstPrinter/[Target]InstPrinter.cpp we adjust the following method to
handle the INLINEASMs I bundle and then unpack:
void [Target]InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
StringRef Annot, const MCSubtargetInfo &STI) {
/* For some reason, [Target]GenAsmWriter.inc cannot print INLINEASM from the
MachineInstr bundles I create in [Target]InstrInfo.cpp, expandPostRAPseudo(),
and then unpack in [Target]AsmPrinter::EmitInstruction().
So I handle these INLINEASMs myself here.
*/
if (MI->getOpcode() == 1) {
printOperand(MI, 0, O);
}
else {
printInstruction(MI, O);
}
printAnnotation(O, Annot);
}
Doing these changes in the back end allows me to stick some INLINEASMs with their
previous instructions, in order to remain next to each other after the post-RA scheduler
that I apply to avoid some data hazards in my resulting ASM program.
Best regards,
Alex
On 2/13/2017 5:02 AM, Alex Susu wrote:
> Hello.
> After looking at the debug information from llc, it seems actually the pre-RA
> scheduler (NOT the post-RA scheduler) is the one breaking my INLINEASM SDNodes from the
> "associated" instructions in my program, (there is a simple dataflow edge between the
> INLINEASM and the associated node). Note that removing this dependence/dataflow edge is
> THE reason why the post-RA schedule generates ASM code with other instructions
> inbetween.
>
> Is it possible to generate instruction bundles (or pseudo-instructions) in the pre-RA
> scheduler pass? At http://llvm.org/docs/CodeGenerator.html#machineinstr-bundles it is
> written that: "Packing / bundling of MachineInstr’s should be done as part of the register
> allocation super-pass.", etc.
>
> Matthias, thank you for pointing out that at least the register allocator can move
> around my 2 instructions - but note that a MachineSDNode with one destination register and
> an immediate value and a consecutive INLINEASM (which has no register) should NOT be
> separated by the register allocator. What other passes from llc (llc -O3) would you
> believe could separate my 2 instructions?
>
> I will read about mutations in the documentation (for example,
> http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleDAGMI.html and
> http://llvm.org/docs/doxygen/html/MachineScheduler_8h_source.html) .
>
> Thank you,
> Alex
>
>
> On 2/10/2017 11:36 PM, Krzysztof Parzyszek via llvm-dev wrote:
>> On 2/10/2017 3:26 PM, Matthias Braun via llvm-dev wrote:
>>> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
>>> in createPostMachineScheduler() that adds a cluster edge between the two nodes so
>>> the scheduler tries hard to keep them together. Unfortunately this doesn't work
>>> always today because the schedulemodel is always checked for stalls first (Pending
>>> vs. Available lists in the MachineScheduler) before the scheduler even checks its
>>> usual cost function with the cluster heuristic.
>>
>> You can do that with the regular post-RA scheduler as well via
>> "TargetSubtargetInfo::getPostRAMutations".
>>
>> -Krzysztof
>
>
>
>
> With best regards,
> Alex Susu
>
> On 2/10/2017 11:26 PM, Matthias Braun wrote:
>>
>>> On Feb 10, 2017, at 12:52 PM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org>
>>> wrote:
>>>
>>> Hello. I am using the post-RA (Register Allocation) scheduler to avoid data hazards
>>> by inserting other USEFUL instructions from the program (besides NOPs) and it breaks
>>> apart some sequences of instructions which should remain "glued" together. More
>>> exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace for example a
>>> BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an INLINEASM, the latter
>>> having a simple dataflow dependence (black solid edge when outputting the DAG as a
>>> .DOT after instruction selection) on the result of the former instruction. (I can
>>> present the .DOT after instruction selection obtained with llc -view-sched-dags).
>>> When I run the default pre-RA scheduler (which seems to be a "List Scheduling"
>>> algorithm) I always obtain the ASM generated code where the string of the INLINEASM
>>> follows immediately after the associated asm instruction for the VLOAD_D_WO_IMM. But
>>> when I use also the post-RA scheduler (llc -post-RA-scheduler ...) I get some
>>> different instructions inserted between the VLOAD_D_WO_IMM and the INLINEASM, which
>>> is not correct semantically.
>>>
>>> How can I avoid these 2 instructions being separated by the post-RA scheduler? Can I
>>> customize the behavior of the post-RA scheduler (I found some documentation at
>>> http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?
>>>
>>> The first natural idea was to use SelectionDAG glue edges, but I noticed that they
>>> are not very reliable (sometimes I even have difficulties in creating them for
>>> example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering). Also I understood
>>> that anyhow the scheduler can disregard the glue edges between SelectionDAG nodes.
>>> For example: - from http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
>>> <<You can't Glue the two nodes together forever. All Glue really does is keep them
>>> together long enough for LLVM to put together a data dependency through "Uses" and
>>> "Defs" implicit operands. Once the MachineInstrs have been created, the two
>>> instructions are at the whim of the scheduler as much as any others. If you really
>>> need them to remain together, you have to either create a pseudo-instruction and
>>> expand it extremely late, or create a bundle (depending on what's natural for your
>>> target).>> - from http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
>>> <<If you want to have these nodes stick together, using glue may not be sufficient.
>>> After the machine instructions are generated, the scheduler may place instructions
>>> between the interrupt disable/restore and the atomic load itself. Also, the register
>>> allocator may insert some spills there---there are ways that this sequence may get
>>> separated. For this, the best approach may be to define a pseudo-instruction, which
>>> will be expanded into real instruction in the post-RA expansion pass.>>
>>>
>>> Also, I don't want to use MachineInstr bundles or pseudo-instructions. MachineInstr
>>> bundles seem to difficult to use and too late in the code generation (I prefer
>>> working at the level of instruction selection). Also, I found little information
>>> about pseudo-instructions - there is some API support, namely expandPostRAPseudo()
>>> described at http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html.
>>> Also, some documentation at
>>> http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf, slide
>>> 55 (and 53, 54).
>
>> Well if it is two instructions, then there is always a chance that some pass moves them
>> around or inserts new instructions in between (esp. regalloc may insert
>> spills/reloads/copies). The only guaranteed solution is indeed to a pseudo instruction
>> or an instruction bundle so the instructions look like a single unit to codegen.
>
>> That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
>> in createPostMachineScheduler() that adds a cluster edge between the two nodes so the
>> scheduler tries hard to keep them together. Unfortunately this doesn't work always
>> today because the schedulemodel is always checked for stalls first (Pending vs.
>> Available lists in the MachineScheduler) before the scheduler even checks its usual
>> cost function with the cluster heuristic.
>>
>> - Matthias
>>
>>
More information about the llvm-dev
mailing list