[llvm-dev] Specifying conditional blocks for the back end
Alex Susu via llvm-dev
llvm-dev at lists.llvm.org
Sat Mar 11 16:36:38 PST 2017
Hello.
I wanted to tell you that I managed to codegen correctly the LLVM VSELECT instruction
by doing the steps described below.
Can somebody help me with the problems with the PredicateInstruction() method I
describe below at point 3? Although I managed to avoid using PredicateInstruction(), I am
curious why it doesn't work.
To codegen correctly the LLVM VSELECT instruction (I will be very explicit, so bare
with me if you have similar issues):
- 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss of
generality that VSELECT has a seteq predicate), which will implement the VSELECT in terms
of my processor's WHERE blocks.
- 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note that
before I was generating the entire list of MachineSDNode instructions equivalent to
VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code Elimination)
pass were messing up the order of instructions resulting in incorrect semantics). Note
that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be able to
access them later in the PassCreateWhereBlocks pass mentioned below;
- 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in
[Target]TargetMachine.cpp, which gets executed immediately after instruction selection
followed by a first scheduling phase.
Even if I predicate in PassCreateWhereBlocks the instructions inside the WHERE
block, the method PredicateInstruction() fails by returning false, which means the method
did not add a predicated flag to the instructions I wanted to. This results, as I said
before, in incorrect program optimizations such as useful instructions being removed,
because the compiler does not understand that code in my WHERE blocks are predicated
(conditional), so it assumes they are always being executed. As a side not, I see the ARM
and SystemZ back ends are overriding the PredicateInstruction() method, but their code is
a bit complex and I did not bother much to understand how they manage to predicate their
instructions e.g., for ARM Thumb2 "it" instruction - are there some links documenting
their work?
Therefore I started using bundles instead of making predicated instructions - as
far as I can see DCE cannot be performed inside bundled instructions (see also
http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html which does
NOT treat bundles, which implies it is not looking at the instruction inside a bundle and
can only see the "header" instruction of a bundle; therefore, I believe it is safe to
bundle instructions to avoid DCE as long as at least we can infer the "header" instruction
of the bundle is not going to be ever DCE-ed). Using bundles also avoids that the
scheduler changes the order of the bundled instructions. To create the bundle I use
MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the
finalizeBundle() method results in an error like "llc:
/llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void
llvm::finalizeBundle(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::instr_iterator,
llvm::MachineBasicBlock::instr_iterator): Assertion
`TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent sequence of
MachineInstr:
// pred is computed before
R31 = OR Rfalse, Rfalse // copy Rfalse to R31
WHERE_EQ
R31 = OR Rtrue, Rtrue // copy Rtrue to R31
ENDWHERE
Note that I create a physical register (R31, a vector register; I also reserve
this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error which
sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using an undefined
physical register"). I cannot use instead of R31 a virtual register in
PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it twice (for
both the then and else branches of the VSELECT instruction) and virtual registers follow
the SSA rule of single-assignment (so I get the following error if assigning twice to a
virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef assumes a single
definition or no definition"' failed.>>). Also I tried without success using
MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment, but then
other passes like MachineLICM will give an error in llc like <<MachineLICM.cpp:409: [...]
Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual
register!"' failed.>>, because MachineRegisterInfo::isSSA() returns false, which makes the
pass assume that register allocation has finished and we have only physical registers,
which unfortunately is NOT the case.
- 4. I also register a pass PassFinalizeBundles, in the addPreSched2() method
[Target]TargetMachine.cpp and use finalizeBundle() on the instruction bundle I created
earlier in PassCreateWhereBlocks because I want to avoid later errors like
<</llvm/lib/CodeGen/PostRASchedulerList.cpp:357: virtual bool
{anonymous}::PostRAScheduler::runOnMachineFunction(llvm::MachineFunction&): Assertion
`Count == 0 && "Instruction count mismatch!"' failed.>> (IIRC)
Best regards,
Alex
On 3/7/2017 8:12 AM, Alex Susu wrote:
> Hello.
> Because I experience optimizations (DCE, OoO schedule) which mess the correct
> semantics of the list of instructions lowered in ISelLowering from the VSELECT LLVM
> instruction, and these bad transformations happen even before scheduling, at later I-sel
> subpasses, I try to fix this problem by lowering VSELECT to only one pseudo-instruction
> and LATER translate it to a list of instructions and use bundles and maybe also
> PredicateInstruction(), which is employed also in IfConversion.cpp.
> More exactly I'm trying to use a pseudo-instruction that will get translated to a
> sequence of 4 MachineInstr, namely:
> // These 4 instructions replace the pseudo-instruction I use for LLVM's VSELECT
> R31 = OR srcVselectFalse, srcVselectFalse
> WHERE_EQ
> R31 = OR srcVselectTrue, srcVselectTrue
> ENDWHERE
> I plan to do this as early as possible, in a pass registered in addInstSelector()
> normally, which gets executed immediately after the first scheduling phase.
> If anybody sees a problem with this, please let me know.
>
> I think it is OK to specify an empty semantics (empty DAG pattern in TableGen) for my
> WHERE_EQ/ENDWHERE instructions delimiting the predication/conditional block.
>
> Eli, thank you for the pointers. The "it" ARM Thumb2 instruction is very interesting,
> maybe even unique among mainstream processors, handling predicated execution of 2
> contiguous blocks of instructions; I found some specs for it at
> https://community.arm.com/processors/b/blog/posts/condition-codes-3-conditional-execution-in-thumb-2.
> This instruction is quite similar to my conditional-block instructions WHERExy/ENDWHERE
> (xy can be EQ, LT, CRY).
>
> Thank you,
> Alex
>
>
> On 3/3/2017 8:59 PM, Friedman, Eli wrote:
>> On 3/2/2017 7:07 PM, Alex Susu via llvm-dev wrote:
>>> Hello.
>>> For my back end for the Connex SIMD research processor I want to implement
>>> conditional blocks (I guess the better term is predicated blocks). Predicated blocks are
>>> bordered by two instructions WHEREEQ (or WHERELT, etc) and ENDWHERE.
>>> For example, the following code executes the instructions inside the WHERE block
>>> only for the lanes where R0 == R1:
>>> EQ R0, R1;
>>> WHEREEQ
>>> vector_asm_instr1;
>>> ...
>>> vector_asm_instrk;
>>> ENDWHERE
>>>
>>> I was able to generate at instruction selection such a block by writing custom C++
>>> selection code, but I don't know how can I inform the back end that the instructions
>>> inside the WHERE block get executed conditionally, not always.
>>> This matters it seems only for optimization levels in llc -O1/2/3, but not for O0.
>>> For levels of optimization O1/2/3, I experienced cases where the WHEREEQ and ENDWHERE
>>> instructions were simply removed and the vector_asm_instr1..k became executed
>>> unconditionally, etc - and this is NOT good.
>>>
>>> Could you please tell me how can I inform the back end that the instructions inside
>>> my WHERE blocks get executed conditionally, not always.
>>
>> There's some existing infrastructure in the backend for predication; see
>> lib/CodeGen/IfConversion.cpp (and the target hooks PredicateInstruction etc.). For
>> forming blocks, you might want to follow what the ARM backend does for Thumb2; see
>> Thumb2ITBlockPass.cpp .
>>
>> -Eli
>>
More information about the llvm-dev
mailing list