[llvm-dev] Specifying conditional blocks for the back end

Alex Susu via llvm-dev llvm-dev at lists.llvm.org
Sat Mar 11 16:36:38 PST 2017


   Hello.
     I wanted to tell you that I managed to codegen correctly the LLVM VSELECT instruction 
by doing the steps described below.
     Can somebody help me with the problems with the PredicateInstruction() method I 
describe below at point 3? Although I managed to avoid using PredicateInstruction(), I am 
curious why it doesn't work.

     To codegen correctly the LLVM VSELECT instruction (I will be very explicit, so bare 
with me if you have similar issues):
       - 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss of 
generality that VSELECT has a seteq predicate), which will implement the VSELECT in terms 
of my processor's WHERE blocks.
       - 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note that 
before I was generating the entire list of MachineSDNode instructions equivalent to 
VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code Elimination) 
pass were messing up the order of instructions resulting in incorrect semantics). Note 
that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be able to 
access them later in the PassCreateWhereBlocks pass mentioned below;

       - 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in 
[Target]TargetMachine.cpp, which gets executed immediately after instruction selection 
followed by a first scheduling phase.
         Even if I predicate in PassCreateWhereBlocks the instructions inside the WHERE 
block, the method PredicateInstruction() fails by returning false, which means the method 
did not add a predicated flag to the instructions I wanted to. This results, as I said 
before, in incorrect program optimizations such as useful instructions being removed, 
because the compiler does not understand that code in my WHERE blocks are predicated 
(conditional), so it assumes they are always being executed. As a side not, I see the ARM 
and SystemZ back ends are overriding the PredicateInstruction() method, but their code is 
a bit complex and I did not bother much to understand how they manage to predicate their 
instructions e.g., for ARM Thumb2 "it" instruction - are there some links documenting 
their work?
         Therefore I started using bundles instead of making predicated instructions - as 
far as I can see DCE cannot be performed inside bundled instructions (see also 
http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html which does 
NOT treat bundles, which implies it is not looking at the instruction inside a bundle and 
can only see the "header" instruction of a bundle; therefore, I believe it is safe to 
bundle instructions to avoid DCE as long as at least we can infer the "header" instruction 
of the bundle is not going to be ever DCE-ed). Using bundles also avoids that the 
scheduler changes the order of the bundled instructions. To create the bundle I use 
MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the 
finalizeBundle() method results in an error like "llc: 
/llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void 
llvm::finalizeBundle(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::instr_iterator, 
llvm::MachineBasicBlock::instr_iterator): Assertion 
`TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
         So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent sequence of 
MachineInstr:
           // pred is computed before
           R31 = OR Rfalse, Rfalse // copy Rfalse to R31
           WHERE_EQ
             R31 = OR Rtrue, Rtrue // copy Rtrue to R31
           ENDWHERE

         Note that I create a physical register (R31, a vector register; I also reserve 
this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error which 
sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using an undefined 
physical register"). I cannot use instead of R31 a virtual register in 
PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it twice (for 
both the then and else branches of the VSELECT instruction) and virtual registers follow 
the SSA rule of single-assignment (so I get the following error if assigning twice to a 
virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef assumes a single 
definition or no definition"' failed.>>). Also I tried without success using 
MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment, but then 
other passes like MachineLICM will give an error in llc like <<MachineLICM.cpp:409: [...] 
Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual 
register!"' failed.>>, because MachineRegisterInfo::isSSA() returns false, which makes the 
pass assume that register allocation has finished and we have only physical registers, 
which unfortunately is NOT the case.

       - 4. I also register a pass PassFinalizeBundles, in the addPreSched2() method 
[Target]TargetMachine.cpp and use finalizeBundle() on the instruction bundle I created 
earlier in PassCreateWhereBlocks because I want to avoid later errors like 
<</llvm/lib/CodeGen/PostRASchedulerList.cpp:357: virtual bool 
{anonymous}::PostRAScheduler::runOnMachineFunction(llvm::MachineFunction&): Assertion 
`Count == 0 && "Instruction count mismatch!"' failed.>> (IIRC)


   Best regards,
     Alex


On 3/7/2017 8:12 AM, Alex Susu wrote:
>   Hello.
>     Because I experience optimizations (DCE, OoO schedule) which mess the correct
> semantics of the list of instructions lowered in ISelLowering from the VSELECT LLVM
> instruction, and these bad transformations happen even before scheduling, at later I-sel
> subpasses, I try to fix this problem by lowering VSELECT to only one pseudo-instruction
> and LATER translate it to a list of instructions and use bundles and maybe also
> PredicateInstruction(), which is employed also in IfConversion.cpp.
>     More exactly I'm trying to use a pseudo-instruction that will get translated to a
> sequence of 4 MachineInstr, namely:
>         // These 4 instructions replace the pseudo-instruction I use for LLVM's VSELECT
>         R31 = OR srcVselectFalse, srcVselectFalse
>         WHERE_EQ
>            R31 = OR srcVselectTrue, srcVselectTrue
>         ENDWHERE
>     I plan to do this as early as possible, in a pass registered in addInstSelector()
> normally, which gets executed immediately after the first scheduling phase.
>     If anybody sees a problem with this, please let me know.
>
>     I think it is OK to specify an empty semantics (empty DAG pattern in TableGen) for my
> WHERE_EQ/ENDWHERE instructions delimiting the predication/conditional block.
>
>     Eli, thank you for the pointers. The "it" ARM Thumb2 instruction is very interesting,
> maybe even unique among mainstream processors, handling predicated execution of 2
> contiguous blocks of instructions; I found some specs for it at
> https://community.arm.com/processors/b/blog/posts/condition-codes-3-conditional-execution-in-thumb-2.
> This instruction is quite similar to my conditional-block instructions WHERExy/ENDWHERE
> (xy can be EQ, LT, CRY).
>
>   Thank you,
>     Alex
>
>
> On 3/3/2017 8:59 PM, Friedman, Eli wrote:
>> On 3/2/2017 7:07 PM, Alex Susu via llvm-dev wrote:
>>>   Hello.
>>>     For my back end for the Connex SIMD research processor I want to implement
>>> conditional blocks (I guess the better term is predicated blocks). Predicated blocks are
>>> bordered by two instructions WHEREEQ (or WHERELT, etc) and ENDWHERE.
>>>     For example, the following code executes the instructions inside the WHERE block
>>> only for the lanes where R0 == R1:
>>>         EQ R0, R1;
>>>         WHEREEQ
>>>           vector_asm_instr1;
>>>           ...
>>>           vector_asm_instrk;
>>>         ENDWHERE
>>>
>>>     I was able to generate at instruction selection such a block by writing custom C++
>>> selection code, but I don't know how can I inform the back end that the instructions
>>> inside the WHERE block get executed conditionally, not always.
>>>     This matters it seems only for optimization levels in llc -O1/2/3, but not for O0.
>>> For levels of optimization O1/2/3, I experienced cases where the WHEREEQ and ENDWHERE
>>> instructions were simply removed and the vector_asm_instr1..k became executed
>>> unconditionally, etc - and this is NOT good.
>>>
>>>     Could you please tell me how can I inform the back end that the instructions inside
>>> my WHERE blocks get executed conditionally, not always.
>>
>> There's some existing infrastructure in the backend for predication; see
>> lib/CodeGen/IfConversion.cpp (and the target hooks PredicateInstruction etc.).  For
>> forming blocks, you might want to follow what the ARM backend does for Thumb2; see
>> Thumb2ITBlockPass.cpp .
>>
>> -Eli
>>


More information about the llvm-dev mailing list