[LLVMdev] "Machine LICM" for Constants?

Matt Johnson johnso87 at crhc.illinois.edu
Wed Mar 7 18:28:08 PST 2012


Thanks for the tip!  I looked into it and it looks like the problem as 
of SVN HEAD is that the lui and ori instructions in Mips are considered 
cheap (1-cycle def-use latency) by MachineLICM::IsCheapInstruction(), 
but are not trivially materializable because their register operands are 
not always available.  This makes MachineLICM::IsProfitableToHoist() 
return false, preventing the hoist even though 
MachineLICM::IsLoopInvariantInst() returns true.

The comment in IsProfitableToHoist() is:

// If the instruction is cheap, only hoist if it is re-materilizable 
[sic]. LICM
// will increase register pressure. It's probably not worth it if the
// instruction is cheap.

The function then proceeds to actually *estimate* register pressure for 
non-cheap instructions to determine whether or not to hoist them.
This heuristic seems reasonable, but doesn't seem to do the right thing 
in this case.  Hacking the instruction itineraries to make the 
instructions not seem cheap doesn't seem like the right answer either.  
I'm guessing the motivation for this heuristic is that, in a loop with 
many possible hoists, some cheap and some expensive, we would prefer to 
hoist the expensive ones rather than wasting all our register slack on 
the cheap ones.

Is there another way to accomplish this goal while still performing the 
hoist in situations where register pressure is low enough?  Say, 
considering the instructions in a loop for hoisting in descending order 
of cost, rather than in program order?

Note that ARM gets around this by creating a pseudo-instruction for 
32-bit immediate loads (MOVi32imm) , rather than putting a pattern 
directly in ARMInstrInfo.td.  This fused instruction *is* 
rematerializable (since it defines the entire register), even though 
either of the two half-register instructions by themselves cannot be.  
This is one way my target and Mips could hack around the problem, but 
for my target at least it has the disadvantage of having to add an 
ExpandPseudo pass to my backend and put logic in C++ that seems (IMO) to 
belong in TableGen.

-Matt

On 03/07/2012 02:45 PM, Evan Cheng wrote:
> Yes machine-licm can and should hoist constant materialization instructions out of the loop. If it's not doing that, it's probably because the target is not modeling the instruction correctly. I would walk through MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can also try compiling the same bitcode for a target like ARM or X86 as a comparison.
>
> Evan
>
> On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote:
>
>> Hi All,
>>      I work on a backend for a target similar to Mips, where large
>> immediates are loaded into registers with 2 instructions, 1 to load the
>> MSBits and 1 to load the LSBits.  I've noticed a recurring pattern
>> where, despite low register pressure, these constants will be
>> rematerialized in every iteration of a loop, rather than being hoisted.
>> Here's an example using the mips-unknown-unknown target and Clang/LLVM
>> HEAD.  From newlib's implementation of strncat:
>>
>> #define DETECTNULL(X) (((X) - 0x01010101)&  ~(X)&  0x80808080)
>> while (!DETECTNULL (*aligned_s1))
>>    aligned_s1++;
>>
>> This loop gets lowered under -O3 to:
>>
>> $BB0_5:
>>    lui $3, 32896
>>    lui $7, 65278
>>    ori $3, $3, 32896 ###### Materialize 0x80808080
>>    lw  $8, 4($2)
>>    nop
>>    and $9, $8, $3
>>    ori $7, $7, 65279 ###### Materialize -(0x01010101)
>>    addiu $2, $2, 4
>>    xor $3, $9, $3
>>    addu  $7, $8, $7
>>    and $3, $3, $7
>>    beq $3, $zero, $BB0_5
>>
>>
>> There are a ton of unused caller-saved registers in this small function,
>> so I expected the constant materialization to be hoisted out of the
>> tight loop.  I'm still learning about the new register allocator and am
>> not immediately able to make sense of its debug output (and the
>> 'problem' may be elsewhere in any case).  I'm happy to post the results
>> of -debug-only regalloc if they're useful.
>>
>> Is my desire to hoist the constants out of the loop reasonable?  Is
>> there something I can do (hints or passes in my backend, clang/opt flag,
>> etc.) to make this happen today?  If not, what is the root cause?  Maybe
>> there's no way to hoist things out of a loop once IR is lowered into a
>> SelectionDAG?
>>
>> Thanks,
>> Matt
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



More information about the llvm-dev mailing list