[llvm-commits] [llvm] r113670 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/PeepholeOptimizer.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h

Mon Sep 13 13:50:24 PDT 2010

On Sep 13, 2010, at 1:22 PM, Bill Wendling wrote:

> On Sep 13, 2010, at 1:05 PM, Gabor Greif wrote:
> 
>>> Here's your patch from another email. My comments are in-line:
>>> 
>>> gabor at google8:~/llvm-build$ svn diff /home/gabor/llvm/lib/Target/ARM/
>>> ARMBaseInstrInfo.cpp
>>> Index: /home/gabor/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
>>> ===================================================================
>>> --- /home/gabor/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
>>> (revision 113747)
>>> +++ /home/gabor/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
>>> (working copy)
>>> @@ -1372,6 +1372,19 @@
>>>   SrcReg = MI->getOperand(0).getReg();
>>>   CmpValue = MI->getOperand(1).getImm();
>>>   return true;
>>> +  case ARM::TSTri: {
>>> +      if (MI->getParent()->begin() ==
>>> MachineBasicBlock::const_iterator(MI))
>>> +        return false;
>>> +      const MachineInstr *AND = llvm::prior(MI);
>>> 
>>> The AND doesn't have to be the immediately prior instruction. I could happen
>>> further up in the BB just as long as there isn't an intervening instruction
>>> which sets CPSR.
>> 
>> Yes, but the instruction selector tends to create sequences like this:
>> 
>> rN = ...
>> rM = and rN, #mask
>> ... ; intervening instructions
>> tst rN, #mask
>> 
>> Note that both "and" and "tst" consume rN! But if we want to change
>> the form of "and" to "andS" we have to find it. My first guess is to
>> look immediately before the "tst". I may relax that (to, say 5
>> predecessors), but do not want to risk a quadratic algorithm.
>> 
> The algorithm is only O(n), where n is the number of instructions in the basic block. But limiting it to 5 or so is probably not bad. My guess is that the back-end generates the AND directly before the TST in most cases.

Likely. The scheduler may well move it around a bit, though. Previously, I've dealt with this by scanning backwards in the instructions 'til I find the <def> of the input register or the start of the basic block. Having a hard limit on the number of instructions to scan and just punting if it's not found by then sounds reasonable.

For example, it's not too unusual to see sequences like the following:
        cmp     r5, #15
        vdup.32 q1, r1
        vadd.i32        q2, q1, q1
        vadd.i32        q1, q0, q1
        bls     LBB0_8

When there's no other benefit, we'd like to prevent the post-RA scheduler from doing things like that, but there's no logic in there to do so at the moment. It's on my plate to do that at some point.

-Jim

> 
>>> +      if (AND->getOpcode() != ARM::ANDri)
>>> +        return false;
>>> +      if (MI->getOperand(0).getReg() == AND->getOperand(1).getReg() && //
>>> FIXME: or == AND
>>> +          MI->getOperand(1).getImm() == AND->getOperand(2).getImm()) {//
>>> FIXME: subset
>>> +        SrcReg = AND->getOperand(0).getReg();
>>> +        CmpValue = 0;
>>> 
>>> CmpValue might not be 0 here.
>> 
>> Whatever the mask is, the comparison is always against zero for the
>> TSTri opcode. I hope I am not terribly mistaken with this statement
>> :-)
>> 
> The CmpValue here means the mask value in this case. The terminology is messing us up.
> 
>>> 
>>> +        return true;
>>> +      }
>>> +    }
>>> 
>>> There's no need for this to be as complex here. The logic should be pretty
>>> much identical to the CMP cases. In fact, I think it is identical when you
>>> have a TST of an immediate. This logic should reside in
>> 
>> I only attempt to cover the TSTri case. I do *not* think it is just as
>> easy as "CMPri", because of the search for a suitable ANDri
>> instruction that feeds from the same input
>> and uses the same (super-) mask. Only in this case the "andS" will be
>> equivalent.
>> 
> But that logic should still reside in the OptimizeCompareInstr method. It will have to be modified some, but it's not complex.
> 
>>> case ARM::SUBri:
>>> case ARM::t2ADDri:
>>> case ARM::t2SUBri:
>>> 
>>> To do the optimization you suggest should be almost as easy as simply adding
>>> the TST* cases to the switch statements and removing the "if (CmpValue != 0)
>>> return false;" from OptimizeCompareInstr. :-)
>> 
>> Are you talking about TSTrr and TSTrs here?
>> 
> TST against the immediate.
> 
> -bw
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits