[llvm-commits] [llvm] r114506 - in /llvm/trunk: lib/Target/ARM/ARMBaseInstrInfo.cpp test/CodeGen/ARM/2010-09-21-OptCmpBug.ll

Evan Cheng evan.cheng at apple.com
Fri Sep 24 16:21:13 PDT 2010


On Sep 22, 2010, at 3:50 AM, Gabor Greif wrote:

> Hi Evan, All,
> 
> my observations inline below...
> 
> On Sep 22, 1:49 am, Evan Cheng <evan.ch... at apple.com> wrote:
>> Author: evancheng
>> Date: Tue Sep 21 18:49:07 2010
>> New Revision: 114506
>> 
>> URL:http://llvm.org/viewvc/llvm-project?rev=114506&view=rev
>> Log:
>> OptimizeCompareInstr should avoid iterating pass the beginning of the MBB when the 'and' instruction is after the comparison.
>> 
>> Added:
>>     llvm/trunk/test/CodeGen/ARM/2010-09-21-OptCmpBug.ll
>> Modified:
>>     llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp
>> 
>> Modified: llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp
>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMBase...
>> ==============================================================================
>> --- llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp Tue Sep 21 18:49:07 2010
>> @@ -1455,7 +1455,8 @@
>> 
>>    // Check that CPSR isn't set between the comparison instruction and the one we
>>    // want to change.
>> -  MachineBasicBlock::const_iterator I = CmpInstr, E = MI;
>> +  MachineBasicBlock::const_iterator I = CmpInstr, E = MI,
>> +    B = MI->getParent()->begin();
>>    --I;
>>    for (; I != E; --I) {
>>      const MachineInstr &Instr = *I;
>> @@ -1469,6 +1470,10 @@
>>        if (MO.getReg() == ARM::CPSR)
>>          return false;
>>      }
>> +
>> +    if (I == B)
>> +      // The 'and' is below the comparison instruction.
>> +      return false;
>>    }
>> 
>>    // Set the "zero" bit in CPSR.
>> 
>> Added: llvm/trunk/test/CodeGen/ARM/2010-09-21-OptCmpBug.ll
>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2010-...
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/ARM/2010-09-21-OptCmpBug.ll (added)
>> +++ llvm/trunk/test/CodeGen/ARM/2010-09-21-OptCmpBug.ll Tue Sep 21 18:49:07 2010
>> @@ -0,0 +1,84 @@
>> +; RUN: llc < %s -mtriple=thumbv7-apple-darwin10
>> +
>> +declare noalias i8* @malloc(i32) nounwind
>> +
>> +define internal void @gl_DrawPixels(i32 %width, i32 %height, i32 %format, i32 %type, i8* %pixels) nounwind {
>> +entry:
>> +  br i1 undef, label %bb3.i, label %bb3
>> +
>> +bb3.i:                                            ; preds = %entry
>> +  unreachable
>> +
>> +gl_error.exit:                                    ; preds = %bb22
>> +  ret void
>> +
>> +bb3:                                              ; preds = %entry
>> +  br i1 false, label %bb5, label %bb4
>> +
>> +bb4:                                              ; preds = %bb3
>> +  br label %bb5
>> +
>> +bb5:                                              ; preds = %bb4, %bb3
>> +  br i1 undef, label %bb19, label %bb22
>> +
>> +bb19:                                             ; preds = %bb5
>> +  switch i32 %type, label %bb3.i6.i [
>> +    i32 5120, label %bb1.i13
>> +    i32 5121, label %bb1.i13
>> +    i32 6656, label %bb9.i.i6
>> +  ]
>> +
>> +bb9.i.i6:                                         ; preds = %bb19
>> +  br label %bb1.i13
>> +
>> +bb3.i6.i:                                         ; preds = %bb19
>> +  unreachable
>> +
>> +bb1.i13:                                          ; preds = %bb9.i.i6, %bb19, %bb19
>> +  br i1 undef, label %bb3.i17, label %bb2.i16
>> +
>> +bb2.i16:                                          ; preds = %bb1.i13
>> +  unreachable
>> +
>> +bb3.i17:                                          ; preds = %bb1.i13
>> +  br i1 undef, label %bb4.i18, label %bb23.i
>> +
>> +bb4.i18:                                          ; preds = %bb3.i17
>> +  %0 = mul nsw i32 %height, %width
>> +  %1 = and i32 %0, 7
>> +  %not..i = icmp ne i32 %1, 0
>> +  %2 = zext i1 %not..i to i32
>> +  %storemerge2.i = add i32 0, %2
>> +  %3 = call noalias i8* @malloc(i32 %storemerge2.i) nounwind
>> +  br i1 undef, label %bb3.i9, label %bb9.i
> 
> This seems to instruction select to:
> 
> BB#10: derived from LLVM BB %bb4.i18
>    Predecessors according to CFG: BB#9
>        %reg16402<def> = COPY %reg16385; rGPR:%reg16402 GPR:%reg16385
>        %reg16403<def> = COPY %reg16384; rGPR:%reg16403 GPR:%reg16384
>        %reg16401<def> = t2MUL %reg16402, %reg16403, pred:14, pred:
> %reg0; rGPR:%reg16401,16402,16403
>        %reg16404<def> = t2MOVi 0, pred:14, pred:%reg0, opt:%reg0;
> rGPR:%reg16404
>        t2TSTri %reg16401, 7, pred:14, pred:%reg0, %CPSR<imp-def>;
> rGPR:%reg16401
>        %reg16405<def> = t2MOVCCi %reg16404, 1, pred:1, pred:%CPSR;
> rGPR:%reg16405,16404
>        ADJCALLSTACKDOWN 0, pred:14, pred:%reg0, %SP<imp-def,dead>,
> %SP<imp-use>
>        %R0<def> = COPY %reg16405; rGPR:%reg16405
>        tBLXi_r9 <ga:@malloc>, %R0, %R0<imp-def>, %R1<imp-def,dead>,
> %R3<imp-def,dead>, %CPSR<imp-def,dead>, ...
>        ADJCALLSTACKUP 0, 0, pred:14, pred:%reg0, %SP<imp-def,dead>,
> %SP<imp-use>
>        %reg16407<def> = t2ANDri %reg16401, 7, pred:14, pred:%reg0,
> opt:%reg0; rGPR:%reg16407,16401
>        %reg16387<def> = COPY %reg16407; GPR:%reg16387 rGPR:%reg16407
>        %reg16408<def> = IMPLICIT_DEF; GPR:%reg16408
>        t2CMPzri %reg16408<kill>, 0, pred:14, pred:%reg0, %CPSR<imp-
> def>; GPR:%reg16408
>        t2Bcc <BB#18>, pred:1, pred:%CPSR
>    Successors according to CFG: BB#18 BB#11
> 
> The TST and the AND indeed both consume %reg16401 and the AND is below
> the TST. Your fix is correct to bail out at the start of the BB.
> 
> But I see a missed opportunity here, and I'd like to understand
> whether our arsenal is good enough to recognize it.
> 
> Finally the above ends up to output:
> 
> @ BB#4:                                 @ %bb.i24.i.preheader
>        and     r0, r4, #7
>        movs    r1, #0        ;; why movS? can we change to mov?
>        cmp     r0, #0        ;; equivalent to tst r0, #7, elidable
>        it      ne
>        movne   r1, #1
>        b       LBB0_6
> 
> But I guess the optimal sequence would be:
> 
> @ BB#4:                                 @ %bb.i24.i.preheader
>        andS     r0, r4, #7
>        ite     ne
>        movne   r1, #1
>        moveq   r1, #0  ;; or perhaps equivalently: moveq r1, r0
>        b       LBB0_6

Yes, the schedule is non-optimal. If the optimization is done at isel time, the reuse would have forced a better schedule. That's an issue with doing this optimization after scheduling.

> 
> Please correct me if I missed something :-)
> 
> Anyway, to get here, I must understand a few things:
> 
>        %reg16387<def> = COPY %reg16407; GPR:%reg16387 rGPR:%reg16407
> 
> 1) Why is this copy present at all?

This means the result of t2AND is being used in a successor block.

> 
>        %reg16408<def> = IMPLICIT_DEF; GPR:%reg16408
> 
> 2) What is defined implicitly here?

That's what's being compared to zero. It's likely due to bugpoint reduction.

> 
> 3) Third question: which MachineInst is responsible for the "movs r1,
> #0" above?

t2MOVi will be shortened into a tMOVi which always sets CPSR. Since the optimization is done late it has no effect on the optimization.

Evan


> 
> Hopefully someone can explain this all to me ;-)
> 
> Cheers,
> 
>   Gabor
> 
> 
>> +
>> +bb9.i:                                            ; preds = %bb4.i18
>> +  br i1 undef, label %bb13.i19, label %bb.i24.i
>> +
>> +bb13.i19:                                         ; preds = %bb9.i
>> +  br i1 undef, label %bb14.i20, label %bb15.i
>> +
>> +bb14.i20:                                         ; preds = %bb13.i19
>> +  unreachable
>> +
>> +bb15.i:                                           ; preds = %bb13.i19
>> +  unreachable
>> +
>> +bb.i24.i:                                         ; preds = %bb.i24.i, %bb9.i
>> +  %storemerge1.i21.i = phi i32 [ %4, %bb.i24.i ], [ 0, %bb9.i ]
>> +  %4 = add i32 %storemerge1.i21.i, 1
>> +  %exitcond47.i = icmp eq i32 %4, %storemerge2.i
>> +  br i1 %exitcond47.i, label %bb22, label %bb.i24.i
>> +
>> +bb23.i:                                           ; preds = %bb3.i17
>> +  unreachable
>> +
>> +bb3.i9:                                           ; preds = %bb4.i18
>> +  unreachable
>> +
>> +bb22:                                             ; preds = %bb.i24.i, %bb5
>> +  br i1 undef, label %gl_error.exit, label %bb23
>> +
>> +bb23:                                             ; preds = %bb22
>> +  ret void
>> +}
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-comm... at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list