[llvm] r287014 - Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.

Wei Mi via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 22 09:37:07 PST 2016


Hi Renato,

Any comments about the test modification?

Thanks,
Wei.


On Thu, Nov 17, 2016 at 10:03 AM, Wei Mi <wmi at google.com> wrote:
> Hi,
>
> I found a simpler way to make the test remain valid. By changing the
> %indvars.iv from the last index of an array to the second last index,
> LSR will not use reg({0,+,1}<nuw><nsw><%for.body4.us>) in the LSR use
> of array address below (The second LSR Use in the trace) because
> otherwise it needs to use one more register. As a result,
> reg({0,+,1}<nuw><nsw><%for.body4.us>) will only be used in the Basic
> LSR Use, not being involved in the address calculation, so it can stay
> as an int32 variable and the test will still be valid.
>
> Do you think it makes sense to change the testcase like this?
>
> The LSR trace after changing the testcase:
> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>     reg({0,+,1}<nuw><nsw><%for.body4.us>)  ### InsertInitialFormula ###
>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
> fixup type: i8*
>     reg({(%inv + %a)<nsw>,+,8}<nw><%for.body4.us>)  ### InsertInitialFormula ###
>
> Index: test/CodeGen/AArch64/eliminate-trunc.ll
> ===================================================================
> --- test/CodeGen/AArch64/eliminate-trunc.ll (revision 286999)
> +++ test/CodeGen/AArch64/eliminate-trunc.ll (working copy)
> @@ -1,39 +1,39 @@
>  ; RUN: llc -verify-machineinstrs -o - %s
> -mtriple=aarch64-none-apple-ios7.0 -mcpu=cyclone | FileCheck %s
>
>  ; Check  trunc i64 operation is translated as a subregister access
>  ; eliminating an i32 induction varible.
>
>  ; CHECK-NOT: add {{x[0-9]+}}, {{x[0-9]+}}, #1
>  ; CHECK: add {{w[0-9]+}}, {{w[0-9]+}}, #1
>  ; CHECK-NEXT: cmp {{w[0-9]+}}, {{w[0-9]+}}
> -define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
> readonly %box, i8 %limit) minsize {
> +define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
> readonly %box, i8 %limit, i64 %inv) minsize {
>  entry:
>    %conv = zext i8 %limit to i32
>    %cmp223 = icmp eq i8 %limit, 0
>    br i1 %cmp223, label %for.end15, label %for.body4.lr.ph.us
>
>  for.body4.us:
>    %indvars.iv = phi i64 [ 0, %for.body4.lr.ph.us ], [
> %indvars.iv.next, %for.body4.us ]
> -  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
> %indvars.iv26, i64 %indvars.iv
> +  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
> %indvars.iv, i64 %inv
>    %0 = load i8, i8* %arrayidx6.us, align 1
>    %idxprom7.us = zext i8 %0 to i64
>    %arrayidx8.us = getelementptr inbounds i8, i8* %box, i64 %idxprom7.us
>    %1 = load i8, i8* %arrayidx8.us, align 1
>    store i8 %1, i8* %arrayidx6.us, align 1
>    %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>    %2 = trunc i64 %indvars.iv.next to i32
>    %cmp2.us = icmp slt i32 %2, %conv
>    br i1 %cmp2.us, label %for.body4.us, label %for.cond1.for.inc13_crit_edge.us
>
>  for.body4.lr.ph.us:
>    %indvars.iv26 = phi i64 [ %indvars.iv.next27,
> %for.cond1.for.inc13_crit_edge.us ], [ 0, %entry ]
>    br label %for.body4.us
>
>  for.cond1.for.inc13_crit_edge.us:
>    %indvars.iv.next27 = add nuw nsw i64 %indvars.iv26, 1
>    %exitcond28 = icmp eq i64 %indvars.iv26, 3
>    br i1 %exitcond28, label %for.end15, label %for.body4.lr.ph.us
>
>  for.end15:
>    ret void
>  }
>
> Thanks,
> Wei.
>
> On Tue, Nov 15, 2016 at 5:12 PM, Wei Mi <wmi at google.com> wrote:
>> Hi Renato,
>>
>> What I meant is "test/Transforms/LoopStrengthReduce/nested-loop.ll"
>> should be target specific, not
>> test/CodeGen/AArch64/eliminate-trunc.ll. nested-loop.ll failed on some
>> targets but passed on AArch64.
>>
>> I feel my patch generates reasonable code for
>> test/CodeGen/AArch64/eliminate-trunc.ll:
>>
>> LSR trace without my patch. The trace shows the final decision about
>> how to generate induction variables for LSR.
>> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>> fixup type: i8*
>>     reg({{%a,+,8}<nsw><%for.body4.lr.ph.us>,+,1}<nw><%for.body4.us>)
>>
>> LSR trace with my patch:
>> The chosen solution requires 2 regs, with addrec cost 1:
>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>> fixup type: i8*
>>     reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>
>> What my patch does is to allow the formula including induction value
>> from outerloop as a separate Reg which was dropped unconditionally
>> before -- reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>). It provides more alternative
>> for LSR to choose, and apparently LSR chooses it in the final
>> decision, because it will generate one less addrec operation.
>>
>> However, although there is one less addrec operation in IR after LSR,
>> there is no much difference in the final assembly w/wo my patch --
>> same instruction number. That is because AArch64 supports
>> post-load/store increment, which hides the cost of addrec.
>>
>> innerloop code without my patch.
>> LBB0_3:                                 ; %for.body4.us
>>                                         ;   Parent Loop BB0_2 Depth=1
>>                                         ; =>  This Inner Loop Header: Depth=2
>>   ldrb w12, [x11]
>>   ldrb w12, [x1, x12]
>>   strb w12, [x11], #1
>>   add w10, w10, #1            ; =1
>>   cmp w10, w8
>>   b.lt LBB0_3
>>
>> innerloop code with my patch.
>> LBB0_3:                                 ; %for.body4.us
>>                                         ;   Parent Loop BB0_2 Depth=1
>>                                         ; =>  This Inner Loop Header: Depth=2
>>   ldrb w11, [x0, x10]
>>   ldrb w11, [x1, x11]
>>   strb w11, [x0, x10]
>>   add x10, x10, #1            ; =1
>>   cmp w10, w8
>>   b.lt LBB0_3
>>
>> So after the change, the test fails just because the precondition of
>> the test disappears. It is not the patch generating unreasonable code.
>> I try some effort to change the test and prevent my patch from kicking
>> in, like if I change the test by removing its outerloop (but keep the
>> innerloop the same), it will still use reg({0,+,1}) in both LSR uses
>> and the test still fails. It further proves that the test can pass
>> before only because the limitation about formula with induction value
>> Reg from outerloop exists.
>>
>> Do you have any suggestions about how to change the test, or is it ok
>> to remove it?
>>
>> Thanks,
>> Wei.
>>
>>
>>
>> On Tue, Nov 15, 2016 at 11:59 AM, Renato Golin <renato.golin at linaro.org> wrote:
>>> On 15 November 2016 at 19:42, Wei Mi via llvm-commits
>>> <llvm-commits at lists.llvm.org> wrote:
>>>> Author: wmi
>>>> Date: Tue Nov 15 13:42:05 2016
>>>> New Revision: 287014
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=287014&view=rev
>>>> Log:
>>>> Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
>>>
>>> Hi Wei,
>>>
>>> I'm not sure making tests target specific will fix anything. One of
>>> the failures was that a truncation shouldn't have happened and it did.
>>>
>>> I think there's more than meets the eye. On your next round, please
>>> include Tim, James and me on the review, so we can understand the
>>> changes to the tests and the code better.
>>>
>>> cheers,
>>> --renato


More information about the llvm-commits mailing list