[llvm] r287014 - Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.

Wed Nov 30 12:59:54 PST 2016

Ping.

On Tue, Nov 22, 2016 at 9:37 AM, Wei Mi <wmi at google.com> wrote:
> Hi Renato,
>
> Any comments about the test modification?
>
> Thanks,
> Wei.
>
>
> On Thu, Nov 17, 2016 at 10:03 AM, Wei Mi <wmi at google.com> wrote:
>> Hi,
>>
>> I found a simpler way to make the test remain valid. By changing the
>> %indvars.iv from the last index of an array to the second last index,
>> LSR will not use reg({0,+,1}<nuw><nsw><%for.body4.us>) in the LSR use
>> of array address below (The second LSR Use in the trace) because
>> otherwise it needs to use one more register. As a result,
>> reg({0,+,1}<nuw><nsw><%for.body4.us>) will only be used in the Basic
>> LSR Use, not being involved in the address calculation, so it can stay
>> as an int32 variable and the test will still be valid.
>>
>> Do you think it makes sense to change the testcase like this?
>>
>> The LSR trace after changing the testcase:
>> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)  ### InsertInitialFormula ###
>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>> fixup type: i8*
>>     reg({(%inv + %a)<nsw>,+,8}<nw><%for.body4.us>)  ### InsertInitialFormula ###
>>
>> Index: test/CodeGen/AArch64/eliminate-trunc.ll
>> ===================================================================
>> --- test/CodeGen/AArch64/eliminate-trunc.ll (revision 286999)
>> +++ test/CodeGen/AArch64/eliminate-trunc.ll (working copy)
>> @@ -1,39 +1,39 @@
>>  ; RUN: llc -verify-machineinstrs -o - %s
>> -mtriple=aarch64-none-apple-ios7.0 -mcpu=cyclone | FileCheck %s
>>
>>  ; Check  trunc i64 operation is translated as a subregister access
>>  ; eliminating an i32 induction varible.
>>
>>  ; CHECK-NOT: add {{x[0-9]+}}, {{x[0-9]+}}, #1
>>  ; CHECK: add {{w[0-9]+}}, {{w[0-9]+}}, #1
>>  ; CHECK-NEXT: cmp {{w[0-9]+}}, {{w[0-9]+}}
>> -define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
>> readonly %box, i8 %limit) minsize {
>> +define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
>> readonly %box, i8 %limit, i64 %inv) minsize {
>>  entry:
>>    %conv = zext i8 %limit to i32
>>    %cmp223 = icmp eq i8 %limit, 0
>>    br i1 %cmp223, label %for.end15, label %for.body4.lr.ph.us
>>
>>  for.body4.us:
>>    %indvars.iv = phi i64 [ 0, %for.body4.lr.ph.us ], [
>> %indvars.iv.next, %for.body4.us ]
>> -  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
>> %indvars.iv26, i64 %indvars.iv
>> +  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
>> %indvars.iv, i64 %inv
>>    %0 = load i8, i8* %arrayidx6.us, align 1
>>    %idxprom7.us = zext i8 %0 to i64
>>    %arrayidx8.us = getelementptr inbounds i8, i8* %box, i64 %idxprom7.us
>>    %1 = load i8, i8* %arrayidx8.us, align 1
>>    store i8 %1, i8* %arrayidx6.us, align 1
>>    %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>    %2 = trunc i64 %indvars.iv.next to i32
>>    %cmp2.us = icmp slt i32 %2, %conv
>>    br i1 %cmp2.us, label %for.body4.us, label %for.cond1.for.inc13_crit_edge.us
>>
>>  for.body4.lr.ph.us:
>>    %indvars.iv26 = phi i64 [ %indvars.iv.next27,
>> %for.cond1.for.inc13_crit_edge.us ], [ 0, %entry ]
>>    br label %for.body4.us
>>
>>  for.cond1.for.inc13_crit_edge.us:
>>    %indvars.iv.next27 = add nuw nsw i64 %indvars.iv26, 1
>>    %exitcond28 = icmp eq i64 %indvars.iv26, 3
>>    br i1 %exitcond28, label %for.end15, label %for.body4.lr.ph.us
>>
>>  for.end15:
>>    ret void
>>  }
>>
>> Thanks,
>> Wei.
>>
>> On Tue, Nov 15, 2016 at 5:12 PM, Wei Mi <wmi at google.com> wrote:
>>> Hi Renato,
>>>
>>> What I meant is "test/Transforms/LoopStrengthReduce/nested-loop.ll"
>>> should be target specific, not
>>> test/CodeGen/AArch64/eliminate-trunc.ll. nested-loop.ll failed on some
>>> targets but passed on AArch64.
>>>
>>> I feel my patch generates reasonable code for
>>> test/CodeGen/AArch64/eliminate-trunc.ll:
>>>
>>> LSR trace without my patch. The trace shows the final decision about
>>> how to generate induction variables for LSR.
>>> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>>> fixup type: i8*
>>>     reg({{%a,+,8}<nsw><%for.body4.lr.ph.us>,+,1}<nw><%for.body4.us>)
>>>
>>> LSR trace with my patch:
>>> The chosen solution requires 2 regs, with addrec cost 1:
>>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>>> fixup type: i8*
>>>     reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>
>>> What my patch does is to allow the formula including induction value
>>> from outerloop as a separate Reg which was dropped unconditionally
>>> before -- reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>). It provides more alternative
>>> for LSR to choose, and apparently LSR chooses it in the final
>>> decision, because it will generate one less addrec operation.
>>>
>>> However, although there is one less addrec operation in IR after LSR,
>>> there is no much difference in the final assembly w/wo my patch --
>>> same instruction number. That is because AArch64 supports
>>> post-load/store increment, which hides the cost of addrec.
>>>
>>> innerloop code without my patch.
>>> LBB0_3:                                 ; %for.body4.us
>>>                                         ;   Parent Loop BB0_2 Depth=1
>>>                                         ; =>  This Inner Loop Header: Depth=2
>>>   ldrb w12, [x11]
>>>   ldrb w12, [x1, x12]
>>>   strb w12, [x11], #1
>>>   add w10, w10, #1            ; =1
>>>   cmp w10, w8
>>>   b.lt LBB0_3
>>>
>>> innerloop code with my patch.
>>> LBB0_3:                                 ; %for.body4.us
>>>                                         ;   Parent Loop BB0_2 Depth=1
>>>                                         ; =>  This Inner Loop Header: Depth=2
>>>   ldrb w11, [x0, x10]
>>>   ldrb w11, [x1, x11]
>>>   strb w11, [x0, x10]
>>>   add x10, x10, #1            ; =1
>>>   cmp w10, w8
>>>   b.lt LBB0_3
>>>
>>> So after the change, the test fails just because the precondition of
>>> the test disappears. It is not the patch generating unreasonable code.
>>> I try some effort to change the test and prevent my patch from kicking
>>> in, like if I change the test by removing its outerloop (but keep the
>>> innerloop the same), it will still use reg({0,+,1}) in both LSR uses
>>> and the test still fails. It further proves that the test can pass
>>> before only because the limitation about formula with induction value
>>> Reg from outerloop exists.
>>>
>>> Do you have any suggestions about how to change the test, or is it ok
>>> to remove it?
>>>
>>> Thanks,
>>> Wei.
>>>
>>>
>>>
>>> On Tue, Nov 15, 2016 at 11:59 AM, Renato Golin <renato.golin at linaro.org> wrote:
>>>> On 15 November 2016 at 19:42, Wei Mi via llvm-commits
>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>> Author: wmi
>>>>> Date: Tue Nov 15 13:42:05 2016
>>>>> New Revision: 287014
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=287014&view=rev
>>>>> Log:
>>>>> Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
>>>>
>>>> Hi Wei,
>>>>
>>>> I'm not sure making tests target specific will fix anything. One of
>>>> the failures was that a truncation shouldn't have happened and it did.
>>>>
>>>> I think there's more than meets the eye. On your next round, please
>>>> include Tim, James and me on the review, so we can understand the
>>>> changes to the tests and the code better.
>>>>
>>>> cheers,
>>>> --renato