[llvm] r287014 - Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.

Thu Nov 17 10:03:06 PST 2016

Hi,

I found a simpler way to make the test remain valid. By changing the
%indvars.iv from the last index of an array to the second last index,
LSR will not use reg({0,+,1}<nuw><nsw><%for.body4.us>) in the LSR use
of array address below (The second LSR Use in the trace) because
otherwise it needs to use one more register. As a result,
reg({0,+,1}<nuw><nsw><%for.body4.us>) will only be used in the Basic
LSR Use, not being involved in the address calculation, so it can stay
as an int32 variable and the test will still be valid.

Do you think it makes sense to change the testcase like this?

The LSR trace after changing the testcase:
The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
  LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
    reg({0,+,1}<nuw><nsw><%for.body4.us>)  ### InsertInitialFormula ###
  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
fixup type: i8*
    reg({(%inv + %a)<nsw>,+,8}<nw><%for.body4.us>)  ### InsertInitialFormula ###

Index: test/CodeGen/AArch64/eliminate-trunc.ll
===================================================================

--- test/CodeGen/AArch64/eliminate-trunc.ll (revision 286999)
+++ test/CodeGen/AArch64/eliminate-trunc.ll (working copy)
@@ -1,39 +1,39 @@
 ; RUN: llc -verify-machineinstrs -o - %s
-mtriple=aarch64-none-apple-ios7.0 -mcpu=cyclone | FileCheck %s

 ; Check  trunc i64 operation is translated as a subregister access
 ; eliminating an i32 induction varible.

 ; CHECK-NOT: add {{x[0-9]+}}, {{x[0-9]+}}, #1
 ; CHECK: add {{w[0-9]+}}, {{w[0-9]+}}, #1
 ; CHECK-NEXT: cmp {{w[0-9]+}}, {{w[0-9]+}}
-define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
readonly %box, i8 %limit) minsize {
+define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
readonly %box, i8 %limit, i64 %inv) minsize {
 entry:
   %conv = zext i8 %limit to i32
   %cmp223 = icmp eq i8 %limit, 0
   br i1 %cmp223, label %for.end15, label %for.body4.lr.ph.us

 for.body4.us:
   %indvars.iv = phi i64 [ 0, %for.body4.lr.ph.us ], [
%indvars.iv.next, %for.body4.us ]
-  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
%indvars.iv26, i64 %indvars.iv
+  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
%indvars.iv, i64 %inv
   %0 = load i8, i8* %arrayidx6.us, align 1
   %idxprom7.us = zext i8 %0 to i64
   %arrayidx8.us = getelementptr inbounds i8, i8* %box, i64 %idxprom7.us
   %1 = load i8, i8* %arrayidx8.us, align 1
   store i8 %1, i8* %arrayidx6.us, align 1
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %2 = trunc i64 %indvars.iv.next to i32
   %cmp2.us = icmp slt i32 %2, %conv
   br i1 %cmp2.us, label %for.body4.us, label %for.cond1.for.inc13_crit_edge.us

 for.body4.lr.ph.us:
   %indvars.iv26 = phi i64 [ %indvars.iv.next27,
%for.cond1.for.inc13_crit_edge.us ], [ 0, %entry ]
   br label %for.body4.us

 for.cond1.for.inc13_crit_edge.us:
   %indvars.iv.next27 = add nuw nsw i64 %indvars.iv26, 1
   %exitcond28 = icmp eq i64 %indvars.iv26, 3
   br i1 %exitcond28, label %for.end15, label %for.body4.lr.ph.us

 for.end15:
   ret void
 }

Thanks,
Wei.

On Tue, Nov 15, 2016 at 5:12 PM, Wei Mi <wmi at google.com> wrote:
> Hi Renato,
>
> What I meant is "test/Transforms/LoopStrengthReduce/nested-loop.ll"
> should be target specific, not
> test/CodeGen/AArch64/eliminate-trunc.ll. nested-loop.ll failed on some
> targets but passed on AArch64.
>
> I feel my patch generates reasonable code for
> test/CodeGen/AArch64/eliminate-trunc.ll:
>
> LSR trace without my patch. The trace shows the final decision about
> how to generate induction variables for LSR.
> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
> fixup type: i8*
>     reg({{%a,+,8}<nsw><%for.body4.lr.ph.us>,+,1}<nw><%for.body4.us>)
>
> LSR trace with my patch:
> The chosen solution requires 2 regs, with addrec cost 1:
>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
> fixup type: i8*
>     reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>)
>
> What my patch does is to allow the formula including induction value
> from outerloop as a separate Reg which was dropped unconditionally
> before -- reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>). It provides more alternative
> for LSR to choose, and apparently LSR chooses it in the final
> decision, because it will generate one less addrec operation.
>
> However, although there is one less addrec operation in IR after LSR,
> there is no much difference in the final assembly w/wo my patch --
> same instruction number. That is because AArch64 supports
> post-load/store increment, which hides the cost of addrec.
>
> innerloop code without my patch.
> LBB0_3:                                 ; %for.body4.us
>                                         ;   Parent Loop BB0_2 Depth=1
>                                         ; =>  This Inner Loop Header: Depth=2
>   ldrb w12, [x11]
>   ldrb w12, [x1, x12]
>   strb w12, [x11], #1
>   add w10, w10, #1            ; =1
>   cmp w10, w8
>   b.lt LBB0_3
>
> innerloop code with my patch.
> LBB0_3:                                 ; %for.body4.us
>                                         ;   Parent Loop BB0_2 Depth=1
>                                         ; =>  This Inner Loop Header: Depth=2
>   ldrb w11, [x0, x10]
>   ldrb w11, [x1, x11]
>   strb w11, [x0, x10]
>   add x10, x10, #1            ; =1
>   cmp w10, w8
>   b.lt LBB0_3
>
> So after the change, the test fails just because the precondition of
> the test disappears. It is not the patch generating unreasonable code.
> I try some effort to change the test and prevent my patch from kicking
> in, like if I change the test by removing its outerloop (but keep the
> innerloop the same), it will still use reg({0,+,1}) in both LSR uses
> and the test still fails. It further proves that the test can pass
> before only because the limitation about formula with induction value
> Reg from outerloop exists.
>
> Do you have any suggestions about how to change the test, or is it ok
> to remove it?
>
> Thanks,
> Wei.
>
>
>
> On Tue, Nov 15, 2016 at 11:59 AM, Renato Golin <renato.golin at linaro.org> wrote:
>> On 15 November 2016 at 19:42, Wei Mi via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>> Author: wmi
>>> Date: Tue Nov 15 13:42:05 2016
>>> New Revision: 287014
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=287014&view=rev
>>> Log:
>>> Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
>>
>> Hi Wei,
>>
>> I'm not sure making tests target specific will fix anything. One of
>> the failures was that a truncation shouldn't have happened and it did.
>>
>> I think there's more than meets the eye. On your next round, please
>> include Tim, James and me on the review, so we can understand the
>> changes to the tests and the code better.
>>
>> cheers,
>> --renato