[llvm] r287014 - Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.

Wei Mi via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 14:46:42 PST 2016


Ping. To explain better why I need to change the testcase:

The patch removed the limitation to the formula containing multiple
SCEVAddRecExpr type regs, as a result LSR can choose to use a single
basic induction variable for eliminate-trunc.ll to reduce the cost of
addrec instructions (There are no instruction number difference in
final asm because before the patch, LSR uses two basic induction
variables but the addrec cost were absorbed by post-increment store).
For eliminate-trunc.ll to still be able to check that LSR can
eliminate trunc, it requires the LSR to still use two different basic
induction variables, which can only be achieved by adjusting the
testcase a little bit.

In other words, the patch just provides more option for LSR to choose
and now LSR always choose a different result for eliminate-trunc.ll.
Maybe LSR can be improved to consider the impact of post-increment
load/store in a better way, but it is an unrelated issue. We just need
to adjust the testcase for it to remain valid.

Does it sound reasonable to you?

Wei.

On Wed, Nov 30, 2016 at 12:59 PM, Wei Mi <wmi at google.com> wrote:
> Ping.
>
> On Tue, Nov 22, 2016 at 9:37 AM, Wei Mi <wmi at google.com> wrote:
>> Hi Renato,
>>
>> Any comments about the test modification?
>>
>> Thanks,
>> Wei.
>>
>>
>> On Thu, Nov 17, 2016 at 10:03 AM, Wei Mi <wmi at google.com> wrote:
>>> Hi,
>>>
>>> I found a simpler way to make the test remain valid. By changing the
>>> %indvars.iv from the last index of an array to the second last index,
>>> LSR will not use reg({0,+,1}<nuw><nsw><%for.body4.us>) in the LSR use
>>> of array address below (The second LSR Use in the trace) because
>>> otherwise it needs to use one more register. As a result,
>>> reg({0,+,1}<nuw><nsw><%for.body4.us>) will only be used in the Basic
>>> LSR Use, not being involved in the address calculation, so it can stay
>>> as an int32 variable and the test will still be valid.
>>>
>>> Do you think it makes sense to change the testcase like this?
>>>
>>> The LSR trace after changing the testcase:
>>> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)  ### InsertInitialFormula ###
>>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>>> fixup type: i8*
>>>     reg({(%inv + %a)<nsw>,+,8}<nw><%for.body4.us>)  ### InsertInitialFormula ###
>>>
>>> Index: test/CodeGen/AArch64/eliminate-trunc.ll
>>> ===================================================================
>>> --- test/CodeGen/AArch64/eliminate-trunc.ll (revision 286999)
>>> +++ test/CodeGen/AArch64/eliminate-trunc.ll (working copy)
>>> @@ -1,39 +1,39 @@
>>>  ; RUN: llc -verify-machineinstrs -o - %s
>>> -mtriple=aarch64-none-apple-ios7.0 -mcpu=cyclone | FileCheck %s
>>>
>>>  ; Check  trunc i64 operation is translated as a subregister access
>>>  ; eliminating an i32 induction varible.
>>>
>>>  ; CHECK-NOT: add {{x[0-9]+}}, {{x[0-9]+}}, #1
>>>  ; CHECK: add {{w[0-9]+}}, {{w[0-9]+}}, #1
>>>  ; CHECK-NEXT: cmp {{w[0-9]+}}, {{w[0-9]+}}
>>> -define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
>>> readonly %box, i8 %limit) minsize {
>>> +define void @test1_signed([8 x i8]* nocapture %a, i8* nocapture
>>> readonly %box, i8 %limit, i64 %inv) minsize {
>>>  entry:
>>>    %conv = zext i8 %limit to i32
>>>    %cmp223 = icmp eq i8 %limit, 0
>>>    br i1 %cmp223, label %for.end15, label %for.body4.lr.ph.us
>>>
>>>  for.body4.us:
>>>    %indvars.iv = phi i64 [ 0, %for.body4.lr.ph.us ], [
>>> %indvars.iv.next, %for.body4.us ]
>>> -  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
>>> %indvars.iv26, i64 %indvars.iv
>>> +  %arrayidx6.us = getelementptr inbounds [8 x i8], [8 x i8]* %a, i64
>>> %indvars.iv, i64 %inv
>>>    %0 = load i8, i8* %arrayidx6.us, align 1
>>>    %idxprom7.us = zext i8 %0 to i64
>>>    %arrayidx8.us = getelementptr inbounds i8, i8* %box, i64 %idxprom7.us
>>>    %1 = load i8, i8* %arrayidx8.us, align 1
>>>    store i8 %1, i8* %arrayidx6.us, align 1
>>>    %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>>    %2 = trunc i64 %indvars.iv.next to i32
>>>    %cmp2.us = icmp slt i32 %2, %conv
>>>    br i1 %cmp2.us, label %for.body4.us, label %for.cond1.for.inc13_crit_edge.us
>>>
>>>  for.body4.lr.ph.us:
>>>    %indvars.iv26 = phi i64 [ %indvars.iv.next27,
>>> %for.cond1.for.inc13_crit_edge.us ], [ 0, %entry ]
>>>    br label %for.body4.us
>>>
>>>  for.cond1.for.inc13_crit_edge.us:
>>>    %indvars.iv.next27 = add nuw nsw i64 %indvars.iv26, 1
>>>    %exitcond28 = icmp eq i64 %indvars.iv26, 3
>>>    br i1 %exitcond28, label %for.end15, label %for.body4.lr.ph.us
>>>
>>>  for.end15:
>>>    ret void
>>>  }
>>>
>>> Thanks,
>>> Wei.
>>>
>>> On Tue, Nov 15, 2016 at 5:12 PM, Wei Mi <wmi at google.com> wrote:
>>>> Hi Renato,
>>>>
>>>> What I meant is "test/Transforms/LoopStrengthReduce/nested-loop.ll"
>>>> should be target specific, not
>>>> test/CodeGen/AArch64/eliminate-trunc.ll. nested-loop.ll failed on some
>>>> targets but passed on AArch64.
>>>>
>>>> I feel my patch generates reasonable code for
>>>> test/CodeGen/AArch64/eliminate-trunc.ll:
>>>>
>>>> LSR trace without my patch. The trace shows the final decision about
>>>> how to generate induction variables for LSR.
>>>> The chosen solution requires 2 regs, with addrec cost 2, plus 1 setup cost:
>>>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>>>> fixup type: i8*
>>>>     reg({{%a,+,8}<nsw><%for.body4.lr.ph.us>,+,1}<nw><%for.body4.us>)
>>>>
>>>> LSR trace with my patch:
>>>> The chosen solution requires 2 regs, with addrec cost 1:
>>>>   LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
>>>>     reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>>   LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0,0}, widest
>>>> fixup type: i8*
>>>>     reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>>>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>)
>>>>
>>>> What my patch does is to allow the formula including induction value
>>>> from outerloop as a separate Reg which was dropped unconditionally
>>>> before -- reg({%a,+,8}<nsw><%for.body4.lr.ph.us>) +
>>>> 1*reg({0,+,1}<nuw><nsw><%for.body4.us>). It provides more alternative
>>>> for LSR to choose, and apparently LSR chooses it in the final
>>>> decision, because it will generate one less addrec operation.
>>>>
>>>> However, although there is one less addrec operation in IR after LSR,
>>>> there is no much difference in the final assembly w/wo my patch --
>>>> same instruction number. That is because AArch64 supports
>>>> post-load/store increment, which hides the cost of addrec.
>>>>
>>>> innerloop code without my patch.
>>>> LBB0_3:                                 ; %for.body4.us
>>>>                                         ;   Parent Loop BB0_2 Depth=1
>>>>                                         ; =>  This Inner Loop Header: Depth=2
>>>>   ldrb w12, [x11]
>>>>   ldrb w12, [x1, x12]
>>>>   strb w12, [x11], #1
>>>>   add w10, w10, #1            ; =1
>>>>   cmp w10, w8
>>>>   b.lt LBB0_3
>>>>
>>>> innerloop code with my patch.
>>>> LBB0_3:                                 ; %for.body4.us
>>>>                                         ;   Parent Loop BB0_2 Depth=1
>>>>                                         ; =>  This Inner Loop Header: Depth=2
>>>>   ldrb w11, [x0, x10]
>>>>   ldrb w11, [x1, x11]
>>>>   strb w11, [x0, x10]
>>>>   add x10, x10, #1            ; =1
>>>>   cmp w10, w8
>>>>   b.lt LBB0_3
>>>>
>>>> So after the change, the test fails just because the precondition of
>>>> the test disappears. It is not the patch generating unreasonable code.
>>>> I try some effort to change the test and prevent my patch from kicking
>>>> in, like if I change the test by removing its outerloop (but keep the
>>>> innerloop the same), it will still use reg({0,+,1}) in both LSR uses
>>>> and the test still fails. It further proves that the test can pass
>>>> before only because the limitation about formula with induction value
>>>> Reg from outerloop exists.
>>>>
>>>> Do you have any suggestions about how to change the test, or is it ok
>>>> to remove it?
>>>>
>>>> Thanks,
>>>> Wei.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 15, 2016 at 11:59 AM, Renato Golin <renato.golin at linaro.org> wrote:
>>>>> On 15 November 2016 at 19:42, Wei Mi via llvm-commits
>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>> Author: wmi
>>>>>> Date: Tue Nov 15 13:42:05 2016
>>>>>> New Revision: 287014
>>>>>>
>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=287014&view=rev
>>>>>> Log:
>>>>>> Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
>>>>>
>>>>> Hi Wei,
>>>>>
>>>>> I'm not sure making tests target specific will fix anything. One of
>>>>> the failures was that a truncation shouldn't have happened and it did.
>>>>>
>>>>> I think there's more than meets the eye. On your next round, please
>>>>> include Tim, James and me on the review, so we can understand the
>>>>> changes to the tests and the code better.
>>>>>
>>>>> cheers,
>>>>> --renato


More information about the llvm-commits mailing list