[PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.

Mon Aug 7 10:18:08 PDT 2017

I've created a bug report: https://bugs.llvm.org/show_bug.cgi?id=34106


The fix is in progress.


Thanks,

Evgeny

________________________________
From: Evgeny Astigeevich
Sent: Monday, August 7, 2017 1:27:15 PM
To: Wei Mi
Cc: Quentin Colombet; reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd
Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.


Hi Wei,


Thank you for the patch.

I think you have spotted a bug in ARMTargetLowering::isLegalAddressingMode.

Thumb1 addressing modes do not support scaling. So false must be returned if AM.Scale is not zero. ARMTargetLowering::isLegalAddressingMode return false only when Scale equals 1 and returns true for other values.

I am testing a fix of the issue. I'll also raise a bug report about this issue.


Thnaks,

Evgeny

________________________________
From: Wei Mi <wmi at google.com>
Sent: Friday, August 4, 2017 7:27:47 PM
To: Evgeny Astigeevich
Cc: Quentin Colombet; reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd
Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.

On Fri, Aug 4, 2017 at 11:16 AM, Evgeny Astigeevich
<Evgeny.Astigeevich at arm.com> wrote:
> I compared the generated code with and without this check:
>
>
> --- check is ON ---
>
> @ BB#0:                                 @ %entry
>         cmp     r2, #0
>         moveq   pc, lr
>         add     r0, r0, r2, lsl #2
>         mov     r12, #0
>         mov     r3, #0
> .LBB1_1:                                @ %bb
>                                         @ =>This Inner Loop Header: Depth=1
>         str     r1, [r0, -r3, lsl #2]
>         str     r3, [r12]
>         add     r3, r3, #1
>         cmp     r2, r3
>         bne     .LBB1_1
>
> --- check is OFF ---
>
> @ BB#0:                                 @ %entry
>         cmp     r2, #0
>         moveq   pc, lr
>         mov     r12, #0
>         mov     r3, #0
> .LBB1_1:                                @ %bb
>                                         @ =>This Inner Loop Header: Depth=1
>         str     r1, [r0, r2, lsl #2]
>         subs    r2, r2, #1
>         str     r3, [r12]
>         add     r3, r3, #1
>         bne     .LBB1_1
>
> The main difference of 'check is OFF' is that we need two registers to keep
> values. As '-Rm' cannot be used in Thumb code it is not working for v6m.
>
>
> Thanks,
>
> Evgeny

Thanks Evgeny. So can we add a subtarget check around this scale<0 check?

Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================

--- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)
+++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
@@ -12411,7 +12411,7 @@ bool ARMTargetLowering::isLegalAddressin
     case MVT::i1:
     case MVT::i8:
     case MVT::i32:
-      if (Scale < 0) Scale = -Scale;
+      if (!Subtarget->isThumb() && Scale < 0) Scale = -Scale;
       if (Scale == 1)
         return true;
       // r + r << imm

Wei.

>
>
> ________________________________
> From: Evgeny Astigeevich
> Sent: Friday, August 4, 2017 6:42:53 PM
> To: Wei Mi; Quentin Colombet
> Cc: reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny
> Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen;
> llvm-commits; Michael Zolotukhin; nd
>
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering
> non-optimal formulae with the same ScaledReg and Scale.
>
>
> Hi Wei,
>
>
> The fix will cause a failure of CodeGen/ARM/arm-negative-stride.ll. The
> check was introduced in r35859 (10 years ago):
>
>
> ------------------------------------------------------------------------
> r35859 | lattner | 2007-04-10 04:48:29 +0100 (Tue, 10 Apr 2007) | 2 lines
>
> restore support for negative strides
>
> ------------------------------------------------------------------------
>
> The test has a triple: arm-eabi. So code is generated for arm7tdmi (armv4).
> Maybe we need a case for the M-profile. I'll try to figure it out.
>
> Thanks,
> Evgeny Astigeevich
> The ARM Compiler Optimization team leader
>
> ________________________________
> From: Wei Mi <wmi at google.com>
> Sent: Friday, August 4, 2017 5:23:47 PM
> To: Quentin Colombet
> Cc: reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny
> Stupachenko; Xinliang David Li; Sanjoy Das; Evgeny Astigeevich; Matthias
> Braun; Dehao Chen; llvm-commits; Michael Zolotukhin
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering
> non-optimal formulae with the same ScaledReg and Scale.
>
> On Thu, Aug 3, 2017 at 6:08 PM, Quentin Colombet <qcolombet at apple.com>
> wrote:
>>
>>> On Aug 3, 2017, at 4:46 PM, Wei Mi <wmi at google.com> wrote:
>>>
>>> I found isAMCompletelyFolded return true for formula like
>>> reg(%c1.0104.us.i.i) + -1*reg({0,+,-4}<nsw><%for.body8.us.i.i>) +
>>> imm(4) on ARM. Does it make sense?
>>
>>
>> That sounds wrong.
>
> With the following fix, the regression of test.ll will be fixed, but I
> am not familiar with ARM so I am not sure whether allowing negative
> scale has any other usage. Quentin, does the fix looks correct to you?
>
> Index: lib/Target/ARM/ARMISelLowering.cpp
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)
> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
> @@ -12411,7 +12411,6 @@ bool ARMTargetLowering::isLegalAddressin
>      case MVT::i1:
>      case MVT::i8:
>      case MVT::i32:
> -      if (Scale < 0) Scale = -Scale;
>        if (Scale == 1)
>          return true;
>        // r + r << imm
>
>
>
>>
>>> The fact is we cannot fold it into
>>> load/store and it leads to the additional instructions causing
>>> performance regressions.
>>>
>>>
>>> On Thu, Aug 3, 2017 at 11:12 AM, Wei Mi via Phabricator
>>> <reviews at reviews.llvm.org> wrote:
>>>> wmi added a comment.
>>>>
>>>> In https://reviews.llvm.org/D34583#830286, @eastig wrote:
>>>>
>>>>> F4171725: test.ll <https://reviews.llvm.org/F4171725>
>>>>>
>>>>> F4171724: test.good.ll <https://reviews.llvm.org/F4171724>
>>>>>
>>>>> F4171723: test.bad.ll <https://reviews.llvm.org/F4171723>
>>>>>
>>>>> This patch caused regressions from 5% to 23% in two our internal
>>>>> benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced
>>>>> from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result
>>>>> when filtering is disabled. 'test.bad.ll' is a result when filtering is
>>>>> enabled.
>>>>> Comparing them I can see that this optimization changes how an
>>>>> induction variable is changed. Originally it is incremented from 0 to 256.
>>>>> The optimization changes this into decrementing from 0 to -256. This
>>>>> induction variable is also used as an offset to memory. So to preserve this
>>>>> semantic conversion of the induction variable from a negative value to a
>>>>> positive value is inserted. This is lowered to additional instructions which
>>>>> causes performance regressions.
>>>>>
>>>>> Could you please have a look at this issue?
>>>>>
>>>>> Thanks,
>>>>> Evgeny Astigeevich
>>>>> The ARM Compiler Optimization team leader
>>>>
>>>>
>>>> Hi Evgeny,
>>>>
>>>> Thanks for providing the testcase.
>>>>
>>>> It looks like an existing issue in LSR cost evaluation exposed by the
>>>> patch. Actually, comparing the trace by adding -debug-only=loop-reduce, all
>>>> the candidates choosen by LSR without filtering are kept in the candidate
>>>> set after adding the filter patch. However filtering patch provides some
>>>> more candidates interesting for LSR cost model to choose, and LSR chooses a
>>>> different set of candidates in the final result which it thinks better (1
>>>> less base add) but actually not. We can see that in the trace:
>>>>
>>>> LSR without the filtering patch:
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 2 base
>>>> adds, plus 4 imm cost, plus 1 setup cost:
>>>>
>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> 1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
>>>>    reg(256) + -1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>>    reg(%c0.0103.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest
>>>> fixup type: i32*
>>>>    reg({(-4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>>    reg(%c0.0103.us.i.i)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>>    reg(%c1.0104.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>) +
>>>> imm(4)
>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>>    reg(%c1.0104.us.i.i)
>>>>
>>>> LSR with the filtering patch:
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 1 base
>>>> add, plus 4 imm cost, plus 1 setup cost:
>>>>
>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
>>>>    reg(-256) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>>    reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest
>>>> fixup type: i32*
>>>>    reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>) +
>>>> imm(-8)
>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>>    reg(%c0.0103.us.i.i)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>>    reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>>    reg(%c1.0104.us.i.i)
>>>>
>>>> The real problem is that LSR has no idea about the cost of getting
>>>> negative value. It thinks 4*reg({0,+,-1} and -4*reg({0,+,-1} have the same
>>>> cost.
>>>>
>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>>    reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>
>>>> I will think about how to fix it.
>>>>
>>>> Wei.
>>>>
>>>>
>>>> Repository:
>>>>  rL LLVM
>>>>
>>>> https://reviews.llvm.org/D34583
>>>>
>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170807/07bf03b8/attachment.html>