[PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
Wei Mi via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 4 11:27:47 PDT 2017
On Fri, Aug 4, 2017 at 11:16 AM, Evgeny Astigeevich
<Evgeny.Astigeevich at arm.com> wrote:
> I compared the generated code with and without this check:
>
>
> --- check is ON ---
>
> @ BB#0: @ %entry
> cmp r2, #0
> moveq pc, lr
> add r0, r0, r2, lsl #2
> mov r12, #0
> mov r3, #0
> .LBB1_1: @ %bb
> @ =>This Inner Loop Header: Depth=1
> str r1, [r0, -r3, lsl #2]
> str r3, [r12]
> add r3, r3, #1
> cmp r2, r3
> bne .LBB1_1
>
> --- check is OFF ---
>
> @ BB#0: @ %entry
> cmp r2, #0
> moveq pc, lr
> mov r12, #0
> mov r3, #0
> .LBB1_1: @ %bb
> @ =>This Inner Loop Header: Depth=1
> str r1, [r0, r2, lsl #2]
> subs r2, r2, #1
> str r3, [r12]
> add r3, r3, #1
> bne .LBB1_1
>
> The main difference of 'check is OFF' is that we need two registers to keep
> values. As '-Rm' cannot be used in Thumb code it is not working for v6m.
>
>
> Thanks,
>
> Evgeny
Thanks Evgeny. So can we add a subtarget check around this scale<0 check?
Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================
--- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)
+++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
@@ -12411,7 +12411,7 @@ bool ARMTargetLowering::isLegalAddressin
case MVT::i1:
case MVT::i8:
case MVT::i32:
- if (Scale < 0) Scale = -Scale;
+ if (!Subtarget->isThumb() && Scale < 0) Scale = -Scale;
if (Scale == 1)
return true;
// r + r << imm
Wei.
>
>
> ________________________________
> From: Evgeny Astigeevich
> Sent: Friday, August 4, 2017 6:42:53 PM
> To: Wei Mi; Quentin Colombet
> Cc: reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny
> Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen;
> llvm-commits; Michael Zolotukhin; nd
>
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering
> non-optimal formulae with the same ScaledReg and Scale.
>
>
> Hi Wei,
>
>
> The fix will cause a failure of CodeGen/ARM/arm-negative-stride.ll. The
> check was introduced in r35859 (10 years ago):
>
>
> ------------------------------------------------------------------------
> r35859 | lattner | 2007-04-10 04:48:29 +0100 (Tue, 10 Apr 2007) | 2 lines
>
> restore support for negative strides
>
> ------------------------------------------------------------------------
>
> The test has a triple: arm-eabi. So code is generated for arm7tdmi (armv4).
> Maybe we need a case for the M-profile. I'll try to figure it out.
>
> Thanks,
> Evgeny Astigeevich
> The ARM Compiler Optimization team leader
>
> ________________________________
> From: Wei Mi <wmi at google.com>
> Sent: Friday, August 4, 2017 5:23:47 PM
> To: Quentin Colombet
> Cc: reviews+D34583+public+14017146d8617607 at reviews.llvm.org; Evgeny
> Stupachenko; Xinliang David Li; Sanjoy Das; Evgeny Astigeevich; Matthias
> Braun; Dehao Chen; llvm-commits; Michael Zolotukhin
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering
> non-optimal formulae with the same ScaledReg and Scale.
>
> On Thu, Aug 3, 2017 at 6:08 PM, Quentin Colombet <qcolombet at apple.com>
> wrote:
>>
>>> On Aug 3, 2017, at 4:46 PM, Wei Mi <wmi at google.com> wrote:
>>>
>>> I found isAMCompletelyFolded return true for formula like
>>> reg(%c1.0104.us.i.i) + -1*reg({0,+,-4}<nsw><%for.body8.us.i.i>) +
>>> imm(4) on ARM. Does it make sense?
>>
>>
>> That sounds wrong.
>
> With the following fix, the regression of test.ll will be fixed, but I
> am not familiar with ARM so I am not sure whether allowing negative
> scale has any other usage. Quentin, does the fix looks correct to you?
>
> Index: lib/Target/ARM/ARMISelLowering.cpp
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)
> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
> @@ -12411,7 +12411,6 @@ bool ARMTargetLowering::isLegalAddressin
> case MVT::i1:
> case MVT::i8:
> case MVT::i32:
> - if (Scale < 0) Scale = -Scale;
> if (Scale == 1)
> return true;
> // r + r << imm
>
>
>
>>
>>> The fact is we cannot fold it into
>>> load/store and it leads to the additional instructions causing
>>> performance regressions.
>>>
>>>
>>> On Thu, Aug 3, 2017 at 11:12 AM, Wei Mi via Phabricator
>>> <reviews at reviews.llvm.org> wrote:
>>>> wmi added a comment.
>>>>
>>>> In https://reviews.llvm.org/D34583#830286, @eastig wrote:
>>>>
>>>>> F4171725: test.ll <https://reviews.llvm.org/F4171725>
>>>>>
>>>>> F4171724: test.good.ll <https://reviews.llvm.org/F4171724>
>>>>>
>>>>> F4171723: test.bad.ll <https://reviews.llvm.org/F4171723>
>>>>>
>>>>> This patch caused regressions from 5% to 23% in two our internal
>>>>> benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced
>>>>> from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result
>>>>> when filtering is disabled. 'test.bad.ll' is a result when filtering is
>>>>> enabled.
>>>>> Comparing them I can see that this optimization changes how an
>>>>> induction variable is changed. Originally it is incremented from 0 to 256.
>>>>> The optimization changes this into decrementing from 0 to -256. This
>>>>> induction variable is also used as an offset to memory. So to preserve this
>>>>> semantic conversion of the induction variable from a negative value to a
>>>>> positive value is inserted. This is lowered to additional instructions which
>>>>> causes performance regressions.
>>>>>
>>>>> Could you please have a look at this issue?
>>>>>
>>>>> Thanks,
>>>>> Evgeny Astigeevich
>>>>> The ARM Compiler Optimization team leader
>>>>
>>>>
>>>> Hi Evgeny,
>>>>
>>>> Thanks for providing the testcase.
>>>>
>>>> It looks like an existing issue in LSR cost evaluation exposed by the
>>>> patch. Actually, comparing the trace by adding -debug-only=loop-reduce, all
>>>> the candidates choosen by LSR without filtering are kept in the candidate
>>>> set after adding the filter patch. However filtering patch provides some
>>>> more candidates interesting for LSR cost model to choose, and LSR chooses a
>>>> different set of candidates in the final result which it thinks better (1
>>>> less base add) but actually not. We can see that in the trace:
>>>>
>>>> LSR without the filtering patch:
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 2 base
>>>> adds, plus 4 imm cost, plus 1 setup cost:
>>>>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> 1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
>>>> reg(256) + -1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>> reg(%c0.0103.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest
>>>> fixup type: i32*
>>>> reg({(-4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>> reg(%c0.0103.us.i.i)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>> reg(%c1.0104.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>) +
>>>> imm(4)
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>> reg(%c1.0104.us.i.i)
>>>>
>>>> LSR with the filtering patch:
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 1 base
>>>> add, plus 4 imm cost, plus 1 setup cost:
>>>>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
>>>> reg(-256) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>> reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest
>>>> fixup type: i32*
>>>> reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>) +
>>>> imm(-8)
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>> reg(%c0.0103.us.i.i)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>> reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest
>>>> fixup type: i32*
>>>> reg(%c1.0104.us.i.i)
>>>>
>>>> The real problem is that LSR has no idea about the cost of getting
>>>> negative value. It thinks 4*reg({0,+,-1} and -4*reg({0,+,-1} have the same
>>>> cost.
>>>>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i8*
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
>>>> type: i32*
>>>> reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)
>>>>
>>>> I will think about how to fix it.
>>>>
>>>> Wei.
>>>>
>>>>
>>>> Repository:
>>>> rL LLVM
>>>>
>>>> https://reviews.llvm.org/D34583
>>>>
>>>>
>>>>
>>
More information about the llvm-commits
mailing list