<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:11pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p>Hi Wei,</p>
<p><br>
</p>
<p>Thank you for the patch.</p>
<p>I think you have spotted a bug in <span>ARMTargetLowering::isLegalAddressingMode.<span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:14.6667px"></span></span></p>
<p><span>Thumb1 addressing modes do not support scaling. So false must be returned if AM.Scale is not zero. <span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:14.6667px">ARMTargetLowering::isLegalAddressingMode
return false only when Scale equals 1 and returns true for other values.</span></span></p>
<p><span>I am testing a fix of the issue. I'll also raise a bug report about this issue.</span></p>
<p><span><br>
</span></p>
<p><span>Thnaks,</span></p>
<p><span>Evgeny</span></p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Wei Mi <wmi@google.com><br>
<b>Sent:</b> Friday, August 4, 2017 7:27:47 PM<br>
<b>To:</b> Evgeny Astigeevich<br>
<b>Cc:</b> Quentin Colombet; reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd<br>
<b>Subject:</b> Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">On Fri, Aug 4, 2017 at 11:16 AM, Evgeny Astigeevich<br>
<Evgeny.Astigeevich@arm.com> wrote:<br>
> I compared the generated code with and without this check:<br>
><br>
><br>
> --- check is ON ---<br>
><br>
> @ BB#0: @ %entry<br>
> cmp r2, #0<br>
> moveq pc, lr<br>
> add r0, r0, r2, lsl #2<br>
> mov r12, #0<br>
> mov r3, #0<br>
> .LBB1_1: @ %bb<br>
> @ =>This Inner Loop Header: Depth=1<br>
> str r1, [r0, -r3, lsl #2]<br>
> str r3, [r12]<br>
> add r3, r3, #1<br>
> cmp r2, r3<br>
> bne .LBB1_1<br>
><br>
> --- check is OFF ---<br>
><br>
> @ BB#0: @ %entry<br>
> cmp r2, #0<br>
> moveq pc, lr<br>
> mov r12, #0<br>
> mov r3, #0<br>
> .LBB1_1: @ %bb<br>
> @ =>This Inner Loop Header: Depth=1<br>
> str r1, [r0, r2, lsl #2]<br>
> subs r2, r2, #1<br>
> str r3, [r12]<br>
> add r3, r3, #1<br>
> bne .LBB1_1<br>
><br>
> The main difference of 'check is OFF' is that we need two registers to keep<br>
> values. As '-Rm' cannot be used in Thumb code it is not working for v6m.<br>
><br>
><br>
> Thanks,<br>
><br>
> Evgeny<br>
<br>
Thanks Evgeny. So can we add a subtarget check around this scale<0 check?<br>
<br>
Index: lib/Target/ARM/ARMISelLowering.cpp<br>
===================================================================<br>
--- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)<br>
+++ lib/Target/ARM/ARMISelLowering.cpp (working copy)<br>
@@ -12411,7 +12411,7 @@ bool ARMTargetLowering::isLegalAddressin<br>
case MVT::i1:<br>
case MVT::i8:<br>
case MVT::i32:<br>
- if (Scale < 0) Scale = -Scale;<br>
+ if (!Subtarget->isThumb() && Scale < 0) Scale = -Scale;<br>
if (Scale == 1)<br>
return true;<br>
// r + r << imm<br>
<br>
Wei.<br>
<br>
><br>
><br>
> ________________________________<br>
> From: Evgeny Astigeevich<br>
> Sent: Friday, August 4, 2017 6:42:53 PM<br>
> To: Wei Mi; Quentin Colombet<br>
> Cc: reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny<br>
> Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen;<br>
> llvm-commits; Michael Zolotukhin; nd<br>
><br>
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering<br>
> non-optimal formulae with the same ScaledReg and Scale.<br>
><br>
><br>
> Hi Wei,<br>
><br>
><br>
> The fix will cause a failure of CodeGen/ARM/arm-negative-stride.ll. The<br>
> check was introduced in r35859 (10 years ago):<br>
><br>
><br>
> ------------------------------------------------------------------------<br>
> r35859 | lattner | 2007-04-10 04:48:29 +0100 (Tue, 10 Apr 2007) | 2 lines<br>
><br>
> restore support for negative strides<br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> The test has a triple: arm-eabi. So code is generated for arm7tdmi (armv4).<br>
> Maybe we need a case for the M-profile. I'll try to figure it out.<br>
><br>
> Thanks,<br>
> Evgeny Astigeevich<br>
> The ARM Compiler Optimization team leader<br>
><br>
> ________________________________<br>
> From: Wei Mi <wmi@google.com><br>
> Sent: Friday, August 4, 2017 5:23:47 PM<br>
> To: Quentin Colombet<br>
> Cc: reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny<br>
> Stupachenko; Xinliang David Li; Sanjoy Das; Evgeny Astigeevich; Matthias<br>
> Braun; Dehao Chen; llvm-commits; Michael Zolotukhin<br>
> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering<br>
> non-optimal formulae with the same ScaledReg and Scale.<br>
><br>
> On Thu, Aug 3, 2017 at 6:08 PM, Quentin Colombet <qcolombet@apple.com><br>
> wrote:<br>
>><br>
>>> On Aug 3, 2017, at 4:46 PM, Wei Mi <wmi@google.com> wrote:<br>
>>><br>
>>> I found isAMCompletelyFolded return true for formula like<br>
>>> reg(%c1.0104.us.i.i) + -1*reg({0,+,-4}<nsw><%for.body8.us.i.i>) +<br>
>>> imm(4) on ARM. Does it make sense?<br>
>><br>
>><br>
>> That sounds wrong.<br>
><br>
> With the following fix, the regression of test.ll will be fixed, but I<br>
> am not familiar with ARM so I am not sure whether allowing negative<br>
> scale has any other usage. Quentin, does the fix looks correct to you?<br>
><br>
> Index: lib/Target/ARM/ARMISelLowering.cpp<br>
> ===================================================================<br>
> --- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)<br>
> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)<br>
> @@ -12411,7 +12411,6 @@ bool ARMTargetLowering::isLegalAddressin<br>
> case MVT::i1:<br>
> case MVT::i8:<br>
> case MVT::i32:<br>
> - if (Scale < 0) Scale = -Scale;<br>
> if (Scale == 1)<br>
> return true;<br>
> // r + r << imm<br>
><br>
><br>
><br>
>><br>
>>> The fact is we cannot fold it into<br>
>>> load/store and it leads to the additional instructions causing<br>
>>> performance regressions.<br>
>>><br>
>>><br>
>>> On Thu, Aug 3, 2017 at 11:12 AM, Wei Mi via Phabricator<br>
>>> <reviews@reviews.llvm.org> wrote:<br>
>>>> wmi added a comment.<br>
>>>><br>
>>>> In <a href="https://reviews.llvm.org/D34583#830286">https://reviews.llvm.org/D34583#830286</a>, @eastig wrote:<br>
>>>><br>
>>>>> F4171725: test.ll <<a href="https://reviews.llvm.org/F4171725">https://reviews.llvm.org/F4171725</a>><br>
>>>>><br>
>>>>> F4171724: test.good.ll <<a href="https://reviews.llvm.org/F4171724">https://reviews.llvm.org/F4171724</a>><br>
>>>>><br>
>>>>> F4171723: test.bad.ll <<a href="https://reviews.llvm.org/F4171723">https://reviews.llvm.org/F4171723</a>><br>
>>>>><br>
>>>>> This patch caused regressions from 5% to 23% in two our internal<br>
>>>>> benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced<br>
>>>>> from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result<br>
>>>>> when filtering is disabled. 'test.bad.ll' is a result when filtering is<br>
>>>>> enabled.<br>
>>>>> Comparing them I can see that this optimization changes how an<br>
>>>>> induction variable is changed. Originally it is incremented from 0 to 256.<br>
>>>>> The optimization changes this into decrementing from 0 to -256. This<br>
>>>>> induction variable is also used as an offset to memory. So to preserve this<br>
>>>>> semantic conversion of the induction variable from a negative value to a<br>
>>>>> positive value is inserted. This is lowered to additional instructions which<br>
>>>>> causes performance regressions.<br>
>>>>><br>
>>>>> Could you please have a look at this issue?<br>
>>>>><br>
>>>>> Thanks,<br>
>>>>> Evgeny Astigeevich<br>
>>>>> The ARM Compiler Optimization team leader<br>
>>>><br>
>>>><br>
>>>> Hi Evgeny,<br>
>>>><br>
>>>> Thanks for providing the testcase.<br>
>>>><br>
>>>> It looks like an existing issue in LSR cost evaluation exposed by the<br>
>>>> patch. Actually, comparing the trace by adding -debug-only=loop-reduce, all<br>
>>>> the candidates choosen by LSR without filtering are kept in the candidate<br>
>>>> set after adding the filter patch. However filtering patch provides some<br>
>>>> more candidates interesting for LSR cost model to choose, and LSR chooses a<br>
>>>> different set of candidates in the final result which it thinks better (1<br>
>>>> less base add) but actually not. We can see that in the trace:<br>
>>>><br>
>>>> LSR without the filtering patch:<br>
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 2 base<br>
>>>> adds, plus 4 imm cost, plus 1 setup cost:<br>
>>>><br>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i8*<br>
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>
>>>> 1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32<br>
>>>> reg(256) + -1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i32*<br>
>>>> reg(%c0.0103.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest<br>
>>>> fixup type: i32*<br>
>>>> reg({(-4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>
>>>> fixup type: i32*<br>
>>>> reg(%c0.0103.us.i.i)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i32*<br>
>>>> reg(%c1.0104.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>) +<br>
>>>> imm(4)<br>
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>
>>>> fixup type: i32*<br>
>>>> reg(%c1.0104.us.i.i)<br>
>>>><br>
>>>> LSR with the filtering patch:<br>
>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 1 base<br>
>>>> add, plus 4 imm cost, plus 1 setup cost:<br>
>>>><br>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i8*<br>
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32<br>
>>>> reg(-256) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i32*<br>
>>>> reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest<br>
>>>> fixup type: i32*<br>
>>>> reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>) +<br>
>>>> imm(-8)<br>
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>
>>>> fixup type: i32*<br>
>>>> reg(%c0.0103.us.i.i)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i32*<br>
>>>> reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>
>>>> fixup type: i32*<br>
>>>> reg(%c1.0104.us.i.i)<br>
>>>><br>
>>>> The real problem is that LSR has no idea about the cost of getting<br>
>>>> negative value. It thinks 4*reg({0,+,-1} and -4*reg({0,+,-1} have the same<br>
>>>> cost.<br>
>>>><br>
>>>> LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i8*<br>
>>>> reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>
>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>
>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>
>>>> type: i32*<br>
>>>> reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>
>>>><br>
>>>> I will think about how to fix it.<br>
>>>><br>
>>>> Wei.<br>
>>>><br>
>>>><br>
>>>> Repository:<br>
>>>> rL LLVM<br>
>>>><br>
>>>> <a href="https://reviews.llvm.org/D34583">https://reviews.llvm.org/D34583</a><br>
>>>><br>
>>>><br>
>>>><br>
>><br>
</div>
</span></font>
</body>
</html>