<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

<div id="divtagdefaultwrapper" style="font-size:11pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">

<p>I submitted <a href="https://reviews.llvm.org/D36467" class="OWAAutoLink">https://reviews.llvm.org/D36467</a> which fixes the issue.</p>

<p>I tested the performance is back. We even get additional ~6% improvements on the benchmarks. We also found we had another issue related to LSR and caused by this bug in <span>ARMTargetLowering::isLegalAddressingMode. The patch fixed it as well.</span></p>

<p><span><br>

</span></p>

<p>Thank you, Wei, for identifying the root cause.</p>

<p><br>

</p>

<p>-Evgeny</p>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Evgeny Astigeevich<br>

<b>Sent:</b> Monday, August 7, 2017 6:18:08 PM<br>

<b>To:</b> Wei Mi<br>

<b>Cc:</b> Quentin Colombet; reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd<br>

<b>Subject:</b> Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.</font>

<div> </div>

</div>

<div><style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

<div id="divtagdefaultwrapper" style="font-size:11pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">

<p>I've created a bug report: <a href="https://bugs.llvm.org/show_bug.cgi?id=34106" class="OWAAutoLink">https://bugs.llvm.org/show_bug.cgi?id=34106</a></p>

<p><br>

</p>

<p>The fix is in progress.</p>

<p><br>

</p>

<p>Thanks,</p>

<p>Evgeny</p>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Evgeny Astigeevich<br>

<b>Sent:</b> Monday, August 7, 2017 1:27:15 PM<br>

<b>To:</b> Wei Mi<br>

<b>Cc:</b> Quentin Colombet; reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd<br>

<b>Subject:</b> Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.</font>

<div> </div>

</div>

<div>

<meta content="text/html; charset=UTF-8">

<style type="text/css" style="">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div dir="ltr">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:11pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">

<p>Hi Wei,</p>

<p><br>

</p>

<p>Thank you for the patch.</p>

<p>I think you have spotted a bug in <span>ARMTargetLowering::isLegalAddressingMode.<span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:14.6667px"></span></span></p>

<p><span>Thumb1 addressing modes do not support scaling. So false must be returned if AM.Scale is not zero. <span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:14.6667px">ARMTargetLowering::isLegalAddressingMode

 return false only when Scale equals 1 and returns true for other values.</span></span></p>

<p><span>I am testing a fix of the issue. I'll also raise a bug report about this issue.</span></p>

<p><span><br>

</span></p>

<p><span>Thnaks,</span></p>

<p><span>Evgeny</span></p>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Wei Mi <wmi@google.com><br>

<b>Sent:</b> Friday, August 4, 2017 7:27:47 PM<br>

<b>To:</b> Evgeny Astigeevich<br>

<b>Cc:</b> Quentin Colombet; reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen; llvm-commits; Michael Zolotukhin; nd<br>

<b>Subject:</b> Re: [PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">On Fri, Aug 4, 2017 at 11:16 AM, Evgeny Astigeevich<br>

<Evgeny.Astigeevich@arm.com> wrote:<br>

> I compared the generated code with and without this check:<br>

><br>

><br>

> --- check is ON ---<br>

><br>

> @ BB#0:                                 @ %entry<br>

>         cmp     r2, #0<br>

>         moveq   pc, lr<br>

>         add     r0, r0, r2, lsl #2<br>

>         mov     r12, #0<br>

>         mov     r3, #0<br>

> .LBB1_1:                                @ %bb<br>

>                                         @ =>This Inner Loop Header: Depth=1<br>

>         str     r1, [r0, -r3, lsl #2]<br>

>         str     r3, [r12]<br>

>         add     r3, r3, #1<br>

>         cmp     r2, r3<br>

>         bne     .LBB1_1<br>

><br>

> --- check is OFF ---<br>

><br>

> @ BB#0:                                 @ %entry<br>

>         cmp     r2, #0<br>

>         moveq   pc, lr<br>

>         mov     r12, #0<br>

>         mov     r3, #0<br>

> .LBB1_1:                                @ %bb<br>

>                                         @ =>This Inner Loop Header: Depth=1<br>

>         str     r1, [r0, r2, lsl #2]<br>

>         subs    r2, r2, #1<br>

>         str     r3, [r12]<br>

>         add     r3, r3, #1<br>

>         bne     .LBB1_1<br>

><br>

> The main difference of 'check is OFF' is that we need two registers to keep<br>

> values. As '-Rm' cannot be used in Thumb code it is not working for v6m.<br>

><br>

><br>

> Thanks,<br>

><br>

> Evgeny<br>

<br>

Thanks Evgeny. So can we add a subtarget check around this scale<0 check?<br>

<br>

Index: lib/Target/ARM/ARMISelLowering.cpp<br>

===================================================================<br>

--- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)<br>

+++ lib/Target/ARM/ARMISelLowering.cpp (working copy)<br>

@@ -12411,7 +12411,7 @@ bool ARMTargetLowering::isLegalAddressin<br>

     case MVT::i1:<br>

     case MVT::i8:<br>

     case MVT::i32:<br>

-      if (Scale < 0) Scale = -Scale;<br>

+      if (!Subtarget->isThumb() && Scale < 0) Scale = -Scale;<br>

       if (Scale == 1)<br>

         return true;<br>

       // r + r << imm<br>

<br>

Wei.<br>

<br>

><br>

><br>

> ________________________________<br>

> From: Evgeny Astigeevich<br>

> Sent: Friday, August 4, 2017 6:42:53 PM<br>

> To: Wei Mi; Quentin Colombet<br>

> Cc: reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny<br>

> Stupachenko; Xinliang David Li; Sanjoy Das; Matthias Braun; Dehao Chen;<br>

> llvm-commits; Michael Zolotukhin; nd<br>

><br>

> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering<br>

> non-optimal formulae with the same ScaledReg and Scale.<br>

><br>

><br>

> Hi Wei,<br>

><br>

><br>

> The fix will cause a failure of CodeGen/ARM/arm-negative-stride.ll. The<br>

> check was introduced in r35859 (10 years ago):<br>

><br>

><br>

> ------------------------------------------------------------------------<br>

> r35859 | lattner | 2007-04-10 04:48:29 +0100 (Tue, 10 Apr 2007) | 2 lines<br>

><br>

> restore support for negative strides<br>

><br>

> ------------------------------------------------------------------------<br>

><br>

> The test has a triple: arm-eabi. So code is generated for arm7tdmi (armv4).<br>

> Maybe we need a case for the M-profile. I'll try to figure it out.<br>

><br>

> Thanks,<br>

> Evgeny Astigeevich<br>

> The ARM Compiler Optimization team leader<br>

><br>

> ________________________________<br>

> From: Wei Mi <wmi@google.com><br>

> Sent: Friday, August 4, 2017 5:23:47 PM<br>

> To: Quentin Colombet<br>

> Cc: reviews+D34583+public+14017146d8617607@reviews.llvm.org; Evgeny<br>

> Stupachenko; Xinliang David Li; Sanjoy Das; Evgeny Astigeevich; Matthias<br>

> Braun; Dehao Chen; llvm-commits; Michael Zolotukhin<br>

> Subject: Re: [PATCH] D34583: [LSR] Narrow search space by filtering<br>

> non-optimal formulae with the same ScaledReg and Scale.<br>

><br>

> On Thu, Aug 3, 2017 at 6:08 PM, Quentin Colombet <qcolombet@apple.com><br>

> wrote:<br>

>><br>

>>> On Aug 3, 2017, at 4:46 PM, Wei Mi <wmi@google.com> wrote:<br>

>>><br>

>>> I found isAMCompletelyFolded return true for formula like<br>

>>> reg(%c1.0104.us.i.i) + -1*reg({0,+,-4}<nsw><%for.body8.us.i.i>) +<br>

>>> imm(4) on ARM. Does it make sense?<br>

>><br>

>><br>

>> That sounds wrong.<br>

><br>

> With the following fix, the regression of test.ll will be fixed, but I<br>

> am not familiar with ARM so I am not sure whether allowing negative<br>

> scale has any other usage. Quentin, does the fix looks correct to you?<br>

><br>

> Index: lib/Target/ARM/ARMISelLowering.cpp<br>

> ===================================================================<br>

> --- lib/Target/ARM/ARMISelLowering.cpp (revision 309240)<br>

> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)<br>

> @@ -12411,7 +12411,6 @@ bool ARMTargetLowering::isLegalAddressin<br>

>      case MVT::i1:<br>

>      case MVT::i8:<br>

>      case MVT::i32:<br>

> -      if (Scale < 0) Scale = -Scale;<br>

>        if (Scale == 1)<br>

>          return true;<br>

>        // r + r << imm<br>

><br>

><br>

><br>

>><br>

>>> The fact is we cannot fold it into<br>

>>> load/store and it leads to the additional instructions causing<br>

>>> performance regressions.<br>

>>><br>

>>><br>

>>> On Thu, Aug 3, 2017 at 11:12 AM, Wei Mi via Phabricator<br>

>>> <reviews@reviews.llvm.org> wrote:<br>

>>>> wmi added a comment.<br>

>>>><br>

>>>> In <a href="https://reviews.llvm.org/D34583#830286">https://reviews.llvm.org/D34583#830286</a>, @eastig wrote:<br>

>>>><br>

>>>>> F4171725: test.ll <<a href="https://reviews.llvm.org/F4171725">https://reviews.llvm.org/F4171725</a>><br>

>>>>><br>

>>>>> F4171724: test.good.ll <<a href="https://reviews.llvm.org/F4171724">https://reviews.llvm.org/F4171724</a>><br>

>>>>><br>

>>>>> F4171723: test.bad.ll <<a href="https://reviews.llvm.org/F4171723">https://reviews.llvm.org/F4171723</a>><br>

>>>>><br>

>>>>> This patch caused regressions from 5% to 23% in two our internal<br>

>>>>> benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced<br>

>>>>> from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result<br>

>>>>> when filtering is disabled. 'test.bad.ll' is a result when filtering is<br>

>>>>> enabled.<br>

>>>>> Comparing them I can see that this optimization changes how an<br>

>>>>> induction variable is changed. Originally it is incremented from 0 to 256.<br>

>>>>> The optimization changes this into decrementing from 0 to -256. This<br>

>>>>> induction variable is also used as an offset to memory. So to preserve this<br>

>>>>> semantic conversion of the induction variable from a negative value to a<br>

>>>>> positive value is inserted. This is lowered to additional instructions which<br>

>>>>> causes performance regressions.<br>

>>>>><br>

>>>>> Could you please have a look at this issue?<br>

>>>>><br>

>>>>> Thanks,<br>

>>>>> Evgeny Astigeevich<br>

>>>>> The ARM Compiler Optimization team leader<br>

>>>><br>

>>>><br>

>>>> Hi Evgeny,<br>

>>>><br>

>>>> Thanks for providing the testcase.<br>

>>>><br>

>>>> It looks like an existing issue in LSR cost evaluation exposed by the<br>

>>>> patch. Actually, comparing the trace by adding -debug-only=loop-reduce, all<br>

>>>> the candidates choosen by LSR without filtering are kept in the candidate<br>

>>>> set after adding the filter patch. However filtering patch provides some<br>

>>>> more candidates interesting for LSR cost model to choose, and LSR chooses a<br>

>>>> different set of candidates in the final result which it thinks better (1<br>

>>>> less base add) but actually not. We can see that in the trace:<br>

>>>><br>

>>>> LSR without the filtering patch:<br>

>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 2 base<br>

>>>> adds, plus 4 imm cost, plus 1 setup cost:<br>

>>>><br>

>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i8*<br>

>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>

>>>> 1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32<br>

>>>>    reg(256) + -1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i32*<br>

>>>>    reg(%c0.0103.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg({(-4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg(%c0.0103.us.i.i)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i32*<br>

>>>>    reg(%c1.0104.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>) +<br>

>>>> imm(4)<br>

>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg(%c1.0104.us.i.i)<br>

>>>><br>

>>>> LSR with the filtering patch:<br>

>>>> The chosen solution requires 5 regs, with addrec cost 2, plus 1 base<br>

>>>> add, plus 4 imm cost, plus 1 setup cost:<br>

>>>><br>

>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i8*<br>

>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>

>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32<br>

>>>>    reg(-256) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i32*<br>

>>>>    reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>) +<br>

>>>> imm(-8)<br>

>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg(%c0.0103.us.i.i)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i32*<br>

>>>>    reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest<br>

>>>> fixup type: i32*<br>

>>>>    reg(%c1.0104.us.i.i)<br>

>>>><br>

>>>> The real problem is that LSR has no idea about the cost of getting<br>

>>>> negative value. It thinks 4*reg({0,+,-1} and -4*reg({0,+,-1} have the same<br>

>>>> cost.<br>

>>>><br>

>>>>  LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i8*<br>

>>>>    reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) +<br>

>>>> -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>

>>>>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup<br>

>>>> type: i32*<br>

>>>>    reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)<br>

>>>><br>

>>>> I will think about how to fix it.<br>

>>>><br>

>>>> Wei.<br>

>>>><br>

>>>><br>

>>>> Repository:<br>

>>>>  rL LLVM<br>

>>>><br>

>>>> <a href="https://reviews.llvm.org/D34583">https://reviews.llvm.org/D34583</a><br>

>>>><br>

>>>><br>

>>>><br>

>><br>

</div>

</span></font></div>

</div>

</body>

</html>