<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"Times New Roman \(Body CS\)";

        panose-1:2 11 6 4 2 2 2 2 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.msonormal0, li.msonormal0, div.msonormal0

        {mso-style-name:msonormal;

        mso-margin-top-alt:auto;

        margin-right:0cm;

        mso-margin-bottom-alt:auto;

        margin-left:0cm;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

span.im

        {mso-style-name:im;}

span.EmailStyle19

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;

        font-weight:normal;

        font-style:normal;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body lang="EN-GB" link="blue" vlink="purple">

<div class="WordSection1">

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Hi Sanjay,<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thank you for your help.<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">I have looked at one of Cortex-M regressions David talked about. The patch improves IR:<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">====Before:<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">%conv10 = zext i8 %40 to i32<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">%xor = xor i32 %conv10, %conv14<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">%conv16 = trunc i32 %xor to i8<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">store i8 %conv16, i8* %storemerge185, align 1<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">====After:<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">%xor = xor i8 %40, %conv14<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">store i8 %xor, i8* %storemerge185, align 1<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">However it affects register allocation causing more spill/fills. I am trying to figure out why it happens.<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thanks,<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Evgeny Astigeevich<o:p></o:p></span></p>

<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">

<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">llvm-dev <llvm-dev-bounces@lists.llvm.org> on behalf of Sanjay Patel via llvm-dev <llvm-dev@lists.llvm.org><br>

<b>Reply-To: </b>Sanjay Patel <spatel@rotateright.com><br>

<b>Date: </b>Monday, 22 January 2018 at 18:00<br>

<b>To: </b>David Green <David.Green@arm.com><br>

<b>Cc: </b>llvm-dev <llvm-dev@lists.llvm.org>, nd <nd@arm.com><br>

<b>Subject: </b>Re: [llvm-dev] always allow canonicalizing to 8- and 16-bit ops?<o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<div>

<div>

<div>

<p class="MsoNormal" style="margin-bottom:12.0pt"><a name="_MailOriginalBody">Thanks for the perf testing. I assume that DAG legalization is equipped to handle these cases fairly well, or someone would've complained by now...<o:p></o:p></a></p>

</div>

<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="mso-bookmark:_MailOriginalBody">FWIW (and at least some of this can be blamed on me), instcombine already does the narrowing transforms without checking shouldChangeType() for binops like and/or/xor/udiv.

 The justification was that narrower ops are always better for bit-tracking. If we can eliminate casts, then that improves analysis even more and allows more follow-on transforms.<o:p></o:p></span></p>

</div>

<p class="MsoNormal"><span style="mso-bookmark:_MailOriginalBody">If there are no objections here, I think you can post your patch for review on Phabricator. If we can squash any of the regressions before that goes in, that would be even better.

<o:p></o:p></span></p>

</div>

</div>

<div>

<p class="MsoNormal"><span style="mso-bookmark:_MailOriginalBody"><o:p> </o:p></span></p>

<div>

<p class="MsoNormal"><span style="mso-bookmark:_MailOriginalBody">On Mon, Jan 22, 2018 at 3:10 AM, David Green <</span><a href="mailto:David.Green@arm.com" target="_blank"><span style="mso-bookmark:_MailOriginalBody">David.Green@arm.com</span><span style="mso-bookmark:_MailOriginalBody"></span></a><span style="mso-bookmark:_MailOriginalBody">>

 wrote:<o:p></o:p></span></p>

<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">

<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="mso-bookmark:_MailOriginalBody">Hello<br>

<br>

Thanks for looking into this.<br>

<br>

I can't be very confident what the knock on result of a change like that would be,<br>

especially on architectures that are not Arm. What I can do though, is run some<br>

benchmarks and look at that results.<br>

<br>

Using this patch:<br>

<br>

--- a/lib/Transforms/InstCombine/InstructionCombining.cpp<br>

+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp<br>

@@ -150,6 +150,9 @@ bool InstCombiner::shouldChangeType(unsigned FromWidth,<br>

   bool FromLegal = FromWidth == 1 || DL.isLegalInteger(FromWidth);<br>

   bool ToLegal = ToWidth == 1 || DL.isLegalInteger(ToWidth);<br>

<br>

+  if (FromLegal && ToWidth < FromWidth && (ToWidth == 8 || ToWidth == 16))<br>

+    return true;<br>

+<br>

   // If this is a legal integer from type, and the result would be an illegal<br>

   // type, don't do the transformation.<br>

   if (FromLegal && !ToLegal)<br>

<br>

<br>

Running on a little A core, in the llvm test suite I am seeing these changes:<br>

<br>

MultiSource/Benchmarks/BitBench/uudecode/uudecode<br>

        3.38%<br>

SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding<br>

        -35.04%<br>

MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1<br>

        -17.92%<br>

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant<br>

        -8.57%<br>

External/SPEC/CINT2000/253.perlbmk/253.perlbmk<br>

        -3.43%<br>

MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm<br>

        -3.36%<br>

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl<br>

        -1.34%<br>

<br>

+ve for these is bad, -ve is good. So overall looks like a good change, especially in<br>

simple_types_constant_folding. There may be some alignment issues that can<br>

causing wilder swings than they should, but the results here look good. The list for<br>

aarch64 is roughly the same, just a slightly longer list of minor improvements.<br>

<br>

On our internal cortex-m tests we are seeing more regressions but it's still a net<br>

positive in most cases.<br>

<br>

I would say that at least for these results, it looks like a profitable idea. Like I said<br>

I can't be sure about other architectures though.<br>

<span class="im">Dave</span><br>

<br>

<span class="im">________________________________________</span><br>

<span class="im">From: Sanjay Patel <</span></span><a href="mailto:spatel@rotateright.com"><span style="mso-bookmark:_MailOriginalBody">spatel@rotateright.com</span><span style="mso-bookmark:_MailOriginalBody"></span></a><span style="mso-bookmark:_MailOriginalBody"><span class="im">></span></span><span style="mso-bookmark:_MailOriginalBody"><br>

<span class="im">Sent: 17 January 2018 22:50</span><br>

<span class="im">To: llvm-dev</span><br>

<span class="im">Cc: David Green</span><br>

<span class="im">Subject: always allow canonicalizing to 8- and 16-bit ops?</span><o:p></o:p></span></p>

<div>

<div>

<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="mso-bookmark:_MailOriginalBody">Example:<br>

define i8 @narrow_add(i8 %x, i8 %y) {<br>

  %x32 = zext i8 %x to i32<br>

  %y32 = zext i8 %y to i32<br>

  %add = add nsw i32 %x32, %y32<br>

  %tr = trunc i32 %add to i8<br>

  ret i8 %tr<br>

}<br>

<br>

With no data-layout or with an x86 target where 8-bit integer is in the data-layout, we reduce to:<br>

<br>

$ ./opt -instcombine narrowadd.ll -S<br>

define i8 @narrow_add(i8 %x, i8 %y) {<br>

  %add = add i8 %x, %y<br>

  ret i8 %add<br>

}<br>

<br>

But on a target that has 32-bit registers without explicit subregister ops, we don't do that transform because we avoid changing operations from a legal (as specified in the data-layout) width to an illegal width - see InstCombiner::shouldChangeType().<br>

<br>

Should we make an exception to allow narrowing for the common cases of i8 and i16?<br>

<br>

In the motivating example from PR35875 ( </span><a href="https://bugs.llvm.org/show_bug.cgi?id=35875" target="_blank"><span style="mso-bookmark:_MailOriginalBody">https://bugs.llvm.org/show_bug.cgi?id=35875</span><span style="mso-bookmark:_MailOriginalBody"></span></a><span style="mso-bookmark:_MailOriginalBody">

 ), an ARM target is stuck at 19 IR instructions:<br>

<br>

declare void @use4(i8, i8, i8, i8)<br>

define void @min_of_3_vals(i8 %x, i8 %y, i8 %z) {<br>

  %nx = xor i8 %x, -1<br>

  %ny = xor i8 %y, -1<br>

  %nz = xor i8 %z, -1<br>

  %zx = zext i8 %nx to i32<br>

  %zy = zext i8 %ny to i32<br>

  %zz = zext i8 %nz to i32<br>

<br>

  %cmpxz = icmp ult i32 %zx, %zz<br>

  %minxz = select i1 %cmpxz, i32 %zx, i32 %zz<br>

  %cmpyz = icmp ult i32 %zy, %zz<br>

  %minyz = select i1 %cmpyz, i32 %zy, i32 %zz<br>

  %cmpyx = icmp ult i8 %y, %x<br>

  %minxyz = select i1 %cmpyx, i32 %minxz, i32 %minyz<br>

  %tr_minxyz = trunc i32 %minxyz to i8<br>

<br>

  %new_zx = sub nsw i32 %zx, %minxyz<br>

  %new_zy = sub nsw i32 %zy, %minxyz<br>

  %new_zz = sub nsw i32 %zz, %minxyz<br>

  %new_x = trunc i32 %new_zx to i8<br>

  %new_y = trunc i32 %new_zy to i8<br>

  %new_z = trunc i32 %new_zz to i8<br>

<br>

  call void @use4(i8 %tr_minxyz, i8 %new_x, i8 %new_y, i8 %new_z)<br>

  ret void<br>

}<br>

<br>

...but x86 gets to shrink the subs which leads to a bunch of other transforms, and we grind this down to 10 instructions between instcombine and early-cse:<br>

<br>

define void @min_of_3_vals(i8 %x, i8 %y, i8 %z) {<br>

  %nx = xor i8 %x, -1<br>

  %ny = xor i8 %y, -1<br>

  %nz = xor i8 %z, -1<br>

  %cmpxz = icmp ult i8 %nx, %nz<br>

  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz<br>

  %1 = icmp ult i8 %minxz, %ny<br>

  %minxyz = select i1 %1, i8 %minxz, i8 %ny<br>

  %new_x = sub i8 %nx, %minxyz<br>

  %new_y = sub i8 %ny, %minxyz<br>

  %new_z = sub i8 %nz, %minxyz<br>

<br>

  call void @use4(i8 %minxyz, i8 %new_x, i8 %new_y, i8 %new_z)<br>

  ret void<br>

}<o:p></o:p></span></p>

</div>

</div>

</blockquote>

</div>

<p class="MsoNormal"><span style="mso-bookmark:_MailOriginalBody"><o:p> </o:p></span></p>

</div>

</div>

</body>

</html>