<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">> Just out of curiousity, can you perhaps tell more how you would like to persuade/force hardware loops with an assume? There are some options at the moment (but would apply to all loops in the compilation
unit), don't think we e.g. looked into a pragma, so it sounds interesting. I guess this is a hint about the iteration count?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Yeah, basically hint about the iteration count so the user can emit hardware loops on a per-loop basis. We were also thinking about using pragmas (and I believe it would be a more user friendly way
to emit hardware loops) but I noticed that I had most tools at hand (assumptioncache, computeKnownBits) to possibly get it working with assume intrinsics as an "easy" first step.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">> As Sjoerd said, can we re-populate it?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">I believe that after CGP all assume intrinsics will be gone so I don't think that will be possible (unless there's another way to find the assumptions and repopulate the assumptioncache). Moving
the hardwareloops pass before CGP sounds like a possibility but I'm not sure about the impact of doing so (in terms of no. of hwloops emitted).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">> And in the `CodeMetrics` class, when it calculates the instruction number, it will exclude the ephemeral values(llvm.assume related instructions) first, that's why we need assumption cache analysis.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">I did notice that isHardwareLoopProfitable is also called from TTI's canSaveCmp in PPC which may explain the need of using the assumptioncache in isHardwareLoopProfitable.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Kind regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Janek van Oirschot<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Zheng CZ Chen <czhengsz@cn.ibm.com><br>
<b>Date: </b>Thursday, 25 March 2021 at 10:50<br>
<b>To: </b>Sam Parker <Sam.Parker@arm.com>, David Green <David.Green@arm.com>, Janek Van Oirschot <janekvo@graphcore.ai>, Sjoerd Meijer <Sjoerd.Meijer@arm.com><br>
<b>Cc: </b>"llvm-dev@lists.llvm.org" <llvm-dev@lists.llvm.org><br>
<b>Subject: </b>RE: isHardwareLoopProfitable() called with empty assumption cache in hwloops pass<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p><span style="font-size:10.0pt">The usage of `AssumptionCache` in PPC hardware loop was introduced in 0724fea2da637883f1461e12ff46d596a816f758</span><br>
<br>
<span style="font-size:10.0pt">For PPC hardware loop pass, we want to make sure we will not convert some small loops into hardware loop. Small loop is identified by instruction number of the loop. So we used `CodeMetrics` class to calculate the instruction
number.</span><br>
<br>
<span style="font-size:10.0pt">And in the `CodeMetrics` class, when it calculates the instruction number, it will exclude the ephemeral values(llvm.assume related instructions) first, that's why we need assumption cache analysis.</span><br>
<br>
<span style="font-size:10.0pt">I think it is ok to have empty `AssumptionCache`in PPC hardware loop pass as it is true that after CGP there is no ephemeral values any more.</span><br>
<br>
<span style="font-size:10.0pt">But maybe it makes more sense to have other APIs for `CodeMetrics` class to calculate instruction number without `AssumptionCache`.</span><br>
<br>
<span style="font-size:10.0pt">Thanks.</span><br>
<br>
<span style="font-size:10.0pt">BRS//</span><br>
<span style="font-size:10.0pt">Chen Zheng</span><br>
<span style="font-size:10.0pt">Power Compiler Backend Developer</span><br>
<br>
<br>
<img width="16" height="16" style="width:.1666in;height:.1666in" id="_x0000_i1028" src="cid:image001.gif@01D7217C.4F7E7D70" alt="Inactive hide details for Sam Parker ---2021/03/25 05:44:25 PM---Indeed, it's just there because the original PPC implementatio"><span style="font-size:10.0pt;color:#424282">Sam
Parker ---2021/03/25 05:44:25 PM---Indeed, it's just there because the original PPC implementation used it. Looking back through the co</span><br>
<br>
<span style="font-size:10.0pt;color:#5F5F5F">From: </span><span style="font-size:10.0pt">Sam Parker <Sam.Parker@arm.com></span><br>
<span style="font-size:10.0pt;color:#5F5F5F">To: </span><span style="font-size:10.0pt">Sjoerd Meijer <Sjoerd.Meijer@arm.com>, Janek Van Oirschot <janekvo@graphcore.ai>, "llvm-dev@lists.llvm.org" <llvm-dev@lists.llvm.org></span><br>
<span style="font-size:10.0pt;color:#5F5F5F">Cc: </span><span style="font-size:10.0pt">David Green <David.Green@arm.com>, "czhengsz@cn.ibm.com" <czhengsz@cn.ibm.com></span><br>
<span style="font-size:10.0pt;color:#5F5F5F">Date: </span><span style="font-size:10.0pt">2021/03/25 05:44 PM</span><br>
<span style="font-size:10.0pt;color:#5F5F5F">Subject: </span><span style="font-size:10.0pt">[EXTERNAL] Re: isHardwareLoopProfitable() called with empty assumption cache in hwloops pass</span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="0" width="100%" noshade="" style="color:#8091A5" align="left">
</div>
<p class="MsoNormal"><br>
<br>
<br>
<span style="font-size:7.5pt;color:white">Indeed, it's just there because the original PPC implementation used it. Looking back through the commits, I didn't move the pass into a different phase so either PPC has either never had a populated assumption cache
or never noticed the change ZjQcmQRYFpfptBannerStart</span> <br>
<b><span style="font-family:"Arial",sans-serif">This Message Is From an External Sender
</span></b><br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">This message came from outside your organization.
</span><br>
<span style="font-size:7.5pt;color:white">ZjQcmQRYFpfptBannerEnd</span><br>
<span style="font-family:"Arial",sans-serif">Indeed, it's just there because the original PPC implementation used it. Looking back through the commits, I didn't move the pass into a different phase so either PPC has either never had a populated assumption cache
or never noticed the change when it was cleared. </span><br>
<br>
<span style="font-family:"Arial",sans-serif">As Sjoerd said, can we re-populate it?</span><br>
<br>
<span style="font-family:"Arial",sans-serif">As long as it runs after LSR, I can't immediately think of anything that would affect the Arm implementation (famous last words!) if we moved the transform a bit earlier in the pipeline.</span><br>
<br>
<span style="font-family:"Arial",sans-serif">Regards,</span><br>
<span style="font-family:"Arial",sans-serif">Sam</span><br>
<br>
<span style="font-family:Helvetica">Sam Parker</span><br>
<span style="font-family:Helvetica">Compilation Tools Engineer | Arm</span><br>
<span style="font-family:Helvetica">. . . . . . . . . . . . . . . . . . . . . . . . . . .</span><br>
<span style="font-family:Helvetica">Arm.com</span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="0" width="100%" align="left">
</div>
<p class="MsoNormal"><br>
<b>From:</b> Sjoerd Meijer <Sjoerd.Meijer@arm.com><b><br>
Sent:</b> 25 March 2021 09:32<b><br>
To:</b> Janek Van Oirschot <janekvo@graphcore.ai>; llvm-dev@lists.llvm.org <llvm-dev@lists.llvm.org><b><br>
Cc:</b> Sam Parker <Sam.Parker@arm.com>; David Green <David.Green@arm.com><b><br>
Subject:</b> Re: isHardwareLoopProfitable() called with empty assumption cache in hwloops pass
<br>
<br>
<span style="font-family:"Arial",sans-serif">I can't imagine that being the intended behaviour. I don't think we have paid much attention to the assumption cache in the ARM implementation. Some parts of the hardware loop infrastructure were factored out from
the initial PPC implementation, which I think explains it is there and used by PPC, but not in the ARM implementation. But perhaps Sam knows more.</span><br>
<br>
<span style="font-family:"Arial",sans-serif">I have never looked into the AssumptionCache, but I assume there's way to retrigger and repopulate it (after CGP)?</span><br>
<br>
<span style="font-family:"Arial",sans-serif">Just out of curiousity, can you perhaps tell more how you would like to persuade/force hardware loops with an assume? There are some options at the moment (but would apply to all loops in the compilation unit), don't
think we e.g. looked into a pragma, so it sounds interesting. I guess this is a hint about the iteration count?</span><br>
<br>
<span style="font-family:"Arial",sans-serif">Cheers,<br>
Sjoerd.</span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="0" width="100%" align="left">
</div>
<p class="MsoNormal"><br>
<b>From:</b> Janek Van Oirschot <janekvo@graphcore.ai><b><br>
Sent:</b> 24 March 2021 17:27<b><br>
To:</b> llvm-dev@lists.llvm.org <llvm-dev@lists.llvm.org><b><br>
Cc:</b> Sam Parker <Sam.Parker@arm.com>; Sjoerd Meijer <Sjoerd.Meijer@arm.com>; David Green <David.Green@arm.com><b><br>
Subject:</b> isHardwareLoopProfitable() called with empty assumption cache in hwloops pass
<br>
<br>
Hey all,<br>
<br>
It seems that when HardwareLoops calls the isHardwareLoopProfitable TTI hook, it never has a populated AssumptionCache. Some debugging revealed that HardwareLoops runs during the PreISel phase for ARM and PPC. However, the CodeGenPrepare pass runs before PreISel
and removes all assumes meaning that the AssumptionCache in HardwareLoops will end up empty.<br>
<br>
From what I gather (and let me know if I'm wrong), only PPC uses the AssumptionCache in isHardwareLoopProfitable but only to aid in some cost analysis. I was wondering whether it's intended behaviour to have an empty AssumptionCache during HardwareLoops? I
ask because I was looking into using assumes to persuade HardwareLoops into emitting hardware intrinsics for our (downstream) target.<br>
<br>
Kind regards,<br>
Janek van Oirschot<o:p></o:p></p>
<p><br>
<span style="font-size:10.0pt"><br>
** We have updated our privacy policy, which contains important information about how we collect and process your personal data. To read the policy, please click
</span><a href="http://www.graphcore.ai/privacy"><span style="font-size:10.0pt">here</span></a><span style="font-size:10.0pt"> **<br>
<br>
This email and its attachments are intended solely for the addressed recipients and may contain confidential or legally privileged information.<br>
If you are not the intended recipient you must not copy, distribute or disseminate this email in any way; to do so may be unlawful.<br>
<br>
Any personal data/special category personal data herein are processed in accordance with UK data protection legislation.<br>
All associated feasible security measures are in place. Further details are available from the Privacy Notice on the website and/or from the Company.<br>
<br>
Graphcore Limited (registered in England and Wales with registration number 10185006) is registered at 107 Cheapside, London, UK, EC2V 6DN.<br>
This message was scanned for viruses upon transmission. However Graphcore accepts no liability for any such transmission.</span><o:p></o:p></p>
<p><o:p> </o:p></p>
</div>
</body>
</html>