<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 3, 2014 at 4:23 AM, Robert Lougher <span dir="ltr"><<a href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
<br>
In feedback from game studios a common issue is the replacement of<br>
loops with calls to memcpy/memset. These loops are often<br>
hand-optimised, and highly-efficient and the developers strongly want<br>
a way to control the compiler (i.e. leave my loop alone).<br></blockquote><div><br></div><div>Please provide examples of such "hand-optimised, and highly-efficient" routines and test cases (and execution conditions) that demonstrate a performance improvement.</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
The culprit is of course the loop-idiom recognizer. This replaces any<br>
loop that looks like a memset/memcpy with calls. This affects loops<br>
with both a variable and a constant trip-count. The question is, does<br>
this make sense in all cases? Also, should the compiler provide a way<br>
to turn it off for certain types of loop, or on a loop individually?<br>
The standard answer is to use -fno-builtin but this does not provide<br>
fine-grain control (e.g. we may want the loop-idiom to recognise<br>
constant loops but not variable loops).<br>
<br>
As an example, it could be argued that replacing constant loops always<br>
makes sense. Here the compiler knows how big the memset/memcpy is and<br>
can make an accurate decision. For small values the memcpy/memset<br>
will be expanded inline, while larger values will remain a call, but<br>
due to the size the overhead will be negligible.<br>
<br>
On the other hand, the compiler knows very little about variable loops<br>
(the loop could be used primarily for copying 10 bytes or 10 Mbytes,<br>
the compiler doesn't know). The compiler will replace it with a call,<br>
but as it is variable it will not be expanded inline. In this case<br>
small values may see significant overhead in comparison to the<br>
original loop. The game studio examples all fall into this category.<br>
<br>
The loop-idiom recognizer also has no notion of "quality" - it always<br>
assumes that replacing the loop makes sense. While it might be the<br>
case for a naive byte-copy, some of the examples we've seen have been<br>
carefully tuned.<br>
<br>
So, to summarise, we feel that there's sufficient justification to add<br>
some sort of user-control. However, we do not want to suggest a<br>
solution, but prefer to start a discussion, and obtain opinions. So<br>
to start, how do people feel about:<br>
<br>
- A switch to disable loop-idiom recognizer completely?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with variable trip count?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with constant<br>
trip count (can't see this being much use)?<br>
<br>
- Per-function control of loop-idiom recognizer (which must work with LTO)?<br>
<br>
Thanks for any feedback!<br>
Rob.<br>
<br>
--<br>
Robert Lougher<br>
SN Systems - Sony Computer Entertainment Group<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote></div><br></div></div>