<div dir="ltr">What if we had a pragma or attribute that lowered down to metadata indicating that the variable length trip count was small?<div><br></div><div>Then backends could choose to lower short memcpys to an inlined, slightly widened loop. For example, 'rep movsq' on x86_64.</div><div><br></div><div>That seems nice from the compiler perspective, since it preserves the canonical form and we get the same kind of information from profiling. Then again, I can imagine most game dev users just want control and don't want to change their code.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Dec 2, 2014 at 11:23 AM, Robert Lougher <span dir="ltr"><<a href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
In feedback from game studios a common issue is the replacement of<br>
loops with calls to memcpy/memset. These loops are often<br>
hand-optimised, and highly-efficient and the developers strongly want<br>
a way to control the compiler (i.e. leave my loop alone).<br>
<br>
The culprit is of course the loop-idiom recognizer. This replaces any<br>
loop that looks like a memset/memcpy with calls. This affects loops<br>
with both a variable and a constant trip-count. The question is, does<br>
this make sense in all cases? Also, should the compiler provide a way<br>
to turn it off for certain types of loop, or on a loop individually?<br>
The standard answer is to use -fno-builtin but this does not provide<br>
fine-grain control (e.g. we may want the loop-idiom to recognise<br>
constant loops but not variable loops).<br>
<br>
As an example, it could be argued that replacing constant loops always<br>
makes sense. Here the compiler knows how big the memset/memcpy is and<br>
can make an accurate decision. For small values the memcpy/memset<br>
will be expanded inline, while larger values will remain a call, but<br>
due to the size the overhead will be negligible.<br>
<br>
On the other hand, the compiler knows very little about variable loops<br>
(the loop could be used primarily for copying 10 bytes or 10 Mbytes,<br>
the compiler doesn't know). The compiler will replace it with a call,<br>
but as it is variable it will not be expanded inline. In this case<br>
small values may see significant overhead in comparison to the<br>
original loop. The game studio examples all fall into this category.<br>
<br>
The loop-idiom recognizer also has no notion of "quality" - it always<br>
assumes that replacing the loop makes sense. While it might be the<br>
case for a naive byte-copy, some of the examples we've seen have been<br>
carefully tuned.<br>
<br>
So, to summarise, we feel that there's sufficient justification to add<br>
some sort of user-control. However, we do not want to suggest a<br>
solution, but prefer to start a discussion, and obtain opinions. So<br>
to start, how do people feel about:<br>
<br>
- A switch to disable loop-idiom recognizer completely?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with variable trip count?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with constant<br>
trip count (can't see this being much use)?<br>
<br>
- Per-function control of loop-idiom recognizer (which must work with LTO)?<br>
<br>
Thanks for any feedback!<br>
Rob.<br>
<br>
--<br>
Robert Lougher<br>
SN Systems - Sony Computer Entertainment Group<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote></div><br></div>