<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 3, 2014 at 4:23 AM, Robert Lougher <span dir="ltr"><<a href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>

<br>

In feedback from game studios a common issue is the replacement of<br>

loops with calls to memcpy/memset.  These loops are often<br>

hand-optimised, and highly-efficient and the developers strongly want<br>

a way to control the compiler (i.e. leave my loop alone).<br></blockquote><div><br></div><div>Please provide examples of such "hand-optimised, and highly-efficient" routines and test cases (and execution conditions) that demonstrate a performance improvement.</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

The culprit is of course the loop-idiom recognizer.  This replaces any<br>

loop that looks like a memset/memcpy with calls.  This affects loops<br>

with both a variable and a constant trip-count.  The question is, does<br>

this make sense in all cases?  Also, should the compiler provide a way<br>

to turn it off for certain types of loop, or on a loop individually?<br>

The standard answer is to use -fno-builtin but this does not provide<br>

fine-grain control (e.g. we may want the loop-idiom to recognise<br>

constant loops but not variable loops).<br>

<br>

As an example, it could be argued that replacing constant loops always<br>

makes sense.  Here the compiler knows how big the memset/memcpy is and<br>

can make an accurate decision.  For small values the memcpy/memset<br>

will be expanded inline, while larger values will remain a call, but<br>

due to the size the overhead will be negligible.<br>

<br>

On the other hand, the compiler knows very little about variable loops<br>

(the loop could be used primarily for copying 10 bytes or 10 Mbytes,<br>

the compiler doesn't know).  The compiler will replace it with a call,<br>

but as it is variable it will not be expanded inline.  In this case<br>

small values may see significant overhead in comparison to the<br>

original loop.  The game studio examples all fall into this category.<br>

<br>

The loop-idiom recognizer also has no notion of "quality" - it always<br>

assumes that replacing the loop makes sense.  While it might be the<br>

case for a naive byte-copy, some of the examples we've seen have been<br>

carefully tuned.<br>

<br>

So, to summarise, we feel that there's sufficient justification to add<br>

some sort of user-control.  However, we do not want to suggest a<br>

solution, but prefer to start a discussion, and obtain opinions.  So<br>

to start, how do people feel about:<br>

<br>

- A switch to disable loop-idiom recognizer completely?<br>

<br>

- A switch to disable loop-idiom recognizer for loops with variable trip count?<br>

<br>

- A switch to disable loop-idiom recognizer for loops with constant<br>

trip count (can't see this being much use)?<br>

<br>

- Per-function control of loop-idiom recognizer (which must work with LTO)?<br>

<br>

Thanks for any feedback!<br>

Rob.<br>

<br>

--<br>

Robert Lougher<br>

SN Systems - Sony Computer Entertainment Group<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div><br></div></div>