[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer
Sean Silva
chisophugis at gmail.com
Thu Dec 4 22:49:23 PST 2014
On Wed, Dec 3, 2014 at 4:23 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Hi,
>
> In feedback from game studios a common issue is the replacement of
> loops with calls to memcpy/memset. These loops are often
> hand-optimised, and highly-efficient and the developers strongly want
> a way to control the compiler (i.e. leave my loop alone).
>
Please provide examples of such "hand-optimised, and highly-efficient"
routines and test cases (and execution conditions) that demonstrate a
performance improvement.
-- Sean Silva
>
> The culprit is of course the loop-idiom recognizer. This replaces any
> loop that looks like a memset/memcpy with calls. This affects loops
> with both a variable and a constant trip-count. The question is, does
> this make sense in all cases? Also, should the compiler provide a way
> to turn it off for certain types of loop, or on a loop individually?
> The standard answer is to use -fno-builtin but this does not provide
> fine-grain control (e.g. we may want the loop-idiom to recognise
> constant loops but not variable loops).
>
> As an example, it could be argued that replacing constant loops always
> makes sense. Here the compiler knows how big the memset/memcpy is and
> can make an accurate decision. For small values the memcpy/memset
> will be expanded inline, while larger values will remain a call, but
> due to the size the overhead will be negligible.
>
> On the other hand, the compiler knows very little about variable loops
> (the loop could be used primarily for copying 10 bytes or 10 Mbytes,
> the compiler doesn't know). The compiler will replace it with a call,
> but as it is variable it will not be expanded inline. In this case
> small values may see significant overhead in comparison to the
> original loop. The game studio examples all fall into this category.
>
> The loop-idiom recognizer also has no notion of "quality" - it always
> assumes that replacing the loop makes sense. While it might be the
> case for a naive byte-copy, some of the examples we've seen have been
> carefully tuned.
>
> So, to summarise, we feel that there's sufficient justification to add
> some sort of user-control. However, we do not want to suggest a
> solution, but prefer to start a discussion, and obtain opinions. So
> to start, how do people feel about:
>
> - A switch to disable loop-idiom recognizer completely?
>
> - A switch to disable loop-idiom recognizer for loops with variable trip
> count?
>
> - A switch to disable loop-idiom recognizer for loops with constant
> trip count (can't see this being much use)?
>
> - Per-function control of loop-idiom recognizer (which must work with LTO)?
>
> Thanks for any feedback!
> Rob.
>
> --
> Robert Lougher
> SN Systems - Sony Computer Entertainment Group
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141205/9b62703d/attachment.html>
More information about the llvm-dev
mailing list