[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

Tue Dec 2 11:45:43 PST 2014

What if we had a pragma or attribute that lowered down to metadata
indicating that the variable length trip count was small?

Then backends could choose to lower short memcpys to an inlined, slightly
widened loop. For example, 'rep movsq' on x86_64.

That seems nice from the compiler perspective, since it preserves the
canonical form and we get the same kind of information from profiling. Then
again, I can imagine most game dev users just want control and don't want
to change their code.

On Tue, Dec 2, 2014 at 11:23 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:

> Hi,
>
> In feedback from game studios a common issue is the replacement of
> loops with calls to memcpy/memset.  These loops are often
> hand-optimised, and highly-efficient and the developers strongly want
> a way to control the compiler (i.e. leave my loop alone).
>
> The culprit is of course the loop-idiom recognizer.  This replaces any
> loop that looks like a memset/memcpy with calls.  This affects loops
> with both a variable and a constant trip-count.  The question is, does
> this make sense in all cases?  Also, should the compiler provide a way
> to turn it off for certain types of loop, or on a loop individually?
> The standard answer is to use -fno-builtin but this does not provide
> fine-grain control (e.g. we may want the loop-idiom to recognise
> constant loops but not variable loops).
>
> As an example, it could be argued that replacing constant loops always
> makes sense.  Here the compiler knows how big the memset/memcpy is and
> can make an accurate decision.  For small values the memcpy/memset
> will be expanded inline, while larger values will remain a call, but
> due to the size the overhead will be negligible.
>
> On the other hand, the compiler knows very little about variable loops
> (the loop could be used primarily for copying 10 bytes or 10 Mbytes,
> the compiler doesn't know).  The compiler will replace it with a call,
> but as it is variable it will not be expanded inline.  In this case
> small values may see significant overhead in comparison to the
> original loop.  The game studio examples all fall into this category.
>
> The loop-idiom recognizer also has no notion of "quality" - it always
> assumes that replacing the loop makes sense.  While it might be the
> case for a naive byte-copy, some of the examples we've seen have been
> carefully tuned.
>
> So, to summarise, we feel that there's sufficient justification to add
> some sort of user-control.  However, we do not want to suggest a
> solution, but prefer to start a discussion, and obtain opinions.  So
> to start, how do people feel about:
>
> - A switch to disable loop-idiom recognizer completely?
>
> - A switch to disable loop-idiom recognizer for loops with variable trip
> count?
>
> - A switch to disable loop-idiom recognizer for loops with constant
> trip count (can't see this being much use)?
>
> - Per-function control of loop-idiom recognizer (which must work with LTO)?
>
> Thanks for any feedback!
> Rob.
>
> --
> Robert Lougher
> SN Systems - Sony Computer Entertainment Group
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141202/e31be04d/attachment.html>