[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

Philip Reames listmail at philipreames.com
Fri Dec 5 10:06:39 PST 2014

On 12/05/2014 08:02 AM, Robert Lougher wrote:
> On 5 December 2014 at 06:49, Sean Silva <chisophugis at gmail.com> wrote:
>> On Wed, Dec 3, 2014 at 4:23 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>> Hi,
>>> In feedback from game studios a common issue is the replacement of
>>> loops with calls to memcpy/memset.  These loops are often
>>> hand-optimised, and highly-efficient and the developers strongly want
>>> a way to control the compiler (i.e. leave my loop alone).
>> Please provide examples of such "hand-optimised, and highly-efficient"
>> routines and test cases (and execution conditions) that demonstrate a
>> performance improvement.
> This sounds like a cop-out, but we can't share customer code (even if
> we could get a small runnable example).  But this is all getting
> beside the point.  I discussed performance issues to try and justify
> why the user should have control.  That was probably a mistake as it
> has subverted the conversation.  The blunt fact is that game
> developers don't like their loops being replaced and they want user
> control.  The real conversation I wanted was what form should this
> user control take.  To be honest, I am surprised at the level of
> resistance to giving users *any* control over their codegen.
If you want to maintain a custom branch of clang with an additional 
option added, no one would object or care.  If you were to submit a 
patch to add such a flag, it might even be accepted.

So far, the discussion has focused on what the compiler is doing wrong 
in this case.  You have requested a workaround for what is clearly a 
compiler optimization bug.  Before agreeing to support the workaround, 
considering how hard it would be to fix is clearly the right approach.

Having said all of that, the existing push/pop optimization scopes (a 
gcc extension) should either already work for what you're trying to with 
the workaround or could be relatively easy to adapt.  If there's an -OX 
setting that excludes the optimization you consider problematic try:
#pragma GCC optimize("-OX")

You could also try the clang::optnone function attribute.


More information about the llvm-dev mailing list