[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer
Philip Reames
listmail at philipreames.com
Fri Dec 5 10:06:39 PST 2014
On 12/05/2014 08:02 AM, Robert Lougher wrote:
> On 5 December 2014 at 06:49, Sean Silva <chisophugis at gmail.com> wrote:
>>
>> On Wed, Dec 3, 2014 at 4:23 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>> Hi,
>>>
>>> In feedback from game studios a common issue is the replacement of
>>> loops with calls to memcpy/memset. These loops are often
>>> hand-optimised, and highly-efficient and the developers strongly want
>>> a way to control the compiler (i.e. leave my loop alone).
>>
>> Please provide examples of such "hand-optimised, and highly-efficient"
>> routines and test cases (and execution conditions) that demonstrate a
>> performance improvement.
>>
> This sounds like a cop-out, but we can't share customer code (even if
> we could get a small runnable example). But this is all getting
> beside the point. I discussed performance issues to try and justify
> why the user should have control. That was probably a mistake as it
> has subverted the conversation. The blunt fact is that game
> developers don't like their loops being replaced and they want user
> control. The real conversation I wanted was what form should this
> user control take. To be honest, I am surprised at the level of
> resistance to giving users *any* control over their codegen.
If you want to maintain a custom branch of clang with an additional
option added, no one would object or care. If you were to submit a
patch to add such a flag, it might even be accepted.
So far, the discussion has focused on what the compiler is doing wrong
in this case. You have requested a workaround for what is clearly a
compiler optimization bug. Before agreeing to support the workaround,
considering how hard it would be to fix is clearly the right approach.
Having said all of that, the existing push/pop optimization scopes (a
gcc extension) should either already work for what you're trying to with
the workaround or could be relatively easy to adapt. If there's an -OX
setting that excludes the optimization you consider problematic try:
#pragma GCC optimize("-OX")
You could also try the clang::optnone function attribute.
Philip
More information about the llvm-dev
mailing list