<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 12/02/2014 11:45 AM, Reid Kleckner
wrote:<br>
</div>
<blockquote
cite="mid:CACs=tyKyUSu-8CCddzNzo9ndk1_a1uOXkiS_+bDzUmsRX68VBw@mail.gmail.com"
type="cite">
<div dir="ltr">What if we had a pragma or attribute that lowered
down to metadata indicating that the variable length trip count
was small?
<div><br>
</div>
<div>Then backends could choose to lower short memcpys to an
inlined, slightly widened loop. For example, 'rep movsq' on
x86_64.</div>
<div><br>
</div>
<div>That seems nice from the compiler perspective, since it
preserves the canonical form and we get the same kind of
information from profiling. Then again, I can imagine most
game dev users just want control and don't want to change
their code.</div>
</div>
</blockquote>
I like this general idea. Here's another possibility...<br>
<br>
We actually already have such a construct in the form of the expect
builtins.
<a class="moz-txt-link-freetext" href="http://llvm.org/docs/BranchWeightMetadata.html#built-in-expect-instructions">http://llvm.org/docs/BranchWeightMetadata.html#built-in-expect-instructions</a><br>
<br>
One way to structure this would be:<br>
if (__builtin_expect(N < SmallSize, 1)) {<br>
//small loop here<br>
} else {<br>
// memcpy here<br>
// or unreachable if you're really brave<br>
}<br>
<br>
I could see us failing to exploit this of course. :)<br>
<br>
<br>
<blockquote
cite="mid:CACs=tyKyUSu-8CCddzNzo9ndk1_a1uOXkiS_+bDzUmsRX68VBw@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Dec 2, 2014 at 11:23 AM, Robert
Lougher <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
In feedback from game studios a common issue is the
replacement of<br>
loops with calls to memcpy/memset. These loops are often<br>
hand-optimised, and highly-efficient and the developers
strongly want<br>
a way to control the compiler (i.e. leave my loop alone).<br>
<br>
The culprit is of course the loop-idiom recognizer. This
replaces any<br>
loop that looks like a memset/memcpy with calls. This
affects loops<br>
with both a variable and a constant trip-count. The
question is, does<br>
this make sense in all cases? Also, should the compiler
provide a way<br>
to turn it off for certain types of loop, or on a loop
individually?<br>
The standard answer is to use -fno-builtin but this does not
provide<br>
fine-grain control (e.g. we may want the loop-idiom to
recognise<br>
constant loops but not variable loops).<br>
<br>
As an example, it could be argued that replacing constant
loops always<br>
makes sense. Here the compiler knows how big the
memset/memcpy is and<br>
can make an accurate decision. For small values the
memcpy/memset<br>
will be expanded inline, while larger values will remain a
call, but<br>
due to the size the overhead will be negligible.<br>
<br>
On the other hand, the compiler knows very little about
variable loops<br>
(the loop could be used primarily for copying 10 bytes or 10
Mbytes,<br>
the compiler doesn't know). The compiler will replace it
with a call,<br>
but as it is variable it will not be expanded inline. In
this case<br>
small values may see significant overhead in comparison to
the<br>
original loop. The game studio examples all fall into this
category.<br>
<br>
The loop-idiom recognizer also has no notion of "quality" -
it always<br>
assumes that replacing the loop makes sense. While it might
be the<br>
case for a naive byte-copy, some of the examples we've seen
have been<br>
carefully tuned.<br>
<br>
So, to summarise, we feel that there's sufficient
justification to add<br>
some sort of user-control. However, we do not want to
suggest a<br>
solution, but prefer to start a discussion, and obtain
opinions. So<br>
to start, how do people feel about:<br>
<br>
- A switch to disable loop-idiom recognizer completely?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with
variable trip count?<br>
<br>
- A switch to disable loop-idiom recognizer for loops with
constant<br>
trip count (can't see this being much use)?<br>
<br>
- Per-function control of loop-idiom recognizer (which must
work with LTO)?<br>
<br>
Thanks for any feedback!<br>
Rob.<br>
<br>
--<br>
Robert Lougher<br>
SN Systems - Sony Computer Entertainment Group<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>
<a moz-do-not-send="true"
href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</body>
</html>