<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 08/05/2014 03:46 PM, Robin Morisset
wrote:<br>
</div>
<blockquote
cite="mid:CAKUVmHotXx6AvUry+1AG2bFUzLwcxiiF1wR7muATMBu1G8F-1Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Hello
everyone,</span></p>
<br style="font-family:arial,sans-serif;font-size:13px">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">I
have recently started on optimizing C11/C++11 atomics in
LLVM, and plan to focus on that for the next two months as
an intern in the PNaCl team. I’ve sent two patches on this
topic to Phabricator that fix </span><a
moz-do-not-send="true"
href="http://llvm.org/bugs/show_bug.cgi?id=17281"
target="_blank" style="text-decoration:none"><span
style="font-size:15px;font-family:Arial;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">http://llvm.org/bugs/show_bug.cgi?id=17281</span></a><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">:</span></p>
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;margin-top:0pt;margin-bottom:0pt"><span
style="vertical-align:baseline;background-color:transparent"><font
color="#000000" face="Arial"><span
style="font-size:15px;line-height:17.25px;white-space:pre-wrap"><a
moz-do-not-send="true"
href="http://reviews.llvm.org/D4796" target="_blank">http://reviews.llvm.org/D4796</a></span></font><br>
</span></p>
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;margin-top:0pt;margin-bottom:0pt"><span
style="vertical-align:baseline;background-color:transparent"><font
color="#000000" face="Arial"><span
style="font-size:15px;line-height:17.25px;white-space:pre-wrap"><a
moz-do-not-send="true"
href="http://reviews.llvm.org/D4797" target="_blank">http://reviews.llvm.org/D4797</a></span><br>
</font></span></p>
</div>
</blockquote>
<font face="Arial">First, welcome! This is definitely an area which
needs some attention. As you've already seen, I'll be happy to
help review some of these patches. We will need to find other
qualified reviewers though. As you've already seen, some of these
issues are *extremely* subtle. <br>
<br>
Fair warning, given how challenging this is to get right, and how
hard it is to debug when wrong, you should expect very slow
progress and quite a bit of time required for reviews. Split your
reviews into the smallest possible pieces you can to give reviewers
a chance at understanding them. <br>
</font>
<blockquote
cite="mid:CAKUVmHotXx6AvUry+1AG2bFUzLwcxiiF1wR7muATMBu1G8F-1Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;margin-top:0pt;margin-bottom:0pt"><span
style="vertical-align:baseline;background-color:transparent"><font
color="#000000" face="Arial">
</font></span></p>
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><br>
</span></p>
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">The
first patch is X86-specific, and tries to apply operations
with immediates to atomics without going through a register.
The main trouble here is that the X86 backend appears to
respect LLVM memory model instead of the x86-TSO memory
model, and may reorder instructions. In order to prevent
illegal reordering of atomic accesses, the backend converts
atomic accesses to pseudo-instructions in
X86InstrCompiler.td (RELEASE_MOV* and ACQUIRE_MOV*) that are
opaque to most of the rest of the backend, and only lowers
those at the very end of the pipeline. I have decided to
follow the same approach, just adding some more RELEASE_*
pseudo-instructions rather than trying to find every
possibly misbehaving part of the backend in order to do
early lowering. This lowers the risk and complexity of the
patch, but at the cost of possibly missing some optimization
possibilities.</span></p>
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Another
trouble I had with this patch is a failure of TableGen type
inference when adding negate/not to the scheme. As a result
I have left these two instructions commented out in this
patch. Does anyone have an idea for how to proceed with this
?</span></p>
<br style="font-family:arial,sans-serif;font-size:13px">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">The
second patch is more straightforward: in the C11/C++11
memory model (that LLVM basically borrows), optimizations
like DSE can safely fire across atomic accesses in many
cases, basically as long as they are not operating across a
release-acquire pair (see paper referenced in the comments).
So I tweaked MemoryDependenceAnalysis to track such pairs
and only return a clobber result in such cases.</span></p>
</div>
</blockquote>
I've already made comments on these in the respective review
threads. <br>
<br>
Other readers - Please, we need extra reviewers on these! If you
have experience reasoning about memory model issues, please chime
in. <br>
<blockquote
cite="mid:CAKUVmHotXx6AvUry+1AG2bFUzLwcxiiF1wR7muATMBu1G8F-1Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<br style="font-family:arial,sans-serif;font-size:13px">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">My
next steps will probably be to improve the compilation of
acquire atomics in the ARM backend. In particular, they are
currently compiled by a load + dmb, while a load + dependant
useless branch + isb is also valid (see </span><a
moz-do-not-send="true"
href="http://www.cl.cam.ac.uk/%7Epes20/cpp/cpp0xmappings.html"
target="_blank" style="text-decoration:none"><span
style="font-size:15px;font-family:Arial;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html</span></a><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">
for example) and may be faster. Even better: if there is
already a dependant branch (such as the loop for the
lowering of CAS), it is just a cheap isb. The main step will
be switching off the InsertFencesForAtomic flag, and do the
lowering of atomics in the backend, because once an acquire
load has been transformed in an acquire fence, too much
information has been lost to apply this mapping.</span></p>
</div>
</blockquote>
I'm unfamiliar with the details of the ARM architecture, so I won't
be able to help you much here. <br>
<blockquote
cite="mid:CAKUVmHotXx6AvUry+1AG2bFUzLwcxiiF1wR7muATMBu1G8F-1Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Longer
term, I hope to improve the fence elimination of the ARM
backend with a kind of PRE algorithm. Both of these
improvements to the ARM backend should be fairly
straightforward to port to the POWER architecture later, and
I hope to also do that.</span></p>
</div>
</blockquote>
Any reason these couldn't be done at the IR level?
<blockquote
cite="mid:CAKUVmHotXx6AvUry+1AG2bFUzLwcxiiF1wR7muATMBu1G8F-1Q@mail.gmail.com"
type="cite">
<div dir="ltr"><br
style="font-family:arial,sans-serif;font-size:13px">
<p dir="ltr"
style="font-family:arial,sans-serif;font-size:13px;line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span
style="font-size:15px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Does
this approach seem worthwhile to you ? Can I do anything to
help the review process ?</span></p>
</div>
</blockquote>
Small patches. Lots of justification. Many test cases. Ping folks
periodically. <br>
<br>
Philip<br>
</body>
</html>