<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Currently the AtomicExpandPass will lower the following IR:</p>
<p>define i1 @foo(i32* %obj, i32 %old, i32 %new) {<br>
entry:<br>
%v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new <u><b>release
acquire</b></u><br>
%v1 = extractvalue { i32, i1 } %v0, 1<br>
ret i1 %v1<br>
}<br>
</p>
<p>to the equivalent of the following on AArch64:<br>
</p>
<p><tt> <u><b>ldxr w8, [x0]</b></u></tt><tt><br>
</tt><tt> cmp w8, w1</tt><tt><br>
</tt><tt> b.ne .LBB0_3</tt><tt><br>
</tt><tt>// BB#1: //
%cmpxchg.trystore</tt><tt><br>
</tt><tt> stlxr w8, w2, [x0]</tt><tt><br>
</tt><tt> cbz w8, .LBB0_4</tt><tt><br>
</tt><tt>// BB#2: //
%cmpxchg.failure</tt><tt><br>
</tt><tt> mov w0, wzr</tt><tt><br>
</tt><tt> ret</tt><tt><br>
</tt><tt>.LBB0_3: //
%cmpxchg.nostore</tt><tt><br>
</tt><tt> clrex</tt><tt><br>
</tt><tt> mov w0, wzr</tt><tt><br>
</tt><tt> ret</tt><tt><br>
</tt><tt>.LBB0_4:</tt><tt><br>
</tt><tt> orr w0, wzr, #0x1</tt><tt><br>
</tt><tt> ret</tt><br>
</p>
<p>GCC instead generates a ldaxr for the initial load, which seems
more correct to me since it is honoring the requested failure case
acquire ordering. I'd like to get other opinions on this before
filing a bug.<br>
</p>
<p>I believe the code in AtomicExpand::expandAtomicCmpXchg() is
responsible for this discrepancy, since it only uses the failure
case memory order for targets that use fences (i.e. when
TLI->shouldInsertFencesForAtomic(CI) is true).<br>
</p>
<pre class="moz-signature" cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
</body>
</html>