[PATCH] D110069: AArch64: use `CAS` instead of `LDX`/`STX` for more ops if available

Mon Dec 12 18:21:17 PST 2022

efriedma added a comment.

Didn't realize this was up for review; happened to spot it on the list.

Some of these sequence seem extremely long.  It should be a little better on main, since we improved ccmp formation, but can we rearrange operations somehow so we need fewer mov operations in the fast path?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17674
+  return Subtarget->hasLSE() ? AtomicExpansionKind::CmpXChg
+                             : AtomicExpansionKind::LLSC;
 }
----------------
A comment here might be useful.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17712
   // succeed. So at -O0 lower this operation to a CAS loop.
-  if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
+  if (getTargetMachine().getOptLevel() == CodeGenOpt::None || Subtarget->hasLSE())
     return AtomicExpansionKind::CmpXChg;
----------------
80 cols

Comment needs to be updated.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-atomic-128.ll:836
+; OUTLINE-NEXT:    // =>This Inner Loop Header: Depth=1
+; OUTLINE-NEXT:    ldaxp xzr, x8, [x2]
+; OUTLINE-NEXT:    stlxp w8, x0, x1, [x2]
----------------
This bug got fixed, right?

================
Comment at: llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll:107
+; LSE-NEXT:    mov x4, x6
+; LSE-NEXT:    mov x5, x7
+; LSE-NEXT:    caspal x4, x5, x2, x3, [x0]
----------------
These moves seem very strange.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110069/new/

https://reviews.llvm.org/D110069