[PATCH] Lower idempotent RMWs to fence+load

Mon Sep 22 14:11:45 PDT 2014

================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:96
@@ -91,3 +95,3 @@
     } else if (RMWI && TargetLowering->shouldExpandAtomicRMWInIR(RMWI)) {
       MadeChange |= expandAtomicRMW(RMWI);
     } else if (CASI && TargetLowering->hasLoadLinkedStoreConditional()) {
----------------
I find that this code isn't quite obvious: if the RMW is idempotent then you try to simplify it in a target-defined way, and if that fails you call `expandAtomicRMW` from `simplifyIdempotentRMW`. There should be only one place where you call `expandAtomicRMW`, and it should be here.

Maybe add a `virtual shouldSimplifyIdempotentRMWInIR` function, or change the control flow here to first try to simplify, if that fails check if you should expand, and if that also fails when keep going (I think this is better).

================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:481
@@ +480,3 @@
+      return C->isMinusOne();
+    default:
+      return false;
----------------
You could handle `min`/`max`/`umin`/`umax` but those seem somewhat useless. I guess other optimizations won't do partial evaluation into an atomic instruction, so constant propagation will feed up to the value operand of the atomic but not further, so it may be possible that this optimization would fire?

================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:490
@@ +489,3 @@
+  auto ResultingLoad = TLI->lowerIdempotentRMWIntoFencedLoad(RMWI);
+  if (ResultingLoad) {
+    if (TLI->shouldExpandAtomicLoadInIR(ResultingLoad))
----------------
Merge the two above lines, LLVM usually follows that style.

================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:492
@@ +491,3 @@
+    if (TLI->shouldExpandAtomicLoadInIR(ResultingLoad))
+      expandAtomicLoad(ResultingLoad);
+    return true;
----------------
Could you instead add the `LoadInst` to the `AtomicInsts` that you're iterating through? Same as before, I don't really like the repeated logic since it makes the code harder to modify and follow.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17029
@@ +17028,3 @@
+  // there is no benefit in turning such RMWs into loads, and it is actually
+  // harmfull as it introduces a mfence.
+  if (MemType->getPrimitiveSizeInBits() > NativeWidth)
----------------
s/harmfull/harmful/

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17030
@@ +17029,3 @@
+  // harmfull as it introduces a mfence.
+  if (MemType->getPrimitiveSizeInBits() > NativeWidth)
+    return nullptr;
----------------
Not all primitive types smaller than native width have an appropriate load. Will this still work in these cases (or is it impossible to get here with types that aren't 8/16/32/64)?

================
Comment at: test/CodeGen/X86/atomic_idempotent.ll:6
@@ +5,3 @@
+; (such as atomic add 0) can be replaced by an mfence followed by a mov.
+; This is explained (with the motivation for such an optimisation) in
+; http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf
----------------
optimization

================
Comment at: test/CodeGen/X86/atomic_idempotent.ll:47
@@ +46,2 @@
+  ret i128 %1
+}
----------------
Could you also try weird integer sizes?

I also think that testing more RMW operations at least for 32-bit would be good, especially `and`.

http://reviews.llvm.org/D5422